How to recover from a broken RAID5
How GNU/Linux saved our data
Download the whole article as PDF
- 2005-08-22
- Server side | Intermediate
-
Write a full post in response to this!
In this article I will describe an experience I had that began with the failure of some RAID5 disks at the Hospital of Pediatric Especialties, where I work. While I wouldn’t wish such an event on my worst enemy, it was something that made me learn about the power of knowledge—a deep knowledge, which is so important in the hacking culture.
Friday, April 29, 2005
A 5-disk (18GB each) RAID5 was mounted on a HP Netserver Rack Storage/12. Due to a power outage yesterday, it would no longer recognize the RAID. As a matter of fact, there were two more RAIDs on the rack that were recovered… but this one (holding about 60GB of data) just wouldn’t work.
The IT manager decided to call in some “gurus” to try to get the data back on-line. I (the only GNU/Linux user at the IT department) thought that something could be done with GNU/Linux. My first thought was: “If I get images of the separate disks, maybe I can start a software RAID on GNU/Linux. All I need is enough disk space to handle all of the images”. I told my crazy (so far) idea to the IT manager and he decided to give it a try… but only once the gurus gave up.
Monday, May 2, 2005
The gurus are still trying to get the data back on-line.
Tuesday, May 3, 2005
The gurus are still trying to get the data back on-line.
Wednesday, May 4, 2005
These guys are stubborn, aren’t they?
Thursday, May 5, 2005
The IT manager called me late in the afternoon. I was given the chance to Save the Republic. One of the disks of the array had been removed. I put the disks on a computer as separate disks (no RAID), booted with Knoppix (the environment of the IT department is Windows based, apart for my desktop, which has the XP that came with the HP box and Mandriva, which is where the computer normally stays) and made the four images of the four disks left from the original five:
# for i in a b c d; do dd if=/dev/sd$i of=image$i.dat bs=4k; done
I got all the files in a single HD and left the office.
Friday, May 6, 2005
I wanted to start a software RAID, fooling the kernel into thinking that the files where HDs. Just having the images was not enough to bring the RAID on-line. RAID5 has a number of options: algorithm (left/right parity, synchronous/asynchronous), chunk (strip) size, but most important: the order of the images in the RAID. I had to tell the kernel how the RAID controller had mounted them so it could replicate the RAID.
I had already been given the hint that the chunks were 64KB long. By the end of the day, the software RAID idea hadn’t worked at all. I started thinking about rebuilding the data the “hard” way: Making a single image of the RAID from the separate images.
Weekend, May 7 and May 8, 2005
I did some research during the weekend, plus a little study of the images. The images didn’t look encrypted at all. The first “chunk” of the four images looked like garbage, but one of the disks showed a Partition Table right on the second chunk and the other chunks appeared to have other kind of data:
# fdisk -lu discoa1 You must set cylinders. You can do this from the extra functions menu. Disk discoa1: 0 MB, 0 bytes 255 heads, 63 sectors/track, 0 cylinders, total 0 sectors Units = sectors of 1 * 512 = 512 bytes Device Boot Start End Blocks Id System discoa1p1 63 142175249 71087593+ 7 HPFS/NTFS Partition 1 has different physical/logical endings: phys=(1023, 254, 63) logical=(8849, 254, 63)
fdisk was complaining because it was a 64KB file, not the expected 72GB one (written in the partition table). I studied the images and noticed that the data chunks and the parity chunks were distinguishable from each other, and that they seemed to follow a plain RAID5 distribution and algorithm… I was hopeful.
| Disk 1 | Disk 2 | Disk 3 | Disk 4 | Disk 5 |
| 1 | 2 | 3 | 4 | P |
| 5 | 6 | 7 | P | 8 |
| 9 | 10 | P | 11 | 12 |
| 13 | P | 14 | 15 | 16 |
| P | 17 | 18 | 19 | 20 |
| 21 | 22 | 23 | 24 | P |
| 25 | 26 | 27 | P | 28 |
Table 1 - RAID5’s chunk disposition (in a 5-disk array)
I made a java class that could rebuild the RAID content from the separate images (Had I used C/C++, I would still be coding!). It was all about placing the right chunk from the right disk (image of disk) at the right place of the final image. I was missing one image, but it could be calculated with the help of the parity chunks spread all over the disks (see Textbox 1). The class was no big deal: selecting the right chunks from the disks, and using XORs to calculate the missing chunks. I guess it took about three or four hours at most to code it. I was finally ready to give it a try. The problem I hit was that while testing the software RAID at home I had damaged the images. So, I have to wait until Monday to test the class with the images of the RAID.
Write a full post in response to this!
Download the files attached with this article.
Similar articles
Do you like this post?
Vote for it!
Copyright information
Verbatim copying and distribution of this entire article is permitted in any medium without royalty provided this notice is preserved.
Biography
Edmundo Carmona: Edmundo is a Venezuelan Computer Engineer. He is working as a Freelance Java Developer in Colombia since very recently. He has also been a GNU/Linux user and consultant for several years. After years of being retired from music, he's working right now to regain his classical flute skills.
- Login or register to post comments
- 13802 reads
- Printer friendly version (unavailable!)




Best voted contents
-
Google App Engine: Is it evil?
Terry Hancock, 2008-04-24 -
The Bizarre Cathedral - 3
Ryan Cartwright, 2008-05-05 -
Free Software Magazine Awards 2008
Tony Mobily, 2008-04-22 -
The Bizarre Cathedral - 2
Ryan Cartwright, 2008-04-27
Similar entries
Buzz authors
All news
From the FSM staff...
- The Top 10 Everything (Dave). The good, the bad and the ugly.
- Free Software news (Dave & Bridget). A site about short stories and writing.
- Book Reviews: Illiterarty (Bridget). Book reviews, blogs, and short stories.
Hot topics - last 60 days
-
Installing an all-in-one printer device in Debian
Ryan Cartwright, 2008-05-05 -
What is the free software community?
Tony Mobily, 2008-03-29 -
Things you miss with GNU/Linux
Ryan Cartwright, 2008-05-01 -
Why Microsoft should not lose (and free software will still win)
Ryan Cartwright, 2008-04-21 -
Beyond Synaptic - using apt for better package management
Ryan Cartwright, 2008-04-03
Dedicated server
Comments from old system
Submitted by admin on Thu, 2006-03-30 06:54.
Vote!From: MAd MAco
Url:
Date: 2005-08-10
Subject: Don't fight against paleo-toys, buy a new one!
Ok.. you are fighting against this Baby-Dino-Disk... but have you ever mind about buy a cheap laptop with a 200GB HDD??? .. .. all these toys at your server have less info than my Digi-cam!!!
...please, don't be selfish... buy a real one!
From: melissa
Url:
Date: 2005-08-10
Subject: :)
although i'd say 89% of this article made no sense to me.....je je je.....i must say....nice work, sir ;)
From: Tormak
Url: lineak.sourceforge.net
Date: 2005-08-26
Subject: Device order?
How did you determine the order that the disks were in?
From: Redbox
Url: www.pv.com.pl
Date: 2005-08-27
Subject: I must say...
GREAT JOB!!!
From: Moxy
Url:
Date: 2005-08-27
Subject: More info required.
How about publishing your Java class to see how you analysed and rebuilt the aid image?
From: rich gregory
Url:
Date: 2005-08-28
Subject: disk based backup
Here is a simple way to add a disk based incremental backup
to a production file system.
http://www.people.virginia.edu/~rtg2t/samba/system.admin.html#backup
It MUST be integrated with a full backup system to tape or disk.
It is a simple way to use an older PC and a 80-120GB IDE drive to give sys admins of big raid systems some pice of
mind.
cheers,
rich
From: Adrin
Url:
Date: 2005-10-03
Subject: RAID 5 failer, Maybe you should look at RAID 1
While that is great you got the data back I have some questions. The Raid died on April 29, and you didn't get it back until May 15. I hope the practice wasn't down that long.
I hope you where able to restore form backup and get them going in the mean time.
Perhaps you should think about the raid setup. While raid 5 is great for makeing one large disk with smaller disks. You are screwed when there is a major failer. Perhaps you should think about a mirrored raid. Yeah you loos a lot of space, but one desaster and it is paid for.
You brought up NTFS partition in you article. Another reason for me to hate windows crash recovery. I have yet to see a good disk recovery. Like the one I use in Unix but this is not the place to spam it and it is not free.
lol 18 gig drives, those
Submitted by Anonymous visitor on Wed, 2006-09-13 23:26.
Vote!lol 18 gig drives, those would have to be old..I'd go ahead and replace the server or at the very _minimum_ implement a backup system of some kind. although with that old of server, you could be looking at possible other hardware failing in the futute, so you'd want to make sure to backup the data to a medium that another machine definitely will read. such as, don't trust old HP dat tape drives!!
Answering the comments (after 1+ year)
Submitted by Edmundo Carmona on Sun, 2006-12-17 16:35.
Vote!I see there were some comments posted.... I think I replied to some in due time, but I'm not so sure... my memory sucks.
I can see there were some people talking about getting new hardware. Of course, that's one answer to the problem.... but when you are working on a public institution (in Venezuela, should I add) things are not that simple here (are they anywhere?). We have to work on a very very very tight budget and crack our heads to get the most out of everything we have at hands (even dyno-hardware).
No, the practice wasn't affected for that long. The main system of the hospital (though it was affected) was restored in basically no time. The data that was affected was "Users' Docs". Users had local copies of documents... but perhaps not all documents... and certainly not old documents. They carried on with what they had at hand at the time and waited for us to (hopefully) get the whole think back.
Coming to a more technical side of the article: how did I (we) determine the order of the disks? That took a little binary math to achieve. When we studied the images we noticed that there were markers that could be read in fixed positions in different chunks of the image of the disks (things like the string "FILE*" that were part of the structure of the NTFS partition). There were 5 disks in the array. For every chunk "row", 4 of them would have data and one disk would have the XOR checksum of the other 4 chunks (remember that checksums are spread among all the disks). If you calculate the XOR of 4 times the same value (like FILE*) you will get a beautiful 0 (for each byte), and so we were able to see were the data chunks and checksum chunks were (luckily there were 5 disks. Had they been an even number of disks, it would have taken a little more trickery because the XOR of an odd number of times of "FILE*" would be "FILE*"). We already knew where the partition table of the disk was (so the first data chuck) and could see where that image had its first checksum chunk, together with the position of the other checksums in the other disks, you can know which algorithm was applied and the order of the disks. I'm wondering if that was clear enough. :-? If you have questions, ask Tony for my email address. I'm sure he will kindly provide it. ;-)
The source code? It's there, right? It's a Free Software Magazine after all, isn't it? ;-)
Cheers!