How to recover from a broken RAID5

How to recover from a broken RAID5


In this article I will describe an experience I had that began with the failure of some RAID5 disks at the Hospital of Pediatric Especialties, where I work. While I wouldn’t wish such an event on my worst enemy, it was something that made me learn about the power of knowledge—a deep knowledge, which is so important in the hacking culture.

Friday, April 29, 2005

This article has downloads!

A 5-disk (18GB each) RAID5 was mounted on a HP Netserver Rack Storage/12. Due to a power outage yesterday, it would no longer recognize the RAID. As a matter of fact, there were two more RAIDs on the rack that were recovered... but this one (holding about 60GB of data) just wouldn’t work.

The IT manager decided to call in some “gurus” to try to get the data back on-line. I (the only GNU/Linux user at the IT department) thought that something could be done with GNU/Linux. My first thought was: “If I get images of the separate disks, maybe I can start a software RAID on GNU/Linux. All I need is enough disk space to handle all of the images”. I told my crazy (so far) idea to the IT manager and he decided to give it a try... but only once the gurus gave up.

Monday, May 2, 2005

The gurus are still trying to get the data back on-line.

Tuesday, May 3, 2005

The gurus are still trying to get the data back on-line.

Wednesday, May 4, 2005

These guys are stubborn, aren’t they?

Thursday, May 5, 2005

The IT manager called me late in the afternoon. I was given the chance to Save the Republic. One of the disks of the array had been removed. I put the disks on a computer as separate disks (no RAID), booted with Knoppix (the environment of the IT department is Windows based, apart for my desktop, which has the XP that came with the HP box and Mandriva, which is where the computer normally stays) and made the four images of the four disks left from the original five:

# for i in a b c d; do dd if=/dev/sd$i of=image$i.dat bs=4k; done

I got all the files in a single HD and left the office.

Friday, May 6, 2005

I wanted to start a software RAID, fooling the kernel into thinking that the files where HDs. Just having the images was not enough to bring the RAID on-line. RAID5 has a number of options: algorithm (left/right parity, synchronous/asynchronous), chunk (strip) size, but most important: the order of the images in the RAID. I had to tell the kernel how the RAID controller had mounted them so it could replicate the RAID.

I had already been given the hint that the chunks were 64KB long. By the end of the day, the software RAID idea hadn’t worked at all. I started thinking about rebuilding the data the “hard” way: Making a single image of the RAID from the separate images.

Weekend, May 7 and May 8, 2005

I did some research during the weekend, plus a little study of the images. The images didn’t look encrypted at all. The first “chunk” of the four images looked like garbage, but one of the disks showed a Partition Table right on the second chunk and the other chunks appeared to have other kind of data:

# fdisk -lu discoa1
You must set cylinders.
You can do this from the extra functions menu.

Disk discoa1: 0 MB, 0 bytes
255 heads, 63 sectors/track, 0 cylinders, total 0 sectors
Units = sectors of 1 * 512 = 512 bytes

  Device Boot   Start     End   Blocks  Id System
discoa1p1       63  142175249  71087593+  7 HPFS/NTFS
Partition 1 has different physical/logical endings:
   phys=(1023, 254, 63) logical=(8849, 254, 63)

fdisk was complaining because it was a 64KB file, not the expected 72GB one (written in the partition table). I studied the images and noticed that the data chunks and the parity chunks were distinguishable from each other, and that they seemed to follow a plain RAID5 distribution and algorithm... I was hopeful.

Disk 1 Disk 2 Disk 3 Disk 4 Disk 5
1 2 3 4 P
5 6 7 P 8
9 10 P 11 12
13 P 14 15 16
P 17 18 19 20
21 22 23 24 P
25 26 27 P 28

Table 1 - RAID5’s chunk disposition (in a 5-disk array)

I made a java class that could rebuild the RAID content from the separate images (Had I used C/C++, I would still be coding!). It was all about placing the right chunk from the right disk (image of disk) at the right place of the final image. I was missing one image, but it could be calculated with the help of the parity chunks spread all over the disks (see Textbox 1). The class was no big deal: selecting the right chunks from the disks, and using XORs to calculate the missing chunks. I guess it took about three or four hours at most to code it. I was finally ready to give it a try. The problem I hit was that while testing the software RAID at home I had damaged the images. So, I have to wait until Monday to test the class with the images of the RAID.

RAID stands for Redundant Array of Independent Disks. All it does is make a number of disks “look” like they are one to improve throughput or fault-tolerance. There are a number of ways to put them together. Some of them are:

Mirroring: in this case, each disk has exactly the same content. Size of the array: the size of the smallest disk. Redundancy: There must be at least a disk working for the data to remain intact.

Linear: one disk follows the other. The size of each disk doesn’t matter at all. Size of the array: the sum of the size of the separate disks. Redundancy: If you remove one disk, you will lose the information on that disk and potentially all of the data in the array.

RAID5: The information is spread in all of the available devices in a manner different from linear. Size of the array: the size of the smallest disk multiplied by the number of the disks minus one. Redundancy: At most one disk can be removed/replaced from the array without data loss. Instead of having disks that follow each other, the information is written in “chunks” of data, one disk at a time (see Table 1).

In Table 1, the numbers represent the order in which the chunks are written on the disks (in this example, it’s left parity, asynchronous). There is a parity chunk per every n – 1 chunks of data. That is done for redundancy.

It works like this: parity is calculated by XORing the n – 1 chunks of data in a row. This logical operator has a very interesting property for redundancy. If you remove one of the data chunks and use the parity chunk instead for the XOR operation, you will get the missing chunk of data:

a xor b xor c xor d xor e = p

If you remove c, then:

a xor b xor d xor e xor p = c

What does this mean for the RAID? It means that if you remove a whole disk from the array, the RAID can still work... though with a little overhead to calculate the missing chunks. Furthermore, if you replace a missing disk with a new one, the data that was in the removed disk can be rewritten to the new disk. There will be no loss of data (provided that no more than a single disk is missing at any given moment).

The process of making a RAID image wasn’t complicated. I started the Java class by telling it the conditions of the run: algorithm, images, order of the disks, chunk size, skipped chunks (remember there were 64KB of garbage at the beginning of every image), and output file.

I started thinking about rebuilding the data the “hard” way: making a single image of the RAID from the separate images

Monday, May 9 2005

I made some attempts at rebuilding the RAID content. Each try took roughly two or three hours. After a run, I had a RAID.dat file (about 72GB in size) that was the “supposed” image of a HD, just like doing:

# dd if=/dev/hda of=ata.dat

Please notice the lack** **of partition number in the input file (a raw HD block device).

Then I had to use that image as a hard drive. First, I had to use fdisk to know the “partitioning” of the hard drive (it had no problem handling the file at all). At that point, just as I had thought, I discovered that the file was the image of a HD and I could see a partition starting from sector 63. I was more than happy! There were no complaints from fdisk this time. Unfortunately, I can’t give you console output from now on, because the files have already been erased. Instead, I’ll show the commands that were involved:

# fdisk -lu RAID.dat

Then mounting. How could I make the kernel think that this file was a hard drive? Well... it took me some more research to learn that losetup is used to link loop devices to files. It felt like the solution was at hand! I had to link the file to a loop device starting from byte 32256 (I had to skip the first 63 sectors 512 bytes each, according to fdisk):

#losetup -o 32256 /dev/loop0 RAID.dat

It linked, no problem! Then mounted:

#mount -t ntfs /dev/loop0 /mnt/tmp

There was no complaint when mounting. All of the pieces were fitting together after all.

I just forgot to take into consideration one very important factor in the IT world: Murphy’s Law. The RAID was not going to give itself away so easily after all.

When I ran ls, in the mount point, I could see a few of the directories, but the information wasn’t usable. I couldn’t cd to those directories and dmesg said there were problems with the NTFS indexes. I guessed I must have made a mistake ordering the disks... or used the wrong algorithm. I tried twice (with different options), but failed.

Tuesday, May 10 2005

I had left another attempt working when I left the office. That one failed too. I was getting frustrated at the time. Three of the developers at the IT department offered their help and started analysing the whole thing with me. I made another class that rebuilt the missing image, which I felt would help us in the analysis—no matter what the algorithm, order or strip size, according to RAID theory, the missing image’s content would always be the same.

We noticed that I had indeed made a mistake when ordering the disks! (Hey, I can’t always be right, can I?) We studied the images a little further to make sure, and started the whole thing again. It was already getting late, so we had to wait until the next morning to see the results.

I just forgot to take into consideration one very important factor in the IT world: Murphy’s Law

Wednesday, May 11 2005

First thing in the morning (and I didn’t sleep very well because of the wait), I did a ls and...

Eureka! It worked.

All of the directories were there (otherwise, I wouldn’t have written this article in the first place, right?). I tried to work with some of the files in the partition... and it was perfect.

I suddenly became the spoiled kid of the IT department! I got a big chocolate cake—that’s what I call a bargain!

Even better, the experience caused some of the guys from the IT department to install GNU/Linux on their own personal computers. That’s quite an achievement!

Conclusion

I want to finish saying that I did nothing miraculous... but definitely clever! I certainly used the resources I had at hand... plus Knoppix. I also got a lot of help from the GNU/Linux community (through www.linuxquestions.org mostly). Thank you people!

It’s very important that you make sure backups be made on a regular basis to avoid this kind of situation. I don’t think it’s likely you will find yourself in the same situation we got ourselves into. But, if you do find yourself in the same boat, I hope this information allows you to not lose the data the Microsoft way (just format the disk, and forget about your data). Don’t freak out, get a Knoppix CD (if you can get a GNU/Linux guru along with it, all the better!), and with a little programming you will most likely solve the problem.

Thanks

I’d like to thank Simon Carreno, Heberto Ramos and Javier Machado, for their help in analysing the way the images of the RAID had to be put together. I’d also like to thank the IT crew as a whole for their support.

Category: 
License: 

Comments

admin's picture
Submitted by admin on

From: MAd MAco
Url:
Date: 2005-08-10
Subject: Don't fight against paleo-toys, buy a new one!

Ok.. you are fighting against this Baby-Dino-Disk... but have you ever mind about buy a cheap laptop with a 200GB HDD??? .. .. all these toys at your server have less info than my Digi-cam!!!

...please, don't be selfish... buy a real one!

From: melissa
Url:
Date: 2005-08-10
Subject: :)

although i'd say 89% of this article made no sense to me.....je je je.....i must say....nice work, sir ;)

From: Tormak
Url: lineak.sourceforge.net
Date: 2005-08-26
Subject: Device order?

How did you determine the order that the disks were in?

From: Redbox
Url: www.pv.com.pl
Date: 2005-08-27
Subject: I must say...

GREAT JOB!!!

From: Moxy
Url:
Date: 2005-08-27
Subject: More info required.

How about publishing your Java class to see how you analysed and rebuilt the aid image?

From: rich gregory
Url:
Date: 2005-08-28
Subject: disk based backup

Here is a simple way to add a disk based incremental backup

to a production file system.

http://www.people.virginia.edu/~rtg2t/samba/system.admin.html#backup

It MUST be integrated with a full backup system to tape or disk.

It is a simple way to use an older PC and a 80-120GB IDE drive to give sys admins of big raid systems some pice of

mind.

cheers,

rich

From: Adrin
Url:
Date: 2005-10-03
Subject: RAID 5 failer, Maybe you should look at RAID 1

While that is great you got the data back I have some questions. The Raid died on April 29, and you didn't get it back until May 15. I hope the practice wasn't down that long.

I hope you where able to restore form backup and get them going in the mean time.

Perhaps you should think about the raid setup. While raid 5 is great for makeing one large disk with smaller disks. You are screwed when there is a major failer. Perhaps you should think about a mirrored raid. Yeah you loos a lot of space, but one desaster and it is paid for.

You brought up NTFS partition in you article. Another reason for me to hate windows crash recovery. I have yet to see a good disk recovery. Like the one I use in Unix but this is not the place to spam it and it is not free.

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

lol 18 gig drives, those would have to be old..I'd go ahead and replace the server or at the very _minimum_ implement a backup system of some kind. although with that old of server, you could be looking at possible other hardware failing in the futute, so you'd want to make sure to backup the data to a medium that another machine definitely will read. such as, don't trust old HP dat tape drives!!

Edmundo Carmona's picture

I see there were some comments posted.... I think I replied to some in due time, but I'm not so sure... my memory sucks.

I can see there were some people talking about getting new hardware. Of course, that's one answer to the problem.... but when you are working on a public institution (in Venezuela, should I add) things are not that simple here (are they anywhere?). We have to work on a very very very tight budget and crack our heads to get the most out of everything we have at hands (even dyno-hardware).

No, the practice wasn't affected for that long. The main system of the hospital (though it was affected) was restored in basically no time. The data that was affected was "Users' Docs". Users had local copies of documents... but perhaps not all documents... and certainly not old documents. They carried on with what they had at hand at the time and waited for us to (hopefully) get the whole think back.

Coming to a more technical side of the article: how did I (we) determine the order of the disks? That took a little binary math to achieve. When we studied the images we noticed that there were markers that could be read in fixed positions in different chunks of the image of the disks (things like the string "FILE*" that were part of the structure of the NTFS partition). There were 5 disks in the array. For every chunk "row", 4 of them would have data and one disk would have the XOR checksum of the other 4 chunks (remember that checksums are spread among all the disks). If you calculate the XOR of 4 times the same value (like FILE*) you will get a beautiful 0 (for each byte), and so we were able to see were the data chunks and checksum chunks were (luckily there were 5 disks. Had they been an even number of disks, it would have taken a little more trickery because the XOR of an odd number of times of "FILE*" would be "FILE*"). We already knew where the partition table of the disk was (so the first data chuck) and could see where that image had its first checksum chunk, together with the position of the other checksums in the other disks, you can know which algorithm was applied and the order of the disks. I'm wondering if that was clear enough. :-? If you have questions, ask Tony for my email address. I'm sure he will kindly provide it. ;-)

The source code? It's there, right? It's a Free Software Magazine after all, isn't it? ;-)

Cheers!

mcontestabile's picture

Hello everyone...
My name is Marco, and I had the same experience...
2 weeks ago my server go down...all disks whit red led on...

The first thing I thought "i've lost all data...all db..all users folders..."...the last backup was a few months ago!!

i tried to change the controller, but nothing to do...the controller don't recover the raid configuration from the disk.

Searching in internet i found this guide...and reading it i've seen that what Edmundo write is the samne thing that happen in my office...i think "ok...i want to try..."...
My raid is a single raid5 array with 8 disks...in the array 2 logical drives.

After three days of work...and with the valuable help of Edmundo (How much patience has this guy? :)))) )...i've recovered all data from the disks...recovere all db and users folders...in few words...all i need to restore the office software.

I want to again thank Edmundo for his help... :))

p.s. the first thing that i've installed in my office after the recovery of the data...is a very good system backup :))

sorry for my english...i hope that who read this can understand :))))

Marco

yoavsil's picture
Submitted by yoavsil on

Hi Edmundo,

apparently the company Raid-Recovery-Online.com can recover all RAID arrays remotely in no time (providing 24/7 services)... So next time you need to Save the Republic , just contact them without all that fuss... and it's not even that expensive.

Cheers,
Yoav

Edmundo Carmona's picture

Hello, everybody.

With the desire to get rid of using java to rebuild the images I have decided to translate (and correct... I think I found a couple of problems) the library to python.

The library is here:
https://code.launchpad.net/~eantoranz/+junk/raidpycovery

I've already written a couple of articles on my blog about it:
http://maratux.blogspot.com/2010/11/broken-raid5-you-said-dont-use-java.html
http://maratux.blogspot.com/2010/11/testing-raidpycovery-through-mdadm.html

There you go!

ShaneW's picture
Submitted by ShaneW on

Hey man nice article. I was looking for abit of advice, if possible.

I have a 3 disk hardware raid 5 that has collapsed. The backup was incomplete :) so I have been asked to recover it. With abit of messing around with the hardware I have 2 full disk images and 1 half disk image. What I need advice with is determining the chuck size and algorithm. I had a go rebuilding the 3rd drive myself with left-sync and 512k chunk size then compared what I got with the partial image. I found 50k-ish chunks of identical data in identical locations but the rest was junk. Then I saw this article, what should I be looking for to determine chunk size? what should I be looking for to determine the algorithm?

Would it be worth my time to create the first say 200M of the third disk with your script (using a variety of variables) and compare to the other disk I have?

I am of course working in knoppix.

Cheers
Shane

Edmundo Carmona's picture

Hi!

First, I had said that it was translated to python and life was beautiful... but on python the recovery process was horrendously slow. Last night I migrated it to C++ and let me tell you that it's MUCH faster!

http://maratux.blogspot.com/2011/11/remember-times-with-i-used-python-for.html

Cheers!

Author information

Edmundo Carmona's picture

Biography

Edmundo is a Venezuelan Computer Engineer. He is working as a Freelance Java Developer in Colombia since very recently. He has also been a GNU/Linux user and consultant for several years.

After years of being retired from music, he's working right now to regain his classical flute skills.

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!