Delve deep into drives

Delve deep into drives


I recently read a doctorate’s thesis on file system robustness by Vijayan Prabhakaran from the University of Wisconsin. It’s very interesting, and may explain in part the recent ruckus on the LKML around file systems.

File system robustness

Vijayan takes several file systems out for a spin: those are ext3, ReiserFS, JFS and XFS (they all run on UNIX systems, following POSIX). Microsoft’s NTFS is added to the mix, but in a limited fashion. He then describes the different ways a drive can fail; typically: media damage, electrical failure, bit rot, mechanical wear, but also firmware bugs, bus timeouts and controller failures.

This is followed by terminology used in the thesis, to describe the communication between the drive and the file system (its driver) and how those try to work together to assure data consistency. If you remember recent publication on the subject of S.M.A.R.T., you’ll know that said communication isn’t very consistent nor reliable...

Journaling explained

There are three main journaling modes in use today: writeback, ordered, and data journaling. Those differ on how journaling is made. But first of all: what is journaling?

Since finding a location for the data to be written is the most time consuming operation right now, a journaled file system simply has an area set to store new data to be written temporarily. Once this data is safe on disk, it will then be allocated to its “final” home and the temporary storage area will be freed for the next data block.

Now, there are two things to write on a disk when new data is added to it: the data itself, and the file system’s record for that data (location, file name, size, creation date, access time, fragments, authorizations etc.).

For example here is how ext3 deals with those modes (writeback, ordered, and data journaling):

  • writeback: the inode is put in the journal. If successful, it is committed right away. Once committed, the inode is written at its final position. Actual data can be written before any journaling, once the journaled inode is committed, or once the inode is in place.
  • ordered: data is written to the disk first. Once on the disk, the inode is then written to the journal, committed if journaling is successful and, at a later time, committed to its final position. This is the most used mode, and the only mode provided by XFS, JFS and NTFS.
  • data: both inode and data are written in the journal. If successful, both are committed. Later on, they are written to their final location.

All three modes have different advantages and problems, and different performance profiles.

Prabhakaran then goes on tracking bugs in all UNIX drivers, describing inconsistencies and danger points, tracing the outline of a tougher file system and then describing how to create an evolved file system that would unite the advantages of most current file systems and overcome most of their shortcomings. At the same time, he tries to describe how moving critical logical pieces from the driver to the kernel (and therefore sharing this code from one file system driver to the other) may make development easier and faster while at the same time strengthening existing FS.

File system summary

Following those descriptions, Prabhakaran describes his testing methodology, essentially on error injection on all UNIX-based file systems (NTFS couldn’t run his test suite) and some detailed end-user provoked tests (NTFS results are mostly based off this part). It results that:

  • Ext3: Overall simplicity. Ext3 implements a simple and mostly reliable failure policy, matching the design philosophy found in the ext family of file systems. It checks error codes, uses a modest level of sanity checking, and recovers by propagating errors and aborting operations. The main problem with ext3 is its failure handling for write errors, which are ignored and cause serious problems including possible file system corruption.. * ReiserFS: First, do no harm. ReiserFS is the most concerned about disk failure. This concern is particularly evident upon write failures, which often induce a panic; ReiserFS takes this action to ensure that the file system is not corrupted. ReiserFS also uses a great deal of sanity and type checking. These behaviors combine to form a Hippocratic failure policy: first, do no harm.. * JFS: The kitchen sink. JFS is the least consistent and most diverse in its failure detection and recovery techniques. For detection, JFS sometimes uses sanity, sometimes checks error codes, and sometimes does nothing at all. For recovery, JFS sometimes uses available redundancy, sometimes crashes the system, and sometimes retries operations, depending on the block type that fails, the error detection and the API that was called.. * XFS: Simple and well-defined. From our preliminary analysis, we find that XFS has a simple and well-defined failure policy to handle write failures. It checks error codes and on synchronous write failures, XFS stops the file system and propagates errors. On asynchronous write failures, the failed write is retried persistently.. * NTFS: Persistence is a virtue. Compared to several Linux file systems, NTFS is more persistent, retrying failed requests many times before giving up. It also seems to propagate errors to the user quite reliably. However, more thorough testing of NTFS is needed in order to broaden these conclusions (a part of our ongoing work)..

Now, I couldn’t find if this work is being used in the design of ext4 (ext3’s successor). However, I find it interesting to note that there was indeed some ruckus about file systems (ext4 in particular) around the time this thesis was published. Moreover, I seem to recall some hints at the possibilty of a reworking of file systems in general in kernel 2.6.22 and up.

And now, on a similar vein...

You may recall that I’ve talked about NTFS-3G previously. There’s a good reason for that: I’m using this piece of software quite often, and obtaining more information on the subject is almost vital to me.

In fact, I’ve been using NTFS-3G ever since version 1.0 came out. Now, progress happened essentially with:

  • Release 1.0: first stable version;
  • Release 1.417: major speedup on large file copies, major reduction in file fragmentation;
  • Releases 1.516, 1.616 and 1.710: bugfix releases when combined with fuse 2.7.0, minor speed improvements;
  • Releases 1.810 and 1.826: improved POSIX conformance. The latter version also allows building a hard-linked binary, which saves some space but also increases speed.

Using the ./configure --disable-library option will build the hard-linked library. Preliminary tests do show a lower CPU use on both small and large file copies (I copied 1,300 files for 6GB worth of disk space, and for once my CPU didn’t max out).

Conclusion

I’m sorry this post isn’t better referenced: I didn’t think about taking notes of where and when I gleaned all those pieces of information, especially since it was quite fragmented. Unfortunately, no one yet has created a brain defragmenter and consistency check utility, otherwise I could verify what I’m writing.

Still, you can check out NTFS-3G’s site for the driver’s latest progress, and download the thesis which is a very interesting read.

Category: 

Author information

Mitch Meyran's picture

Biography

Have you ever fixed a computer with a hammer, glue and a soldering iron? Why not? It's fun!

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!