Learn some command line: using du, df, file, find to make your life easier

Learn some command line: using du, df, file, find to make your life easier


I love the command line. If the command line were a dog, it would be a hard-headed labrador: big and somewhat intimidating, but really kind of even-tempered and friendly once she gets to know you.

I just compared the command line to my dog Roscoe. I love them both, and they both frustrate me.

I can't do much with Roscoe, but I can help out a bit with the command line. And so allow me to introduce four of my favorite utilities: df, du, file, and find.

Filesystem sizes with df

This one is easy. According to the man page, df stands for, "report file system disk space usage." I say it stands for, "disk free." But what do I know?

$ df -h

The -h tells df to report in human-readable numbers. Here, "human-readable" means "human-readable if you know the difference between G and M and K." You can also use -k (report in kilobytes) or -m (report in megabytes) if you desire. It's all up to you.

df -h gives up something like this:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             7.4G  4.6G  2.4G  66% /
varrun               1014M  128K 1014M   1% /var/run
varlock              1014M     0 1014M   0% /var/lock
procbususb           1014M  108K 1014M   1% /proc/bus/usb
udev                 1014M  108K 1014M   1% /dev
devshm               1014M     0 1014M   0% /dev/shm
/dev/sda4              61G  7.3G   51G  13% /home
/dev/sda1              40G   17G   23G  43% /media/sda1
/dev/scd0             7.8G  7.8G     0 100% /media/cdrom0

The first column is the device. For disks, this will be something like /dev/sda_n_, or /dev/hda_n_, where in a small number. Those other filesystems with names like udev or devshm or varrun are OS-specific. This output was taken from a GNU/Linux box running a 2.6.20 kernel.

The middle three columns show the total size, the amount used, and the amount avialable, just like the title says. The Use% column indicates the total percentage used. Generally, you don't want that to read 100%, except for CDs and DVDs, which will always show 100%. The final column tells you where in your directory hierarchy the filesystem is mounted.

That's if for the Very Short Tour of df.

Directory sizes with du

Suppose df reports a filesystem is full, and you need to find the culprit fast. Let's say for illustrative purposes the filesystem is /home. Here's one of my favorite commands of all time:

$ du -k /home | sort -n

Now, technically that's two commands. du stands for "estimate file space usage," though I hate the word "usage," because "use" will almost always work instead. I like to call it "disk use," for hopefully obvious reasons. The -k specifies reporting in kilobytes, rather than filesystem blocks. You can also use -m, which specifies megabytes, if you like smaller numbers. Do not use the -h option. -h means, "print in human-readable form," which will break our nifty sort operation.

The '|' (official name: "bar thingy") means "pipe." "Pipe" means, "take the output of this command, and pass it to the next command." In even simpler terms, this means "route STDOUT (standard out) of the first program to STDIN (standard in) of the next program."

sort sorts lines of data, just as the name implies. It isn't short for "somehow order random text" or anything like that. It just means, "sort." The -n option specifies to sort as if the first word were a number, rather than to sort it ASCIIbetically. For fun, try the sort without the -n. You'll quickly observe that "1" sorts before "101" which sorts before "2." For our purposes, the -n is quite important.

On my machine, that command gives this output:

4       /home/tony/.config/xfce4/orage
4       /home/tony/.config/xfce4/xfwm4
4       /home/tony/docs/fsm
4       /home/tony/docs/stories/speleology
4       /home/tony/.gimp-2.2/brushes
     .
     .
     .
512564  /home/tony/src
685672  /home/tony/tmp/zips
714508  /home/tony/tmp/iso
789240  /home/tony/tmp/tony
813236  /home/tony/video/roscoe
881512  /home/tony/video/family
1694756 /home/tony/video
3835596 /home/tony/tmp
7442492 /home/tony
7442496 /home

As you can see, I have a lot of stuff in /home/tony/tmp. I would look there for things to remove to free up space.

What kind of file is it?

Unlike some operating systems, GNU/Linux (and Unix-like operating systems in general) don't use filename extensions to determine the type of a file. So, a text file does not have to end in .txt, and a jpeg-encoded image file does not have to end in .jpg. Instead, there is a nifty utility called file that will report the filetype for you.

It's really pretty easy to use:

$ file blah.c
blah.c: ASCII C program text

It's really that simple.

Of coure, it uses magic. /etc/magic. Really. I'm not kidding.

Finding files

Find is one of the unsung heroes of the Free software world. Many do not appreciate the functional finesse, the streamlined beauty of this perfect utility. Find can search for files based on name, on size, on ownership, on permissions, on modification time, on access time, on... well, just about anything. Combined with other utilities, you can search on content or file type.

For instance, to find all files ending in .c:

$ find /home -name \*.c -print

The /home tells find to start the search in the /home directory. The -name *.c specifies the pattern for which to search. The * means "anything," followed by .c, which means just that: search for anything ending in .c. The -print is the "predicate;" that is, the action we wish to perform on the things we find. We can do more than just print out filenames.

This gives the following output:

/home/tony/src/gnome/gnome-columns/src/jewel.c
/home/tony/src/gnome/gnome-columns/src/texture.c
/home/tony/src/gnome/gnome-columns/src/renderable.c
/home/tony/src/gnome/gnome-columns/src/rectangle.c
/home/tony/src/gnome/gnome-columns/src/gnome-columns.c
/home/tony/src/gnome/gnome-columns/src/gameboard.c

(There was really a lot more output, but I wanted to keep the display simple.)

Search for files that have been recently changed:

$ find . -ctime -1 -print

This time I specified the start directory as ".", which is the current directory. I've specified the search criteria as -ctime -1, which means "change time, less than one day ago. Again, I specified print. Here is the output:

.
./blah

It returned only one file, blah. How boring.

Let's do something a little more interesting. Let's look for all PDFs in my home directory:

$ find ~ -exec file {} \; | grep PDF

I commanded find to start in my home directory by using the squiggly, '~'. (Actually, it's called a "tilde.") Then I specified it to execute a command, using -exec file {} \; The -exec is a predicate to cause find to execute a command, in this case, file. The '{}' bit means, "substitute the filename here." When find generates the command, it'll be something like, "file ~/stupidname.ext". The '\;' bit marks the end of the executable command. Then I pipe the output to grep, which prints only the lines containing "PDF".

There are better ways of doing this, especially using a command called xargs, but I don't cotton to those new-fangled methods. Well, I do, but you must first learn to crawl before you can fly on the space shuttle.

Here's the output:

/home/tony/src/beerhacker/Documentation/BeerXML_v2_01.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/ewlbook/pre-rendered/ewlbook.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/ewlbook/pre-rendered/ewlbook.es.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/cookbook/pre-rendered/eflcookbook.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/cookbook/pre-rendered/eflcookbook.fr.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/cookbook/pre-rendered/eflcookbook.es.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/cookbook/pre-rendered/eflcookbook.pt-BR.pdf: PDF document, version 1.3

Finally, let's use find to delete all our old emacs backup files. WARNING! DANGER, WILL ROBINSON! THIS IS VERY DANGEROUS! Be very careful when using find do do file manipulation. Always print out the results of find before executing a dangerous command.

First, do this:

$ find . -name \*~ -print

This prints all the files that end in ~, starting in the current directory. Once you are sure you won't miss these files, do this:

$ find . -name \*~ -exec rm {} \;

That's it! You are now wise in the ways of a couple of minor file utilities. As always, enjoy playing around with them. Be safe. Don't run with scissors, or shave with a rusty razor. Remember that cats have five pointy ends, and that with powerful knowledge comes powerful responsibility. Don't abuse these tools, and they will treat you right until the end of your days.

Category: 

Comments

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

Anthony!

Although you mention the xargs I would like to give an example for the simplest implementation of the recursive grep
find . -type f|xargs grep -i

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

Maybe it's the resolution or something else on my computer but I can't seem to make out what the sign in front of each command is. It doesn't look like an "&" or "$". So what is it?

Anthony Taylor's picture

The '$' ("dollar sign") signifies a standard shell prompt-- that is, the dollar sign is often used by several of the canonical shells to indicate the shell is waiting for input (bash and ksh, for instance).

You don't type it in. It's used in this article to signify the user types in the following line.

Output from various commands is designated by lack of the dollar sign.

Scott Carpenter's picture

I've dimly known about df by using it at work. ("Hey, what's that command again that shows disk usage?") But now will have more occasion to use it on my emerging home GNU/Linux systems. I also think discus is a neat little df "prettifier."

----
http://www.movingtofreedom.org/

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

Just a note that a lot of these commands will give you feedback on what they're doing if you ask them for it. For instance,

find . -name \*~ -print | xargs rm -v

will tell which files it's deleting. Just an extra bit of reassurance that you've done the right thing. :)

Lawrence D'Oliveiro

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

i use find with the -exec arg:
find . -ctime +10 -exec ls -l {} \;

mean:
find from this point (.) the files wich the creation time (.ctime) is more than 10 days (+10) and execute (-exec) a command (ls -l). The argument for this command is pased from find as "{}" . to end the command write and scaped ";" (\;).

in my servers i use this for process a lot of files, in combitation with awk (mmised it in this article) to make batch job files.

Flavio Camus

mobilemail's picture
Submitted by mobilemail on

I've been a Windows user since DOS/3.1, but I'm just now learning the guts of Linux. Practical clues like this make it easier.

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

Thanks for a wonderful article,
It will definitely help to make my life in Linux easier.
Specially the find command seems really powerful

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

how to execute a command as well as printing the args into a file using args and find

Anthony Taylor's picture

That's an easy one. Use the "tee" command.

"tee" will copy STDIN to a file, and spit it back to STDOUT. This essentally makes a copy of STDIN into the new file. So, pipe STDIN through "tee" before passing it on to "xargs", like this:

find . -name \*\.c | tee c.list | xargs grep -l sprintf

This would list all the files with the (slightly dangerous) "sprintf" function, while making a copy of all the files checked.

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

I'm more used to the AIX version than the Linux version for "find". I'll have to try a couple of your other commands (du vs. df) as they seem pretty relevant to our current situation. (AIX guru retired, all the kids are left to play and learn on our own).

On a different note, is this Tony Taylor that used to work / go to school at UAF ?

-KS

Anthony Taylor's picture

Well, I imagine it depends on who you are, but since I went to UAF, I am most likely that very same Tony Taylor.

If you want, contact me at tony (at the following domain) paperdove.org. I'd love to get in contact with old friends from UAF.

Author information

Anthony Taylor's picture

Biography

Tony Taylor was born, causing his mother great discomfort, and has lived his life ever since. He expects to die some day. Until that day, he hopes to continue writing, and living out his childhood dream of being a geek.

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!