Picante shell: pipes

Picante shell: pipes


Spicy food should cause chemical burns, or spontaneous human combustion. Your mouth should feel as if it’s tangled with an angry badger. Capillaries in your nose should burst. Your gut should sue for punitive damages. If not, your food just isn’t spicy enough.

At least, that’s how I feel. So, when I say things like, “Here, try some of these mild command-line recipes; they’re really quite tasty”, you might keep that in mind. One man’s “mild” is another man’s, “I think you’ve poisoned me”.

If you are ready, settle in, dish up, and keep a nice lager handy. You’ll probably need it before we’re done.

Base ingredients: input and output

The “Unix Way” is a bit like salsa. (It’s also like the Tao, or like processed cheese, or like the way puppy toes smell like popcorn.) As with salsa, there are many ingredients, and no one definition. Many kinds of salsa are made with tomatoes. Others are made without. Some are mild. Some are slightly tingly (these are labeled, “Hot!” in American supermarkets). People recognize salsa when they see it, but there is no one way of making it.

And so the Unix Way is not a single thing, nor a collection of specific things, but a way of combining things. Many have summed up the Unix Way as, “Do one thing, and do it well”. What that misses is a simple concept most of us learned on the playground: “Play well with others”.

The way most Unix command line utilities play together is through another simple concept, one that dates back to the beginning of the digital computer age: “input” and “output”. That is, most of these utilities take some information (input), do something with (process) it, and spit the final product back out (output).

“Input” is handled in many ways. It can be from a file, from user input, from the network, or from another program. Most programs can handle at least three of these: from a file, user input, or another program. Actually, that’s only two ways. Programs treat user input and input from another program in exactly the same way. For the sake of brevity, let’s give a name to this kind of input. Let’s call it, “standard in”.

Processing is up to each program. That is what sets each command line utility apart. Mastering the command line is mostly about mastering each program.

“Output” is generally either to a file, or to the screen, or to another program. Okay, like “standard in”, these last two are really the same. The output goes to the screen, or you can send the output to another program. Let’s call this single output, “standard out”.

This might seem complex, but really think of it this way. “Standard out” is just the output of the program, as if it were sent to the screen. So, when you type a command like ls, the output that is printed to the screen (in this case, the directory contents) is really going to something called “standard out”. By default, this is to the screen. “Standard in” is just the opposite: it’s as if the user were typing information into the program.

Instant picante: pipes

So, how do you get the output of one program to go to the input of another program?

It’s a thing called a “pipe”. In the shell, a pipe is created with that broken bar symbol above your Enter key, like this: |. This very spicy symbol takes the standard output of one program, and attaches it to the standard input of another. The best way to see how it works is to try it.

No. Really. It’s not that spicy. There’s hardly any spontaneous human combustion involved at all.

So, give it a shot, starting with the ps command:

$ ps -e

  PID TTY          TIME CMD
    1 ?        00:00:01 init
    2 ?        00:00:00 migration/0
    3 ?        00:00:00 ksoftirqd/0
    4 ?        00:00:00 watchdog/0
    5 ?        00:00:00 migration/1
    6 ?        00:00:00 ksoftirqd/1
    7 ?        00:00:00 watchdog/1
    8 ?        00:00:00 events/0
            .
            .
            .
 8026 ?        00:00:00 gnome-terminal
 8028 ?        00:00:00 gnome-pty-helpe
 8029 pts/0    00:00:00 bash
 8684 pts/0    00:00:00 emacs
 9010 pts/0    00:00:00 ps

The dollar sign $ merely indicates the shell prompt. Type only the ps and hit return. The output will look something like mine, with columns of cryptic and arcane information designed to mystify and enlighten.

Notice how it all goes scrolling by, too fast to read? Try this:

$ ps -e | less

Isn’t that better? Now, you can scroll forward one screen at a time by hitting the space bar, or back one screen at a time by hitting the b key. Quit at any time by hitting the q key.

That magic happens by taking the input of the ps program, and passing it to the less program.

less is a Unix command for browsing files. Try it on its own like this:

$ less ~/.bash_history

This should allow you to browse your command line history, if you normally use the bash shell. Note how the input to less can be a file (e.g., ~/.bash_history) or the output from another program (e.g., ps -e). This is a common theme among command line utilities.

This simple concept gives us great power.

Extra picante: gluing programs together with pipes

I hope you have your lager handy. This is where it starts to get a little spicy.

See, you can chain programs together using pipes. You can glue as many together as you need, to form a chain of spicy command line goodness. Start simple, with a single program:

$ find . -name \*\.c

This prints out a list of C source files. That’s it. If you happen to have some C source code laying around, give it a go.

Next, save that list to a file, but still print it out.

$ find . -name \*\.c | tee c.list

tee simply takes standard input (here, the list of C source files), saves it to a file (here, c.list), and then prints it to standard out. In fact, the output to the screen will be identical to the find command on its own. The only difference here is, the list is also saved to a file.

Now, to get really fancy:

$ find . -name \*\.c | tee c.list | xargs grep -l sprintf

This saves a list of all the C source files starting in the current directory, and prints out a list of all those files that contain the string sprintf. Cool. Or hot. Whichever.

This is just like the example above, but now you pipe standard output from the tee command directly to the xargs command. xargs then uses grep to search all those files for a string, sprintf, and prints the name of those files to standard out.

See how processing gets progressively more complex? As long as you know how each individual program processes its standard input, you can figure out what the chain of programs does.

Make it as hot as you want

Okay, you might not be like me. You might not enjoy food that raises blisters on your tongue. That’s okay. With the shell, you can make it as hot or as mild as you want. Just keep in mind, when you must swallow an angry badger, you might put it through the pipeline first. It’ll make the task a little easier to swallow.

Category: 

Comments

Tyler's picture
Submitted by Tyler on

Hi,

If you want to see your bash history, the command 'history' is probably better. .bash_history only gets updated when you exit your shell, so it will never contain the commands you have used in the current shell. 'history', on the other hand, includes the last 500 commands you've used, including commands issued from the current shell. The 500 command limit can be modified by changing the shell variable HISTFILESIZE.

Thanks for the articles! I know this is supposed to be the year of Linux on the desktop, (or was that last year, or 2003?), but even with KDE, GNOME, Beryl et al, I still think the command line is where the power of GNU/Linux is really at.

Tyler

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

Here's another fun Bash feature: command substitution. Did you know you can put a whole command inside a "<(...)" construct, and Bash will pass its standard output via a named pipe to the containing command? For example, the single-line command:

diff -u <(lsmod | sort | cut --delimiter=' ' --fields=1) <(ssh hypatia lsmod | sort | cut --delimiter=' ' --fields=1)

compares what kernel modules I have loaded on my current machine (Theon) with those on my other machine (Hypatia).

Lawrence D'Oliveiro

Anthony Taylor's picture

Nice! That's an excellent use of redirection, one of my favorite things. I did *not* know you could pass in multiple redirects to a single program like that.

Mitch Meyran's picture

First thing: I'm not sure 'picante' actually exists in Spanish - and if you were trying to say it in French, the correct spelling is 'piquante'.

Second thing: there is not three way to process pipe I/O in Unix, not two ways - there's only one way, with three different standard behaviours (those may vary depending on programmer's goal):
- a file is named as input/output: since everything in a Unix system is a file, said file can be... a file on disk, a device, a process. Depending on application, they may only produce input or output on standard I/O, ignoring parameters.
- a file is not named, but '-' is used as input/output: the pipe will be connected with a concurrently running process (usually following with '|') instead of a file if output to a file is its default ehaviour. Many programs will accept this, however depending on input's format not all may know how to handle it.
It can be nice to, say, convert a PDF file to bitmap graphics and do text recognition on it, or to convert a sound file from one format to another, without writing uncompressed data to disk.
- no file is named: default input/output is used. While it depends on program, usually default input is keyboard, default output is screen. Some program REQUIRE a named I/O.

---
A computer is like air conditioning: it becomes useless when you open windows.

Author information

Anthony Taylor's picture

Biography

Tony Taylor was born, causing his mother great discomfort, and has lived his life ever since. He expects to die some day. Until that day, he hopes to continue writing, and living out his childhood dream of being a geek.

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!