Gutenberg books with GNU/Linux - Part 1

Gutenberg books with GNU/Linux - Part 1


I am a great fan of the Gutenberg project, a noteworthy and honorable effort to digitize copyright-free texts. This project has released into the public domain over 20,000 classic books.

This article will explain how to integrate this huge body of material with the Ubuntu desktop.

The Gutenberg project releases books in a standard text format. Volunteers convert a subset of the text to HTML and PDFs and to a few other less ubiquitous formats besides. Other support sites such as manybooks.net give you even more choice of formats including compressed formats for palmtops, iPod and Java-based telephones. These sites are mostly free with the option to make a small donation.

The Gutenberg project has delivered to the world the base material for serious and progressive Edutainment. Started back in 1971 by Michael S. Hart, the project has blossomed with numerous affiliated projects. The project's great success is measurable with a body of around 100,000 available texts--and rising.

The Gutenberg project has delivered to the world the base material for serious and progressive Edutainment.

Can software help bring to life plain text files? Alternatively, are we doomed to wander the world carrying sheets of paper and waiting for the next advance in smart reader technology and potential commercial vendor lock-in?

I will discuss the installation and merits of the following packages:

  • guternbrowser
  • gmountiso
  • espeak
  • recite
  • serpentine

By the finish you will be able to browse e-text and/or hear the text spoken. I will also explain how to download and mount Gutenberg CDs (with collections of books).

Installing the e-text browsing packages

With so many packages available in the GNU/Linux universe, it makes sense to search for a pre-compiled binary rather than installing from a downloaded tar file. I personally find it natural to run apt from the command line; for example typing:

sudo apt-cache search guten.

To my surprise, my search only returned two entries: PYGE[13] (an unrelated python based series of related file manipulation applications) and gutenbrowser. Gutenbrowser sounded promising, running:

apt-cache show gutenbrowser

You can see that the package is an e-text reader supported by Ubuntu, falling under the general section universe/text. To install the software, input:

sudo apt-get install gutenbrowser

Test the newly-installed software by typing gutenbrowser. An e-reader will pop-up, as shown in figure 1

Figure 1: The gutenbrowser under fire.Figure 1: The gutenbrowser under fire.

The gutenbrowser contains basic functionality to search and download text from a central repository, make bookmarks and display the text in any font size that you choose. After reading a few of the smaller books, I concluded that the browser was a solid application for the desktop.

To make a shortcut, right click your mouse on the desktop and select the "create launcher" option. A dialog will then appear asking for extra information for name, command, and comment. Simply fill in Gutenbrowser for the name, gutenbrowser for the command and an optional comment.

To download an e-text and read it, you will first need to visit the menu item Library → Gutenberg. At this point, a dialog will appear with a list of authors and titles. Double clicking on a book title triggers the E-reader to download the book to the selected directory on your local drive. To import the downloaded file into the main menu select "file open local library" and import the text you have just downloaded. The e-reader will then display the title of the book with any other imported books titles available. Selecting the title and double clicking loads the book's text.

You should now be able to read imported books in Gutenberg text format. However, what if you are a road warrior and prefer to keep a large local library? Disk space is cheap: take advantage of ever improving economics, download an ISO image from a Gutenberg site, and mount the image locally, thus allowing you to read all the thousands of books contained in it.

Local storage

After using the e-text reader for a couple of days I realized that downloading texts was inconsistent: everything worked most of the time, but sometimes downloading the books took a little time and occasionally the relevant file was missing. I therefore decided to download an ISO image the "Best Of CD August 2003" as mentioned on the Gutenberg website. I then tried to mount the image to my file system. The advantage of this approach is that I did not need to burn a CD/DVD and then copy the files over to my hard drive. Downloading the ISO is potentially helpful for schools that have limited internet access for their local network of machines. Please be a good net citizen: if you download an image choose the nearest mirror site instead of risking surge loading the main site.

A compact Ubuntu-related helper to mount ISO images is Gmount-iso. To install it, just type from the command line:

sudo apt-get install gmountiso

A compact Ubuntu related tool to mount ISO images is Gmount-iso

On installation, a new menu option appears on the Gnome desktop under Applications → System Tools → Gmount-iso (figure 2). To mount the image you will first need to create a directory. For this example, I made a temporary one under my home directory /home/alan/temp/gunten. To mount, simply locate the image file from Nautilus and then the mount point, in this case /home/alan/temp/gunten and finally click the button with the word "mount" on it.

The main dialog from Gmount-isoThe main dialog from Gmount-iso

Performing the same action from the command line every time means remembering the exact syntax of the mount command, which can be a hassle. However, for the sake of completeness the equivalent commands to the actions just mentioned is:

sudo mount -o loop -t iso9660 /home/alan/Desktop/PG2003-08.ISO /home/alan/temp/guten/

The Gutenbrowser e-reader is fast and reads the text files found under the /etext series of directories on the newly mounted ISO without any hesitation.

To create a permanent mount you will need to edit the /etc/fstab file as root. However, before you do so please remember to backup the older version to a safe location. Assuming that you have moved your ISO image off the desktop and created the directory /media/guten, add at the end of /etc/fstab something like this:

# GutenBerg project.
/home/alan/temp/image/PG2003-08.ISO  /media/guten     iso9660   ro,loop,auto   0   0

Please keep in mind that the download sites suggest, if possible, that you should copy and distribute the CDs and DVDs, spreading the opportunity to read classic literature without affecting bandwidth.

Audio Format (libre)

The e-reader lacks basic text to voice functionality. If you wish to incorporate this extra audio feature, then you should find another e-reader or try out a command line tool, or use the ever-improving accessibility features in Ubuntu. Festival, recite, espeak and a few other well known packages contain the necessary tools.

I was therefore pleasantly surprised to discover that when downloading these packages I had also automatically installed espeak. To test espeak’s ability to read from a text file, just type:

espeak -f location_of_file

To compare Recite with Espeak, I installed Recite and got both of them to say "hello world":

sudo apt-get install recite 
recite "Hello World"
espeak "Hello World"

Although I have seen some other excellent examples of automated singing on the Internet (the pain still haunts me), practically "speaking" a command line tool reading files is not exactly very tuneful when compared with the human version. An example of a Gutenberg affiliate spin off project is librivox. Volunteers read text and capture it to MP3 and OGG format. The OGG format is patent-free and its codecs are released under the GPL license; therefore, it's my personal format of choice. The volunteers tend to capture chunks of audio divided into chapters or sections of any given book. At the beginning of each set piece is a small advert for the project. This can be somewhat intrusive if the sections are small.

My younger son enjoys going to sleep with the CD player turned on and playing classical music or stories. Sadly, his CD player is not modern enough to accept the MP3 or OGG format. Creating an audio CD under Ubuntu from a series of OGG files is straightforward. Simply place a blank recordable CD in your CD burner. A few seconds later, a pop up window will appear: one of the options will be to create an audio CD. Choose the Add option and navigate through Nautilus file manager to the OGG files you wish to convert. Finally, hit "Write to Disc". If all goes well, five minutes later you will have a CD ready to go.

The Serpentine Audio CD CreatorThe Serpentine Audio CD Creator

If you do not have Serpentine installed, then the following command will add it to your system:

sudo apt-get install serpentine

Conclusion

I have explored a number of Debian packages that make reading Gutenberg texts a pleasure. In part two of this article I will explore Gutenberg-related Java-based applications, from speed reading to converting text to other handy formats.

Until then happy reading.

Resources

  1. Distributed proof readers home - http://www.pgdp.net/c/
  2. Espeak - http://espeak.sourceforge.net/
  3. GutenBerg homepage - http://www.gutenberg.org/wiki/Main_Page
  4. Gutenpalm project - http://gutenpalm.sourceforge.net/
  5. ISO image - http://www.gutenberg.org/etext/11220
  6. List of affiliates - http://www.gutenberg.org/wiki/Gutenberg:Partners%2C_Affiliates_and_Resources
  7. Manybooks homepage - http://manybooks.net/about/
  8. Ogg format - http://www.vorbis.com/
  9. Plucker - http://www.plkr.org/
  10. PYGE homepage - http://pyge.sourceforge.net/
Category: 
License: 

Comments

lpotter's picture
Submitted by lpotter on

Apparently, the version of Gutenbrowser that you reviewed (judging by the screenshot) is a bit out of date. The current version is based on Qt 4 and webkit, and is able to display full html content. It is also able to download newer etexts that live in the Gutenbergs project new file system hierarchy.

At any rate, thanks for the mention!

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!