Desktop search tools for GNU/Linux: the competition hots up (part one)

Desktop search tools for GNU/Linux: the competition hots up (part one)


I sometimes think that search tools are like my local bus: none comes along for ages and then three turn up in quick succession. For quite some time Beagle and Kat have been meeting the needs of users like you and me who fill up their hard drives with the results of our internet meanderings and because we have been remiss in keeping those drives well organized we eventually have to use search tools to find that PDF or HTML article we spent an eternity looking for.

Beagle has been out for quite some time and was very much launched as the poster boy of Mono along with F-Spot. Before I get flamed, I know some purists will regard the installation of Mono as cause for being strung up outside Micronovell by piano wire from a lampost but if you want Beagle you will have to turn a Nelsonian eye to it and grit your teeth. (Is that a mixed metaphor?) Don’t worry. There is no truth in the rumour that every time someone uses a Mono-based application Satan blinds an innocent puppy.

It comes frequently pre-installed with many distros today and, where it is not, it can usually be installed from the software repositories which will pull in any dependencies required. You can see a list of dependencies here. If a precompiled binary is not available for your particular distro you can always head over to the Beagle website for the source tarball. Install from source? Why not? It reminds me of the good old days when men were real men and installing from source was real installing from source, when real men made their own frocks and crocheted their own yoghurt!

Beagle was designed for the GNOME desktop but it will also work on KDE, and the Kerry frontend was designed specifically for that desktop. In either incarnation, it is still going strong but I’m afraid that Kat is floundering. In fact it appears to be dead. Go to either here or here and you will be redirected to the main page of Kat’s sponsor, Mandriva, which informs you that the Kat project webpage is closed. It is still available for download in either precompiled binaries for earlier versions of various distros or in source format which now looks like the best bet.

GNU/Linux is about choice and therefore freedom and therefore power and when it comes to desktop search tools the same mantra applies. If some efforts have fallen by the wayside the GNU/Linux community possesses sufficient imagination and talent in depth to come up with alternatives and I want to look at three Beagle rivals and compare them to that established search tool.

Beagle: best in class?

Well, there’s quite a lot to list so let’s get the facts out on the table about Beagle. It has been out for a few years now and version 0.2.17 has just been released. Beagle offers real time searching and it supports the following data sources:

  • File system
  • Evolution mail, calendar, and addressbook
  • Thunderbird mail, news, RSS feeds, and addressbook
  • Korganizer events and TODOs
  • KMail mail
  • KAddressbook addressbook
  • Pidgin (formerly Gaim) and Kopete instant messaging and IRC logs
  • Firefox and Epiphany web pages (as you view them, through browser extensions, automatically disabled for secure websites for things like online banking and credit card transactions)
  • Konqueror web pages
  • Blam, Liferea and Akregator RSS feeds
  • Tomboy, KNotes, and Labyrinth notes
  • Konversation IRC logs
  • Opera Web History

It supports extraction of text and metadata from the following file formats:

  • Folders
  • Office Documents
  • OpenOffice.org (.sxw, .sxc, .sxi and more)
  • OpenDocument (.odt, .ods, .odp)
  • Microsoft Office (.doc, .xls, .ppt)
  • AbiWord (.abw)
  • Scribus (.sla)
  • Rich Text Format (.rtf)
  • PDF (.pdf)
  • Text Documents (.txt)
  • HTML (.xhtml, .html, .htm)
  • Source code (Boo, C, C++, C#, Fortran, Java, JavaScript, Lisp, Matlab, Pascal, Perl, PHP, Python, Ruby, Scilab and Shell scripts)
  • Plain text (.txt, any plain text file that isn’t filed under any other category)
  • Documentation/Help Documents
  • Texinfo
  • Man pages
  • Docbook
  • Monodoc
  • Windows help files (.chm)
  • Images (.jpeg, .png, .bmp, .tiff, .gif, .svg)
  • F-Spot and Digikam tags in the images are also indexed
  • Audio (.mp3, .ogg, .flac, .ape, .mpc, .m4a, .aac, tracker, amiga audio, .wma)
  • .m3u and .pls playlists
  • Archive files (.zip, .tar, .gzip, .bzip2) and their contents
  • Application launchers
  • Linux packages (ebuild, .rpm, .dpkg)
  • Generic XSLT files Video (.mpeg, .asf, .wmv, .mng, .mp4, quicktime and other formats supported by MPlayer or Totem

And, if you want to see a list of supported data sources for a Beagle installation, then just issue the following command:

beagle-info --list-backends

This is what I got on my laptop running Mepis:

 
Mail (/usr/lib/beagle/BeagleDaemonLib.dll)
Files (/usr/lib/beagle/BeagleDaemonLib.dll)
GaimLog (/usr/lib/beagle/BeagleDaemonLib.dll)
IndexingService (/usr/lib/beagle/BeagleDaemonLib.dll)
Tomboy (/usr/lib/beagle/BeagleDaemonLib.dll)
Blam (/usr/lib/beagle/BeagleDaemonLib.dll)
Liferea (/usr/lib/beagle/BeagleDaemonLib.dll)
Akregator (/usr/lib/beagle/BeagleDaemonLib.dll)
KonquerorHistory (/usr/lib/beagle/BeagleDaemonLib.dll)
Kopete (/usr/lib/beagle/BeagleDaemonLib.dll) 

And here is a list of applications using Beagle:

  • The GNOME-based search tool included in Beagle.
  • Kerry Beagle: a KDE-based search tool.
  • Nautilus: The GNOME file manager can use Beagle to quickly find files based on content and metadata.
  • Yelp: The GNOME help browser uses Beagle to search Docbook and man pages.
  • Deskbar: A versatile search interface
  • Tangerine: allows for publication of music over a local network using DAAP can find all your music automatically using Beagle.
  • Brasero: A CD/DVD-burning application can find files on your system using Beagle.
  • The SUSE Linux Enterprise Desktop main menu has a search entry which uses Beagle.
  • beaglefs: A FUSE filesystem built on top of Beagle.
  • kio-beagle: A KDE IO implementation for Beagle integration into KDE applications like Konqueror.
  • Peagle: Beagle Web frontend, coded in PHP, with a look and feel similar to modern web search engines.
  • catfish: A unified frontend to beagle, (s)locate, doodle and tracker, written in PyGTK2.
  • kBeagleBar: A KDE kicker applet for beagle.
  • yaBi: A pyKDE application for querying beagle. It uses KHTML to display results, CSS for designing and beagle-python bindings to query beagle.

Given such extensive support it is not surprising that the implementation of Beagle has been seen as one of the factors in the adoption of GNU/Linux on the desktop.

Beagle’s a clever dog. How does he do it?

First off, it utilizes the Lucene search engine from Apache and uses inotify. inotify is a Linux kernel subsystem to provide file system event notification and has been in the main kernel since version 2.6.13. This is ideal for search tools like Beagle because it permits reindexing of changed files obviating the need for scanning a file system frequently which would be CPU-intensive. Another thing you can do is to enable extended file attributes. It is a file system to enable users to associate computer files with metadata not interpreted by the filesystem. It is supported in Ext2, Ext3, ReiserFS and XFS filesystems. If not enabled, the default for Beagle is to track file status by using the slower sqlite.

To enable user_xattr you will need to edit /etc/fstab (as root) and add it to the file attributes for the partition in question—usually the partition containing home (e.g., /dev/hda3/home). You then need to remount the partition with mount -o remount /home. A word of caution though: I have seen reports of adding this feature to the XFS filesystem which resulted in a read only filesystem on reboot, so keep a rescue/repair LiveCD on hand just in case! In fact, it might not be a bad idea to try all of this on a fresh, testbed install to be on the safe side and copy in some files for Beagle to chew on when you execute searches. For further information on inotify and extended attributes look here and here respectively.

Allied to the above-listed features Beagle makes for a powerful search tool although it has been accused of being something of a resource hog. Run Beagle for a few hours and check it with the top command; enabling user_xattr will use some tens of megabytes for several gigabytes of files. If you are bothered by this then you can set the environmental variable with BEAGLE_DISABLE_XATTR but be aware that this will not reset ctime back to mtime which some worry might affect backups (For more detail on what this is have a look at this Wikipedia entry. This is perhaps a small price to pay for such a versatile piece of software which interfaces to so many applications.

Other tips

If you are determined to run it on your laptop but are concerned about the drain on your battery you can disable the Beagle Daemon either by issuing the command beagle-shutdown or in all versions since 0.2.14 simply uncheck the box marked “Start search & indexing services automatically”. Or, to use Beagle without indexing, to conserve battery power, issue the command beagled --disable-scheduler. If you are on AC power, this will not be strictly necessary, but a matter of personal preference and how much resource you are prepared to give to Beagle on your hard drive. Users of Ubuntu Fiesty Fawn will have Beagle version 0.2.14 which will have an option to disable indexing whilst on battery power. I have that feature on Kerry Beagle on Fedora Core 6, which can be set directly from the graphical user interface configuration option. (However, Fiesty Fawn ships with Tracker, a Meta search tool and I have heard that that it might scrap with Beagle like a pitbull unless you disable the Beagle daemon first. As I am not running that version of Ubuntu yet, can anyone comment on this?) If you are using a different version you can disable Beagle and run it manually on a per session basis as and when needed. To do this issue the following commands and exercise the little mutt:

beagle-shutdown
export BEAGLE_EXERCISE_THE_DOG=1
beagled

This will index the hard drive agressively and hog resources, so leave it to get on with and resume other tasks later.

Whilst for most tasks the GUI for Beagle will be quite ample for your purposes, as always with GNU/Linux, the command line can be your friend—to exercise a fine degree of granular control try these commands for size:

  • beagle-config—Current documentation is probably too limited to be useful. Use the beagle-settings gui instead.
  • beagled—The main Beagle program behind the scenes. Coordinates indexing and searching.
  • beagle-query—A command line search interface to the Beagle indexes.
  • beagle-settings—a graphic interface for beagle-config.
  • beagle-shutdown—Launch to cleanly shutdown the beagled program.
  • beagle-status—Continually displays the number of items indexed by type.
  • beagle-search—The graphic user interface for Beagle.

I mentioned earlier that there were two graphical frontends for Beagle, one for GNOME (which will also run on KDE) and Kerry Beagle. I prefer the latter. Once you have completed a search you can right-click and select the application with which you want to open it. In the case of web pages, I have a choice of five installed browsers as well as the extra option on the open icon at the bottom of each search result to open with the default browser, Konqueror. There is another menu for actions that offers me the options to attach as an e-mail, edit as root and send with Bluetooth. By contrast the GNOME-based frontend offers only the opportunity to double-click and open a webpage with Konqueror. Kerry Beagle as a front end seems to lack granular configurations. You seem to need either to launch the GNOME frontend or, if you prefer, launch the settings option only. In a terminal, as normal user, type beagle-settings and a small front-end will give you Search Preferences box with two tabs—search and indexing. There are options for adding file paths and excluding them too on the grounds of privacy. If you are curious as to what the busy little canine is up to you can also issue the following command as normal user in a terminal: ls ~/.beagle/Log/ | grep current.

Give the dog a bone

If there is one bone Beagle might be grateful for it would perhaps be integration with Google. Indeed, that is one of the projects at the Google Summer of Code but, in fact, it is already available via the installation of the Deskbar Applet, a facility for GNOME users which is only as far away as a command in your package manager of choice but a little more needs to be done to integrate the applet with Google web search.

Basically, you will need to go to the Google API page and use your Google account or create a new one. Once an account has been created a Google Web API licence key will be generated and e-mailed to you. The installation and any problems associated with it are too detailed to itemize here, but there is a very good guide on the following website. You will probably also need to know that Google is no longer issuing new keys for the SOAP Search API which is needed to integrate Google and Beagle and has switched (for new, not existing, users) to an AJAX Search API. There is a good guide here. Just follow the links. Your mileage may vary and I have not tried this yet.

Other frontends: Catfish, Peagle and Yabi

There are, it is said, nine ways to skin a cat and GNU/Linux knows all of them. So, it will come as no surprise to the reader that people have been busy creating new frontends for Beagle and the three listed above are well worth a look. I will look at them in part two. In the meantime, have a look at the Beagle, Catfish, Yabi and Peagle screenshots categories.

Screenshots

Kerry Beagle configurationKerry Beagle configuration
Kerry Beagle backendsKerry Beagle backends
Kerry Beagle configuration with indexing for laptopsKerry Beagle configuration with indexing for laptops
Kerry Beagle Daemon StatusKerry Beagle Daemon Status
Category: 

Comments

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

Beagle is a bit buggy and extremely slow. It also uses lots of memory/cpu. But fear not a new search tool is available that doesn't have all the problems of beagle. It's called Tracker - http://www.gnome.org/projects/tracker/index.html Tracker is written in C and is super fast and uses very little resources and it is improving even more every day. It is also much more than a search tool.

chemicalscum's picture
Submitted by chemicalscum (not verified) on

Yes I use it. It is great on older hardware as it has a very low memory footprint. it doesn't yet cover emails but it already handless a wide range of filetypes, video, music, documents, test and code. It is indeed lightening fast.

Its front end is written using pyGTK so it fits in nicely on a GNOME desktop and it also integrates into Nautilus as its search function. I would reccomentd it to anyone. Cut out the bloat of Mono. On that vein a superior repacement for Tomboy is Zim http://www.pardus.nl/projects/zim/ which is written using PerlGTK and I find I use to organize my whole life. While when I had Tomboy installed it was just a nice toy I hardly ever used.

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

You do know that kat is dead? strigi will be the search backend mainly used in KDE 4.

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

I, like a number of people and at least 1 major distro (Fedora) have found Beagle resource heavy (the use of Mono)and a bit buggy and have removed it. I turned to Recoll which I find lighter on resources and a little more useful as it allows stemming though it does not have kio_slave functionality like Beagle.

I'd love to see a head to head with Beagle, Recoll, Tracker and Pinot.

A very nice and indepth article, hoping the next parts are as well done.

J.B. Nicholson-Owens's picture

Is there any plan to include some means of letting OS distributors easily plug in an indexer (the user could then pick what indexer they like, what front-end they like, etc.)? Some effort to make it easy for all sorts of programs to use the indexing system would be great too. I haven't yet done enough work with GNOME or GTK+ to know how to best approach this, however I'm sure others do know how to best allow this kind of flexibility and inter-app indexing.

It would be great to format the search results like another Nautilus window so I could sort the results in the same way I sort a folder contents in Nautilus' detailed view. It would also be helpful to index e-mail without having to download them locally to one's account so I could more easily find emails in the huge hierarchy of emails I've stored on my IMAP server.

I like the speed of Tracker and it seems to use far less RAM than Beagle. But I've noticed that Tracker misses some of the documents that should be in the hit list (for lack of a better term) and the default front-end I tried doesn't allow sorting the results.

J.B. Nicholson-Owens
mail@digitalcitizen.info
http://digitalcitizen.info/

Michael Fierro's picture

I second the recommendation for recoll. It's much nicer to system resources, and does just about as good a job as Beagle. Plus, it doesn't rely on Mono (icky!).

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

I prefer Recoll as it's less buggy, less resource hungry and doesn't need to have GNOME libs or KDE libs installed to work.

I can't believe you just ignored it...

trollzor's picture
Submitted by trollzor on

*thought I'd cross-post my comment from digg in reply to opencoder saying beagle used too many resources and "locate" command *

I just spent the last couple of years on fedora then Ubuntu and searched the same way, but locate is only as good as your last updatedb and sometimes filenames don't cut it. My impression of beagle when I installed it on Ubuntu was that is was a very good app, but the resource use seemed a bit iffy for how often I used it (at most once a day).

This resource issue is going to be a major issue with linux in the future I think, I just got a macbook and it's got spotlight integrated and I have dashboard loaded up with apps, there is negligible cpu use at idle. And while it may be acceptable for beagle to be a little porky (as anyone who's owned a real beagle knows) gDesklets is a resource disaster compared to dashboard, I have had it up in the 20-30% cpu range with a desktop full of apps on a P4 northwood 2.8gighz on Ubuntu. Maybe that doesn't count so much on a desktop (though I'd argue it does) but on a laptop that is suicide, Vista was getting a panning for taking up people's batteries faster than XP so I dont' think linux should be off the hook for that.

All that said after a week of using OS X after using Linux exclusively for 4 years there isn't any technology it has that Linux doesn't, it's all just in the polish which is comforting for the future of the Linux desktop. I got the macbook because it's one of the most Linux compatible laptops out there, in addition to best value laptops you can buy here in Australia (where they are still trying to palm off celerons at the same price point often) and OS X doesn't really impress me enough to swap loyalties, Ubuntu is going on soon.

NotThistime's picture
Submitted by NotThistime (not verified) on

"I have seen reports of adding this feature to the XFS filesystem which resulted in a read only filesystem" - no wonders there: XFS supports xattr without (!) any fstab option.
Therefore it doesn't make sense to edit fstab in case of xfs file systems. It can even be that the additional option creates an error because it is not understood.

Author information

Gary Richmond's picture

Biography

A retired but passionate user of free and open source for nearly ten years, novice Python programmer, Ubuntu user, musical wanabee when "playing" piano and guitar. When not torturing musical instruments, rumoured to be translating Vogon poetry into Swahili.