How to help build a free software search engine and use it too. Welcome to Wikia

How to help build a free software search engine and use it too. Welcome to Wikia


Mention Jimmy Wales and you automatically think Wikipedia; however, that is not his only love child. The Wikia search engine is his latest offspring -- or least it was when it was launched in January 2008. Wikia has been devised as a free software and open source alternative to Google and othersNow.

In an article in June I speculated about how the internet might change the way we think.; I mentioned in passing the Wikia search engine. Enough time has passed to see how this attempt to democratise a web search engine has come.

The philosophy

Google's algorithm is the goose that lays the golden egg but doesn’t cackle. Wales wants it to sqawk its head off

Wales thinks that search on the web is broken, for the same reason that proprietary software is broken: lack of freedom and accountability. He thinks that Google's search algorithm is a form of "editorial control"; it is of course true that whilst Google has been a relatively good friend to free software and some open standards, they do nevertheless keep their search algorithm secret. It is the goose that lays the golden egg but doesn't cackle. Wales wants it to squawk its head off. To that end he came up with Wikia. It is released under the GPL and the source code is free for all to examine and amend. Additionally, where search engines like Google rely on that algorithm to power searches, Wikia adds a human element to searching that positively encourages human input to build and extend the search engine. Wiki utilizes Nutch for its algorithm and Lucene for search; Wales also acquired Grub as an open source web crawler to index the web (and released the source code for it too).

So far so good. However, unlike Wikipedia, Wikia is a-for-profit undertaking which means two things: first, for all its free software credentials, it has to monetize web searching. Nothing necessarily wrong with that, but if Wikia is built with the assistance of unpaid volunteers, is it fair to make a profit without spreading the wealth around? Second, when Wikia launched back in January (with a relaunch in June) it received consistently bad reviews. Even those who wished it well and wanted it to succeed were appalled and Wales himself said that the search results "sucked". Many people failed to understand that, initially, Wikia was intended to build a search engine by user input so it was always going to be a matter of time before it accumulated a body of information. It seemed to be caught in a vicious circle: no index, no information, no information no index. Sufficient contributors are the key to breaking that vicious circle--and time, lots of time. So, as of today (Wednesday 12th November) Wikia's real-time counter reveals that it is approaching eight million queries and just over two million contributions, After all, how did Wikipedia look and function on day one?

One solution might have been to fill that temporary gap by pulling in Wikipedia content but that might have been problematical as Wikipedia has always been on a not-for-profit basis.

So, how exactly does Wikia work?

Superficially, you go to the Wikia page (which is available in twenty-six languages) and type in your search term. I'm pleased to report that you will now get a decent number of returns for your search query. Providing you are using Wikia in a Javascript-enabled browser you should get a screen something like this:

Figure 1: Wikia search engine default screenFigure 1: Wikia search engine default screen

Anyone who has used Google (virtually everyone) will feel right at home here. I tried several searches at random: selinux, trucrypt, konqueror and gpl. There is quite a lot of detail on these searches. Let's have a look.

The devil's in the detail

I typed "konqueror" into the search bar and here is what it returned:

Figure 2: Wikia's return for konquer search termFigure 2: Wikia's return for konquer search term

The first difference from Google is that Wikia does not indicate the number of search returns and does not render a numerical list of pages at the bottom of the page. In fact, Wikia will scroll continuously the results, thanks to its Ajax interface. Along the top of the search results there are what look like tags. For example, I know that Konqueror uses the Webkit engine for rendering pages. Clicking on that "tag" with the scroll button loads a related search for that term. If you know a lot about your particular search topic, you can (and should) add a tag. Well, I love Konqueror and decided to add what I knew. I clicked on "add suggestion" and added "kio" as Konqueror uses this feature for integrated transparency. Now, when anyone searches for Konqueror they can now also initiate a supplementary search under kio. This may of limited value for experts, but for anyone whose knowledge is limited this feature is very useful indeed.

Whenever you scroll down a page of search results and move your cursor to the right of any result, Wikia will display a rating system. Moving it across the stars (rated one to five) will highlight them. To give your rating, just click on the number you want and the stars will be highlighted in all future web searches. Wikia will also rank it in real time and move the entry up the page. Further to the right, there are five options for each search result: edit, annotate, spotlight comment and delete. The last is obvious. If you have checked out a particular result in a separate browser tab and judge it to be really sub-standard, just click on delete and it is gone. If you want to make your own comments on a particular search result, well, just click on "comment" and an input box opens up immediately below:

Figure 3: Wikia's comment boxFigure 3: Wikia's comment box

Selecting Annotation has the effect of opening the specific search result's webpage in a form of "split screen" with the original Wikia page resized to the left. This now gives you the opportunity to add links, images or text and add them back to the Wikia search index. If you select a link, it will be added at the bottom of any given search result. These changes are added back in immediately. Finally, if you think a result is worth a second dose of five-star treatment, then click on "spotlight" and a biege-coloured box with highlight it, although it did not seem to be reversible.

At the top right of the search results is a link called "add application". Clicking on it opens a box:

Figure 4:  Wikia's add application functionFigure 4: Wikia's add application function

From here you can select from either categories or applications. Using the up/down arrows, scroll through categories relative to you search and Wikia will bring up what it finds, in this case "images" and selecting from the results:

Figure 5: Wikia's add application in actionFigure 5: Wikia's add application in action

If Wikia constitutes the wisdom of crowds then Eric Raymond would be proud of it

This is now added back immediately to the search results where others can edit, annotate, delete or comment upon and hopefully add to the sum total of collective knowledge. If this constitutes the wisdom of crowds then Eric Raymond would be proud of Wikia. Of course, like any search engine, there is scope for abuse as there has been in Wikipedia but Wikia is evolving and as problems arise solutions will rise to meet them.

Wiki evolution

The eagle eyed amongst you will have noticed a toolbar in the screenshots. This is the wikia evolution extension which adds a toolbar to the browser (it works in both Firefox and Netscape Navigator). This is a really, really useful addon because it means you can still help Wikia to build the web index and the search engine even when you are not searching directly from the Wikia search page.

The toolbar gives you the main features you have when working directly from Wikia itself. So, I opened a page from one of my bookmarks about adding repositories to the Asus EeePC and clicked on the "Add URL to Keyword". A dialogue box opens and you are prompted to submit a search term under which the URL will be submitted to the Wikia search index. You will know that your URL has been added because an entry will have added to the toolbar.

One of the best features about installing this toolbar is that you can still contribute to indexing the web and building wikia as an effective search engine. If you are using either Google or Yahoo search engines Wikia adds two features: a small green plus sign against each search result returned and the star rating system. For example, I came across a recent excellent article on "do it yourself Konqeuror commands". I clicked on the add button and it changed to a tick accompanied by "URL added". Going back to the konqueror search in Wikia, I see that it has been added and a grayed-out message below the entry indicates that it was added by a Google user. Me.

By the same token, if you have launched a search term in Wikia and come up short, just click on the search engine links on the right for Google or Yahoo and a new tab will open to search for that term. If you find what you are looking for in these other search engines, you can now add that search result back to Wikia. You get what you were looking for and you have, at little effort and no cost, helped to build a free and open search engine. Two final features I almost missed: if you are in any other search engine and do a search, it will also automatically be copied to the Wikia toolbar. So, if you having no luck in Google or Yahoo just click on the Wikia search and the you can start a search in there instead (select it with the scroll wheel and it will open in a separate tab and you can compare results if you wish). Also, if you have added a search result from Google or Yahoo, it will be annotated as such directly below it. Despite the grayed-out appearance, these are in fact active links which will take you to those search engines.

Conclusion: if you do nothing else install the Grub web crawler

When Wikia debuted in January it was subject to some of the worst reviews I have ever seen. Wales would have needed the hide of an elephant to front that criticism--but then he was expecting it. Since then Wikia looks unrecognisable from what launched back then. It is usable, it has features, it is highly interactive, that toolbar is a killer idea that allows you to contribute to the project even when you are not using the search engine directly and, yes, it has usable content. I was impressed. I can definitely see myself using it on a regular basis so I will stick my neck out and predict that Wikia may become as ubiquitous as Wikipedia. If you hate or just distrust Google for reasons of privacy, security or corporate politics then Wikia may be just the search engine for which you have been waiting.

However, for Wikia to really take off will require more than individual user contributions. Search engines need huge data farms. Google has in excess of 270,000 servers and they are expensive. If Wikia has to go down that route it is rather obvious why Wales has chosen a-for-profit model. Hence the Google ads. How ironic, to fund a Google rival with Google ads. Wales thinks he has a possible solution to this: to store the index on a distributed computing grid composed of individual's PCs and servers across the net, a la SETI. Of course that has a downside too because whilst servers are usually always on, home PCs are not; Jeremie Millar, Wikia's chief technology officer, thinks the solution may lie with software like BitTorrent which can still work even when some machines are off, by distributing copies of data to other computers.

That is why Wikia bought Grub, a free and open distributed web crawler licenced under GPLv3 which you can install on your computer and donate your free CPU cycles to indexing the web. On the homepage there is an install.bin executable. In KDE, clicking on it will launch a no brainer of a GUI install wizard. Trying the same in Gnome will give a message that it does not know how to handle it. Solution? Open a console, go root, cd to the directory where you dowloaded it and type the following: ./grubng-0.8-Linux-x86-Install.bin and the GUI wizard will launch.

If you prefer there are noarch.rpms and src.rpms available at rpm.pbone and rpms for OpenSuse 10 and 11 also at rpm.pbone. There are also versions of the install.bin for BSD and Solaris on the Grub project page.

This will install Grub and add a menu entry for it. If you find that on trying to launch Grub you get a flickering LED indicating hard drive activity but a no launch (or "silent failure" as I call it) this is almost certainly because you are missing dependencies for the GUI aspect of the program. By name, Mono (versions > 1.25) GTK2 sharp and associated development files. Once installed via your distro's package manager, Grub should now run in GUI mode. You will be prompted for you login details before you can start, so if you have not done so you will need to create a user account on the Wikia homepage. Once settings are chosen via Preferences you can now start helping to crawl and index the web. As you can see, I have been busy:

Figure 6: Grub's progress reportFigure 6: Grub's progress report
Figure 7: Grub indexing the web and uploading resultsFigure 7: Grub indexing the web and uploading results

You can set as many crawlers as you wish, manage upload/download speeds to suit your connection and usage and run it in semi-automatic or automatic mode.

We shall see if there is any substance to the phrase “the wisdom of crowds” as a potential army of coders and contributors bear down on spammers

Whether you search with Wikia and rate search results or help to index the web with Grub there is still the challenge of spamming to be addressed. Google is embroiled in an endless war with spammers and their algorithm is proprietary. Wikia's is free and open. Anyone is free to look at the code, see how it works and exploit it. We shall see then if there is any substance to the phrase "the wisdom of crowds" as a potential army of coders and contributors bear down on this problem or if it is merely an over-worked cliche.

Those people who slated Wikia in January should revisit it today and see that it has improved massively (albeit from a low base). If you ultimately want better and more transparent search results and you are starting to see Google as big a threat to freedom, privacy and transparency as Microsoft then what are you waiting for? Use it, contribute to it, install the Firefox toolbar and the Grub web crawler GNU/Linux client. Get cracking.

Category: 

Comments

thekohser's picture
Submitted by thekohser on

"Of course, like any search engine, there is scope for abuse as there has been in Wikipedia but Wikia is evolving and as problems arise solutions will rise to meet them."

You really are full of the JimboJuice cocktail, Gary. Do you know ANYTHING about the real Jimmy Wales? Look at Wikipedia! What makes you think abuse will be curbed in a FOR-PROFIT environment?

I hope you enjoy volunteering your time and labor and intellect, so that a man like Jimmy Wales can profit off you. (I'm sure the Moscow "personal health" industry will be pleased, too.) At least on my wiki (MyWikiBiz.com), the contributors earn and keep 100% of the ad revenues from their banners. So, if they want to visit a Russian massage parlor on their own dime, that's their decision.

Author information

Gary Richmond's picture

Biography

A retired but passionate user of free and open source for nearly ten years, novice Python programmer, Ubuntu user, musical wanabee when "playing" piano and guitar. When not torturing musical instruments, rumoured to be translating Vogon poetry into Swahili.

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!