Jim Kent, hero of free and open source software

Jim Kent, hero of free and open source software


Percy Shelly averred that poets were the unacknowledged legislators of the world. That was a hope rather than a fact. It might have been true in earlier centuries but the inexorable rise of scientific methodology relegated it as a source of power and influence. Inevitably, the baton passed to science.

Computer designers and software engineers might reasonably claim that they have taken on the mantle previously occupied by their artistic ancestors and everyone is all too unhappily familiar with the baleful, quasi-monopolistic influence of large software companies like Microsoft. Yet, it is not the whole story by any means. All the major players in the world of free software may well feel themselves to be, like Shelly, the unacknowledged legislators of the world given the role they have played in democratizing knowledge and thus empowering people to have control over their digital lives.

If GNU/Linux were a branch of astronomy you would experience little difficulty in navigating the night sky. Against the background of the familiar constellations you could easily name and identify the shooting stars and comets: Torvalds, Stallman, Perens et al. However, if you peer a little harder you might just observe a smaller star of the firmament, an unregarded little stellar object in an unfashionable spiral arm of the galaxy; not a supernova but its influence will be felt for decades to come.

You may not find it listed prominently in the free software equivalent of a star catalogue but look up the celestial index under Kent, Jim and there it is. Some people will have heard of him but I suspect that many have not. His story is well worth the telling and I hope that by the end you will agree with me that it is a story of free software heroism and that Kent deserves the soubriquet as one of the unacknowledged legislators of the world.

The opposition to software patents in the free software community is too well known to require repeating here but if defeating attempts to patent software code is important (and it is very important) what about wetware patents? Wetware? That's you and me, the carbon-based lifeforms on planet Earth. More specifically, in computer terms, our source code. It was not invented but represents three billion years of evolution and the idea that you can or should patent this universal inheritance is repugnant (US patent office please take note).

The same reaction to the very idea of patenting our source code was the catalyst for an extraordinary series of events that began with the race to map the human genome. Ultimately, it was a two-horse race between the National Institute for Health (NIH USA), a publically-funded body with financial help from the Wellcome Foundation (UK), and Celera Genomics, a private company headed up by Craig Venter. Venter's goal was to sequence the genome, patent it and charge subscribers to the database a fee for access. This closed-source model was deeply unpopular with geneticists worldwide and the Human Genome Project was founded in response. (There may be some poetic justice in the belief that Venter was allegedly fired in 2002 by Celera after it became clear that selling genome date would not be profitable.)

The task was huge - some think it equivalent to putting a man on the moon - and the problems encountered reflected that. Without going into the complex details of it all the task was, to put it in computer terms, to decompress and extract the biological zip drive that was the 400,000 pieces of data coded in the human genome. This was a data problem that required a high degree of processing power. Venter at Celera was reputed to have deployed the resources of the most powerful civilian computer in the world.

Kent's senior colleague, Dave Haussler, at the University of the University of California at Santa Cruz (UCSC) was working on sequencing the genome and Kent offered his program on worm DNA as a template to assist the human gonenome project. It was, with the luxury of hindsight, a truly historical moment. A series of events was set in train that ultimately ensured that the human genome remained firmly in the public domain.

How did he do it? He armed himself with the GNU/Linux operating system using a Beowulf Cluster across one hundred 800MHz processers with 256MB of RAM each, the VIM editor, the GCC compiler and icepacks every night to stave off the RSI from hell induced by the herculean task of writing 10,000 lines of code, working eighty hours a week for forty days in a converted garage at the back of his bungalow to complete the task! ("He had to ice his wrists at night because of the fury with which he created this extraordinarily complex piece of code...He's unbelievable,this program represents an amount of work that would have taken a team of 5 or 10 programmers at least six months or a year", Haussler said) The result was a program called GigAssembler which allowed the project to organize the data into a coherent sequence and 22 June, 2000 was the day it went live, followed by another historical date: 7 July of the same year Kent and Haussler posted the results on the web. The genie was well and truly out of the bottle.

History records that the HGP beat Celera by one day but it was declared officially to be a draw and was announced as such at the White House , where, after Kent had uploaded the results to the web and eventually compressed it to a single CD, it was presented and put in a time capsule for posterity. Rightly, Kent was showered with job offers (he declined them) and awards- the most appropriate being the Ben Franklin award from Bioinformatics.org; ironic because Franklin declined to patent his inventions. Oh, and he was also awarded the Overton Prize from the International Society of computational Biology in 2003.

It was a remarkable instance of serendipity. Before his Ph.D in biology Kent had founded and run a software company and led a team of software engineers, so he possessed the necessary computer skills equal to the task. (He wrote paint and animation programs for computers. These included the award-winning Aegis Animator on the Amiga, Cyber Paint on the Atari ST, and Autodesk Animator on the PC.) That provenance is evident in the way he described the task of genome sequencing: it was essentially a task in reverse engineering. DNA was the machine code, genes were the assembler and proteins a high-level language like C. Alright, the analogy has limits of course but you get the idea.

In the spirit of free software Kent released the GigAssembler under a free license for non-commercial use and he clearly understands the significance of open sytems. In an interview with O'Reilly Networks, whilst describing the GPL as "absolutely disastrous" for commercial environments but "particularly beneficial" for operating systems even "copyleft" seemed too restrictive! Kent's summary of the value of free software is well worth quoting at length:

I think that free software for computer operating systems is particularly beneficial. We all want to speak the same language so that we can communicate with each other. We all want to use the same operating system so that we can use our favorite programs everywhere, and so that we don't have to port the programs we've written. An operating system in private hands will therefore tend to die out, become a monopoly, or encourage the people who use it to become a closed group. The pro bono work of the many people developing free operating systems really is helping lift the world out of these three gloomy alternatives. Displacing Microsoft in the Windows/Icon/Mouse/Pointer world may not happen, but I would not be in the least surprised if a computer with good out of the box speech recognition, a nice web browser, and a strictly optional keyboard based on Linux takes over China in 3 years and the USA in 6. Free software, and the low cost, encourages the sharing of information between scientists. If I want to share my work with a colleague, they don't need to spend time or $ acquiring the software needed to review or recreate the work (ever try to buy an expensive piece of software via a University purchasing office?). This also evens the playing field for scientists without the research budgets found at most US universities. I can afford most software through research grants, but a colleague is a less prosperous country may not have the same resources.

This points up the similarity between the methodology of science and free software. Both progress and gather kudos by being open and being open means that the code base and the experiments can be examined in order to debug, verify or refute. If you don't believe me then just recall the episode of Pons and Fleishman and cold fusion. Karl Popper would certainly have approved.

Even peer review can be tainted though. Just as there is high street fashion, there are fashions too in science. Some topics are "sexy" or topical and getting funding, free software or otherwise, can mean towing a particular line in political correctness. Just try getting a hearing or funding to study the correlation between solar activity and global warming and see how far you get. Clearly there is an undeniable human input to climate change but it takes a very brave scientist to swim against the tide and dare to suggest that there may be additional factors. Just ask Henrik Svensmark, a Danish scientist whose finding over ten years in solar activity and climate have been consistently turned down for publication by journal after journal and who only managed to get his research published in The Proceedings of the Royal Society last year. Free software means keeping an open mind.

The Human Genome Project, like the Manhatten Project before it, is big science and that means big funding and the old cliche still stands: he who pays the piper calls the tune. Not for nothing did Eisenhower warn about the military-industrial complex. Open sourcing software or science can be compromised by the politics of its finances. Aubrey de Grey, the gerontologist, has pointed out that being funded by government leaves you hostage to politicians and therefore , at least in a democracy, to the voters and their interests, agendas and lobby groups. If you go down the route of financing from big business it is buffeted by directors, shareholders and accountants and the desire to return a quick profit. Disinterested private finance may be the only unfettered option left, but philanthropic donations without strings are as uncommon as politicians saying they're sorry.

All this may sound a little idealistic, even naive. Perhaps it it, but for all the faustian pacts scientists and software engineers have to make to fund research and for all their all too human foibles, they have done more that any other organised group of human beings to advance and improve this miserable species. If you took science out of the picture a returning visiting alien from Mars would be virtually unable to distinguish between modern man and his far ancestors.

Our progress thus far has been underpinned by pure science and applied technology and the era of computing, particularly free software computing, has acted an an exponential accelerant to the whole enterprise. The next time a clone of John Cleese asks you: "what has free software ever done for us?", go tell them about Jim Kent, not Clarke Kent. As Miranda in Shakespeare's The Tempest said: "Brave new world that has such people in it".

Bibliography

Websites

http://www.alternet.org/story/10843/

http://bioinformatics.org/franklin/2003/

http://dir.salon.com/story/tech/feature/2003/10/21/genome/index.html?pn=3

http://www.iscb.org/newsletter6-2/overton.html

http://en.wikipedia.org/wiki/Jim_Kent

http://www.oreillynet.com/pub/a/network/2002/12/10/kent.html

http://www.oreillynet.com/pub/a/network/2002/04/05/kent.html

http://conferences.oreillynet.com/cs/os2002/view/e_spkr/1196

http://www.infoworld.com/articles/hn/xml/02/07/25/020725hngenome.html

http://www.precarios.org/nrecortes/nytimes_130201.html

Category: 

Comments

Bob The Known Coward Kohlenberger's picture

As an open source developer and former employee of both Celera and it's sibling, Applied Biosystems (ABI), I'm writing to provide a different perspective on both open source and the genome project.

For his part, Tony White, CEO of Applera and Craig Venter's former boss claims he fired him because Venter antagonized ABI's customers. (ABI supplied the gene sequencers used by both the NIH Human Genome Project and Celera). In fact it was Venter who wanted to give away the sequence and charge only for annotated data on genetic variation among individuals. It was White who was skeptical of Venter's plan to give away the genome sequence instead of charging for it.

In principle we can all agree that nobody invented the human genome, so our gut reaction is outrage that anybody should be permitted to patent it. On the other hand, it cost Celera half a billion dollars to sequence the genome, and shareholders who fund such ventures have a reasonable expectation of return. The tax-paying public has other kinds of return in mind when funding medical research, and does expect the data to be made public. Like it or not though, patent law does provide a financial incentive that in some cases benefits both private and public good.

By the way, the public HGP was not founded in response to Celera as the article states. It was started back in 1987 under the DOE. However, David Haussler and others admit that Venter's group forced them to retool and complete their project several years ahead of schedule - to everybody's benefit.

But back to the question of inventions, IP, and compensation. At the risk of being crude, let me state two propositions that seem to frame the debate:

1. All people have a right to make a living in exchange for providing value to society.
2. Inventors (including open source developers) should not be subject to lawsuits over patents of which they are unaware.

I call #1 the "quid pro quo" argument, and #2 the "presumption of omniscience" argument. Both of these propositions seem reasonable in the abstract, yet in a capitalist society, they often come into conflict. What's amazing to me is that the debate has become so polarized that well-informed people on both sides of the debate act as if the other side's arguments are baseless.

Those of us who provide their source code at no charge should be aware that it is frequently downloaded and used by "engineers" who make six figure salaries, yet wouldn't have a clue how to write it themselves. We aren't bothered by this or we would stop contributing to it. Yet open source developers still need to eat. It's worth noting that Ben Franklin, who refused to patent his inventions invented most of them after he became wealthy from his printing business. So he made his money not so much from generating information, but for making it widely available. Would he have been so successful in the Internet age? More to the point, does the open source movement mean that only aggregators make money, and developers themselves must be independently wealthy?

Although many believe that information should be free (as in "free beer"), even more believe that software developers should be free to roam the algorithmic landscape without risking stepping on patent landmines. The idea that software developers should be presumed omniscient concerning existing patents, and understand patent law besides, is, we all know, lunacy.

I don't presume to know how to solve the dilemma of "quid pro quo" versus "presumption of omniscience". But I do think that no solution is possible as long as either side views the opposing view as illigitimate. We need to ensure that people who wish to make a living producing useful code have that option, and we need to keep the landscape free of landmines. Perhaps patent law should be changed to except those who unknowingly reinvent a method or algorithm from patent infringment litigation, with the burden of proving "foreknowledge and intent" falling on the patent holder. However, I don't favor throwing out patent law wholesale.

Humanity benefits from the Jim Kents of this world, as well as the Tony Whites and Craig Venters, and with such diversity it is indeed a brave new world.

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

[QUOTE]We need to ensure that people who wish to make a living producing useful code have that option, and we need to keep the landscape free of landmines. Perhaps patent law should be changed to except those who unknowingly reinvent a method or algorithm from patent infringment litigation, with the burden of proving "foreknowledge and intent" falling on the patent holder. However, I don't favor throwing out patent law wholesale.[/QUOTE]

This argument is flawed. Patent law as it currently stands is wrong because it blurs the line between idea and implementation. If it can not define that line correctly, it needs to be changed or withdrawn from the realm where ideas and implementation can not be reasonably separated by such patent approval groups.

No one is denying any one the opportunity to make money from useful code. What should not be possible however is the ability to patent an idea with no visible implementation. I should not be able to patent "sorting in ascending order" without a demonstrable algorithm that performs this function. And the patent should be applicable to my IMPLEMENTATION, and NOT THE IDEA, of "sorting in ascending order".

This will allow me to sell my fantastic code that "sorts in ascending order" while at the same time allowing any one else to develop code that "sorts in ascending order" but doesn't use my algorithm.

This is real market economics and will facilitate a level playing field allowing innovators to innovate and not allowing me to monopolize the "sorting" market place with my sole algorithm. Buyers get to choose which fantastic "sorting in ascending order" algorithm they want to incorporate in their product on their grounds - not on mine (the seller).

Your argument that "cost Celera half a billion dollars to sequence the genome, and shareholders who fund such ventures have a reasonable expectation of return" is also simillarly wrong in that it accepts that Celera may want to patent the genome sequence to gain returns for their investment. Any attempt to exploit that which should be available freely for the greater good of mankind is wrong. And Celera wanted to exploit this by getting to the market first. It is good for humanity that tools like GNU/Linux, the Beowulf cluster, the Internet and capable people like Kent are available to take on funded private pirates looking for every opportunity to exploit the ignorant innocent.

Anonymous visitor's picture
Submitted by Anonymous visitor (not verified) on

JK work has undoubtedly benefitted all of mankind but I don't believe you can call JK a "hero of open source". A "pioneer for mankind" would be a more appropriate title for him. I believe that Jim Kent (JK) does not understand "Open Source" or the purpose of the GNU's GPL.

By releasing his software under a non-commercial licence and claiming to be a proponent of Open Source, he shows me that he doesn't understand Open Source. The first criterion of the Open Source Definition states:

The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.

and yet, his actions show that he was willing to forgo this criterion for whatever reason. To me, the failure to adhere to this criterion shows a lack of understanding of the intended purpose of this criterion and therefore it shows a lack of understanding of Open Source.

JK's responses from his O'Reilly Network interviews show me that he doesn't understand the purpose of the GNU GPL. He says:
I'm not sure if open source works so well in commercial environments. These days at least, the GPL is absolutely disastrous, since it really doesn't mix peacefully with existing proprietary stuff.

No duh. I believe JK does not understand about giving the user freedom and therefore cannot understand the intent of the GPL. The GNU GPL is designed to ensure the user's freedom. Proprietary software doesn't guarantee any freedom and in fact, it often imposes very strict forms of user subjugation in the name of "protecting the author's interests". So it isn't hard to see that GPL licensed software not mixing with proprietary stuff.

I do agree with his sentiment that the GPL is disastrous to mixing with proprietary stuff. The GPL was designed so people wouldn't impose extra restrictions on top of what was already restricted and therefore ensuring the freedoms that the GPL provided. Proprietary software imposes restrictions that restrict the user's freedom and therefore proprietary software is inherently incompatible with GPL licensed software.

So yeah, he's a great guy for doing something that progresses society but he's no posterboy for Open Source.

JIm Kent's picture

First thanks for the kind words. I was in the right place at the right time with the right skills on the genome project, and ended up with more fame than I deserve compared to other people you don't hear of, like LaDeanna Hillier, Asif Chinwalla, Richard Gibbs, Greg Schuler, and hundreds of other people without whom the public genome project would have never completed in time to avoid wholesale gene patents by Celera and Celera's licensees. I am proud of my role in it the genome project though, and of the work we continue to do to understand the genome at genome.ucsc.edu and elsewhere.

It's true that I don't buy the predominant open source models. I've made money, good money, selling software I've written. I think writing and selling software is an honorable profession. I think the lion's share of the money in software should go to the people who write the software, not the people who write the manual, since it takes longer to write the software than the manual. I think that a lot of really good software requires very little support, and in fact in the consumer categories, the better it is the less support it requires. So the model where I write something for free, and Tim O'Rielly makes good money selling books on it, and Red Hat makes money providing phone support for it and burning it on CD just doesn't work for me.

The GNU model doesn't work for me, because it's sticky. After 10 years I think code should go into the creative commons free for all to use. I think a lot of code should be put there earlier. However if I spend 2 years writing something that is useful, I think I deserve to profit from it if I want. I don't think that someone else should be able to make a copy on a floppy, or these days a CD ROM or web site, and sell it for 50% of what I'm selling it for. If I incorperate Gnu code into my code (if I am reading the license correctly at least) I *do* need to open myself to someone who can undercut me in price because they essentially had no development costs.

That said, I do think that source code should be freely available for others to read, and for customers to modify in all cases.

There are places where the GNU license is useful, where it provides leverage for code to escape from trade secret, or semi-permanent copyright status from a tight fisted company. Still, personally, I like to divide the world of code I write into that which is given away with no strings at all, and that which I sell to make a living. Now that I'm supported by government grants I can afford to give away *all* the code that I write. That has not always been true though for me.

I have been lucky to be my own boss most of my career. If you are working for someone else, having to change jobs reasonably often, I can see how the GPL would be extremely attractive. Among other things once the GPL gets it's hooks in, if you contribute to a GPL project, you can keep working on it even after your startup is bought out by Cisco, even after your enlightened manager gets driven out of the company at Oracle, even after your mature software customization shop gets driven into the ground by the founder's idiot nephew. I've got code I wrote 25 years ago and still use. Hopefully the GPL will help make it so that the programmers caught in the intellectual property mazes of corperate America can experience the same pleasures of extreme code reuse. I still think a better solution would be companies that you'd *want* to work for for 10 years at a time because they weren't run by aggressive *ssh*les, a software license standard like we have with patents that tends to limit royalties on a piece to no more than 3% of the profit on the whole, and copyright laws that expired completely after 10 years. I guess I'm dreaming. Maybe in the face of mega-software-corporations GPL is necessary. I'm not comfortable with it's effects on the smaller software shops like I used to run though. You might ask Walter Bright, Joel Spolsky, or some of the other good small shoppers what they think of GPL. I'm betting their feelings would run pretty parallel to mine, but I've been wrong before!

Take care,
Jim Kent

Author information

Gary Richmond's picture

Biography

A retired but passionate user of free and open source for nearly ten years, novice Python programmer, Ubuntu user, musical wanabee when "playing" piano and guitar. When not torturing musical instruments, rumoured to be translating Vogon poetry into Swahili.