Wrapping up PyCON 2007

Wrapping up PyCON 2007


Well, I didn't quite make it to all of day 3 of PyCON, but I got a good piece of it, starting with some very nice presentations of scientific software from Enthought and finishing with some questions about the future of Python packaging for GNU/Linux distributions.

I first heard about Chaco when I was working at Caltech, supporting scientific software there. We were extremely far in arrears on our payments to RSI for the many seat licenses of IDL that the Caltech/JPL IPAC needed. So, converting scientists over to using a suite of free software packages in a Python-based environment seemed like a very attractive idea.

Space Telescope Science Institute in Baltimore was really taking the initiative on this. Enthought was essentially on contract to produce the necessary astronomy plotting package, which would be named “Chaco” after the famous Anasazi archeological site (though probably a principally religious site, the ruins at Chaco Canyon include a wide range of astronomical markers).

Chaco was meant both to modernize and replace existing plotting packages like Mongo and SuperMongo, which were not only rather kludgy, but also had extremely awkward and uncertain licensing terms.

However, at the time, Chaco was in its infancy and we couldn't really use it. Fast forward four years, however, and Chaco is looking pretty amazing! It combines the ideas of plot scripting (which has been of use in the scientific community for publication quality graphics as well as exploratory visualization) and interactive graphics manipulation (which may be more familiar to users of “business graphics” tools). In Chaco, you script your data, which then launches a plot browser application to interact with your data. Far better for exploration, of course.

I didn't actually see how Chaco produces publication graphics, but I’m fairly sure they wouldn’t have overlooked this.

Chaco is part of a larger suite of scientific data reduction and analysis tools called “scipy”. Though not part of scipy, NumPy (the inheritor of both the Numeric and Numarray packages, which have been unified) is the expected way to handle image and spectral array data within scipy.

I’ve been out of astronomical research for many years now, so I haven’t really had any data to analyze, and so I haven’t personally tried these packages out—but I sure wish I had had them back when I was doing active research. They really do look nice.

In another open space talk, Peter Wang of Enthought gives a presentation on Chaco, a free software scientific plotting package.In another open space talk, Peter Wang of Enthought gives a presentation on Chaco, a free software scientific plotting package.

Immediately afterwards, we had another talk about a package which was not itself free, but demonstrated an application of Enthought’s free-licensed application framework, “Envisage”. The package is (I’m told) not much different from Eclipse, except of course for being written in Python.

Enthought has a plugin/component architecture system, which, if I understood correctly, is called “traits”. Installing all of this stuff is now on my to-do list, I can definitely tell you.

Unfortunately, although the 3D visualization widget is apparently free, the CAD system Enthought used to create the models apparently is not. However, I'm very intrigued by the possibilities of using these tools and extending them. It seems like Envisage might make a very nice start on a CASE tool of some kind.

Dave Peterson presents a talk on “Making Toothpaste with Python”. To be more precise, the package is a computational fluid dynamics environment used by process engineers to model the behavior of mixing machines used to blend all kinds of things like medicines, food products, and of course, toothpaste.Dave Peterson presents a talk on “Making Toothpaste with Python”. To be more precise, the package is a computational fluid dynamics environment used by process engineers to model the behavior of mixing machines used to blend all kinds of things like medicines, food products, and of course, toothpaste.

There was also a very handy talk by Sean Reifschneider about the vim Python module, as well as vim scripts for handling Python code: syntax handling, block jumps, re-tabbing and so forth. This was personally interesting to me as a vim user, of course.

Following that was a very edifying if not exactly exciting presentation from Anita Ewing and David Hancock of ARINC Direct. Their company sells a service for business air traffic, which has to aggregate an enormous amount of information from various web services to produce flight plan reports for pilots on demand. This lets a pilot know, for example, what sort of winds to expect on his planned course, and how long it will take him to arrive.

Their problem was that this huge and complex system had to be accurately tested, and they went from a very laborious manual process to an almost fully automated system using Python test scripting based on Fitnesse, twill, buildbot, Selenium, and nose.

Fitnesse is a web-based testing technology that resembles doctest, but with the test strings embedded in a Wiki. Twill automates web browsers, including forms. Buildbot of course, runs tests automatically, and Selenium is a tool for capturing web-based test runs and recording the information. The ARINC folks automated the process so that Selenium would generate the necessary test scripts automatically, base on an interactive test session.

They reported that their biggest problem was getting their programmers to use the automated testing approach, but also that they had managed to get everyone using it—by recruiting one programmer to work on the sysem full time, and another to spend time evangelizing about the benefits of test-driven development. The company has apparently seen this succeed very well.

Afterwards, Titus Brown followed up with a presentation of his various testing packages: twill (automated web browsing and form-filling), scotch (it records and replays WSGI—this is a bad pun on one of the popular alternatives for pronouncing “WSGI”), and figleaf which is a code coverage tool (code coverage tools test how much of a program has been executed during testing to determine whether you've tested everything).

A presentation by Dr. Roberto de Almeida returned to the scientific programming theme with a talk on serving and reading “Data Access Protocol” (DAP) data. He has some fairly clever hacks to allow your python program to essentially pretend that it has the dataset on disk, even though it’s really being served by a remote file server. His package is “PyDAP”.

At the end of the day I wavered a bit. There was a talk about the status of Women in IT which interested me as a journalist. But in the same time slot was a talk about packaging Python applications for use by GNU/Linux distributions, and that has definite practical utility to me, so I decided to go pragmatic.

I will say though that my subjective impression is that the number of women attending this conference, though still a small minority, was probably twice what it was last year. There were certainly a lot more women giving presentations, as well. We must be doing something right.

The package talk raised some interesting issues. Python has a newly maturing package format specifically for Python programs, called “Eggs” (to go alongside Java “Beans” and Ruby “Gems”—Perl has some kind of package format at CPAN. Probably just “modules” without the kitschy names).

However, the major GNU/Linux distributions have language-neutral package formats like DEB and RPM. Creating a Python Egg is (apparently—I haven’t tried it yet) quite easy, because the mechanism is built into Python's “setuptools” package. However, the process for producing RPMs, let alone DEBs, is fairly complicated and not well documented. Also, Debian exercises a great deal of gate control, keeping out packages that they don't like. Depending on your purposes, that can be very good or very bad.

So, the question arises: should programmers worry about distribution-significant packages or should they simply produce Eggs? There are pros and cons on both sides.

After that last talk, I was pretty worn out, so I went home. There was a closing address which I’m sorry to have missed, but overall it was a great conference. Next years' is not expected to be in Dallas (at this point, I’m thinking that Chicago is most likely). I do plan to find out, PyCON has been a great conference for two years running.

License

Copyright ©2007 Terry Hancock / Creative Commons Attribution-ShareAlike License (http://creativecommons.org/licenses/by-sa/2.5)

Originally published at www.FreeSoftwareMagazine.com.

You must retain this notice if you reprint this article.

Unless otherwise noted, the photos in this piece are by me, and under the same terms.

Category: 

Comments

Geoffrey Lehr's picture

First off, thanks for writing these articles about PyCon. I've enjoyed reading them very much, and am considering doing something similar when I attend the OSCMS Summit in March ( http://2007.oscms-summit.org/ ).

I would think that the best way to go about delivering Python programs would be to use the Eggs, and let the distro maintainers determine the actual method of delivery on their system. I haven't used Debian for a long time, but I would hope that they would have some sort of system to handle unusual packages. Gentoo, for example, uses "build scripts", so installing an Egg using portage should be straightforward, at least no more difficult than installing/compiling packages from source, since you could merely pass the Egg to Python in the script and let Python handle it (Ok, ok, it's more complicated than that, but not much more).

Terry Hancock's picture

Eggs are fine for developers and Python fanatics, but they aren't the right way to go for user apps and libraries.

The problem is the "language ghetto" thing -- you'd wind up with multiple package formats by language (eggs, beans, gems, modules...) and of course that's added to existing multiple formats by origin (deb, rpm, and whatever that thing is that Inkscape uses).

I don't know about commercial distributors like Red Hat, but Debian needs all the help they can get with 8000+ packages to worry about already, so it's better to package your own work if you can.

The real kicker about having all these different packages is "package hell" -- and now we'd have a multi-layered, multi-bolgia, highly Dantesque Hell!

Because, see, none of these packages recognize each other as dependencies. So you might be fine with Eggs within Python, but then Debian packages that depend on them, won't know they are installed, and APT will try to replace them with (possibly outdated) dpkg versions.

So, what we actually need is a package multi-format integration system, which would allow either:

A pluggable APT tool that can read all of these formats and somehow magically figure out which is who, OR...

A pluggable Cheeseshop (and the equivalent for Java, Perl, and Ruby, of course), which allows you to automatically convert Eggs into custom DEBs (or RPMs, or whatever) for download. In fact, we could just provide a "cheeseshop" DEB archive as an unofficial archive. Then APT would just work, as long as you had the cheeseshop in your sources.list file.

I think the latter would be my preference. But of course, that's work to do on the Cheeseshop, then.

Author information

Terry Hancock's picture

Biography

Terry Hancock is co-owner and technical officer of Anansi Spaceworks. Currently he is working on a free-culture animated series project about space development, called Lunatics as well helping out with the Morevna Project.

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!