Keeping score in test-driven development with Python, PyLint, unittest, doctest, and PyRate

Keeping score in test-driven development with Python, PyLint, unittest, doctest, and PyRate


Programming is more fun when you keep score. The extreme programming (XP) development model popularized the idea of test-driven development (TDD) with professional programmers in mind. But TDD turns out to be even more useful for lone amateur programmers, because it provides much needed motivation in the form of more visible rewards for your work. This is true even when simple test runners are used, but I decided to make things a little snappier by including a couple of other types of measurement and generating a "scorecard" for the present state and progress of my Python software projects. Here's how it works, and a download link for my script, which I call "PyRate".

Programming is one of the most extreme examples of delayed gratification -- you can work for months without having anything to show for it (or at least nothing that would convince your family that it was worth all that time and trouble). This can be really demoralizing, especially if you have any doubts about your abilities to begin with.

Programming is one of the most extreme examples of delayed gratification

Testing helps. You won't be able to convince many other people of your progress, but at least you will have an objective measure telling you how much closer to your goal you are every day you work on the project. Also, because automated testing catches bugs right when they happen, it's usually much easier to track them down (because you know what you've changed since the previous test).

This helps any programmer, but it's a big deal for the lone programmer, because you don't have a second set of human eyes to look over everything you've done and check it. At least having the computer check it for you makes a big difference.

Types of testing

There are actually a lot of different kinds of testing. My old favorite method from the beginning of my career is a technique called "instrumenting". Basically, you stick in a bunch of "print" statements at strategic points to tell you what the state of various variables is. I suspect that if you've ever done programming at all, you've done this.

There are actually a lot of different kinds of testing

The big advantage of this method is that it is simple, and in most cases, the program runs in exactly the same environment it normally would, so you (usually!) don't risk the sort of bug that appears during use, but not during testing. Probably the biggest disadvantage is that you alter the code every time you insert or remove an instrumenting statement, so there's a small chance that you will accidentally delete or insert something else along with it, causing additional bugs.

There are special programs called "debuggers" that will essentially automate this process, stepping the program through its statements, and reporting the status on each step, or only at specified "breakpoints". I must confess I don't use these much, preferring to simply inject instrumenting code where I would have to set a breakpoint.

Once the program is finished (or at least runs), it's possible to do what are called "functional tests": this simply means that you use the program to do what its supposed to do, and see if it actually does it. These are great, and it is part of the free software advantage that users will often do at least some of this testing for you -- that's how you get bug reports.

However, functional tests are very coarse, and it can often be hard work to track down why a particular problem is happening. It's much better to break the program up into logical "units" and test each of these as separately as possible. This is called "unit testing", and it is this sort of testing that "test-driven development" usually refers to.

It's much better to break the program up into logical "units" and test each of these as separately as possible

You can theoretically do ad hoc unit testing yourself, without any automation. This is basically what you are doing when you instrument the code or call it from the Python interactive interpreter (in which case, it's very similar to functional testing, except that you are testing the interface for a module from the interpreter instead of testing the whole application). However, this has some serious disadvantages:

  • They are hard to repeat exactly. This makes it possible that you don't actually verify that a problem is fixed even though you think you have
  • It's hard for other people to exactly repeat your tests
  • It takes a lot of work, so you won't do the tests very often
  • You are likely to forget the tests as well as exactly how the code is supposed to work in a matter of weeks

These greatly increase the chance of bugs creeping through unnoticed. They also greatly increase the chance that you simply won't notice when some external dependency changes, subtly breaking the way your module works. That all adds up to making bugs harder to find -- and you don't need that kind of pain.

So, instead, TDD counsels us to automate those unit tests and run them all the time -- preferably every time the program files are changed. Certainly you should run them once a day during the development process.

The Python standard library provides two modules which can help with automated unit-testing: unittest and doctest

The Python standard library provides two modules which can help with automated unit-testing: unittest (also called PyUnit) which gives a JUnit-style of testing, and doctest which provides a much more lightweight testing model based on strings which can be pasted directly from the interactive interpreter.

True TDD calls for writing the tests before you write the code they test. This is one way to write a very concrete specification of what your program must do. Obviously these tests will fail, and then you write your code to make them pass -- theoretically that means you're done. Or, you can write tests as you go, which is closer to what I do most of the time. What you shouldn't do is put off testing until the program is "complete" -- that's usually asking for disaster.

Keeping track of progress and quality

The tests above are "acceptance tests" -- either they pass or they fail, and if they pass, you're done. But there are other kinds of tests or "metrics" that you can compute or estimate for your code to keep track of your progress.

One is to simply make note of how close you think you are to finishing work on a given part of the program. Does it have all the features you plan to add (or plan to add in the next release)? If not, it's useful to have a way to make a note to yourself. I find it easy to express this as a "percentage" -- it's not in any way scientific or precise, but I can usually say "Yeah, that's about 40% complete" about a given piece of code.

There are other kinds of tests or "metrics" that you can compute or estimate for your code to keep track of your progress

Similarly, when you write tests for a module, you may not write them all at once, so it's useful to keep track of how much of the modules existing features are actually tested by your testing code. I also assign a subjective percentage value to that -- although this time the rubric is a little more definite: just count how many "features" I think there are that need testing and divide by how many I've already written tests for.

There are also some nice tools out there for computing metrics on your code automatically. I'll be using one called "PyLint" in this article. It has a collection of rules and conventions that it uses to analyze your code and try to find potential errors that the interpreter doesn't discover on its own as well as violations of recommended coding style standards. I will say right now that I don't agree with all of PyLint's default style choices -- but that's okay, because you can also reconfigure the program to meet your own style preferences (for example, if you like Java-style ".camelCase" methods, PyLint will grade you down -- it likes ".underscore_spaced" methods).

Keeping score: PyRate

Taking all of these factors into account, I decided to make my own Python script to both run automated unit tests and metrics to evaluate my progress. It generates a "scorecard", like this one:

Coding & Testing Scorecard
2009-10-28 08:34:19.682646
           Unit  Tests   Done  Q/A  Lint  Description
           ----  -----   ----  ---  ----  -----------
    base_drawing   1/  1|████ |  ○★( 8.4) Basic SVG drawing and I/O     
         collect   2/  2|██   |  ◔✓( 5.1) Collect images referenced in an SVG
         drawing   1/  1|█████|  ◑ ( -- ) SVG drawing with aspects imported
          inx_ui   1/  1|██   |  ○✓( 5.4) Inkscape extension compatible CLI
  Legend:
    *        - Some tests are failing
    ‼        - There are bugs in the testing code
    

    |██   |  - Subjective completeness of coding. Question marks (?)
               indicated code that may have to be re-written or
               removed.

    ○◔◑◕●    - Subjective completeness of testing (of completed code)
               from "totally untested" (○) to "completely tested" (●)
               A blank ( ) means there is no code to test.
               
    ##.#     - Pylint evaluation of coding style out of 10 (run
               pylint on the module to learn why it gave this rating).

For each module or independent test, PyRate collects unit tests from three places:

  • doctest examples in doc strings in the module,
  • doctest documentation file examples named test_*.txt in the testing directory, and
  • unittest test modules named test_*.py in the testing directory

It collects these into a unittest testing suite, runs them, and reports the results as "tests passed" out of "total tests run".

Completeness of coding and testing are marked by subjective values you can insert into the code. This is also where the brief descriptive text is placed. The advantage of having it here is that it is immediately available when you are updating the files or the testing code. The notes can appear in three places, in the following order of priority:

  1. In the doctest file (i.e. test/test_<modname>.txt)
  2. In the unittest module (i.e. test/test_<modname>.py)
  3. In the module itself

Within the doctest file (which is a text file), the variable are set with valid Python code appearing after a python-style comment mark "#", like this:

# pyrate_percent_complete = 30
# pyrate_percent_tested = 20
# pyrate_description = "A description of the module"

Within the python source files, these same values are simply placed at the module level:

pyrate_percent_complete = 30
pyrate_percent_tested = 20
pyrate_description = "A description of the module"

Each is independently loaded, so it would be possible to put the completeness value in the module itself and the testing level in the test module (actually the pyrate_description value isn't checked for in the module itself -- instead the beginning of the module docstring is used if no description is provided in the testing code).

I've so far found it preferable to keep these metadata notations in the testing modules

However, I've so far found it preferable to keep these metadata notations in the testing modules (either test_<modname>.py or test_<modname>.txt), because I find that it is during the writing of tests that I am most aware of what their values should be. This also avoids having stray values in your working module's name space.

The format of the scorecard is somewhat configurable. I created a very simple XML configuration file to store the text elements used (You could also modify the code through subclassing or just hack it directly, of course).

Want to try it out?

I have just finished setting up a project site for PyRate at Google Code. The download includes just two files: the script pyrate.py and the configuration file, pyrate.xml. I decided to call this version "0.9" just in case something seriously needs to be fixed to call it "1.0", but I'm using the code as-is already.

You will need to have already installed PyLint, and your Python needs to be version 2.5 or later.

Installation is minimal -- you just place these files in the module source directory of your project. It is assumed that there will be a directory named "test" with a collection of test_<modname>.py and/or test_<modname>.txt files containing unittest and doctest tests respectively.

As of this version 0.9, there is no support for scanning nested source trees (all the tested modules need to be in one directory), although this won't be a problem for your testing modules (it will cause PyRate to miss the doctests inside the modules, though).

The configuration file allows you to alter the installation paths, and also to provide a list of modules not to scan (it can be desirable to block the __init__ module as well as pyrate itself).

Once installed, you can run the program from the command line, or set it up to run in an IDE (I have been running it from Eclipse).

So far, so good

I haven't really been using this version of the program myself for that long, but I'm pretty happy with the results. I ran an earlier (much kludgier) version of the script which helped a lot, though, and I think this will be even better.

It's much more fun to code when you can see how much progress you are making in both quantity and quality, and that's the idea behind PyRate.

Category: 

Author information

Terry Hancock's picture

Biography

Terry Hancock is co-owner and technical officer of Anansi Spaceworks. Currently he is working on a free-culture animated series project about space development, called Lunatics as well helping out with the Morevna Project.

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!