Deep in the Docs

Deep in the Docs


This week, and specifically today, marks a minor milestone in my employment at MySQL—I’m finally a full time employee, no longer on probation. It has also been probably the busiest week since I started at MySQL, except for the week spent at the developers’ conference in Sorrento.

Why so busy?

Because I’ve spent many hours deep in the build process that actually generates the documentation, partly to address some existing errors, but also to improve the documentation after some new content was added. In summary, the following major steps were made this week:

  • Added a new Chinese, Japanese, Korean FAQ, written by our resident languages guru Peter Gulutzan
  • Changed the documentation build so that it now includes a font to display the characters introduced by Peter’s new section
  • Fixed the CHM build process, so that the website now includes up to date versions of the manual in CHM format. These also now include images, and the past problems where some versions of the CHM would not display anything have also been addressed.
  • Fixed the Texinfo/Info builds so that they now include all sections of the manual. Previously we had to disable the Connectors section because the build would fail.

As part of that entire process, I’ve also addressed a number of other build failure issues, and created some new scripts to check the documentation so that hopefully these won’t happen again.

Of course, such short descriptions don’t really do the process any justice. Developing and deploying the manuals in the different versions, and languages, that we support is no easy matter. This week, many of the issues were actually triggered by some failures in the way the documentation is built.

Well, first of all on Sunday a complete block of our documentation—namely the online version of our manual—completely disappeared. My initial investigations lead me to believe it wasn’t a fault with the files that build the manual, and these were borne out when Eric, head of the web team, worked his magic and determined that a missing file during a recent update had caused the problem.

Back in the Docs source code, however, and there were other problems. Two of them had appeared over the weekend. First, PDFs of the manual were failing to build, and then the CHM (Compiled HTML) formats also stopped building and some of the existing files had stopped working altogether.

All MySQL documentation, even the manual pages, internal help tables, and all of the various formats we generate, come from a single source using the standard DocBook XML format. Every destination format but one (CHM) are then built using open source tools. Using a single source for all of this information is obviously a massive benefit, but it is not entirely without its own problems. DocBook XML is great, but there are certain areas where we have had to tweak the format and work around bugs. When it comes to outputting information in specific formats, there are even more issues to consider.

Any time a build fails for any reason, the Docs team get an email. Often the failure can be traced back to a new submission or fix of some description in the XML source. Simple things like validation errors in the XML cause problems, even though we can check most aspects of our edits before they are committed back using SVN.

Other times the error is more obscure. For example, we use FOP to generate PDFs. FOP is a great tool, but it sometimes trips up on funny instances. For example, in the DocBook XML there is no reason why you can’t specify that a table has two columns, but then actually define three in your code. Most XSL transformations—HTML, Texinfo, etc—pay no attention. FOP, however, falls over badly when it sees this and fails to build the PDF at all. If the definition is the other way round (i.e. you suggest there might be three columns but use only two) you could understand it.

I realize that kind of makes sense, but it would be far more helpful if it generated a warning or error, rather than just falling over.

I spent almost the entire week looking into issues like this to find out why different parts of the building process didn’t work. I’ve now fixed, I think, all the problems, but it required a significant amount of investigation and a lot of waiting for the latest fix to complete while trying out something new. The waiting is necessary because it takes 25 minutes to build a PDF of our docs, and closer to 45 minutes for a build of the CHM documentation.

Another element of the build process that I attempted to fix this week was triggered by the inclusion of the Chinese, Japanese and Korean FAQ. MySQL has supported multiple languages and collation for a while (indeed, I added some information on how this works internally last week).

Peter’s CJK FAQ covers most of the common questions, such as why multi-byte characters don’t work, and the often posed issue with the backslash glyph not existing in many CJK fonts, being replaced by the local currency symbol. That, in itself, added a further dimension to the problems when building.

The HTML version of this works great. Unfortunately, it introduced a few problems for the PDF because of the character sets used in the examples. Without a suitable font (from the standard 13 used by the PDF standard), unknown characters appear as a hash symbol. That was fixed by embedding a suitable font with the missing glyphs into the PDF, coupled with some Perl magic to ensure we only set that font on glyphs we don’t already know about.

The CHM build was more problematic, as the entire process would run fine up until the compilation/compression process when the compiler (Microsoft’s own hhc.exe) would fail with a memory error. A lot of investigation lead to the index file generated by the DocBook XSL transformations which would contain blank references when using <see> and <seealso> elements. At the moment we simply comment them out, and the CHM build is fixed, but I’m working on a longer term solution.

All in all it was a hard week because of the work involved in getting all of this working. Build failures that are straightforward are easy, but the more detailed investigations are complicated, not just because of the time, but because of the multiple steps required to achieve the result.

Using DocBook, FOP and our own extensions and tools enable us to generate a range of docs and formats—probably the most significant part of the user interface beyond the MySQL tools themselves—to support the wide range of users we have.

Category: 

Author information

Martin Brown's picture

Biography

Martin “MC” Brown is a member of the documentation team at MySQL and freelance writer. He has worked with Microsoft as an Subject Matter Expert (SME), is a featured blogger for ComputerWorld, a founding member of AnswerSquad.com, Technical Director of Foodware.net and, and has written books on topics as diverse as Microsoft Certification, iMacs, and free software programming.

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!