From Code to Design Document: A Play in Four Acts

Act One: The Code and The Motivation

Here’s the first player, our code.

    % ls Repository/trunk/clementine/interfaces/gesture
    CLGestureCue.cxx           CLGestureCueAppearance.h   CLGestureListener.cxx      CLGestureParser.h
    CLGestureCue.h             CLGestureInterface.cxx     CLGestureListener.h        CLGestureTrainer.cxx
    CLGestureCueAppearance.cxx CLGestureInterface.h       CLGestureParser.cxx        CLGestureTrainer.h

    % cat CLGestureListener.h
    #ifndef __CLEMENTINE_INTERFACES_GESTURE_CLGESTURELISTENER_H__
    #define __CLEMENTINE_INTERFACES_GESTURE_CLGESTURELISTENER_H__

    #include "CLGestureParser.h"
    #include <string>

    /**
     * CLGestureListener is responsible for listening on the CLGestureInterface
     * for a single gesture type.
     */
    class CLGestureListener
    {
            public:
                    /**
                     * Instantiate the listener.
                     * @param gestureName the name of the gesture
                     * (and consequently, the filename to parse) that this object is created to listen for.
                     */
                    CLGestureListener(std::string gestureName);
    ... etc

It’s a whole bunch of header and implementation files for a C++ project I’m involved with. This project requires the use of thorough documentation, but our design documents are hundreds of revisions behind our code. While it’s important to have gone through initial designs, our code at this point is miles removed from those designs. It would be nice to be able to automagically update our design document to be kept current on what is occurring in the code. This will not make up our entire design document. It is meant to be used in conjunction with manual, hand-written and hand-proofed analysis of the larger architecture.

The second player is AsciiDoc, a wonderful formatting and markup system. You may know it from the Git User’s Manual. AsciiDoc affords us several advantages.

  • It is in plain text, which means it can sit in source control, right alongside our code, and diffs are easily viewable.
  • It is a templating language, which allows output into HTML, PDF, you name it. With enough hacking, you can produce really distinct output for webpages and PDF readers.
  • It allows division of sections into different files, so authors can focus on the sections that concern them without being overwhelmed by the text they’re editing.

Act Two: The Tools

We want the comments in our code to be part of the documentation, but that’s not all. Since UML is the standard when it comes to software engineering description, it would be nice to have UML diagrams in our output as well. The ubiquitous free diagramming tool Dia provides, with its invocation, command line arguments to convert Dia diagrams into images. It goes like this, where you wish the output to be image_name.png:

    dia -t png -e image_name.png diagram_name.dia

So this part will be simple. Generating the Dia diagrams from code is already taken care of for us. Aaron Trevena has written an excellent script called AutoDia which provides this functionality. All we need to do is put these pieces together, along with writing the functionality to extract the comments we want to become documentation.

Act Three: The Program

I am calling it Amorfus, because I think client-side applications should start getting into the Web 2.0 naming crazes.

How do you use it?

  1. Create a new CommentDocParser, with these parameters: a string pointing to the subdirectory of your trunk that you wish to document, a string containing the conceptual name of the code in that directory, and a string containing your trunk directory. You can leave this last one out, and it will default to '.'.
  2. Run #parse on that parser.
  3. Create a file handle, and output the return value of #to_asciidoc to that file.
  4. That’s it!

Here’s an example:

	require 'amorfus'
	DIRECTORY_HEADER_LEVEL = 1 # This will become the section depth, in AsciiDoc, for each individual class we find
	r = CommentDocParser.new 'interfaces/gesture', 'Gesture Interface', 'Repository/trunk'
	r.parse
	t = File.new('output.txt', 'w')
	t.write r.to_asciidoc
	t.close

What does it do in the background?

  1. It uses a dumb regex based heurestic to determine if a comment has value.
  2. It classifies methods by what it can find of their name.
  3. It generates Dia diagrams whenever it can find an object.
  4. When generating images, it parses Dia diagrams using Nokogiri and removes irrelevant objects from that diagram, so that only the featured object shows up in the image.

Finally, we’ll need to make a small patch to AutoDia to make it recognize structs as equally valid objects. This is extremely hackish (see the Epilogue), but it manages to work for now.

--- Autodia-2.03/lib/Autodia/Handler/Cpp.pm     2009-04-15 01:10:46.000000000 -0400
+++ Autodia-2.03.orig/lib/Autodia/Handler/Cpp.pm        2005-04-15 08:02:49.000000000 -0400
@@ -72,7 +72,7 @@
          $i++;
 
          # check for class declaration
-         if ($line =~ m/^\s*(?:class|struct)\s+(\w+)/)
+         if ($line =~ m/^\s*class\s+(\w+)/)
            {
 
 #            print "found class : $line \n";

Act Four: The Results

The resultant HTML looks like this:

Example of Amorfus-generated documentation

From our previous example, this can be generated on the command line like so:

    % asciidoc --unsafe -e data-uri output.txt

The --unsafe and -e data-uri allows AsciiDoc to embed the images you’ve created directly into the HTML, instead of referencing them externally. This makes the document self-contained, in a sense. You can ignore these flags if you wish. In that case, standard <img> tags will be generated.

Epilogue: Warnings

This functionality was hacked together in about two hours. It does horrible things to Dia’s XML documents, it guesses at what the current spec for AsciiDoc is, and requires a hastily applied patch to a third party tool, AutoDia, in order to accomplish its goals. Because of it’s nature, I can’t make any guarantees as to how effectively it will work, what circumstances it will work under, and the like. If you have any suggestions or improvements, I implore you to fork the code (currently hosted at this Gist) and see what you can come up with. For example:

  • I don’t even think it recognizes variables correctly. You might want to fix this.
  • It ignores public:, private:, and protected:. You might want to fix this, but it is meant for design documents, so all methods should be documented.
  • If a class has more than one constructor, only one will be displayed in the documentation. You might want to fix this.

TextMate, Markdown, and following hyperlinks

I’m using Matt Webb’s excellent Plain Text Wiki TextMate bundle for gathering my thoughts, and needed a really basic way to quickly open hyperlinks I’ve saved without copying/pasting and all that nonsense. So, I added this command to the Plain Text Wiki bundle. I’m sure you could get it to work in Markdown’s bundle, as well. All you have to do is be in the hyperlink portion of a link you’ve formatted like this:

    (This is a link)[http://www.google.com]

Then press the shortcut key you’ve assigned to it. I use ^↘.

I don’t know if it handles other hyperlink markup Markdown offers. Enjoy!

Multiruby and its Rubygems Mirror

# multiruby_setup update:rubygems
      Determining latest version for rubygems
    /opt/local/lib/ruby/1.8/net/http.rb:560:in `initialize': Connection refused - connect(2) (Errno::ECONNREFUSED)
                    from ...
            from /opt/local/bin/multiruby_setup:19:in `load'
            from /opt/local/bin/multiruby_setup:19

Multiruby embeds the name of a specific Rubygems mirror, http://files.rubyforge.vm.bytemark.co.uk/rubygems, into its code. This isn’t a horrible thing, since we can use an environment variable to overwrite it if that mirror stops responding; however, it may be a better idea to use the RubyForge mirror selector as that URL permanently. Until then,

# GEM_URL=http://master.mirror.rubyforge.org/rubygems/ multiruby_setup rubygems:update

Should Canticore Have a Web-based Editor? and Other Thoughts

So my school term is coming to a close very soon, and I’ll be able to get a little bit more work done on Canticore, which I’ve decided is one of my important projects that deserves my attention. I’ll be happy if I can add 50% more tests to the existing functionality, add a couple of important features, and solidify it all. The transition to Sinatra 0.9 allowed me to really clean up the code base, and I intend on continuing that refactoring until I’m satisfied with it.

Two threads of thought have occupied my attention since I stopped working on Canticore. They both have to do with perception and functionality of blogging engines, and they both kind of merge into each other. The first is the decision whether or not to include an web-based interface for writing, editing, and managing posts. Just like one of NetNewsWire’s design goals was to allow a person to literally have a coffee in one hand and read the news with the other, one of my design goals is to allow people to interact with Canticore in a comfortable environment. The song-and-dance of working in your favorite text editor and then copying and pasting your output to the web-based interface is anachronistic. We have our text editor, we have XML-RPC. Theoretically there should be nothing stopping us from keeping an arm’s length from the administrative process inherent in a web interface.

And if Canticore starts becoming a publishing platform instead of just a blog engine through its plugins, there’s nothing stopping plugin editors from defining mini-specs that would interface with a canticore command-line executable. Suppose, for example, a plugin existed that created a ‘featured’ list of articles like you see on many pages — an series of images and blurbs that transition from one to the next indefinitely. Ignoring for now the problem of uploading the media, the plugin could define a namespaced XML-RPC request that took a set of arguments. For our purposes, a 3-tuple of arguments for each ‘feature’ article will suffice:

  [
    { :image => 'garden.png', :blurb => 'Lorem ipsum...', :link_to_article => 14 }
    # ...
  ]

Canticore already contains an XML generator for the most-used data types in Ruby, so if these 3-tuples were defined in a YAML document, we could perform something like the following:

$ cat features.yml
--- 
- :image: garden.png
  :blurb: An article about gardens...
  :link_to_article: 1
- :image: cleaning.png
  :blurb: An article about cleaning...
  :link_to_article: 4
$ canticore --blog myblog --post pluginName.sendFeatures < features.yml

And we could receive a status message from the plugin as to if the features list was well-formed, if it could find all the necessary images and articles, and so on.

Now we’ve managed to segue into the second train of thought: a canticore command-line client. It would help generate blogs, retrieve information about them, and allow people to automate any task necessary for blog maintenance. It would include a Ruby API so it could be called from Rake scripts and the like, and we could throw it into crontabs whenever necessary, constantly building on the software that works well and already exists instead of trying to reinvent the wheel. The practical upshot of this is that people would have the ability to keep their articles locally, under source control, and even provide post-commit hooks for publishing, like some prominent content publishers in the Ruby arena seem to focus on (Jekyll, in particular).

Features Gist Still Needs

  • Tagging
  • Most Popular / Most Forked Gists

If these both existed, it would be a cinch to use Gist as the de facto repository for all the Rails Templates we’ll start seeing when 2.3 gets released for reals.