Software and Related Stuff: 2003

Monday, September 29, 2003

Don't be afraid to code. Far too many projects waste time developing elaborate dynamic dispatch mechanisms when they could achieve the same ends with much less pain by just writing straight code.

Most organizations I've worked for have at some point built a complicated message driven dispatch system as part of their software. Typically this is accompanied by grandiose pronouncements about how this "software bus/object broker/message controller" will bring about a maintenance utopia because new handlers can by added "just by adding an entry to a table", that ISVs will be beating down the door to provide plugins/cartridges/random-marketing-buzzword for the system, that custom processing for clients can be done by plugging in additional modules, etc. In practice everything that these dispatch systems try to achieve (usually badly) can be done with less effort and more reliability using a few common patterns and Java's dynamic class loading mechanism.

Let's start with dynamic dispatch. This is a common C programming technique - a "dispatch table" of function pointers, indexed by message type, is used to select the appropriate function to execute to process the message. A similar Java implementation would use a HashMap or array of classes with each class implementing an interface that provided a common handleMessage method. Of course your architects wouldn't be worth their Rational Rose licenses if they designed something this simple, so instead you'll probably get a complicated reflection-based scheme executing arbitrary class methods based on the closing value of the NASDAQ-100 index, the phases of the moon, and so on. You'll also get a steady stream of run-time failures as any errors in the dispatch table (misspelled class names, incorrect methods, etc.) aren't detected until the corresponding message is processed.
There's a simpler and more reliable alternative - the plain old Factory pattern. Your factory class has a newObject method that consists of a big switch or nested if statement (we want the code to reference the actual classes rather than strings containing class names) that constructs and returns the appropriate message handler class. Simple, fast (reflection usually kills JIT), and robust - make a mistake in a class definition and the code doesn't compile.

"But we can't dynamically change the system - we have to modify the code and rebuild to change behavior". Take a good look at your production environment - you're already doing this if your system is of any importance. Nobody just drops in new salary calculation code into the corporate payroll system - it goes through QA, gets change controlled, and is then installed in a scheduled maintenance window (unless it's a fix for a high-priority bug).
You can use the meta-factory pattern for more flexibility without any loss of reliability. A meta-factory provides a static method that returns a factory object which is used to construct new objects. Typically the meta-factory checks if a custom factory class has been specified (by a system property or the like) and returns an new instance of the custom factory if this is the case, otherwise returning an instance of itself. If you really can't rebuild the entire system you need only provide a new factory class and any associated handler classes and restart the system. You can also use this approach to add custom handlers for specific messsages. In this case the custom factory subclasses the standard factory, with it's newObject method returning a custom handler for the messages you've selected and calling super.newObject for the others. This also lets you implement "software bus" designs that dispatch a message to multiple handlers by writing a wrapper handler class that invokes each message handler in turn - but unlike a software bus a utility class gives you an easy way to handle issues like whether you continue or abort executing handlers after one fails (the kind of thing your architects won't discover until the system is in production), and you can do so on a case-by-case basis. Code is more flexible than frameworks.

Monday, July 28, 2003

From code reviews to writers' workshops. I've never found code reviews to be useful, despite the extensive literature vouching for their effectiveness. The reviews I've participated in typically degenerated into arguments about how many spaces in a tab stop, prima donnas attempting to impose their idiosyncratic programming style on everyone else, and managers advocating some once-useful but now outdated technique (Hungarian notation, casting NULLs, etc.). Worst of all is the adversarial nature of code reviews - reviewers seek to find as many things to criticize as possible while developers attempt to defend (or rationalize) their code.

The phrase "writing code" is more than a metaphor - programming is a form of creative writing. Good code and good writing share the same characteristics: accuracy, conciseness, understandability, and internal consistency. Both are usually written the same way by a process of successive revisions and edits- except we call these "refactoring" and "iterative development" instead of treating them as a natural part of programming. It's more than coincidence that great programmers are at least good writers.

While I won't go as far as advocating a Master of Fine Arts in programming I believe that many of the techniques writers use to become better writers can also help us become better programmers. Instead of code reviews we need writers' workshops where we can read and discuss our peers' code. These would focus on improving the participants' code and learning from each others work rather than on scoring points with the boss by spotting errors. It would also give us a chance to size up our peers - who are the really talented people we want to work with in the future and who are the lousy programmers who are baffled by any coding task more complex than running a code wizard.

Sunday, April 13, 2003

Breaking open the black box. Look beyond inputs and outputs to locate the cause of those "can't reproduce" production errors.

Recently I was involved with testing a J2EE application that failed with SQL errors when run in a simulated production environment. We tracked down the problem to a PreparedStatement that wasn't being closed, eventually causing the DBMS to run out of cursors. The reason the error didn't manifest itself in test wasn't due to the larger volume of transactions being processed or the size of the database, but differences in garbage collection. The database resources associated with a PreparedStatement are freed when the object is garbage collected, and running under the heap size used in QA garbage collection was frequent enough that the number of open cursors never exceeded the DBMS's limit. With the much larger heap size in our simulated production environment this no longer happened and the cursors associated with the PreparedStatements stayed open.

I've seen similar "works in test but fails in production" problems in other systems, such as a C program that didn't close file descriptors (and didn't test the return value of open()). Underlying this is an inherent deficiency in most QA methodologies- testing normally considers only program inputs and outputs. This is fine for traditional applications that run for some finite amount of time, at the end of which their outputs can be compared to expected results, but is insufficient for server applications that run forever. QA needs to track a third parameter - internal system state.

The SQL cursor problem I described was identified by using Oracle dynamic views (system tables that expose the current state of the Oracle server) to obtain the SQL query that was causing the problem, while the cause of the errant C program's problem was discovered when a system call trace revealed that open() was returning unusually high and steadily increasing values for its returned file descriptor (most UNIX system support either truss or strace to trace a process' system calls). Incorporate tests for these values into your normal QA process and you can stop those "unreproduceable" production errors before they happen.

Sunday, March 16, 2003

Every development team needs a "googler". Your team needs at least one person who's first reply to a question is "Let's do a Google search and find out."

In a well-known technical interview problem (usually attributed to Microsoft) prospective employees are asked "How many gas stations are there in the U.S.?". It doesn't matter if the interviewee's answer is anywhere close to the correct value - the purpose is to observe his or her problem solving skills and logical though processes. This question also illustrates the source of many software development disasters - developers all too often come up with answers to a problem that are ingenious, logical - and utterly wrong, instead of researching the problem.

The most obvious manifestation of this is "reinventing the software wheel" - wasting time developing yet another scripting language/security system/RPC mechanism instead of using an existing solution. Just as often (but not as visibly) it happens with business requirements - far too many organizations would rather spend days debating the format of Australian postal addresses or the starting date of the fiscal year for Japanese banks than spend a few minutes researching the correct answers on the internet.

This is why your team needs a "googler" - someone who knows that a right answer is better than a clever one. This person should be familiar with a range of online knowledge sources- Freshmeat for free software, MSDN for anything related to Microsoft, and Citeseer for Computer Science research. For business knowledge NASDAQ is a good starting point for information regarding publicly-traded competitors and partners, and for legal and regulatory information almost every government agency now has a website with at least a summary of relevant procedures and regulations.

With luck you may already have someone on your team doing this - a person who gets asked the questions nobody else knows the answer to. Best of all, "googling" is contagious - as time goes the rest of your team will get in the habit of researching correct answers instead of inventing wrong ones. Oh, and the format of Australian postal addresses? The Universal Postal Union online address format guide has the mailing address formats for almost every country in the world.

Sunday, March 02, 2003

Rethinking code coverage. 100% code coverage (every code path executed by at least one test) is an enshrined part of QA orthodoxy - and an outdated one.

In an iterative development process it's normal to have some unused code, either leftovers from an earlier cycle or partially-complete functionality for a future one. At the same time the shift to higher-level languages such as Java and C# reduces the likelyhood that simply executing a codepath will expose an error, unlike C where every string operation is a potential segmentation fault.

This doesn't mean there's no point in code coverage - but it's value is as an indicator of other problems rather than as an exit condition for QA. Large sections of unexecuted code may mean that your test cases don't cover all the functionality of the system, or indicate "fossil code" that developers are unwilling to delete because they don't know if it's still in use.