Monday, September 28, 2009

The Duct Tape Programmer and surviving the "death trap"

I really like Joel Spolsky's blog, but I have a lot of problems with one of his recent posts about the Duct Tape Programmer.

He is the guy you want on your team building go-carts, because he has two favorite tools: duct tape and WD-40. And he will wield them elegantly even as your go-cart is careening down the hill at a mile a minute. This will happen while other programmers are still at the starting line arguing over whether to use titanium or some kind of space-age composite material that Boeing is using in the 787 Dreamliner. When you are done, you might have a messy go-cart, but it’ll sure as hell fly.


Seriously? When you're done, you're lucky to still be alive! If you make it down the hill in one piece!

Of course, overengineering can be just as bad as fly-by-the-seat-of-your-pants programming. But, just because you win the race doesn't mean you're going to win. Go ahead and win the first heat. I'll gladly take your spot in the second heat, while you're trying to figure out how to keep your front wheels from falling off.

Remember, before you freak out, that Zawinski was at Netscape when they were changing the world. They thought that they only had a few months before someone else came along and ate their lunch. A lot of important code is like that.


Netscape did win the first heat. Great for them! I'll bet it was high-times while all this was going on. Of course, since that time, Netscape hasn't had their go-cart in any races. I'll bet after the initial thrill of winning, life wasn't so happy for the employees and developers at Netscape.

No, I think I'll stick with taking a little extra time to make sure my shit works. If you don't have time to do unit tests, you don't have time to write the code. There's nothing about testing that requires you to use "templates, multiple inheritance, multithreading", etc., approaches.

In fact, I do agree that it's best to go for the simple approach. But, most duct tape programmers I've met aren't going for simple - they're going for "works when I try it" without any regard for readability. And since they don't use unit tests, if it doesn't work for you, the only way you can refactor their code is by crossing your fingers and feeling like you're flying down a hill at a mile a minute on a bucket of bolts held together by duct tape and wd-40! No thank you!

Thursday, September 24, 2009

I'm such an asshole (and I really think it's going to pay off big)!

The company I work for moved into new office space awhile back. We simply outgrew the old space (well, that, and the boss went in with some other investors to buy a building in China town).

One of the worst parts of the old space was that the developers sat in "the back room". To get to the restrooms, you either went out the front door, or through the back door (which was in the back room). This resulted in numerous "drive bys".

A "drive by" can range from a random thought to "the next big thing". It usually starts in the mind of a salesperson or marketing type. The process can start anywhere, but the urinal seems to be a treasure trove of drive by opportunities. As soon as the thought forms (but hopefully not before hand washing is complete), the thinker looks for the first developer they can find. Then, the thinker unloads his wonderous wisdom upon the developer who is supposed to stand in awe, absorbing all of the ingenious and innovate ideas spurting forth. Once the stunned feeling subsides, the developer is naturally going to figure out some way to fit the new feature into the product as soon as is humanly possible (or faster).

Needless to say, drive bys suck! These are ideas that, if fed, will be spread to clients and throughout your entire organizations sales engine. Before you know it, what you thought was an innocent head nod has come back to bite you. If you're lucky, you find that this great feature (which has no specification, definition or clear vision) is already overdue. If you're not so lucky, you get a bug report from a client asking why they just paid your company money for a product when they can't get that feature to work (because, of course, that's the one thing the client bought your software instead of your competitors).

At our new office, we're no longer in the "bathroom path". Drive bys have been reduced exponentially! I owe a big thank you to Cody, who was the dev team manager when we moved into the new space - he worked hard to make sure drive bys wouldn't be an issue, including getting us a door - that locks!!!!

But yesterday, while trying to troubleshoot a failing unit test, here comes Mr. Salesman with "the next big thing". To be honest, as he stood over my desk and asked whether our puny brains could fathom how to realize his incredible concept in software, I didn't even listen. All I heard was "will the new product have blah blah blah" and "no one else does that". And, instead of politely nodding, I shook my head and said "I don't know. You'll have to ask Jeff and Sheryl what's going into the new product". When this didn't seem to satisfy him, I added "I don't know what's going to be in there, but I don't think we'll have that."

First of all, I have since found out, we will, indeed, have this innovative concept in our new product. In fact, it's so innovative that it's already been thought of and defined. Apparently, even though I still don't know what the feature is, it's something we've already built.

Secondly, I suppose I could have saved Mr. Salesman a few minutes by actually listening to him. I could have listened and realized that "yes, we've already built that". I could have even taken some of the credit and made myself look smart by explaining how well we wrote that particular branch of code. I should have probably flushed the error I was trying to track down from my mind, realizing that however long it took to get my head back in the code, it was time well spent to make sure that this saleperson could turn his attention to even grander innovations.

But, my reaction was to basically tell him he was talking to the wrong person. I was polite enough to not be offensive, but steadfast enough to make sure there was no question that I would not play the "define a feature" game. If you want to talk about whether a feature is technically possible, then by all means, let's chat! If you want to understand how our architecture will impact future growth, integration possibilities or performance, then I'm all in! But, if you want to tell me about a rough draft of an idea that occurred to you in the shower, then fuck off! I have better things to do!

So, as bad as I feel about turning Mr. Salesman away with the understanding that I will not take part in inflating his ego, I think it will pay off! I'm pretty sure I've helped cure someone of "drive by disease".

Tuesday, September 22, 2009

Zend_JSON_Decode barfs on funky characters

Zend works fairly well with json, but we've found many problems with funky characters. The latest character was an "end of text" - \u0003. No idea how it got input into the system, but now that it's there, we need Zend to be able to deal with it.

Thanks to Katie for helping us figure out what characters are causing problems!

Monday, September 21, 2009

Is Java dead? Are you kidding?

Posts that contemplate whether or not java is dead are amusing at best and annoying at worst. Regardless of which company bought who, or which scripting/functional/newfangled language is so much better, or how many people are "jumping ship", Java will continue to thrive for years!

This is no different than people proclaiming that Linux would spell the demise of Windows (I must admit, I was among those fools). True, Windows has much less market share than it once did, but it is by no means dead.

Heck, if COBOL continues to run "almost three quarters of the world’s business applications", then Java will surely survive 2009! And if Java is currently the most popular language, saying that Java is doomed seems pretty ignorant.

Of course, if by dead you mean something you won't use anymore, then sure . . . move along! Those of us who view languages as tools will continue to use it (and claim the money that companies are still willing to hand out). You go ahead and make decisions based on what language "feels" better and which one you think will get you the most respect. In the meantime, I've got work to do, and that my friend, involves writing heaps and heaps of Java code!

Wednesday, September 16, 2009

Creating an Easy Mock Custom Matcher

This blog isn't really supposed to contain "how-to" entries (it's really just a place for me to vent). However, in the past I've found certain tasks worth putting somewhere online in case I need them in the future. I've used planet-source-code for snippets, sourceforge for projects, and maybe a few other sites here and there. Custom matchers in Easy Mock are just hard enough to want a reliable tutorial. While there may be better posts about this same subject elsewhere, there's something about reading code you've written to remind you how to write more code!

So, here's my "CollectionMatcher" class:

package project.test.common;

import java.util.Collection;

import org.easymock.IArgumentMatcher;
import org.easymock.classextension.EasyMock;

public class CollectionMatcher implements IArgumentMatcher {

private Collection collection;
private String notFound;

public CollectionMatcher(Collection collection) {
this.collection = collection;
notFound = "";
}

public static Collection collectionEq (Collection collection) {
EasyMock.reportMatcher(new CollectionMatcher(collection));
return null;
}

public void appendTo(StringBuffer buffer) {
buffer.append(notFound).append(" not found in {");

String comma = "";
for (Object o : collection) {
buffer.append(comma).append(o.toString());
comma = ",";
}

buffer.append("}");
}

public boolean matches(Object otherCollection) {
if (!(otherCollection instanceof Collection)) {
return false;
}

Collection otherText = (Collection)otherCollection;

for (Object o : otherText) {
if (!collection.contains(o)) {
notFound = o.toString();
return false;
}
}

return true;

}

}


So, now we can do things like:

package com.project.api.delivery;

import static project.test.common.CollectionMatcher.*;
import static org.easymock.classextension.EasyMock.*;

//other imports

public class PreviewCommandTest extends TestCase {

init(new PreviewCommand(preview));

{
ContentReplacementManager manager = managerContext.getContentReplacementManager();
expect(manager.replaceValues(content, null)).andReturn(content);
expect(manager.replaceTags(eq(content), (Preview)eq(null), (Collection)collectionEq(entity.getPreviews()))).andReturn(content);
managerContext.doReplay();
}

command.invoke(webContext);

managerContext.doVerify();

}

}

Tuesday, September 1, 2009

Testing the internet (or how to mock a web service)

Part of the architecture of project Awesome involves an engine that processes a high number of tasks, each requiring a significant amount of work. While I can't explain what the engines main responsibility is, this is related to the persistent queue I mentioned in a previous post.

At the end of each unit of work, a message is sent to an external system for additional work to be done. However, because the external system is essentially a non-blocking remote queue, we get no feedback or information about the state of the work being done. To solve this, there is yet another service which returns feedback on the process of each unit of work as it executes on the remote system. We reconcile the feedback with each unit of work sent for processing.

Because there are so many steps involved, testing this entire process is a bitch. Testing this entire process in Fitnesse is even harder. The engine spans threads to periodically check the incoming queue and poll the feedback service. Since Fitnesse runs in a single thread, we will need to test some of the threading issues with JMeter (for example, if two callers attempt to place the same item in the queue, only one should be accepted). However, we don't have the resources to do JMeter, so for now, we'll have to hope that our unit tests are enough.

So, the external system gets mocked in Fitnesse! To accomplish this, we create an actual HTTP server on the local system which we can control within test tables. Fitnesse starts up this mockable server, and then starts the engine, telling it to call the address of our mocked service. With no changes to it's existing code, our engine is now processing records that we can include directly from our wiki.

While this doesn't test the entire engine, it does allow us to test one critical piece. We can now verify our engine performs correctly, regardless of the state of actions of the external service (in fact, we can even make it do things it shouldn't ever do - just in case).

Thursday, August 27, 2009

Careful with those dates!

Crap! I've been bit in the ass by dates AGAIN.

Back at Microsystems/Biotronik, I got screwed assuming the SQL Server could tell the difference between dates that were milliseconds apart. Turns out, they have to be several milliseconds apart, or they look like the same date. When you're tracking items as they moved through a manufacturing line and machines are pushing the parts around, milliseconds are critically important. If you want to use date as part of a compound primary key or unique index, you have to use a epoch-based value (or something other than SQL Server).

And once again, dates are back to haunt me! This time, we're using dates to organize files on the filesystem. Awesome has a concept of "Assets", which is simply binary content (pictures, spreadsheets, documents, etc.). For each binary file, we keep a record (in the database) of metadata. The actual file is stored on the file system. We decided that in order to potentially partition and sort the files, we would store them based on their created date. In other words, the "date added" value in the metadata record tells us where the binary file is stored. This works great within the overall Awesome architecture.

That is, until yesterday!

Turns out, the database was misconfigured. While we're on the West coast, the database thought it was somewhere in the midwest. One of our diligent IT staff noticed this incosistency, and like any good IT person would, he corrected the problem. Hours later, we were troubleshooting a problem with Assets not being found. Now, the metadata that used to point to 2009/06/12/10/10/24 as a path for the Asset was looking in 2009/06/12/12/10/24. Opps, no file!

BTW, if you're wondering why we use the date and don't store the actual path as a varchar, consider what happens when we want to change how the binary files are stored. While it would be painful to move each file to a new directory structure, changing the code to determine the path is much easier (and safer) to change than trying to update each record in the database. Just sayin' . . . :-)

Anyway, the database's timezone is back to central time and all is good. For the time being, we're going to write a script that moves all of the files to a different "hour" directory and then change the timezone. Fortunately, we don't have very many Assets that were uploaded between midnight and 2 am. Since the added time is only critical for finding the binary file represented by the record, updating a few records before we run the script should be pretty painless.

So, crisis averted. But next time, I'll pay more attention when choosing whether or not to use a date as an identifier!

Tuesday, August 18, 2009

There's always a beginner in the crowd

Each morning, my RSS reader is full of posts that seem very rudimentary. Absolutely not trying to pick on any one particular post, but there are many of them each day. These are things that you run across when working in a new language or with a new API. These articles strike me as a "here's a puzzle I ran into and here's how the puzzle is solved" type entry.

These used to annoy me. I envisioned a blogger who was so proud of solving a problem that they felt they should share their brilliance with the world. I felt they were trying to announce their advancement from one level of knowledge to the next.

I now realize that that's not the case. People have lots of reasons to post a blog entry. Maybe it's to share an experience, share knowledge, or just (like in my case) to vent.

But the best thing these entries do is remind other developers of the basics. Sure, these are valuable to beginners, but we're all, in one way or another, beginners. Programming has so many back alleys and dark corners that no one is an expert. Sure, maybe you're very knowledgable about Java concurrency, or the latest scripting language, or . . . whatever. But there's a lot of stuff you don't know (and if you deny it, you're so oblivious to reality that you are probably dangerous to your clients well being).

There are professional developers working every day that completely overlook (or ignore, or forget) the basics. Because we work with multiple languages, it's pretty easy to forget the language specific shortcuts or optimitazations that each language offers. There are highly paid developers with many years of experience who do not use version control, unit testing and similar strategies that most of us take for granted (I swear this is true - I've seen it with my own two eyes). If you're a developer with a lot of experience, don't discount the ideas and discoveries that the newbie at the desk next to you, or the blogger that got stuck in your RSS feed, is so excited about -- there's a chance she's found something of value that you've overlooked.

My language is better than your language

I'm flabbergast at the number of blog posts that try to convince readers why one language (or framework) is better than others. What a waste of bits!

You can find fault in any language. PHP lacks a finally clause, Java is too cumbersome, Ruby uses a poor threading model, and on and on. The funny thing is, while I might argue that each of the above statements show these languages downsides, I'm sure there are arguments that can be used to contradict or refute those statements.

Who cares? If you don't like a language, don't use it! If it solves a problem for you, then use it. If you only know one language, and you know how to solve a particular problem with it, then by all means, use it! The end user doesn't give a rat's ass what language you use (unless it won't work on their computer).

Languages are just tools. Use what works to solve a particular problem. Some languages solve certain problems better than others, but that doesn't make them better overall languages. Saying that a language is the best is the same as saying you know that language -- if you say you know a language (unless you're the creator of that language), then you don't realize how much you don't know!

Tuesday, August 11, 2009

Terracotta is a JMS killer

I often feel bad (or maybe it's guilt) at the decision to not use Spring in our software platform. I don't think it would have helped our project, and I certainly don't miss the XML "sit-ups" we would be stuck with now. But, when one of our team leaves (even if that's me), it's going to be one major item missing from their resume.

So, now we come to our current dilemma. Awesome (the code name for our platform) currently has a queue for asynchronously processing tasks. It's not a bad design and is throughly tested. But it's also not the cleanest design. We've encountered another place where asynchronous processing is required. So the time has come to refactor the current queue logic so these, and all future queues, are coded consistently.

My experience tells me that this new queue especially, should be consumed on a remote machine. Warnings of millions of events, each requiring significant processing power, occurring repeatedly for sustained periods of time, are common. Of course, there's no way to currently verify the loads we'll have to support, or the amount of power required to process each event, but my general sense is, we could cripple most machines.

So, my first reaction was to use JMS to manage our queues. This way, we could move them off of our main server easily, and let some other machine churn away on the events without effecting anything else.

But then I realized, at least for the time being, we are developing for a single JVM. Of course, we're planning on using Terracotta to make it a really big JVM, but it's still acting as a single machine. We're also working from a single database and file system. Being able to code for a single machine definitely reduces the complexity of distributed processing, but at some point, this model will not be optimal. I'm pretty convinced that we'll eventually want to move some of the processing off to a separate environment.

So, to use JMS or not? It would certainly look good on everyone's resume. And while we have a persistent queue already coded up and working in production, JMS would handle the persistence for us, but that's probably a "push". JMS would allow us to use a subscription scheme where we currently only support one consumer per message, should we ever need it. But introducing JMS would add more complexity to the overall architecture. The only way to really know which route is best would be to research, test and benchmark. Since our timeline is so compressed and management views QA as a secondary concern, guesswork is all I've got.

All our team can do is build the features we're being told to build (if this sounds bitter, it's not - we have a huge feature list and a very tight deadline). I'm certainly not turning a blind eye to the potential for performance problems down the road, but I don't have the time or resources to make an informed decision. So, for now, we hope that one big JVM can handle the load we're going to encounter.

I hope when shit hits the fan, the switch is set to low.

Thursday, July 30, 2009

Cart before the Horse

Sometimes you can't avoid it. You just can't write featureA until you at least define featureB. It would sure be nice if everything was separate, but on occasion, it's unavoidable.

For example, consider a situation where you're interfacing with an external service. Say this external service accepts requests to do some work and also provides feedback on work that is either completed, or in process. Classic consumer/producer type stuff. Your typical remote work queue.

But, what if you're responsible for both of these systems? It's pretty hard to write the interface that calls the external service if you don't know what the external service will need to complete it's work. At the same time, it's difficult to write the external service if you don't know what work is to be performed. Until you know what feedback is required by your system, or what data your system will have to reference the external system, it's pretty difficult to know how you'll gather and return the feedback.

Cart before the Horse. You need to know at least a little bit about each system in order to even abstract out the parts that are generic. Until you know what data each system will have, and what each system needs in order to do it's work, you can't create the protocol that they'll use to talk to each other. And until you do that, trying to define one side of the equation is pretty much guesswork.

Wednesday, July 29, 2009

A good post on clean coding

Here's a great post on keeping code clean. Some PHP guys say Smarty isn't useful; maybe that's why there's so much PHP code that sucks to work with.

Tuesday, July 28, 2009

Building just what you need, even when you know it's wrong

I just completed some functionality that I know is incomplete. Sure, all the tests pass, but I know there are other tests that could be written. But since we don't have time to make the tests pass, I'm just not writing them.

The functionality basically merges values into a document. Think "mail merge" type functionality or smarty-style tags. The current implementation will work for all of the "official" user stories we have documented. However, I can think of certain situations where, especially if escaping is involved, the merged output will not be what is intended.

So, should the time be spent now to make sure that any possible input can be handled? Or do we stick with something that may handle all of our needs, and add tests only when we run into situations where a customer need additional functionality?

Regardless of the "right" answer, we're moving on . . .

My love and hatred of Fitnesse

I honestly can't remember how I survived without Fitnesse. What a marvelous tool!

Fitnesse provides regression and intergration testing of our backend api, acceptance testing that is clearly understandable to non-programmers (even managers can understand it) and specs for developers. It allows us to put our datastores into a known state before tests run without having to expose the functionality through the api. Truly, exactly what the doctor ordered.

But there's a dark side to Fitnesse too. And it goes beyond the fact that, as a wiki, it's easy to get a big mess of tests with very little organization. After using Fitnesse for over 2 years, we've been able to rangle in the mess and figure out a good organizational strategy.

The biggest problem we're facing now is a limited number of people that can write quality tests. In fact, the only one who puts any serious time into writing tests is . . . me! Sure our project manager spends time putting in user stories, but these are little more than "items should be able to be deleted" type entries. The act of converting that into actual executable tables falls on my shoulders.

Perhaps it's because I'm too controlling, or perhaps it's because I know that having quality tests ready to go means that the team will always have well defined work to complete. I wonder if I should try offloading some of the testing onto other developers (the only ones who have any chance of creating quality tests), or if I should keep them working on making the tests pass.