Thursday, August 27, 2009

Careful with those dates!

Crap! I've been bit in the ass by dates AGAIN.

Back at Microsystems/Biotronik, I got screwed assuming the SQL Server could tell the difference between dates that were milliseconds apart. Turns out, they have to be several milliseconds apart, or they look like the same date. When you're tracking items as they moved through a manufacturing line and machines are pushing the parts around, milliseconds are critically important. If you want to use date as part of a compound primary key or unique index, you have to use a epoch-based value (or something other than SQL Server).

And once again, dates are back to haunt me! This time, we're using dates to organize files on the filesystem. Awesome has a concept of "Assets", which is simply binary content (pictures, spreadsheets, documents, etc.). For each binary file, we keep a record (in the database) of metadata. The actual file is stored on the file system. We decided that in order to potentially partition and sort the files, we would store them based on their created date. In other words, the "date added" value in the metadata record tells us where the binary file is stored. This works great within the overall Awesome architecture.

That is, until yesterday!

Turns out, the database was misconfigured. While we're on the West coast, the database thought it was somewhere in the midwest. One of our diligent IT staff noticed this incosistency, and like any good IT person would, he corrected the problem. Hours later, we were troubleshooting a problem with Assets not being found. Now, the metadata that used to point to 2009/06/12/10/10/24 as a path for the Asset was looking in 2009/06/12/12/10/24. Opps, no file!

BTW, if you're wondering why we use the date and don't store the actual path as a varchar, consider what happens when we want to change how the binary files are stored. While it would be painful to move each file to a new directory structure, changing the code to determine the path is much easier (and safer) to change than trying to update each record in the database. Just sayin' . . . :-)

Anyway, the database's timezone is back to central time and all is good. For the time being, we're going to write a script that moves all of the files to a different "hour" directory and then change the timezone. Fortunately, we don't have very many Assets that were uploaded between midnight and 2 am. Since the added time is only critical for finding the binary file represented by the record, updating a few records before we run the script should be pretty painless.

So, crisis averted. But next time, I'll pay more attention when choosing whether or not to use a date as an identifier!

Tuesday, August 18, 2009

There's always a beginner in the crowd

Each morning, my RSS reader is full of posts that seem very rudimentary. Absolutely not trying to pick on any one particular post, but there are many of them each day. These are things that you run across when working in a new language or with a new API. These articles strike me as a "here's a puzzle I ran into and here's how the puzzle is solved" type entry.

These used to annoy me. I envisioned a blogger who was so proud of solving a problem that they felt they should share their brilliance with the world. I felt they were trying to announce their advancement from one level of knowledge to the next.

I now realize that that's not the case. People have lots of reasons to post a blog entry. Maybe it's to share an experience, share knowledge, or just (like in my case) to vent.

But the best thing these entries do is remind other developers of the basics. Sure, these are valuable to beginners, but we're all, in one way or another, beginners. Programming has so many back alleys and dark corners that no one is an expert. Sure, maybe you're very knowledgable about Java concurrency, or the latest scripting language, or . . . whatever. But there's a lot of stuff you don't know (and if you deny it, you're so oblivious to reality that you are probably dangerous to your clients well being).

There are professional developers working every day that completely overlook (or ignore, or forget) the basics. Because we work with multiple languages, it's pretty easy to forget the language specific shortcuts or optimitazations that each language offers. There are highly paid developers with many years of experience who do not use version control, unit testing and similar strategies that most of us take for granted (I swear this is true - I've seen it with my own two eyes). If you're a developer with a lot of experience, don't discount the ideas and discoveries that the newbie at the desk next to you, or the blogger that got stuck in your RSS feed, is so excited about -- there's a chance she's found something of value that you've overlooked.

My language is better than your language

I'm flabbergast at the number of blog posts that try to convince readers why one language (or framework) is better than others. What a waste of bits!

You can find fault in any language. PHP lacks a finally clause, Java is too cumbersome, Ruby uses a poor threading model, and on and on. The funny thing is, while I might argue that each of the above statements show these languages downsides, I'm sure there are arguments that can be used to contradict or refute those statements.

Who cares? If you don't like a language, don't use it! If it solves a problem for you, then use it. If you only know one language, and you know how to solve a particular problem with it, then by all means, use it! The end user doesn't give a rat's ass what language you use (unless it won't work on their computer).

Languages are just tools. Use what works to solve a particular problem. Some languages solve certain problems better than others, but that doesn't make them better overall languages. Saying that a language is the best is the same as saying you know that language -- if you say you know a language (unless you're the creator of that language), then you don't realize how much you don't know!

Tuesday, August 11, 2009

Terracotta is a JMS killer

I often feel bad (or maybe it's guilt) at the decision to not use Spring in our software platform. I don't think it would have helped our project, and I certainly don't miss the XML "sit-ups" we would be stuck with now. But, when one of our team leaves (even if that's me), it's going to be one major item missing from their resume.

So, now we come to our current dilemma. Awesome (the code name for our platform) currently has a queue for asynchronously processing tasks. It's not a bad design and is throughly tested. But it's also not the cleanest design. We've encountered another place where asynchronous processing is required. So the time has come to refactor the current queue logic so these, and all future queues, are coded consistently.

My experience tells me that this new queue especially, should be consumed on a remote machine. Warnings of millions of events, each requiring significant processing power, occurring repeatedly for sustained periods of time, are common. Of course, there's no way to currently verify the loads we'll have to support, or the amount of power required to process each event, but my general sense is, we could cripple most machines.

So, my first reaction was to use JMS to manage our queues. This way, we could move them off of our main server easily, and let some other machine churn away on the events without effecting anything else.

But then I realized, at least for the time being, we are developing for a single JVM. Of course, we're planning on using Terracotta to make it a really big JVM, but it's still acting as a single machine. We're also working from a single database and file system. Being able to code for a single machine definitely reduces the complexity of distributed processing, but at some point, this model will not be optimal. I'm pretty convinced that we'll eventually want to move some of the processing off to a separate environment.

So, to use JMS or not? It would certainly look good on everyone's resume. And while we have a persistent queue already coded up and working in production, JMS would handle the persistence for us, but that's probably a "push". JMS would allow us to use a subscription scheme where we currently only support one consumer per message, should we ever need it. But introducing JMS would add more complexity to the overall architecture. The only way to really know which route is best would be to research, test and benchmark. Since our timeline is so compressed and management views QA as a secondary concern, guesswork is all I've got.

All our team can do is build the features we're being told to build (if this sounds bitter, it's not - we have a huge feature list and a very tight deadline). I'm certainly not turning a blind eye to the potential for performance problems down the road, but I don't have the time or resources to make an informed decision. So, for now, we hope that one big JVM can handle the load we're going to encounter.

I hope when shit hits the fan, the switch is set to low.