Making Wrong Code Not Compile

The rest of the programming world is linking to Joel Spolsky’s latest post about the need for Hungarian notation. Joel makes a nice distinction between "Apps Hungarian" and "Systems Hungarian". The latter is the one we all know and hate, all lpszThis and dwThat. The former is more interesting in that it uses prefixes to describe the role of the data in the application and therefore what can be done to it and how it can be used.

But as far as I’m concerned, all this talk about Hungarian notation is just rubbish.

The essence of Joel’s argument is that you, the developer, become attuned to the prefixes and you notice when variables whose names use different prefixes are used inconsistently. You should read Joel’s post since I’ll be discussing his example; go ahead, read it now. I’ll wait.

Now this all sounds groovy baby, and indeed I imagine several developers have been swayed by Joel’s argument (and I know he can be very persuasive) and have suddenly decided to use "Apps Hungarian".

Well, I’m not swayed: I think it’s awful, a complete throwback to the 80s. Why? Because to me Joel’s argument is antithetical to modern object-oriented practices. In fact it just reeks of old-style C programming.

Consider again Joel’s example: given a string variable it’s hard to say whether its value is the original input from the user (that may contain spurious HTML tags) or the encoded value (where the angle brackets from the spurious HTML tags are converted to their character encodings). From this he proposes using prefixes for string variable names so that you can know whether the values are "safe" (i.e., encoded) or "unsafe" (i.e., raw, direct from the user).

Well to me a string is a string is a string. It’s just an array of characters, with no other structure or semantic meaning at all. That’s it. Period. It’s just, you know, a primitive type. If you want a string to have some other overlaid semantic meaning, such as safeness, then it is no longer a simple primitive string. It is a string with new behavior; it is a string with extra properties. Certain actions are allowed with this string, others are not.

And Joel’s argument is that we should implement this through a naming convention? Wow. To me, it sounds like a new type. A class. You know: something that encapsulates data, that enforces specific behavior on that data, that constrains what you can do with the data. Then the compiler can help you maintain type safety and behavior safety. Wow, using the compiler to ensure we don’t write bad code? What a concept.

So, off the top of my head, not saying this is how I’d really do it in a production application, that your mileage may vary, etc, I’d write a UserText class with a constructor that accepted the original string from the Request instance. There would be two methods, GetSafeText() and GetUnsafeText() to return the two variants of the original string. There might be other methods as well: Store() and Load() to save and read the data from the database. Etc.

Think I’m talking rubbish? Look at the Url class in the .NET Framework. See what I’m getting at? Joel would have you prefix string variable names and have your eyeballs enforce URL type safety. The Framework designers didn’t take that na├»ve solution and instead gave us a class with certain behaviors and in using this class the compiler forces us to use URLs properly. That’s just — shock, horror — so type- safe.

In fact, I’d have to say that if you have "primitive data" that has other attributes or properties, or that is constrained in some way, then it should be an object, an instance of a class that you write to enforce the constraints, etc. Let the compiler take the heavy load of making sure you use the data properly, not some wacky naming convention.

Sounds like a plan to me.