Readability Panel
Capitalise Main Text
Increase Word Spacing
Use another Font
More Coming Soon!!

Sunday, 18 August 2013

A few notes on "best" practices

"That's how we've always done it!" - that statement is in itself an indicator that you should have a look at what you're doing. You might be doing everything as well as it can be done, but unless you know why the decisions have been made, you don't know if those reasons were - or are still - valid.

Here are five practices I believe are flawed, why I think so, and what I propose instead.

1. "unsafe" casts - (MyClass)obj vs "safe" casts - obj as MyClass


The names "safe" and "unsafe" are misleading. A "safe" cast simply does not throw an exception when the cast fails. In fact it is often preferable to adopt a "fail fast" style of coding, wherein exceptions are thrown as soon as something unexpected occurs, halting the program and preventing data inconsistencies or unexpected logic flows. Using a so-called "safe" cast will simply return a null reference - one possible source of our old friend "Object reference not set to an instance of an object." II do everything I can to avoid this exception. It's unhelpful and unclear. Using an "unsafe" cast, you'll get an explicit InvalidCastException which will tell you exactly what types the code was dealing with at the time, meaning you stand a far greater chance of nailing down the problem faster.

Some people also cite performance as a compelling reason to use safe casts, but I've run some performance tests in the past and found that there's not enough in it to make it an argument worth considering. You can roll your own test harness to prove the point.

If you're going to change from "safe" casts, you can quite easily check for type compatibility before your cast with the is keyword as follows:
if (!(obj is MyClass))
    throw new Exception(string.Format(
        "Object of type {0} was supplied where {1} was expected.",
        obj.GetType().Name, typeof(MyClass).Name);
However, it's worth noting that this doesn't give a great deal more than the InvalidCastException would, unless you need to fail "gracefully."

2. if (obj == null) vs if (null == obj)


"It's a C++ thing." OK, but if someone wrote some weird code and said "It's a Pascal thing" or "It's a Fortran thing," you'd tell them to get on their bike. This is C#. It's a different language. It behaves differently.

So why do people do it at all? Well, back in C++ everything can be cast to a boolean. Unless it has a specific value, "zero" or null is considered false and anything "non-zero" (such as an object) is true. That means that if you were clumsy enough to use the assignment operator (=) in your if statement instead of the equality operator (==) you would convert your object reference to a null *and* escape the "true" half of the conditional statement, which is meant to execute when that reference is null. So some programmers decided that instead of being more careful, they'd just write less readable code that would safeguard them from their own clumsiness - since you can't assign to null, any mistyped assignment operator will cause a build error

In the world of C#, the result of an assignment is the value assigned, which would be null in the unfortunate case of using the assignment operator. But in C#, null cannot be cast to a boolean, which would cause the condition to be uncompilable. So using the assignment operator will cause a compile error even when the arguments are in Left-to-Right order. There is no reason to write unreadable code.

3. private object fieldName vs private object _fieldName


Private class fields are often given an underscore prefix. The premise is that this enables you to tell the difference between a method's parameters and variables and those owned by the class. I've heard others stating that it also allows you to use the same name inside a method. First off, having two things with the same name probably isn't a good idea anyway. If you've got a field called, for instance, username and then you pass a parameter with the same name into a method, the question is raised over which, between the field and the parameter, contains the username that's intended in that context. Is the class even being used in the right way? Classes having state should be immutable - that is to say, they should be instantiated with their state applied through a constructor - or with the information to build that state - without it being manipulated by code outside of that instance. If it's intended to handle different state information, the information should be passed into its entry method(s) and then cascaded through the call path. Imagine two threads hitting a method that takes a username and then stores it in a field called _username before then calling a 2nd method which reads the _username field. Thread 1 enters, and sets the field. Thread 2 enters, and sets the field. Thread 1 then moves on to the next method call, and reads the _username field which is now the wrong user. The two options are to create a new instance for each username, or enable one instance to handle multiple usernames by removing the field completely.

When refactoring, the first thing I do is remove any field prefixes so I can find any cases where this happens by causing build errors to be raised where there's a name clash. Having those underscores hides - and enables - potentially badly behaved code.

4. catch { throw; } vs catch (Exception ex) { throw ex; }


Throwing the exception will corrupt the call stack (since this becomes the top of the stack) whereas rethrowing the exception (a simple throw statement with no parameter) will preserve it. Moving on from this, if all you're doing is rethrowing the exception then there is absolutely no reason to have a try block unless you have a finally block too.

In the worst case, developers mangle perfectly usable exceptions with cryptic messages and streams of acronyms in an attempt to describe the already accessible call stack (I know of at least one such case in proximity to my current work focus, but I can't touch it,) and in the best case you'll just know the exception was thrown from a catch block and otherwise occurred somewhere in the try block.

Bottom line: Unless you need to do some logging, or expect a specific exception, or have a finally block that must be executed, just don't have a try ... catch block at all. If you do need to do any of those things and are going to rethrow the exception, rethrow it properly.

5. Near-first-use declaration vs Early declaration


By "Early declaration" I'm talking about the idea that you should declare all your variables at the top of a method. What a load of nonsense.

For a long time I've heard it suggested that you should declare variables at the top of the method so you know what variables the method is dealing with. This is usually talked about by the same kinds of people who still think strings should be prefixed with "str," or method parameters should have "p" in front of them - which is obviously all complete insanity. Your method's name should describe what it does. How it does it is irrelevant. Declaring variables early may, at one time, have aided in debugging because one could see how much memory was being allocated up front. But come on, we have heaps of RAM to play with now. We don't need to have tactics for mitigating memory usage in the vast majority of instances.

So why do I dislike Early declaration so much? Surely it doesn't matter where they get declared?

Firstly, declaring the variables early tells you nothing. You can't tell, at a glance, what the method is doing with each of those variables - or that any of the variables is going to be used for what it's named as. You can check on the sanity of the method in a code review, or as part of a refactoring effort. What's important is that the method has an appropriate and descriptive name.

Secondly, if you declare your variable too early, you risk attempting to use the variable before it's been assigned. It's easily done - you've got some complex logic that's difficult to split into separate methods, and you need to use a specific value. There's a variable with that name, so you use it. More likely, someone else is debugging the code and has found that they need that variable to do something, so they try to use it. Then it turns out the variable hadn't been assigned and you've got more work fiddling around to see if you could do what you expected, or finding another way to achieve the same result.

Thirdly is the issue of Variable Re-use. That is, reassigning a variable. This is when its meaning changes half way through the execution of the code. At any given time, there is ambiguity about the value of the variable. Is it null? Is it the first value to be assigned? Or is it something else? The same applies to loops - and I've seen this happen even recently - where the variable is declared before the loop, reassigned n times inside the loop, and then used by an innocent developer later in the code. It will only contain the last value assigned.

So, how does declaring closer to the usage help?

Well, if the variable is declared in scope - that is, between a set of curly braces - it will only be considered to exist within that scope. It becomes far more difficult to 'misuse' the variable or inspect it prematurely, and that's better all round. You're more likely to see when it's completely unused, too. Even if it does't go in a set of braces, moving the declaration nearer to the assignment (or combining both actions into one call) will still prevent it being used prematurely. Tools like Resharper will make it easier to move variables in this way, and will warn you about unused variables.

No comments :

Post a Comment