Readability Panel
Capitalise Main Text
Increase Word Spacing
Use another Font
More Coming Soon!!

Saturday, 18 January 2014

Still Alive

No, this isn't a post about Portal.

Just to say I'm still around, and trying to find time to work on this blog. I've been incredibly busy with work since August and we're also now in the middle of moving house - but I've got some ideas around leveraging the power of AMD's HSA, the architecture behind their relatively new line of APUs. That said, anything I post on the issue may well not be C#!

Posts in the pipeline:

  • Rolling your own DI container in C#
  • DI Frameworks (not sure this is necessary)
  • Why I hate the Null Object pattern
  • OCP (Open/Closed Principle) - and why it's important
  • SRP (Single Responsibility Principle) - and why it's important
  • rosettacode.org - go there now!
  • Simplifying ADO.Net (in which I aim to discuss means to avoid the XML Spaghetti nightmare)
  • Self-Hosting WCF Services
  • Fractional calculations in C#
Again, there's a box on the right for suggestions.

Saturday, 24 August 2013

Why I Don't Like Frameworks and XML Configuration

One of the comments on my Singleton Pattern blog post simply gave the name of a framework followed by a questionmark. My response:

"What about it? :) There are 101 frameworks out there. IME they fall under two categories: Costly, and Poorly Maintained. If they're not Poorly Maintained now, they probably will be later. There can also be resistance in corporate environments to using anything 3rd-party regardless of how good it is, and there are plenty of discussions to be had on distributed library size and closed-source unmaintainability (and the issues around merging 'official' fixes with 'private' tweaks in open-source libraries.)

You could use Microsoft Enterprise Library for Logging, but I'll be doing a blog post on that and Microsoft's Unity DI crap with bits about why having 20,000 lines of app config XML is a Bad Thing. The same applies to WCF."

I aim now to explain in further detail.

API familiarity


Ben Ellis also commented that there are plenty of other ways of achieving the same result. An example he gave was to use Service Locator. There are reasons why you wouldn't want to use Service Locator - although there are counter-arguments that point out that it does come down to how you use the Service Locator. My issue, on the other hand, is impure intent. Service Locator and Unity are two separate technologies that do roughly the same thing. They can be forced to interact, but looking at that link raises one of the most important points I want to make here: "Make sure that you understand the API and code that you are calling. It may not always be black and white." If you are working in a team, or are likely to be bringing in short term contractors to work on the code, or even if there is the likelihood of someone else having to maintain your code, other people might not understand the API. Expecting them to learn all about that API when they've had an urgent production issue dropped on their desk is unfair, and the most likely outcome is that they will bend and twist whatever framework(s) you've chosen to make them behave the way they expect.

XML Configuration


It might seem like I'm picking on Unity a little here. I just did a quick Google search and found The Unity Configuration Schema - the size of this thing is incredible. Even if you split it out into a separate file (which always causes a bit of fun when people remove the reference or it gets missed out of a deployment package) it requires a huge effort to learn how the damn thing goes together, and again at short notice that's going to become tricky. When you've got a number of retail stores unable to trade because of a bug, time really is of the essence. I have Juval Löwy's Programming WCF Services on my desk so I can respond accurately to questions about The WCF Configuration Schema - another monster I've actually developed some code to avoid having to deal with. The problem with having a lot of XML Configuration is that it's difficult to read.

Imagine the following scenario:

- A Junior Developer has been given the task of maintaining an overnight process
- The Junior Developer deploys some new code, but amongst the thousands of lines of XML, has forgotten to update one of the service references to point to a Production server
- The code runs that night at 02:00 and flags up an error to The Tech Support Guy who gets paid to make sure none of the red lights come on.
- That Guy thinks "That's odd, but there was a deployment so I'll have a look and see if there's anything obvious." He's faced with thousands of lines of cryptic XML and hurriedly closes the file.
- He phones the Developer who gets out of bed, fires up their laptop and signs into the corporate network so they can get a copy of the configuration file as it is in Production.
- 30 minutes later, a tired Junior Developer, not in the best frame of mind for looking at code, finally locates the erroneous setting.
- The Developer emails The Support Guy with the fix - locate "http://testserver/" with "http://prodserver/".
- The Support Guy does so - but also accidentally deletes the opening quote from the bad setting, his view obscured by ragged line lengths and settings interspersed by hierarchical elements.
- The code still doesn't work.
- They agree to come back to it in the morning, and the Junior Developer loses more sleep.

Now let's say we've abolished complex XML (my WCF configuration has App Settings like IService_Host) and replay the scenario:

- A Junior Developer has been given the task of maintaining an overnight process,
- The Junior Developer deploys some new code, but - as unlikely as it now is - forgets to change one of the few settings in the XML file.
- The code runs that night at 02:00 and flags up an error to The Tech Support Guy who gets paid to make sure none of the red lights come on.
- That Guy thinks "That's odd, but there was a deployment so I'll have a look and see if there's anything obvious." He opens the config file, and notices that one of the _Host settings is wrong. He knows what "Host" means, being a network guy.
- He phones the Developer and says "There was a problem. I looked at the config file and one of the Host settings says testserver. Should I change it to prodserver and re-run the program?"
- The Developer says "Oh, sorry. Yes, please change that."
- The configuration gets changed, the process runs, the Developer gets a good nights sleep.

Frameworks try to be All Things to All Men


Frameworks very often try to cover all bases. They try to be configurable in many, many ways so as to cater for the many, many ways people want to work. That means they are open to abuse by people who don't know how to configure them properly and misuse by those who don't have the time to understand how they're meant to be configured. This comes back to maintainability, too. If you have an urgent problem, you're likely to simply create a workaround for the framework - through misconfiguration - not doing what you need. That workaround won't ever get changed.

Frameworks try to use their own Language


Enterprise Library has "Blocks". Someone decided that instead of "Assemblies", "Namespaces" or "Classes", Enterprise Library should introduce a new concept. These "Blocks" are described by Microsoft as "Reusable Software Components" - so what was wrong with using the word "Components"? The Logging "Block" has "Listeners" while in the traditional logging paradigm one would call a method that Pushes a Log item to wherever it's going. Indeed, you still use the Enterprise Library Logging Block in this way. Entity Framework has "Entities" rather than "Classes", "Tables", "Structures" or even "Object Definitions" - terms that pretty much any developer would understand. It goes further. It has "Associations" instead of "Relationships". There's an enum of "Primitive Types" that map directly to already defined CLR types - and, of course, the already present DbType enumeration. NHibernate uses a "Session" instead of a "Connection" - I can see how the concept of a "connection" is somewhat outdated, but it's terminology that will be understood by a wider audience. This introduction of new terms does nothing to help those who need to pick up some code and fix it fast. It does nothing to improve on-boarding speed for developers new to a project.

Frameworks have unknown side-effects


The Enterprise Library Logging "Block" (see above comment about Language) has LOADS of built-in "Listeners". We don't know (without using a tool like dotPeek) how good the code is, and even if we open it up, it might be calling code we can't get or could be too complex to delve into properly. To take Logging as an example, I've written my own because I need there to be NO thread blocking when I'm writing a log entry. I need to log and return as fast as possible. I do this by calling a static class which stores the entry in a Queue which gets drained on another thread. The point is to remove any delay that writing to disk might cause. I don't know what Enterprise Library does internally. Someone could write a new Listener that tries to write to a database - which would seriously spanner the performance of the process that's trying to log. There was a time when we have had a service timing out in the middle of the day, around 2pm. It was fine the next morning. It turned out Enterprise Library was creating a file so large that the sum of the time taken to perform all the logging was exceeding 30s.

To divert from Logging, we have had a case where we were using a very basic ORM layer that had been in active development when the team lead decided to use his friend's project. By the time we found the performance limitations and memory leak issues, it had been abandoned and the code was unavailable to us. Another, more common ORM layer - Entity Framework - has particularly poor performance when you're trying to update a large number of records. It's not really designed for handling 300,000 records at once, since it attempts to "observe" changes and manage the persistence of those changes. Writing each record at a time was slow because of the setup & teardown cost in the Save method. Writing all 300,000 records in bulk was far too unwieldy. We had to save them in batches, in a manner discussed in this StackOverflow post.



Of course, there are times that it DOES make sense to use a framework, but like design patterns, you should use a framework because you believe it's going to enhance the development of the code in some way. Not because you haven't used it before, and not because you want to learn it.

Sunday, 18 August 2013

A few notes on "best" practices

"That's how we've always done it!" - that statement is in itself an indicator that you should have a look at what you're doing. You might be doing everything as well as it can be done, but unless you know why the decisions have been made, you don't know if those reasons were - or are still - valid.

Here are five practices I believe are flawed, why I think so, and what I propose instead.

1. "unsafe" casts - (MyClass)obj vs "safe" casts - obj as MyClass


The names "safe" and "unsafe" are misleading. A "safe" cast simply does not throw an exception when the cast fails. In fact it is often preferable to adopt a "fail fast" style of coding, wherein exceptions are thrown as soon as something unexpected occurs, halting the program and preventing data inconsistencies or unexpected logic flows. Using a so-called "safe" cast will simply return a null reference - one possible source of our old friend "Object reference not set to an instance of an object." II do everything I can to avoid this exception. It's unhelpful and unclear. Using an "unsafe" cast, you'll get an explicit InvalidCastException which will tell you exactly what types the code was dealing with at the time, meaning you stand a far greater chance of nailing down the problem faster.

Some people also cite performance as a compelling reason to use safe casts, but I've run some performance tests in the past and found that there's not enough in it to make it an argument worth considering. You can roll your own test harness to prove the point.

If you're going to change from "safe" casts, you can quite easily check for type compatibility before your cast with the is keyword as follows:
if (!(obj is MyClass))
    throw new Exception(string.Format(
        "Object of type {0} was supplied where {1} was expected.",
        obj.GetType().Name, typeof(MyClass).Name);
However, it's worth noting that this doesn't give a great deal more than the InvalidCastException would, unless you need to fail "gracefully."

2. if (obj == null) vs if (null == obj)


"It's a C++ thing." OK, but if someone wrote some weird code and said "It's a Pascal thing" or "It's a Fortran thing," you'd tell them to get on their bike. This is C#. It's a different language. It behaves differently.

So why do people do it at all? Well, back in C++ everything can be cast to a boolean. Unless it has a specific value, "zero" or null is considered false and anything "non-zero" (such as an object) is true. That means that if you were clumsy enough to use the assignment operator (=) in your if statement instead of the equality operator (==) you would convert your object reference to a null *and* escape the "true" half of the conditional statement, which is meant to execute when that reference is null. So some programmers decided that instead of being more careful, they'd just write less readable code that would safeguard them from their own clumsiness - since you can't assign to null, any mistyped assignment operator will cause a build error

In the world of C#, the result of an assignment is the value assigned, which would be null in the unfortunate case of using the assignment operator. But in C#, null cannot be cast to a boolean, which would cause the condition to be uncompilable. So using the assignment operator will cause a compile error even when the arguments are in Left-to-Right order. There is no reason to write unreadable code.

3. private object fieldName vs private object _fieldName


Private class fields are often given an underscore prefix. The premise is that this enables you to tell the difference between a method's parameters and variables and those owned by the class. I've heard others stating that it also allows you to use the same name inside a method. First off, having two things with the same name probably isn't a good idea anyway. If you've got a field called, for instance, username and then you pass a parameter with the same name into a method, the question is raised over which, between the field and the parameter, contains the username that's intended in that context. Is the class even being used in the right way? Classes having state should be immutable - that is to say, they should be instantiated with their state applied through a constructor - or with the information to build that state - without it being manipulated by code outside of that instance. If it's intended to handle different state information, the information should be passed into its entry method(s) and then cascaded through the call path. Imagine two threads hitting a method that takes a username and then stores it in a field called _username before then calling a 2nd method which reads the _username field. Thread 1 enters, and sets the field. Thread 2 enters, and sets the field. Thread 1 then moves on to the next method call, and reads the _username field which is now the wrong user. The two options are to create a new instance for each username, or enable one instance to handle multiple usernames by removing the field completely.

When refactoring, the first thing I do is remove any field prefixes so I can find any cases where this happens by causing build errors to be raised where there's a name clash. Having those underscores hides - and enables - potentially badly behaved code.

4. catch { throw; } vs catch (Exception ex) { throw ex; }


Throwing the exception will corrupt the call stack (since this becomes the top of the stack) whereas rethrowing the exception (a simple throw statement with no parameter) will preserve it. Moving on from this, if all you're doing is rethrowing the exception then there is absolutely no reason to have a try block unless you have a finally block too.

In the worst case, developers mangle perfectly usable exceptions with cryptic messages and streams of acronyms in an attempt to describe the already accessible call stack (I know of at least one such case in proximity to my current work focus, but I can't touch it,) and in the best case you'll just know the exception was thrown from a catch block and otherwise occurred somewhere in the try block.

Bottom line: Unless you need to do some logging, or expect a specific exception, or have a finally block that must be executed, just don't have a try ... catch block at all. If you do need to do any of those things and are going to rethrow the exception, rethrow it properly.

5. Near-first-use declaration vs Early declaration


By "Early declaration" I'm talking about the idea that you should declare all your variables at the top of a method. What a load of nonsense.

For a long time I've heard it suggested that you should declare variables at the top of the method so you know what variables the method is dealing with. This is usually talked about by the same kinds of people who still think strings should be prefixed with "str," or method parameters should have "p" in front of them - which is obviously all complete insanity. Your method's name should describe what it does. How it does it is irrelevant. Declaring variables early may, at one time, have aided in debugging because one could see how much memory was being allocated up front. But come on, we have heaps of RAM to play with now. We don't need to have tactics for mitigating memory usage in the vast majority of instances.

So why do I dislike Early declaration so much? Surely it doesn't matter where they get declared?

Firstly, declaring the variables early tells you nothing. You can't tell, at a glance, what the method is doing with each of those variables - or that any of the variables is going to be used for what it's named as. You can check on the sanity of the method in a code review, or as part of a refactoring effort. What's important is that the method has an appropriate and descriptive name.

Secondly, if you declare your variable too early, you risk attempting to use the variable before it's been assigned. It's easily done - you've got some complex logic that's difficult to split into separate methods, and you need to use a specific value. There's a variable with that name, so you use it. More likely, someone else is debugging the code and has found that they need that variable to do something, so they try to use it. Then it turns out the variable hadn't been assigned and you've got more work fiddling around to see if you could do what you expected, or finding another way to achieve the same result.

Thirdly is the issue of Variable Re-use. That is, reassigning a variable. This is when its meaning changes half way through the execution of the code. At any given time, there is ambiguity about the value of the variable. Is it null? Is it the first value to be assigned? Or is it something else? The same applies to loops - and I've seen this happen even recently - where the variable is declared before the loop, reassigned n times inside the loop, and then used by an innocent developer later in the code. It will only contain the last value assigned.

So, how does declaring closer to the usage help?

Well, if the variable is declared in scope - that is, between a set of curly braces - it will only be considered to exist within that scope. It becomes far more difficult to 'misuse' the variable or inspect it prematurely, and that's better all round. You're more likely to see when it's completely unused, too. Even if it does't go in a set of braces, moving the declaration nearer to the assignment (or combining both actions into one call) will still prevent it being used prematurely. Tools like Resharper will make it easier to move variables in this way, and will warn you about unused variables.

Saturday, 10 August 2013

The Singleton Pattern in C#, its pitfalls and a Solution

The Singleton Pattern, for those who don't know, is a pattern that enforces single instantiation of a class where only one class is wanted. Like all patterns, one should be aware of the reasons for choosing to use a Singleton over a Static class. Sometimes it's said that the Singleton pattern is overused. I would contend that it is overused by those who have not given proper thought to whether they should be using a Singleton. On the other hand, if one has used a Singleton for one part of an application, it may make it easier to follow if other classes in that application behave in the same way. As per my first post on this blog, I code as part of a team. Code should be readable to not only me, but to anyone who comes to the code after me. This is a concept that seems to be forgotten by many. It's not your code. It's the team's code. If you've decided to do one part of the code one way, try to maintain that theme throughout.

On to the matter at hand...

Some would say that the Singleton Pattern is constructed thusly:

    public class MyClass
    {
        private static MyClass instance;

        public static MyClass Current
        {
            get
            {
                return instance ?? (instance = new MyClass());
            }
        }
    }

This is all well and good, but we'll have to rewrite this exact same code every single time we want a class to be a Singleton. There may also be variations: some might call the property "Instance" or "CurrentInstance" or "Singleton". Some might initialise the instance statically, obviating one of the points of using the Singleton pattern.
With this in mind, my first stab at solving the problem of repeated code (and inconsistent implementation) was to create a base class using Generics:

    public class MyClass : Singleton<MyClass>;
    {
    }

    public abstract class Singleton<T> where T : class, new()
    {
        private static T instance;
          
        public static T Current
        {
            get
            {
                return instance ?? (instance = new T());
            }
        }
    }

There are a couple of issues with this approach, however.

Firstly, while MyClass can implement an interface, it is immediately prevented from being a subclass. This might be acceptable if you want to enforce a rule that all Singletons have no Inheritance, but then you should also make all Singletons sealed and asking everyone to implicitly be aware of this could be a tall order.

Secondly, and perhaps more importantly, there's this messy scenario to consider:

    public class YourClass : Singleton<MyClass>
    {
    }

"You'd never do that!" - OK, you might never do that, but someone less experienced might. Someone who's trying to refactor your class without actually deleting it might copy your code and change the class name without changing the Generic. This little gem could cause some hilarity down the line. You can't put type checking into Singleton, because Current is static and consequently, GetType() is unavailable.

I thought about this problem a little, and it occurred to me that since we have Generics, and Activator.CreateInstance, there's no reason not to have a generic Singleton Factory. That is to say we have a static class whose job it is to maintain a keyed list of instances of types. The key is the Type, so it's impossible to have two items in the list being instances of the same Type.

    public static class SingletonFactory
    {
        private static readonly Dictionary<Type, object> instanceDictionary
                                = new Dictionary<Type,object>();

        public static T Get<T>() where T : class, new()
        {
            return (T) Get(typeof (T));
        }

        public static object Get(Type t)
        {
            lock (instanceDictionary)
            {
                if (!instanceDictionary.ContainsKey(t))
                    instanceDictionary.Add(t, Activator.CreateInstance(t));
                return instanceDictionary[t];
            }
        }
    }

This is a simple example. I've got a more advanced version that checks if the type is a class and whether it has a default constructor (public or otherwise) and will then invoke a private constructor if necessary. This enables me to protect classes intended to be used as Singletons from being accidentally instantiated by giving them private default constructors. Of course, others may still favour the Singleton base class - under which circumstances, you might imagine it possible to create an instance of the Singleton with the Factory and another instance from the Base Class. We can deal with that too:

    public abstract class Singleton<T> where T : class, new()
    {
        public static T Current
        {
            get { return SingletonFactory.Get<T>(); }
        }
    }

Here, the Singleton Base Class uses the SingletonFactory as its source of the instance of T instead of holding its own private reference.

After creating the Singleton Factory, I thought to myself "This can't be an uncommon problem." All the same, I hadn't really heard anyone complaining about problems with Singletons. I had a quick Google, and it turns out the Singleton Factory has been around for a while in the land of Java, but it's seen less frequency in C#.

Who are you, and what is this?

Hello,

My name is Hugh. I'm a software engineer with a successful 13 year career behind me. I write fast, clean, efficient code, and I'm passionate about it. I code mainly in C# at the moment. I know you can write faster code in other languages, but I do C# day in, day out, for a corporate client who run on the Microsoft stack so it's in my best interest to keep doing this, rather than writing machine code or programming ASICs.

On this blog I intend to post stuff that I've done or that I'm tinkering with. There will be code. It will be C#. There will be splurges of ideas. There may be some analysis. When I post code I will endeavour to discuss the design decisions, and will try to field any comments that get made. I code as part of a team. I try to write code that other people can understand, and I hope for the same in return.

This blog will be technical, about writing actual code and overcoming issues that might not at first be evident. Another blog worth checking out is Stephen's Coding in the Trenches where he discusses design concepts at a higher level as well as reviewing development tools and blabbering on about training. There may be some overlap from time to time. Although he codes in objective C, Daren also makes some good industry-wide observations about TDD and Agile methodologies. He's also planning a video series.

I can't make you a website, but Mark at Radical Geek Web Solutions can if you pay him. He'll also give you access to a comprehensive project management portal.

I could fix your PC, but you wouldn't ask the designer of a hydroelectric dam to unblock your drain. If you can't do it yourself, take it to one of the many capable independent computer repair shops.