Why C# is Not My Favorite Programming Language

by Fred Mameri


There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code.

— Flon’s Law

Abstract

This post is an attempt to recreate a similarly titled paper I published internally at Microsoft many years ago. The title is a reference (and homage) to Brian Kernighan’s 1981 similarly titled paper about Pascal.

It is my opinion that C# is not the first language I would choose for a project, and through this post I will lay my reasons for that. This is not an attempt to compare C# to any one language, but rather to discuss C# for what it is. I wrote C# code professionally for a few years, and much of this post will be based on my own personal experience.

Overview

I started writing C# code many years after I had been writing C, C++, and Haskell. The first project I undertook in C# was back in 2003, when my team and I tried to write a database engine in C#. I went into the project fairly optimistically, with no preconceived notions of C#. That turned out to be the first of many times I wished I were not using C#.

I have divided the rest of this post into the following categories:

  • Types
  • Resource management
  • Other language problems
  • .NET problems

For each of these, I will present my observations as why some of the choices in the language are not ideal.

C# may have pushed Java to be a better language. It may have enabled programmers, otherwise not trained or qualified enough to develop C++ programs, to write code. It deserved credit for both these things; but it falls short when it comes to developing complex systems.

Types

No type synonyms

One of the advantages (and purposes) of using types in the first place is to abstract away the representation of the underlying data, even if it’s trivial. Languages like C++ and Haskell provide a mechanism for creating new types that are synonymous with an exist type (typedef in C++, or type in Haskell).

Take the following code:

class Game {
public:
    typedef unsigned int Score;

    Score getScore() const;
};

Experienced developers use this construct extensively in order to increase readability, maintainability, and portability.

There is very simply no way to achieve the same effect in C#. To be fair, there is a using statement, but the type synonym is not exported outside of the current .cs file. The outside world will see the original type.

Heavy types

A common feature found in other languages is to redefine an underlying type so that it is not interchangeable with the new type (non-isomormic identities).

For example, in C++ you could do something like this:

struct Name {
    string value;

public:
    explicit Name(string &value) : value(value) {}
};

void print(Name n);

print(Name("Fred")); // ok
print("Fred"); // error

The beauty here is that Name (which is a compile-time concept) has no overhead over string. The size and performance of using both types is identical – the only difference is the compile-time check I get from it.

Sure, one could do the same in C#. The problem is that the newly defined type is not free. There is a non-trivial amount of overhead associated with defining a type in .NET. From the compiler, to the type metadata in the assembly, to the JIT compiler, to oftentimes the final machine code itself, defining new types is expensive.

Enums

Enums are a great way to make code more legible. This is another feature in programming languages that experienced developers will use extensively to make code more readable and maintainable. And they come with no overhead! They are replaced by the value at compile time.

enum class CacheMode : uint8_t {
    NO_CACHE,
    READ_THROUGH,
    CACHE_ONLY,
};

void read(Cache &cache, CacheMode mode);

read(cache, CacheMode::READ_THROUGH); // very easy to read

C# does have enums, but they are expensive, as they carry metadata (they are not a free compile-time check on top of an integral type). When writing C++ code, I use enums liberally, sometimes even replacing booleans for specific operations:

enum class ShouldFlush : bool {
    NO = false,
    YES = true,
};

void write(Data &data, ShouldFlush flush) {
    if (ShouldFlush::YES == flush) { // still no overhead
    }
}

write(data, ShouldFlush::NO); // easy to read!

As a side note, I happen to know that Anders was against including enums in the language, advocating instead that folks should use public static const fields in a static class. He was obviously finally convinced otherwise by the team.

Structs and classes

In C#, the difference between a struct and a class is that struct are value types, whereas classes are reference types. What seems bizarre to me is that in most languages the differentiation is done at instantiation and parameter passing time, not at declaration time.

There are legitimate reasons why value types are used, for example, for performance reasons or as a view of a part of a data buffer.

C# imposes severe artificial restrictions on structs. For example, they cannot inherit from other structs, as can C++ types:

struct Character {
    Level  level;
    Points health;
    Money  money;
};

struct Wizard : Character {
    Points magic;
};

Wizard wizards[10];

Now, you could have an array of references in C#, but that is not equivalent. Arrays of reference types have no memory locality, and are a non-starter for applications that require high-performance (such as graphics intensive applications, or simulations).

The other argument that could be made is that inheritance is not all that important, since you could use containment (has-a) instead of inheritance (is-a). Well that is an argument in favor of my point against the language forcing classes for everything.

No bit fields

C# does not support bit fields. To be fair, C# supports this Flags attribute on enumerations, which achieves the same effect in a much more error-prone and verbose way.

For example, in C++ you could do something like this:

struct Status {
    bool read_only : 1;
    bool deletable : 1;
    bool versioned : 1;
};

// Break the build if Status is not 1-byte long
static_assert(sizeof(Status) == 1, "Size of Status too large");

// somewhere later
if (!table.status.deletable) {
   ...

This representation is compact, efficient, and elegant. It’s also easy to read and maintain.

This particular example could be replicated in C# without bit fields. It would look something like:

[Flags]
public enum Status 
{
    None      = 0,
    ReadOnly  = 1,
    Deletable = 2,
    Versioned = 4
}

// somewhere later
if (!(table.Status & Status.Deletable)) {
   ...

The above example is obviously much more error-prone than its C++ counterpart, since the developer is required to do all the heavy lifting (also, don’t forget to zero the instance of Status upon initialization).

Some C# supporters criticize the use of bit fields as a relic from a bygone era when memory was expensive. This could not be further from the truth: first, consuming less memory and having better performance never hurts (especially if the language makes it is easy and elegant to do so). Secondly, bit fields are extremely useful when dealing with low-level systems. A very good application is for efficiently reading and writing binary files (and hey, some language needs to properly support writing systems such as the .NET CLR, no?).

Whereas the example above was somewhat easy to emulate (albeit more cumbersome and error-prone), more advanced use cases for bit fields become increasingly trickier to emulate. Take the following example:

namespace Zip {
    typedef uint32_t Signature;
    typedef uint16_t Version;

    enum class CompressionOption : uint8_t {
        // options
    };
    enum class CompressionMethod : uint16_t {
        NO_COMPRESSION    = 0,
        SHRUNK            = 1,
        REDUCED_FACTOR_1  = 2,
        // ...
    };

    struct Header {
        Signature             signature;
        Version               version;
        struct {
            bool              encrypted         : 1;
            CompressionOption compression       : 2;
            DataDescriptor    descriptor        : 1;
            bool              enhancedDeflation : 1;
            //
            // other bit fields
            //
        }                     flags;
        CompressionMethod     method;
    };

    static_assert(sizeof(Header) == 30, "Wrong size of ZIP header");
}

In the above example, there’s no hacks, no messy annotations. It’s just code that’s semantic, clean, easy to read, and self-documenting. Reading and writing this type is also very fast: it can be done as a binary block (and should endianness correction be required for a platform, it can be done extremely efficiently after being read to main memory).

Poor support for unions

I’ll be the first to admit: unions can be abused, and if misused, can cause serious trouble. At the same time, if used correctly, they are an extremely powerful feature that helps write clean and semantic code.

I have used unions extensively when writing image manipulation software. You can then define the data types and operations very clearly and elegantly, as follows:

union RgbaColor {
    typedef uint8_t Channel;

    uint32_t    rgbaValue;
    uint32_t    rgbValue : 24;
    Channel     rgb[3];
    Channel     rgba[4];
    struct {
        Channel r;
        Channel g;
        Channel b;
        Channel a;
    }           channels;

    RgbaColor(uint32_t rgbaValue) :
        rgbaValue(rgbaValue)
    {
    }
    RgbaColor(uint32_t rgbValue, Channel alpha) :
        rgbValue(rgbValue)
    {
        channels.a = alpha;
    }
    RgbaColor(Channel r, Channel g, Channel b) :
        channels({r, g, b, 0})
    {
    }
    RgbaColor(Channel r, Channel g, Channel b, Channel a) :
        channels({r, g, b, a})
    {
    }
};

// In this example, it is convenient to use a sized array
// of channels, and then a separate alpha channel
RgbaColor toGrayScale(RgbaColor color) {
   RgbaColor::Channel value = average(color.rgb);
   RgbaColor result(value, value, value, color.channels.a);

   return result;
} 

// In this example, it is convenient to use see the colors
// as one number
RgbaColor average(RgbaColor c1, RgbaColor c2) {
   RgbaColor result((c1.rgbValue + c2.rgbValue) / 2, 0);

   return result;
}

In C#, there’s no syntactic way for creating unions (which is weird, because C# has way more syntax than it needs to, or should). But if you really want to, you can use the StructLayout attribute to emulate that (the type is actually called StructLayoutAttribute, but don’t even get me started on the compiler schizophrenia of dropping the Attribute part of the name).

The StructLayout attribute, combined with the FieldOffset attribute somewhat emulate the behavior of unions. The problem is that they do not work but on the simplest examples. The language was designed to be used as an OO-language with reference types, and has very limited support for anything outside of that.

In C#, the RgbaColor example above would look like this:

[StructLayout(LayoutKind.Explicit, Size=4)]
public unsafe struct RgbaColor {
    [FieldOffset(0)] public uint rgbaValue;

    // - no way to declare rgbValue
    // could add a getter/setter that leave the a channel unchanged

    [FieldOffset(0)] public fixed byte rgb[3];
    [FieldOffset(0)] public fixed byte rgba[4];
    [FieldOffset(0)] public Channels channels;

    struct Channels {
        byte r;
        byte g;
        byte b;
        byte a;
    }

In order to use “fixed arrays” (which is a C/C++ style array “inlined” in the data structure), the struct has to be marked with unsafe. This is inconvenient for two reasons: first, I need to now enable the /unsafe option to the compiler. But most importantly, unsafe is leaky: any code that reads that array must be marked as unsafe too.

Writing something actually useful with StructLayout and FieldOffset is difficult. But even if successful in doing so, another major obstacle is a lot of definite assignment compiler errors when trying to access the fields in the union.

My suspicion is that since unions are not first-class concepts in the language, the code in the compiler that performs the definite assignment checks is unaware that a certain type is a union, and treats it like any other type. Furthermore, it would need to maintain a map of which memory areas of the object have been initialized, and which fields map to which areas in order for the checks to be accurate.

No const-correctness

The C++ FAQ starts the section on const-correctness with the words “A good thing”. Const-correctness tightens the belt on type safety, by restricting which operations are allowed on types modified with the const keyword. It is as a way to ask the compiler to remind you that you do not wish to change a certain value.

In C#, if I try to do this:

const List<int> list = new List<int> { 1, 2, 3 };

I get the following compiler error:

error CS0134: 'list' is of type 'System.Collections.Generic.List<int>'.
   A const field of a reference type other than string can only be
   initialized with null.

The only possible value for a const reference variable is null. In other words, const-correctness does not work in C#. But it gets even worse, as the compiler does not enforce const-correctness at compile time. For example, this code compiles in C# (and then throws during runtime):

const List<int> list = null;
list.Add(1);

For comparison, the same code in C++:

const list<int> list = {1, 2, 3};
list.push_back(4);

The C++ compiler gives me the following error:

error: no matching member function for call to 'push_back'
note: candidate function not viable: 'this' argument has type
   'const std::list<int>', but method is not marked const

Finding errors during compile-time (instead of runtime) is very obviously desirable. But const-correctness also has another very important role: it serves as an important hint to the optimizer. Knowing that a certain value cannot change gives the compiler the opportunity to perform optimizations, such as (from Stack Overflow):

  • incorporating the object’s value directly into the machine instruction opcodes
  • complete elimination of code that can never be reached because the const object is used in a conditional expression that is known at compile time
  • loop unrolling if the const object is controlling the number of iterations of a loop

No multiple inheritance

If you buy into the OO paradigm, multiple inheritance is a very natural and desirable feature. A Button is both a Rectangle and a ClickTarget, regardless of what C# supports. Not supporting multiple inheritance just means that the code has to be designed in such a way that makes it harder to read and maintain.

Some people criticize multiple inheritance as inherently unsafe (pun intended). This feature is not unsafe or evil. The main problem with it is the ambiguity caused when resolving inherited members. The language must then provide disambiguation mechanisms, which are oftentimes confusing.

Here’s my problem with that argument: it restricts the capabilities in the language and caters to a few developers who would be confused at the expense of those who deem the feature useful. If you are a developer, and you are confused by a certain language feature, then here’s what you do: don’t use that feature.

Also, not supporting multiple inheritance does not mean the problems do not exist: for instance, Java 8 has become susceptible to the diamond problem by introducing default interface methods. The diamond problem is not insurmountable, and there are ways to solve it: For example, C++ follows each inheritance path separately, as well as forcing the programmer to disambiguate the path to the parent. Another related feature is virtual inheritance.

No templates

C# generics are not nearly as powerful as templates. From Wikipedia:

Although C++ templates, Java generics, and .NET generics are often considered similar, generics only mimic the basic behavior of C++ templates.[4] Some of the advanced template features utilized by libraries such as Boost and STLSoft, and implementations of the STL itself, for template metaprogramming (explicit or partial specialization, default template arguments, template non-type arguments, template template arguments, …) are not available with generics.

Templates are so powerful, and their use is so essential to a modern C++ developer that working in a language without them simply feels limiting.

With default template arguments, non-type arguments, template specialization, and the principle of SFINAE (which is useful for compile-time introspection), there’s absolutely nothing in the C# world that nearly resembles the power of templates.

No way to know the size of an object

It makes me deeply uncomfortable to not have a very good idea of how much memory my application is going to consume.

Let’s start with just something as simple as determining the size of an object in memory. In C++, it’s easy. There’s a built-in operator (sizeof) for that:

sizeof(obj);

In C#, one used to have to use unsafe code to do it:

RuntimeTypeHandle th = obj.GetType().TypeHandle;
unsafe
{
    int size = *(*(int**)&th + 1);
}

Starting in .NET 4, they made it slightly better to query the size of the object. I can now do it without resorting to unsafe code:

RuntimeTypeHandle th = obj.GetType().TypeHandle;
int size = Marshal.ReadInt32(th.Value, 4);

Even though this code no longer needs to be in an unsafe block, it is not any safer than the first version. There is, after all, a very implicit assumption about the size and position of that field.

Still, knowing the size of the object does not tell me much. It doesn’t tell me, for example, how the overhead in the object grows in comparison to the data in the object (if at all). It doesn’t tell me the overhead associated with the object in the garbage collector. It doesn’t tell me how the garbage collector itself grows over time.

Take the following example:

class Empty
{
}

Empty obj = new Empty();
RuntimeTypeHandle th = obj.GetType().TypeHandle;
int size = Marshal.ReadInt32(th.Value, 4);

In the example above, size is 12. If you replace Empty with Small (a class with one integer), the size is still 12.

Poor support for fixed arrays

In one of the examples above, I used fixed arrays in a union. Support for fixed arrays (which are commonly used in real-world applications) is limited to primitive numerical types in C#.

That means that something like this:

enum class Suit {
   HEARTS,
   DIAMONDS,
   CLUBS,
   SPADES,
};

enum class Value {
   ACE   = 1,
   TWO   = 2,
   THREE = 3,
   // ...
   JACK  = 10,
   QUEEN = 11,
   KING  = 12,
};

struct Card {
   Suit  suit;
   Value value;
};

struct Player {
    Card myHand[5];
};

would be difficult to represent so that sizeof(Player) == sizeof(Card[5]). There are cases where such a property is desirable.

Resource management

Before I start about C#, let me talk about C++. C++ has a brilliant concept called RAII (Resource Allocation Is Initialization). The way it works is that objects are constructed when they go into scope, and destructed when they go out of scope.

Sure C# is garbage-collected. But it is an incomplete solution to a more comprehensive problem. Resources are more than just memory – resources are files, locks, network sockets, slots in a buffer, memory, and countless others.

In a sense, memory is the least interesting resource. If you are leaking memory, the effects are likely to be far less dire than leaking a semaphore, for instance. (And don’t get me wrong, I’m not saying it’s not bad).

Now this is not about C++, but let me set something straight. Constructors and destructors are absolutely the right way to go when it comes to managing resources. C++-style constructors and destructors, combined with copy constructors, allow for very efficient resource management. They allow for management of memory, as well as any other resource your program might use.

Before you tell me about memory leaks in C++, let me tell you this: Bjarne Stroustrup (the creator of C++) is the first to say that if you are doing manual mallocs and frees in C++, then you are doing it wrong. In C++, are you supposed to be doing RAII-style management of everything, including memory. You can have a full garbage collector if you so wish. But resources are guaranteed by the compiler to always be freed up.

int global_var = 0;
mutex global_var_mutex;

void some_function() {
    lock_guard<mutex> guard(global_var_mutex); // locks the mutex
    global_var++;
}

At the end of some_function, the mutex is always released. It doesn’t matter if the function returned explicitly, implicitly, or threw. The result is always the same: the mutex is released.

When talking about why you shouldn’t use manual mallocs and frees, Bjarne has a quote that I very much enjoy (I heard it from him verbally, and this might be the first transcription of the quote):

It doesn’t matter how good and disciplined you are when dealing with mallocs and frees. It only takes one omission out of the thousands of places you are doing it from to wreak havoc on the system. Let the compiler do it for you.

He is right. This is a good strategy, that we should adopt as much as possible.

In order to walk around the lack of support for a proper deterministic destruction, C# has the notion of finalizers (which they call destructors, but are different from C++ destructors in that they are non-deterministic) and an interface called IDisposable.

If having both Finalize (exposed as a destructor, with the same syntax as a C++ destructor) and IDisposable.Dipose sounds confusing, well, it is.

If your type has unmanaged resources (anything not relating to the garbage collector), then you are expected to implement a destructor (Finalize). If you can’t wait until the garbage collector runs, then you should implement IDisposable.

Placing the burden on the user

Suppose you have a type that contains a private member that implements IDisposable. Because of that, your type should really also implement IDisposable, and your callers must know to call Dispose in a finally block (or wrap your object in a using block).

There are several problems with that. Remember Bjarne’s quote “it only takes one omission to wreak havoc”? Here, the language is placing all the burden to make your object an IDisposable on you. You need to remember to do that (or at the very least, you need to run a tool to tell you to do that). It’s also placing the burden of remembering to use it correctly on your users (which you have no control over).

Leaky abstractions for managing resources

Now suppose you have a public type that does not contain an IDisposable member, but now for the next version of the API you need to add one.

Here’s the problem: your existing users are guaranteed to leak that resource, unless they are willing to go and find every reference to your type and make sure they are invoking Dispose.

Because of that, people have come up with rules such as “public types should always be marked as IDisposable“. Sounds like a lot of work on me. Also, “it only takes one omission”.

Resource management hacks

Obviously all this is very non-ideal. If there’s one resource that you really don’t want to leak is locks. They will cause your system to hang or deadlock very quickly.

So instead of providing a sound, general, and universal solution to resource management (such as RAII), C# went ahead and added a lock keyword (for critical sessions). And a using keyword (for IDisposables).

So at this point we have Finalize, IDisposable, using, and lock, and we are still not at the same level of resource management that RAII provides.

Other language problems

Scoping of functions

C# claims to be a multi-paradigm language, with support for object orientation. In reality, it is object-mandated, with some support for functional programming. I say that because everything must be contained in a class – even things that do not belong in classes.

C# itself ships with a number of those, such as Math.Floor or GC.Collect. As a matter of fact, both Math and GC are static classes (meaning they cannot be instantiated or inherited from). In other words: they are, in fact, not classes at all – they are namespaces.

If I wanted to implement my custom rounding in a language like C++ (don’t ask why I want to do it, it is sometimes necessary in some domains, like circuit design), that would simply be a few functions on a namespace, like so:

namespace math {
    float scaled_round(const float arg);
    double scaled_round(const double arg);
    long double scaled_round(const long double arg);
}

Furthermore, it’s trivial to augment an existing namespace, or to add new specializations to existing templates (more to on that later).

In C#, I’m stuck providing a new class and new implementation of those methods.

namespace Project
{
    public static class MyMath
    {
        public static float ScaledRound(float arg)
        {
            // implementation
        }

        public static double ScaledRound(double arg)
        {
            // implementation
        }

        public static decimal ScaledRound(decimal arg)
        {
            // implementation
        }
    }
}

This approach has several problems. First, it tries to fit everything into a specific model – even when things don’t naturally fit into that model.

Secondly, even if you assume that everything should be object-oriented, this does not fall into the traditional definition of static methods. From Wikipedia:

Static methods are meant to be relevant to all the instances of a class rather than to any specific instance.

And that is very telling. Methods such as floor do not belong to all instances of the Math class. Quite the opposite! It belongs to no instances of the Math class (which can’t even have instances!). It goes to show that these functions would be better served off a namespace. The .NET Framework does support functions on a namespace – it is C# as a language that doesn’t.

Thirdly, it highlights one of my pet peeves with C# – there’s too much unnecessary verbiage around my code (more on that later).

Lastly, and perhaps most weirdly, is that fact that the same operations on slightly different types are exposed through entirely different static classes. Take the System.Data.OracleNumber struct, for instance. Compare and constrast:

// on a primitive number
Math.Floor(Math.PI);

// on an OracleNumber
OracleNumber n(OracleNumber.PI);
OracleNumber.Floor(n);

As a side note: in this particular case, I am left wondering why it is that the implementation of the Oracle DB driver didn’t hide the underlying representation of numbers from the user. I will admit I haven’t looked too much into this, but my first instinct tells me that all APIs should have exposed native types to the user.

Now suppose that we are writing the DB layer itself (and thus have to deal with the marshaling of numbers between different systems). In C++, we would have something like this:

class DBFloat {
    // ...
}; 

/*
 * we can define operations such as
 * - math::constants::pi<DBFloat>
 * - DBFloat std::floor(DBFloat)
 */

// The usage is now this:
// Much more natural than C#
floor(pi<float>());
floor(pi<DBFloat>());

Although the example above is in C++, many other languages have similar concepts. For example, Haskell has a very similar concept through the RealFrac type class.

No local static variables

In C++, if I need a constant (or sometimes a global variable, such as a mutex) that only really applies to one function, I can easily define it next to the place it’s used, as in the following example:

void my_function() {
    static mutex m;
    lock_guard<mutex> guard(m);

    // do something
}

void another_function() {
    static const float R5_RESOLUTIONS[] = {1.0, 1.6, 2.5, 4.0, 6.3};

    // R5_RESOLUTIONS is not created for every call to this function
    // it is read-only, compiler-allocated memory in the read-only globals
    // segment
}

In C#, I would have to declare these to be class-wide. The problem with that is that one is unnecessarily expanding the scope of that variable, thus making the code more error-prone, and less readable and maintainable.

Poor optimizations

Let’s be honest here: the C# compiler is not necessarily known for being a highly-optimizing compiler. In a blog post, Eric Lippert discusses the very few cases where the C# compiler optimizes the output, and talks about some of the cases it purposefully eschews optimizations – for example, in order to preserve certain information in the debugging symbols (in my opinion, this is an argument for a better debugging symbol file format, one which supports temporal references – which the PDB format generated by the C# compiler does not). In this own words:

The /optimize flag does not change a huge amount of our emitting and generation logic. We try to always generate straightforward, verifiable code and then rely upon the jitter to do the heavy lifting of optimizations when it generates the real machine code.

Eric Lippert

By his own admission, the C# compiler relies on the JIT compiler to do most of the optimizations. There are several problems with that:

  • optimizations can be expensive, so ideally all optimizations would happen during build-time, not runtime (and don’t even mention ngen – it’s a joke)
  • just because the JIT can optimize code, it doesn’t mean it always does so
  • there are optimizations that are related to language constructs, and therefore must be done by the compiler, not the linker. Good examples include C’s adoption of the keyword restrict, and C++ const keyword (discussed below).

Eric’s post continues. He lists a few optimizations performed by the C# compiler, and concludes with:

That’s pretty much it. These are very straightforward optimizations; there’s no inlining of IL, no loop unrolling, no interprocedural analysis whatsoever.

Eric Lippert

And again, they leave that to the JIT compiler. The JIT compiler does a terrible job at a lot of basic optimizations.

David Notario, a developer in the .NET JIT compiler team, discussed JIT optimizations in a blog post:

These are some of the reasons for which we won’t inline a method:
– Valuetypes: We have several limitations regarding value types an inlining. We take the blame here, this is a limitation of our JIT, we could do better and we know it. Unfortunately, when stack ranked against other features of Whidbey, [excuse omitted for brevity]
– Complicated flowgraph: We don’t inline loops, methods with exception handling regions, etc…

David Notario

So by the .NET JIT compiler’s team own admission, they don’t inline functions with loops, etc.? I really wonder what that etc means.

So there you have it. Both the C# compiler and the JIT compiler lack in their ability to optimize code.

And don’t even get me started on things that cannot be optimized, such as the metadata arrangement in the assembly, which causes the whole file to be read into main memory and processed. No wonder startup times for .NET applications is so slow.

No separate linking stage

Very simply put: building a C# project does not scale. I have been in projects (when I was a developer at Microsoft) where it would take almost 30 mins to build the full project. And we are not talking about full builds vs. incremental builds – this is exactly the point: incremental builds do not exist for C#.

The C# compiler takes a list of source files, does a two-pass compilation on all of them, and produces a binary. The more files that are passed in, naturally the longer the compilation time.

One way to fix this problem is to generate separate DLLs for small parts of the project. This helps both in keeping the build time of each DLL small, but also by enabling parallel builds for independent DLLs. The problem with this approach is that now your tax is on the runtime. You have to ship more DLLs (and deal with the GAC problems, DLL loading times, etc.).

Another ill-fated approach would be to use modules (some people call them netmodules, after their filename extension). My team at Microsoft tried that, and it was a nightmare. That did not at all work well.

One contrasting approach adopted by many other languages is to have two separate building stages: compiling and linking. The compiler transforms source code into objects. The linker then combines objects into a final binary. Linking is easier and cheaper than compiling.

When working with large projects, that makes a huge difference. Many modern tools exist that allow teams to store compiled objects in a network cache, so in order to get a working binary, downloading the objects and then linking them are all that are required. And if I make changes to one of the source files, only the affected files need to be re-compiled into objects. And then linked together.

Verbiage

When writing C# code, I have this feeling that it has too much syntax around my code. Adding a new source file to a project usually consists of filling in a template containing using clauses, and the namespace and the class declarations. That is such a common pattern that most IDEs will automatically add those for you when you create a new file in the project.

This phenomenon is not exclusive to C#, but C# definitely suffers a lot from it, especially with it’s properties, and getters and setters. Paul Graham summarized this phenomenon well:

Object-oriented programming generates a lot of what looks like work. Back in the days of fanfold, there was a type of programmer who would only put five or ten lines of code on a page, preceded by twenty lines of elaborately formatted comments. Object-oriented programming is like crack for these people: it lets you incorporate all this scaffolding right into your source code. Something that a Lisp hacker might handle by pushing a symbol onto a list becomes a whole file of classes and methods. So it is a good tool if you want to convince yourself, or someone else, that you are doing a lot of work.

Paul Graham, via Jeff Atwood

That a programming language shapes one’s way of thinking is a real phenomenon. Those who have switched to a new language have probably found themselves asking “How do I do X in this language?”, only to be answered something like “You are thinking in a different paradigm. In this language, you don’t have to do X”.

C# verbiage-driven development is also very real. I have seen it first hand. I have seen more people preoccupied with the scaffolding, defining properties and interfaces and pure virtual methods than I care to count.

Eric Lippert (one of the authors of the C# compiler at Microsoft, and a former member of the C# design committee) had the following to say:

What I sometimes see when I interview people and review code is symptoms of a disease I call Object Happiness. Object Happy people feel the need to apply principles of OO design to small, trivial, throwaway projects. They invest lots of unnecessary time making pure virtual abstract base classes — writing programs where IFoos talk to IBars but there is only one implementation of each interface! I suspect that early exposure to OO design principles divorced from any practical context that motivates those principles leads to object happiness. People come away as OO True Believers rather than OO pragmatists.

Eric Lippert, via Jeff Atwood

His was not a criticism of C#, but in my experience there is definitely a lot of Object Happiness in the C# community. I find it hard to make an argument that every single one of those people suffering from Object Happiness is a bad developer. An argument is easier made that there is an underlying force driving them to that – and I believe that force is the language design. I became convinced of that by noticing that this phenomenon is not observed nearly as often among enthusiasts of other languages (such as C++, Haskell, Scala, or even JavaScript!).

There is no partial application

C# LINQ extensions were a welcome functional addition to the language. Using them, however, often involves manually creating lambdas, just because C# doesn’t support partial application.

Suppose a function called translate that takes a source language f, a destination language t, and a string s and returns the translated string. Now suppose I want to translate every string in a list from English to French. In Haskell, I would easily write something like this:

map (translate English French) strings

Notice that the expression (translate English French) is a partial application of translate. It is a new function that takes 1 argument, of type string, and translates it from English to French. Other languages like C++ (through templates!) or JavaScript also have support for partial application.

In C#, however, the burden is on me:

strings.Select(s => Translate(Languages.English, Languages.French, s));

When used many times, and often, it becomes quite tedious to write partially applied functions in C#. And they end up reducing readability (instead of increasing it!) due to unnecessary syntax around the lambda.

There is no escape

This was also a section in the original paper. It is difficult to override the type mechanism when necessary. This same problem originally observed in Pascal by Kernighan in 1980 persists in C# today:

There is no way to override the type mechanism when necessary, nothing analogous to the “cast” mechanism in C. This means that it is not possible to write programs like storage allocators or I/O systems.

Brian W. Kernighan

.NET problems

Signed integers for sizes

Even though C# has unsigned integers (unlike Java), the C# community does not seem to understand when to use them. And the example comes from the .NET Framework class library: things like Count on a list produce an int. How it is possible that a list contains -3 elements is anyone’s guess, and yet, that’s the type they use.

And this is spread consistently across the language. ICollection.Count returns an int. The norm is for indexers to take ints. Bizarrely, Object.GetHashCode() also returns an int, which half of the time yields negative hashes. While there’s nothing per se wrong with a hash value being negative, it’s not really semantic. And that type should be defined as a Hash type, synonymous (but not isomorphic) to uint.

The reason for this weird behavior is that unsigned numbers are not “CLS-compliant”. Still, those are artificial rules. And it doesn’t make the out-of-the-box C# experience any less weird.

Exceptions

Exceptions can be a useful feature, but they do not replace expected codepaths for error cases. The .NET Framework class library oftentimes raises exceptions when a status code would have been adequate (and desirable). For example, File.Open can throw a FileNotFoundException. A file not being there is not an exceptional case at all – I would expect any developer trying to open a file to handle that case as something that can – and does – happen.

Global Assembly Cache (GAC)

In my experience, more problems have been caused by the GAC than solved by it. Version mismatches are common. Of significant importance is the design decision of not being able to load a different assembly version from a file if there’s an assembly with a similar signature in the GAC.

In large systems (and large teams), the GAC tends to cause a lot of problems. Assemblies will oftentimes end up in the GAC as part of setup. Oftentimes (due to bugs during the development cycle, and other reasons) there are some assemblies left behind in the GAC, which makes developing difficult.

The problem happens even when requesting an assembly by filename, even if the full path is provided. In that case, the file is located and opened, and the assembly signature is then extracted. Next, the GAC is probed for an assembly of similar signature, and if one is found, then the file which had been requested by filename is closed and the GAC version is used instead.

The GAC should have been designed to be used as a true cache. Once a file is loaded, the results of JIT optimizing it should be cached for next time, all completely transparent to the application.

Loading untrusted code (plug-ins)

This is not a flaw in the design of the language, but an annoyance inextricably linked to the it by virtue of its runtime. The way .NET loads and manages untrusted DLLs is frustrating to say the least. The problems are deep, systemic, and endemic.

Since this is not a problem with C# per se, I don’t want to spend a lot of time here – but some of the problems include not being able to unload individual DLLs (and having to use AppDomains), the very way AppDomains work, and how data is passed between different AppDomains.

Conclusion

I do acknowledge that a lot of these problems I presented also affect other languages, such as Java. But Java is not my favorite language either!

Through much of this post, I compared C# to C++. This is not to say that C++ is perfect – it has its significant share of problems as well. But in the aspects where I find C# problematic, C++ tended to be a good language to illustrate an alternative.

During my career as a professional software developer, I have written programs in a few languages: C, C++, C#, x86 assembly, JavaScript, and Haskell. I am not including DSL, such as HTML, CSS, or SQL.

The most frustrating of them, by a mile, was C#. The limitations in the language meant that programs were written to satisfy the compiler, rather than long-term requirements, such as maintainability, or user-requirements, such as performance.

The language seemed to direct the team into adopting coding practices I deemed inefficient, such as a liberal usage of design patterns (abstract type factories abounded in the code, as did bridges, proxies, etc.).

In many of those projects, the code was not what I consider good code: semantically clean, small, efficient, and easy to understand. It was an spaghetti of properties, constructors, and factories. Much of the code seemed to be written just so that we could write the code we wanted to in the first place, except in a more complicated way.

I do like many of C#’s features: I like that the compiler performs assignment checks in if expressions. I like it that I can easily check for overflow. I like it that I don’t have to use header files. The tools (especially the Visual Studio IDE) are nice and well-finished.

On the gloomier side, in my opinion, none of those positives is enough to offset the severe limitations in the language – they were nice, but I derived little benefit from them.

The language might be suitable for the development of very small applications, but falls short for large systems. So unless the project is intended to remain perpetually small, the choice of C# is a trap.

The domain of the applications should be considered. Those small applications should not include build tools. I have learned from experience that small tools executed over and over again during the build can significantly slow down a build pipeline, due to the high startup cost.

I feel like it’s a mistake to use C# for the development of large and complex systems. In that sense, it’s a toy language suitable for beginners and amateurs.

Advertisements