. NET Core brings along many optimizations with regards to performance, both in terms of execution speed as well as memory allocation. Examples are optimizations in collections and LINQ extension methods, text processing, networking,… There are also new types and concepts, such as Span that allow doing interesting things. In this article, we will look at how these new concepts can be used.
With the release of. NET Core 2.0, Microsoft has the next major version of the general purpose, modular, cross-platform and open source platform that was initially released in 2016.. NET Core has been created to have many of the APIs that are available in the current release of. NET Framework. It was initially created to allow for the next generation of ASP. NET solutions but now drives and is the basis for many other scenarios including IoT, cloud and next generation mobile solutions. In this series, we will explore some of the benefits. NET Core and how it can benefit not only traditional. NET developers but all technologists that need to bring robust, performant and economical solutions to market.
This InfoQ article is part of the series «. NET Core «. You can subscribe to receive notifications via RSS.
Now that. NET Core is on the streets, Microsoft and the open-source community can iterate more quickly over new features and enhancements in the framework. One of the areas of. NET Core that gets continuous attention is performance:. NET Core brings along many optimizations in terms of performance, both in execution speed as well as memory allocation.
In this article, we’ll go over some of these optimizations and how the continuous stream – or Span, more on that later – of performance work, helps us in our lives as developers.
Before we dive in deeper, let’s first look at the main difference between the full. NET framework (let’s call it. NET for convenience) and. NET Core. To simplify things, let’s assume both frameworks respect the. NET Standard — essentially a spec that defines the base class library baseline for all of. NET. That makes both worlds very similar, except for two main differences:
First,. NET is mostly a Windows thing, where. NET Core is cross-platform and runs on Windows, Linux, Mac OS X and many more. Second, the release cycle is very different.. NET ships as a full framework installer that is system-wide and often part of a Windows installation, making the release cycle longer. For. NET Core, there can be multiple. NET Core installations on one system, and there is no long release cycle: most of. NET Core ships in NuGet packages and can be easily released and upgraded.
The big advantage is that the. NET Core world can iterate faster and try out new concepts in the wild, and eventually feed them back into the full. NET Framework as part of a future. NET Standard.
Very often (but not always), new features in. NET Core are driven by the C# language design. Since the framework can evolve more rapidly, the language can, too. A prime example of both the faster release cycle as well as a performance enhancement is System. ValueTuple. C# 7 and VB. NET 15 introduced “value tuples”, which were easy to add to. NET Core due to the faster release cycles, and were available to full. NET as a NuGet package for full. NET 4.5.2 and earlier, and only became part of the full. NET Framework in. NET 4.7.
Now let’s have a look at a few of these performance and memory improvements that were made.
One of the advantages of the. NET Core effort is that many things had to be either rebuilt, or ported from the full. NET Framework. Having all of the internals in flux for a while, combined with the fast release cycles, provided an opportunity to make some performance improvements in code that were almost considered to be “don’t touch, it just works!” before.
Let’s start with SortedSet and its Min and Max implementations. A SortedSet is a collection of objects that is maintained in a sorted order, by leveraging a self-balancing tree structure. Before, getting the Min or Max object from that set required traversing the tree down (or up), calling a delegate for every element and setting the return value as the minimum or maximum to the current element, eventually reaching the top or bottom of the tree. Calling that delegate and passing around objects meant there was quite some overhead involved. Until one developer saw the tree for what is was and removed the unneeded delegate call as it provided no value. His own benchmarks show a 30%-50% performance gain.
Another nice example is found in LINQ, more specifically in the commonly used. ToList() method. Most LINQ methods operate as extension methods on top of an IEnumerable to provide querying, sorting and methods like. ToList(). By doing this off an IEnumerable, we don’t have to care about the implementation of the underlying IEnumerable, as long as we can iterate over it.
A downside is that when calling. ToList(), we have no idea of the size of the list to create and just enumerate all objects in the enumerable, doubling the size of the list we’re about to return whenever capacity is reached. That’s slightly insane as it potentially wastes memory (and CPU cycles). So, a change was made to create a list or array with a known size if the underlying IEnumerable is in fact a List or Array with a known size. Benchmarks from the. NET team show a ~4x increase in throughput for these.
When looking through pull requests in the CoreFX lab repository on GitHub, we can see tons of performance improvements that have been made, both by Microsoft and the community. Since. NET Core is open source and you can provide performance fixes too. Most of these are just that: fixes to existing classes in. NET. But there is more:. NET Core also introduces several new concepts around performance and memory that go beyond just fixing these existing classes. Let’s look at those for the remainder of this article.
Imagine we want to return more than one value from a method. Previously, we’d have to either resort to using out parameters, which are not very pleasant to work with and not supported when writing async methods. The other option was to use System. Tuple as a return type, but this allocates an object and has rather unpleasant property names to work with (Item1, Item2,…). A third option would be to use specific types or anonymous types, but that introduces overhead when writing the code as we’d need the type to be defined, and it also makes unnecessary allocations in memory if all we need is a value embedded in that object.
Meet tuple return types, backed by System. ValueTuple. Both C# 7 and VB. NET 15 added a language feature to return multiple values from a method. Here’s a before and after:
In the first case, we are allocating a Tuple. While in this example the effect will be negligible, the allocation is done on the managed heap and at some point, the Garbage Collector (GC) will have to clean it up. In the second case, the compiler-generated code uses the ValueTuple type which in itself is a struct and is created on the stack – giving us access to the two values we want to work with while making sure no GC has to be done on the containing data structure.
The difference also becomes visible if we use ReSharper’s Intermediate Language (IL) viewer to look at the code the compiler generates in the above examples. Here are just the two method signatures:
We can clearly see the first example returns an instance of a class and the second example returns an instance of a value type. The class is allocated in the managed heap (tracked and managed by the CLR and subject to garbage collection, mutable), whereas the value type is allocated on the stack (fast and less overhead, immutable). Or in short: System. ValueTuple itself is not tracked by the CLR and merely serves as a simple container for the embedded values we care about.
Note that next to their optimized memory usage, features like tuple deconstruction are quite pleasant side effects of making this part of the language as well as the framework.
We already touched on stack vs. managed heap in the previous section. Most. NET developers use just the managed heap, but. NET has three types of memory we can use, depending on the situation:
All have their own advantages and disadvantages, and have specific use cases. But what if we want to write a library that works with all of these memory types? We’d have to provide methods for each of them separately. One that takes a managed object, another one that takes a pointer to an object on the stack or in the unmanaged heap. A good example would be in creating a substring of a string.