• Diving into the .Net JIT engine

    by  • July 19, 2013 • .Net, Programming • 0 Comments

    Managed development using .Net has always been akin to the concept of ‘standing on the shoulders of giants’ where code reuse and using the tools are concerned. Most .Net developers are happily content with the .Net compilers performance however in the embedded space, eecking out the tiniest piece of performance is sometimes essencial, thus we have to dive a little deeper. This usually entails changing the way we code, using different structures or components, and finally adjusting how the code is generated.

    Diving deeper to get these smaller and smaller performance gains usually means in turn the cost of development also rises, and the expertise also increases.

    Note: I’ve always held the belief that the .Net compiler is smarter than I am, so only start tinkering if I know what I am doing.

    The first and easiest step is using Native Image Generation (NGen) to compile all the code where possible to native instructions. Under normal circumstances, the .Net compiler will convert the C# code to MSIL; the Microsoft intermediate language. During the running of the application, segments of code are JIT’ted to Native Code, this is on a dynamic and pre-emptive schedule and is usually transparent. Code that has been JIT’ted gets cached and reused if a method is subsequently called. There is of course a small but noticable overhead during the JIT phase, as well as the JIT’ter itself has to perform the trade off of look ahead code analysis and speed.

    NGen helps to reduce the working set in 2 ways: the application will not need to load the JIT into the process (process specific benefit), and the native image for a library will be shared across multiple managed applications running at the same time (machine wide benefit). As with everything performance related, you can only decide whether using NGen benefits working set for your application by measuring it.

    When to use NGen:

    • Large applications: Applications that run a lot of managed code at start up are likely to see wins in start up time when using NGen. Microsoft Expression Blend for example, uses NGen to minimize start up time. If a large amount of code needs to be JIT-compiled at start up, the time needed to compile the IL might be a substantial portion of the total app launch time (even for cold start up) . Eliminating JIT-compilation from start up can therefore result in warm as well as cold start up wins.
    • Frameworks, libraries, and other reusable components: Code produced by our JITs cannot be shared across multiple processes; NGen images, on the other hand, can be. Therefore any code that is likely to be used by multiple applications at the same time is likely to consume less memory when pre-compiled via NGen. Almost the entire Microsoft .NET Framework for instance, uses NGen. Also Microsoft Exchange Server pre-compiles its core DLLs that are shared across various Exchange services.
    • Applications running in terminal server environments: Once again NGen helps in such a scenario because the generated code can be shared across the different instances of the application – that in turn increases the number of simultaneous user sessions the server can support.

    When not to use NGen:

    • Small applications: Small utilities like caspol.exe in the .NET Framework aren’t NGen-ed because the time spent JIT-compiling the code is typically a small portion of the overall start up time. As a matter of fact, since NGen images are substantially bigger in size than the corresponding IL assemblies, using NGen might actually result in increased disk I/O and hurt cold start up time.
    • Server applications: Server applications that aren’t sensitive to long start up times, and don’t have shared components are unlikely to benefit significantly from NGen.  In such cases,  the cost of using NGen (more on this below) may not be worth the benefits. SQL-CLR for example, isn’t NGen-ed.

    Function inlining in the JIT’ter

    This is the term for when small methods are inserted directly at their call points to reduce the overhead of having to invoke a method. This is one of the best ways to improve performance in loops where small methods are repeatedly called.

    This is technically already done by the compiler however is limited by the size of the method itself and the number of times it is called – again there is a trade off between performance imrpovement and increased executable footprint.

    Good Practice: Always keep methods to the point. Smaller can sometimes be better in that the compiler can target inlining.

    From the outset of inlining, at one point you’ll ask yourself why the compiler, JIT, or runtime did or did not inline a certain method. Unless you worked on the compiler, JIT, or runtime, you really have no way of telling, other than trial and error.

    This was all changed in the CLR 4.  Now the end-user or programmer can see why the JIT, and in some cases the runtime decided to disallow inlining or tail calls.  This can also tell you when the JIT succeeded in inlining or tail calling a certain method.  All of this information can be found here: Event Tracing for Windows.

    An ETW provider was not cheap in terms of performance outlay, so the runtime only does it when requested to. On Vista and newer OSes, it is cheap enough to do all the time and you should not notice a performance hit, so you only need to request it on pre-Vista OSes.You request it by setting the following environment variable before running the application you are interested in:

    SET COMPLUS_ETWENABLED=1

    To start logging ETW events do this:

    logman start clrevents -p {e13c0d23-ccbc-4e12-931b-d9cc2eee27e4} 0x1000 5 -ets

    There are lots more options to tweak here, but the important part is the GUID (the CLR ETW provider GUID), the mask 0x1000 (the JitTracingKeyword), and the level 5 (everything).

    More information about logman.exe can be found at http://technet.microsoft.com/en-us/library/bb490956.aspx.

    After you’ve started ETW, run your scenario, and then stop ETW as follows:

    logman stop clrevents -ets

    This will create clrevents.etl.

    This can be dissected in order to determine the methods which were inlined.

    .Net Optimisations to the Large Object Heap

    The CLR manages two different heaps for allocation, the small object heap (SOH) and the large object heap (LOH). Any allocation greater than or equal to 85,000 bytes goes on the LOH. Copying large objects has a performance penalty, so the LOH is not compacted unlike the SOH. Another defining characteristic is that the LOH is only collected during a generation 2 collection. Together, these have the built-in assumption that large object allocations are infrequent.

    In .NET 4.5, significant improvement was made in the way the runtime manages the free list, thereby making more effective use of fragments. Now the memory allocator will revisit the memory fragments that earlier allocation couldn’t use.

    About

    Software engineer. Tea drinker

    http://MrPfister.com

    Leave a Reply

    Your email address will not be published. Required fields are marked *