08 August, 2007

LINQ vs Loop - A performance test

I just installed Visual Studio 2008 beta 2 to see what the future holds for C#. The addition of LINQ has brought a variety of query keywords to the language. "Anything" can be queried; SQL databases (naturally), XML documents, and regular collections. Custom queryable objects can also be created by implementing IQueryable. Sadly, like every abstraction, these goodies all come at a cost. The question is how much?

I decided to create a simple test to see how much of a performance hit LINQ is. The simple test I deviced finds the numbers in an array that are less than 10.

Initially, I assumed the performance impact would not be too large, since its equivalent is the straightforward imperative loop, which should not be too hard for a compiler to deduce given static typing and a single collection to iterate across. Or?

    LINQ: 00:00:04.1052060, avg. 00:00:00.0041052
    Loop: 00:00:00.0790965, avg. 00:00:00.0000790


As you can see, the performance impact is huge. LINQ performs 50 times worse than the traditional loop! This seems rather wild at first glance, but the explanation is this: The keywords introduced by LINQ are syntactic sugar for method invocations to a set of generic routines for iterating across collections and filtering through lambda expressions. Naturally, this will not perform as good as a traditional imperative loop, and less optimization is possible.

Having seen the performance impact, I am still of the view that LINQ is a great step towards a more declarative world for developers. Instead of saying "take these numbers, iterate over all of them, and insert them into this list if they are less then ten", which is an informal description of a classical imperative loop, you can now say "from these numbers, give me those that are less than ten". The difference may be subtle, but the latter is in my opinion far more declarative and easy to read.

This may very well be the next big thing, but it comes at a cost. So far, my advice is to create simple performance tests for the cases where you consider adopting LINQ, to spot possible pitfalls as early as possible.

20 comments:

  1. Iteration of arrays is highly optimized by the C# compiler and .Net JIT. So if all you are doing is a tiny bit of work over an array and it is performance critical then iterate the loop directly. LINQ comes in handy in many other situations, as there are a lot of other API's that expose collections that are not arrays.

    ReplyDelete
  2. Anders Hejlsberg31 October, 2007 02:27

    I just ran across this and tried it out.

    It turns out that the culprit is the explicit mention of the "int" type in your query ("from int i in ..."). This translates into a call to the Cast() extension method which ends up boxing and unboxing each integer. If you remove the "int" from the query you'll see much better performance (about 2 times slower than the for loop). That's representative of the expected performance.

    Explicit typing of a range variable in a query is really only necessary when querying older non-generic collection types (such as ArrayList). We'll give some thought to ways of optimizing away the Cast() overhead in cases where it isn't needed.

    ReplyDelete
  3. Use System.Diagnostics.Stopwatch instead of manually constructing the time checks, it is much easier.
    Note that your input array should have been randomized or shuffled, it appears very predictive to the human eye, also, it does little justice to the sequential execution of the code because of the way branching will occur in one way the first 10 times, and the other way the other 10,000 times, which probably will affect the CPU's pipeline prediction, yielding even better performance to the for-loop.

    ReplyDelete
  4. Axel, thanks for the tip on Stopwatch.

    I just tried running the test with a randomized input array, and randomized partitioning point. Neither had any noticeable effect on the test results.

    ReplyDelete
  5. [...] http://ox.no/posts/linq-vs-loop-a-performance-test [...]

    ReplyDelete
  6. Hey great post. I was wondering the same thing. Some of my colleagues and I were pitching the idea around of "how exactly are they doing the query under the hood?".

    I was hoping that at least it was translating it into the same IL as a nested foreach; the idea that it's slower is terrible. Booo!

    ReplyDelete
  7. Pavol - the ToArray appears to take so long as the query is only executed at this point. Deferred query execution in LINQ means the query is only executed when you iterate through the query result (e.g. by calling ToArray). declaring and initialising the query doesn't execute the query, but this initialisation probably accounts for the shorter time (2ms) you've seen.

    ReplyDelete
  8. Wow, building in release, Loop is 10x faster than Linq without doing the int cast.

    ReplyDelete
  9. I ran the same test and didn't get the same results

    Linq .20
    Loop .08

    ReplyDelete
  10. Brian: Interesting, with 3.5 SP1 I get the following results:

    With explicit int:
    Linq .446 Loop .231

    And without the explicit int:
    Linq .431 Loop .256

    This could indicate that Cast() is optimized away in SP1, but I have not verified this.

    ReplyDelete
  11. All the times in ms.

    I run the tests as Pavol suggested.

    LINQ: 1, avg. 24
    Loop: 92, avg. 1322

    from int i / from i, There is no much difference

    LINQ: 247, avg. 3803
    Loop: 89, avg. 1286

    ReplyDelete
  12. If you run the compiled verision of the application (.exe or .dll), LINQ actually runs faster


    running out of VS is not a valid test and should not be compared.

    ReplyDelete
  13. This function does nothing, and thus benchmarking a compiled version is dangerous (so is benchmarking a debug version, but less so). A compiler will happily say, "This loop does nothing! Let's skip it!" I'd be hesitant to trust any benchmarks that are not in use on production code...though they're a nice tool for developers of compilers to use to check if their optimizations are working.

    ReplyDelete
  14. My code for Linq:

    for (int t = 0; t < RUNS; ++t)
    {
    int[] less = ints.Where(i => i < 10).ToArray();
    }

    My result:
    LINQ: 00:00:00.0936002, avg. 00:00:00.0000936
    Loop: 00:00:00.1092002, avg. 00:00:00.0001092

    ReplyDelete
  15. Thanks! I was getting 50% of CPU usage by writing file to a Stream, like so:

    buffer = formData.Skip(offset).Take(buffer.Length).ToArray();

    After reading your comments, I've rewritten my code to use loop and now CPU is quiet as ever :)

    ReplyDelete
  16. [...] http://ox.no/posts/linq-vs-loop-a-performance-test [...]

    ReplyDelete
  17. LinqTrialerandHater18 March, 2011 07:14

    Bah Linq, ever thought of what your going to do if you wish to port your application to a different language that doesn't use Lamda expressions or Linq! youre stuffed! there really is no difference between linq and a for loop. If it was faster than a for loop I would think its cool, but its not and it's also harder to look at, understand and debug. It's just a bunch a snobs trying to be smarter and cleverer than everyone else, Wow! look at the weird and cryptic linq expression I can write! it's slower and harder to understand and if I ever want to port my code to objective C++ or PHP or something. I can just sit on it, and go wow, I thought it was cool but its slower and really just microsofts way of keeping you locked into their product! So THERE!

    ReplyDelete
  18. The LINQ query object still is used underneath the enumerable object when looping. This means the query is called repeatedly when looping. Take a look at these two articles http://allthingscs.blogspot.com/2011/03/linq-performance.html and http://odetocode.com/code/737.aspx

    ReplyDelete
  19. [...] años y en ese tiempo esta herramienta ha mejorado bastante, basado en el código que encontré en una publicación del 2007 realice varias pruebas ajustando el código de la siguiente [...]

    ReplyDelete