I just installed Visual Studio 2008 beta 2 to see what the future holds for C#. The addition of LINQ has brought a variety of query keywords to the language. "Anything" can be queried; SQL databases (naturally), XML documents, and regular collections. Custom queryable objects can also be created by implementing IQueryable. Sadly, like every abstraction, these goodies all come at a cost. The question is how much?
I decided to create a simple test to see how much of a performance hit LINQ is. The simple test I deviced finds the numbers in an array that are less than 10.
Initially, I assumed the performance impact would not be too large, since its equivalent is the straightforward imperative loop, which should not be too hard for a compiler to deduce given static typing and a single collection to iterate across. Or?
LINQ: 00:00:04.1052060, avg. 00:00:00.0041052
Loop: 00:00:00.0790965, avg. 00:00:00.0000790
As you can see, the performance impact is huge. LINQ performs 50 times worse than the traditional loop! This seems rather wild at first glance, but the explanation is this: The keywords introduced by LINQ are syntactic sugar for method invocations to a set of generic routines for iterating across collections and filtering through lambda expressions. Naturally, this will not perform as good as a traditional imperative loop, and less optimization is possible.
Having seen the performance impact, I am still of the view that LINQ is a great step towards a more declarative world for developers. Instead of saying "take these numbers, iterate over all of them, and insert them into this list if they are less then ten", which is an informal description of a classical imperative loop, you can now say "from these numbers, give me those that are less than ten". The difference may be subtle, but the latter is in my opinion far more declarative and easy to read.
This may very well be the next big thing, but it comes at a cost. So far, my advice is to create simple performance tests for the cases where you consider adopting LINQ, to spot possible pitfalls as early as possible.
Iteration of arrays is highly optimized by the C# compiler and .Net JIT. So if all you are doing is a tiny bit of work over an array and it is performance critical then iterate the loop directly. LINQ comes in handy in many other situations, as there are a lot of other API's that expose collections that are not arrays.
ReplyDeleteI just ran across this and tried it out.
ReplyDeleteIt turns out that the culprit is the explicit mention of the "int" type in your query ("from int i in ..."). This translates into a call to the Cast() extension method which ends up boxing and unboxing each integer. If you remove the "int" from the query you'll see much better performance (about 2 times slower than the for loop). That's representative of the expected performance.
Explicit typing of a range variable in a query is really only necessary when querying older non-generic collection types (such as ArrayList). We'll give some thought to ways of optimizing away the Cast() overhead in cases where it isn't needed.
Use System.Diagnostics.Stopwatch instead of manually constructing the time checks, it is much easier.
ReplyDeleteNote that your input array should have been randomized or shuffled, it appears very predictive to the human eye, also, it does little justice to the sequential execution of the code because of the way branching will occur in one way the first 10 times, and the other way the other 10,000 times, which probably will affect the CPU's pipeline prediction, yielding even better performance to the for-loop.
Axel, thanks for the tip on Stopwatch.
ReplyDeleteI just tried running the test with a randomized input array, and randomized partitioning point. Neither had any noticeable effect on the test results.
[...] http://ox.no/posts/linq-vs-loop-a-performance-test [...]
ReplyDeleteHey great post. I was wondering the same thing. Some of my colleagues and I were pitching the idea around of "how exactly are they doing the query under the hood?".
ReplyDeleteI was hoping that at least it was translating it into the same IL as a nested foreach; the idea that it's slower is terrible. Booo!
Pavol - the ToArray appears to take so long as the query is only executed at this point. Deferred query execution in LINQ means the query is only executed when you iterate through the query result (e.g. by calling ToArray). declaring and initialising the query doesn't execute the query, but this initialisation probably accounts for the shorter time (2ms) you've seen.
ReplyDeleteWow, building in release, Loop is 10x faster than Linq without doing the int cast.
ReplyDeleteI ran the same test and didn't get the same results
ReplyDeleteLinq .20
Loop .08
Brian: Interesting, with 3.5 SP1 I get the following results:
ReplyDeleteWith explicit int:
Linq .446 Loop .231
And without the explicit int:
Linq .431 Loop .256
This could indicate that Cast() is optimized away in SP1, but I have not verified this.
All the times in ms.
ReplyDeleteI run the tests as Pavol suggested.
LINQ: 1, avg. 24
Loop: 92, avg. 1322
from int i / from i, There is no much difference
LINQ: 247, avg. 3803
Loop: 89, avg. 1286
If you run the compiled verision of the application (.exe or .dll), LINQ actually runs faster
ReplyDeleterunning out of VS is not a valid test and should not be compared.
This function does nothing, and thus benchmarking a compiled version is dangerous (so is benchmarking a debug version, but less so). A compiler will happily say, "This loop does nothing! Let's skip it!" I'd be hesitant to trust any benchmarks that are not in use on production code...though they're a nice tool for developers of compilers to use to check if their optimizations are working.
ReplyDeleteMy code for Linq:
ReplyDeletefor (int t = 0; t < RUNS; ++t)
{
int[] less = ints.Where(i => i < 10).ToArray();
}
My result:
LINQ: 00:00:00.0936002, avg. 00:00:00.0000936
Loop: 00:00:00.1092002, avg. 00:00:00.0001092
Thanks! I was getting 50% of CPU usage by writing file to a Stream, like so:
ReplyDeletebuffer = formData.Skip(offset).Take(buffer.Length).ToArray();
After reading your comments, I've rewritten my code to use loop and now CPU is quiet as ever :)
[...] http://ox.no/posts/linq-vs-loop-a-performance-test [...]
ReplyDeleteBah Linq, ever thought of what your going to do if you wish to port your application to a different language that doesn't use Lamda expressions or Linq! youre stuffed! there really is no difference between linq and a for loop. If it was faster than a for loop I would think its cool, but its not and it's also harder to look at, understand and debug. It's just a bunch a snobs trying to be smarter and cleverer than everyone else, Wow! look at the weird and cryptic linq expression I can write! it's slower and harder to understand and if I ever want to port my code to objective C++ or PHP or something. I can just sit on it, and go wow, I thought it was cool but its slower and really just microsofts way of keeping you locked into their product! So THERE!
ReplyDelete[...] [...]
ReplyDelete[...] años y en ese tiempo esta herramienta ha mejorado bastante, basado en el código que encontré en una publicación del 2007 realice varias pruebas ajustando el código de la siguiente [...]
ReplyDelete