After working with Linq-to-objects, I started thinking about how this tool could work in the wrong hands. At its simplest, a seemingly elegant query could easily turn into a CPU hog if the underlying data structure isn’t organized well. At its best, how will these queries make use of the underlying data structures?
What to do? Write some code and get some numbers to compare.
What I found was interesting, eye-opening and opened more questions for another day. I set out to compare Linq performance over a reasonably well-optimized data structure by varying the where clause composition and also comparing this to code going directly after the data structure in typical pre-Linq fashion.
What I learned was fascinating – let’s take a look at the code
Given a dictionary defined as:
private Dictionary<int, string> _lookupList = new Dictionary<int, string>();
I populate it using a loop where the upper limit is a const I can vary:
for (int x = 0; x < _listSize; x++ )
_lookupList.Add(x, string.Format(“item#{0}”, x));
I can then use Linq to query the item dictionary with something like this:
string[] myResult = (from i in _lookupList
where i.Key >= lowerLimit && i.Key <= upperLimit && i.Value.Contains(“1″)
select i.Value).ToArray();
Let’s take three variations as follows:
1) where i.Key >= lowerLimit && i.Key <= upperLimit && i.Value.Contains(“1″)
2) where i.Value.Contains(“1″) && i.Key >= lowerLimit && i.Key <= upperLimit
3) Hand-code the query to go against the data structure to compare performance.
To make sure I varied the access in different areas of the dictionary, the query was performed against 10 discrete key values in the beginning, middle and end of the dictionary. Performance was consistent across each item, showing that Linq was utilizing the datastructure to some degree – as expected. The .Contains method further reduced the return set to range from 1-2 items depending on where the midpoint was calculated. These test sets were run 10 times and the averages taken.
The results:
Item#1 – averaged 2643 ms
Item#2 – averaged 671 ms
Item#3 – averaged <1 ms
Observations:
The difference between #1 and #2 is a factor of 4. I am not sure how this actually performed but I suspect that item #2 scanned the entire contents of the dictionary. It is hard to tell since a test where you eliminate the where clause results in a very different memory allocation pattern and therefore inherently larger timing numbers.
The difference between #3 and the other items is staggering. Keep in mind, I tried to handicap this with a try/catch block around the indexer into the dictionary, stored the matching values in a List<string> and then performed a List<string>.ToArray() just for good measure.
That’s eye popping – it’s more than two factors of 10 faster than the best optimized Linq query and more than three factors of 10 faster than the worst optimized.
Bottom line – despite my best efforts I just couldn’t make my hand-written code perform as poorly as Linq.
Summary:
So what did I learn?
· Variations on where-clause construction have an implied order when considering performance. What are these implied rules exactly? I’d love to know.
· An entirely different test would have to be constructed to understand join performance.
· How does the [Indexable] attribute play into optimization choices and what are the performance considerations of that attribute?
· When doing intensive object querying, Linq for Objects needs to be eyed very carefully in context of the application.
+++ Addition per Comment
Here is the code that Jimmy Bogard requested. As noted in my post, I added exception handling and chose to use a List<string> to gather the match values and finally convert to an array of strings just as the Linq function did. If anything, I am going for the most conservative path in this case. The call to this function is responsible for time measurement.
private void HandCodedAccessorFunction(int lowerLimit, int upperLimit)
{
List<string> result = new List<string>();
int y = 0;
for (int x = lowerLimit; x <= upperLimit; x++)
{
try
{
if (_lookupList[x].Contains(“1″))
{
result.Add(_lookupList[x]);
y++;
}
}
catch (Exception)
{
Console.Writeline(“Exception thrown”);
}
}
string[] ar = result.ToArray();
Console.WriteLine(y);
}