.ToString(theory);

My travels in the .Net underworld

  Home  |   Contact  |   Syndication    |   Login
  19 Posts | 0 Stories | 0 Trackbacks

News

Article Categories

Archives

Post Categories

One of my favorite features in .Net 4 is the addition of the PLINQ extensions off the IEnumerable<T> for parallel queries of type ParallelQuery<T>.  Even after talking to bunches of developers, it still seems to be one of the lesser known updates or packs of extension methods to .Net. 

PLINQ allows for an easy way to dramatically increase the speed at which a query will run when the capability to run faster exists.  Even though this great feature set exists, as always there are some gotchas to keep in mind before you start using it everywhere.

How to Use PLINQ

Thankfully, the PLINQ extension set is extremely simple to use.  Have an IEnumerable<T>?  Simply tack .AsParallel() after the collection, and your good to go:

var people = new List<People>();

//imagine code here that adds a whole lot of people to the List

//now, lets go Parallel people!

var parallelPeople = people.AsParallel();

That’s it! Now, parallelPeople will now dynamically scale to the most efficient number of threads/processes as guesstimated by some quick checks in .Net.  In other words, run the query on a single core non-multithreaded, system, chances are this will run at its lowest efficiency.  Up this to a 6 core multithreaded PC, and chances are you will end up with multiple threads executing your query – if it calls for it.

See, just because you have the multiple cores doesn’t always mean that it is most efficient to run a query across all of them.  What if your list only has 1 or 2 objects in it?  Is it really smart to try and scale that up to being processed by 6 separate threads, or even 2 threads?  There is quite a bit of overhead in spinning up new threads and executing them, so PLINQ does some quick estimations and analyses on your query and collection to determine if it should even try to run it as a parallel task.  Pretty neat right?!

Another thing to note is that after calling AsParallel(), you will notice that you actually have the ability to call almost every single LINQ extension still, but in parallel!  Some may have different signatures that are more logical, such as renaming ForEach() off of a List instance, to a ForAll() for the ParallelQuery.

A Practical Example

Lets say that in your application, you need to do some operation on all of your Users in a system – maybe as part of a weekly batch operation for calculating statistics or something.  So, the first thing you would want to do is to grab all of the users from your system…  Assuming something like a Entity Framework context:

List<User> users = null;
using (YourAwesomeDbContext context = new YourAwesomeDbContext())
{
	users = context.Users.ToList();
}

You may have noticed, the first thing I did here was to declare a List<User> to hold my users from the database in, then I constructed the context and called ToList() on the Users DbSet.  Why you may ask?  The main reason is that the extension for AsParallel() is for an IEnumerable<T>…  It expects that the collection that it is called on is an object collection, so if you run it on query that you are constructing on your context, you can end up with the unintentional side effect of pulling an entire set of data into memory, that you didn’t mean to.  I’m not going to cover those details exactly here, but check out this post for a better overview of what I mean.

Now that you have the list of users, now you can perform some operations with them:

users.AsParallel().ForAll(u=> {
	//do some extensive operation for a user
});

And that is all it takes to perform an operation, in parallel, on a list!

Conclusion

The .Net PLINQ extensions can be a great addition to your every day toolset, however as the old adage goes – “With great power comes great responsibility”.  My point is, don’t just go jump in and change ALL of your operations on lists and IEnumerable’s to PLINQ extensions.  Look at the surrounding code.  Are you sure that you are operating on an in-memory set already, or is the set still in the database/context?  Should you really call AsParallel() at this point, or can you add a few filters first so you aren’t needlessly pulling down full collections, when you actually only need to iterate through 20%. 

All in all though,  PLINQ is a great thing to keep in mind when coding to jumpstart an operation/filtering/ordering to super proportions…  I have actually coded items from using a ForEach() extension of of List, to an AsParallel().ForAll( { } ), and seen performance change from 60+ seconds, to around 3.  YMMV.

posted on Sunday, March 10, 2013 12:55 AM

Feedback

comments powered by Disqus