Toub’s Parallel Cache pattern for C#

June 16, 2012

I’ve recently thanked Steve Toub again for his clear explanation some time back of what I call the “parallel cache pattern”. He says that something similar is in his whitepaper, but I wanted to post a brief description here, if only so that I can return to it easily when wanting to remind myself of it!

The basic need: create a dictionary cache of objects, each of which is expensive to create, in a parallel environment.

You might think that this is built into the Task Parallel Library of .NET 4+. For example:

(Edit: The less-than and greater-than template symbols of C# have been replaced in this post by [ and ], because the former breaks the HTML of the WordPress page.)

ConcurrentDictionary[Foo, Bar] cache = new ConcurrentDictionary[Foo, Bar]();
Bar GetBar(Foo foo) {return cache.GetOrAdd(foo, f => CreateBar(f));}
Bar CreateBar(Foo foo) {return [expensive creation here];}

This is fine if multiple threads tend to initially hit the cache with different keys, because each thread will go off and create a Bar for a different key Foo, and eventually they’ll all be added to the cache (the blocking only occurs for these additions). In cases where two or more threads happen to have created the Bar for the same Foo key foo, all but the first will effectively be discarded (the ConcurrentDictionary doesn’t need a second opinion!).

There is a problem, however, if multiple threads all tend to want the Bar corresponding to the same Foo key foo at the same time. (I frequently find this to be the case, but maybe my use cases are abnormal.) Then all of the threads are told by the ConcurrentDictionary that it doesn’t yet exist, and they all go off and create it. Since we’re assuming that these Bar objects are expensive to create, this takes a long time (in machine cycles) to occur. What effectively happens is that every core gets tied up with a thread that is creating the same Bar object. When they all finish, all but one of them finds, to their disappointment (if threads can be disappointed), that their effort is all for naught: someone else has already given the ConcurrentDictionary the Bar object for that Foo key foo.

The net result is that the code appears to be running nicely in parallel (all the CPUs are lit), but it’s no faster than the non-parallel code. All that you’ve achieved is to burn your other CPUs for no net effect.

What you really want to occur is for only one thread to go off and create that Bar, with all the other threads blocking and yielding to other iterations in your Parallel.For or Parallel.ForEach statement.

Now, if your use case is such that each and every such iteration requires that one Bar object, then they will all eventually block, until that first thread is finished creating it. You haven’t achieved anything over a serial implementation, but at least your CPUs are free to do other things (in another parallel branch of your program, or for something else on your system), and you haven’t been fooled by lit CPUs into thinking that you’ve gained anything from the parallelism. And once that Bar has been created, all those blocked iterations will be resurrected, and can run off in parallel doing whatever they need to do with it. (Unless they all need a second one, in which case the process repeats.)

In my typical use cases, however, eventually some iterations are activated that require different Bar objects. As long as there exist as many iterations in the list that won’t be blocked by requiring that first Bar object as you have CPUs, then they will all eventually get to work creating different Bar objects. Your CPUs will be lit up, productively.

So what is Toub’s trick? Simply this observation: if the ConcurrentDictionary stores not Bar objects, but rather Lazy[Bar] objects, then the initializer of the Lazy[Bar] will automatically achieve what we want.

Basically, it works because the Lazy[Bar] constructor of the first thread returns very quickly: the Lazy[Bar] object is added to the collection almost instantaneously. So rather than all the other threads finding that the cupboard is bare, their wishes are instead fulfilled. They are all given the Lazy[Bar] object. (There is a vanishingly small probability that two or more will ask for the object in the short time it takes for the Lazy[Bar] constructor to complete.)

So how have cheated the expensive creation process? Basically, when all of those other threads “open their presents” (the Lazy[Bar] object), they effectively find a note telling them to wait until it’s ready, because (by default) Lazy[T] is thread-safe. So they all block, yielding to other parallel iterations, until the first thread has created the Bar object.

Magic! And it works really well.

So here’s the Toub generalization of the above example:

ConcurrentDictionary[Foo, Lazy[Bar]] cache = new ConcurrentDictionary[Foo, Lazy[Bar]]();
Bar GetBar(Foo foo) {return cache.GetOrAdd(foo, f => new Lazy[Bar](() => CreateBar(f))).Value;}
Bar CreateBar(Foo foo) {return [expensive creation here];}

Thanks Steve!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: