I managed to offend a few Erlangers by talking about actors as being shared stateful objects a couple of articles back. Now I'm going to continue the tradition by talking about the actor locking model. And I'm not just talking about Erlang, but any of the libraries that have sprung up for other languages to support actor style concurrency. Let me emphasize again that I like both Erlang and actors. But I dislike the number of times I read from enthusiasts that you "can't have deadlocks in Erlang because it doesn't have any locks." Of course you can because it does. This article will explain.
Locks
There are all manner of locks in the world: mutexes, monitors, synchronization barriers, read/write locks etc., etc., yada yada yada. All of them boil down to a spot in the program where if conditions aren't good-to-go the currently executing thread stops executing until some other thread (or threads) change the condition the lock is waiting on[1]. Erlang has a locking primitive, too. It stops the execution of the current thread and waits for the right conditions to move on. The primitive is called (drum roll please)...'receive'.
"But, but, 'receive' receives messages - it's not a lock." Sure, it receives messages. True. But it also brings the currently running thread to a screeching halt until it receives a message it wants to handle. "receive foo -> doSomething()" waits until the atom "foo" is at the front of the current process's mailbox - any other message gets thrown to the back of a "this receive can't handle that message" queue for processing by some other receive. In other words "receive foo ->" is a lock that waits for the condition "foo at head of mailbox queue" before it unlocks.
Deadlocks
A deadlock is when there's a cycle in the locking dependencies of 2 or more threads. Thread 1 is locked waiting for condition 1 after which it will set condition 2, Thread 2 is waiting for condition 2 after which it will set condition 3, etc until you get Thread n which is waiting for condition n after which it will set condition 1. Everybody's waiting on somebody else to go first so nobody can go.
Based on this, a deadlock in Erlang is easy to create. Just spawn two actors that are each waiting for a message from the other
Foo = spawn(fun() -> receive {From, foo} -> From ! {self(), bar}, io:format("foo done~n") end end). Bar = spawn(fun() -> receive {From, bar} -> From ! {self(), foo}, io:format("bar done~n") end end).
If a running system doesn't have any other actors that will send Foo or Bar the right message then they are deadlocked until some poor sleepy eyed admin comes along to break the deadlock with "Foo ! {Bar, foo}". Now, in this case the deadlock is not a big deal. The resources these actors consume are pretty tiny and they don't really do anything useful even when they unlock. But if these actors were meant to do something actually useful such a deadlock could seriously impair the functionality of a system.
This was a toy example, of course. It's easy to see the deadlock. In a "real world" large scale application it might not be so obvious where the deadlocks lurk. Actor's can end up in very complicated, dynamically changing relationships. Couple that with the possibility for race conditions and you'll see that deadlock is a real concern. Erlang style actors make concurrency much easier to deal with, but they don't remove the need for careful design and testing.
That's pretty much all I really wanted to say in this article, but I can't feel my work here is done until I've committed further atrocities against Erlang.
Perverting Erlang
A couple of posts ago I had a problem with race conditions on a shared stateful actor. Here's what it looked like.
1> c(sockmachine). {ok,sockmachine} 2> Machine = sockmachine:start(). <0.38.0> 3> sockmachine:test(Machine, 2). all test processes launched [2] BONUS COIN! [1] BONUS COIN! ok [2] Got my socks. [1] Blasted machine ate my money! Give it to me! Rattle, rattle, ARRRGGHGHHRHRHGH [2] BONUS COIN! [1] BONUS COIN! [2] Got my socks. [1] Blasted machine ate my money! Give it to me! Rattle, rattle, ARRRGGHGHHRHRHGH [2] BONUS COIN! [1] BONUS COIN! etc.
While there are many good solutions, one evil solution is to use a mutex. You see, one well known aspect of concurrent programming is that once you have one locking primitive, many others are pretty easy to build. To the perversion mobile!
-module(mutex). -export([create/0, lock/1, unlock/1]). create() -> spawn(fun() -> unlocked() end). lock(Mutex) -> Mutex ! {lock, self()}, receive ok -> ok end. unlock(Lock) -> Lock ! {unlock, self()}. unlocked() -> receive {lock, LockedBy} -> LockedBy ! ok, locked(LockedBy, 1) end. locked(LockedBy, Count) -> receive {unlock, LockedBy} -> if Count == 1 -> unlocked(); true -> locked(LockedBy, Count - 1) end; {lock, LockedBy} -> LockedBy ! ok, locked(LockedBy, Count + 1) end.
When a Mutex actor is the unlocked state then any actor can tell it to lock. Once in the locked state it won't unlock until the first actor tells it to. [2] The counting business is so that one process can lock a mutex several times and then must unlock the mutex the same number of times before it is considered unlocked.
And now a bit of test code. I've modified the original sockmachine test code so that before inserting coins and pushing buttons a testloop must acquire the lock on a mutex and once done must release it.
-module(mutextest). -export([test/3]). test(Machine, Mutex, Processes) -> if Processes > 0 -> spawn(fun() -> testloop(Processes, Machine, Mutex, 10) end), test(Machine, Mutex, Processes - 1); true -> io:format("all test processes launched~n") end. testloop(Process, Machine, Mutex, Count) -> if Count > 0 -> mutex:lock(Mutex), testzerocoins(Process, Machine), mutex:unlock(Mutex), testloop(Process, Machine, Mutex, Count - 1); true -> io:format("[~w] testing completed~n", [Process]) end. testzerocoins(Process, Machine) -> case sockmachine:insertcoin(Machine) of {nothing} -> testonecoin(Process, Machine); {coin} -> io:format("[~w] BONUS COIN!~n", [Process]), testtwocoins(Process, Machine) end. testonecoin(Process, Machine) -> case sockmachine:insertcoin(Machine) of {nothing} -> testtwocoins(Process, Machine); {coin} -> io:format("[~w] BONUS COIN!~n", [Process]), testtwocoins(Process, Machine) end. testtwocoins(Process, Machine) -> case sockmachine:pushbutton(Machine) of {tubeofsocks} -> io:format("[~w] Got my socks.~n", [Process]); {nothing} -> io:format("[~w] Blasted machine ate my money! Give it to me! Rattle, rattle, ARRRGGHGHHRHRHGH~n", [Process]) end.
And here's what it looks like when I run all this
4> c(mutex). {ok,mutex} 5> c(mutextest). {ok,mutextest} 6> Mutex = mutex:create(). <0.53.0> 7> mutextest:test(Machine, Mutex, 2). all test processes launched ok [2] Got my socks. [1] Got my socks. [2] Got my socks. [1] Got my socks. [2] Got my socks. [1] Got my socks. [2] Got my socks. [1] Got my socks. [2] Got my socks. [1] Got my socks. [2] Got my socks. [1] Got my socks. [2] Got my socks. [1] Got my socks. [2] Got my socks. [1] Got my socks. [2] Got my socks. [1] Got my socks. [2] Got my socks. [2] testing completed [1] Got my socks. [1] testing completed
Ta da! Fixed my problem!
Kids, Don't Try This At Home
There are better ways to solve this kind of problem with actors. Seriously. Why use a custom mutex when Erlang has a lock built in as described above? Just change the sock machine to not be so stateful, i.e. make it take one message "{From, coin, coin, pushbutton}" and do all the work necessary before returning the tube of socks. Particularly nasty about this manual mutex code is that it's hard to ensure that all code paths will unlock a mutex. If my test code had thrown an exception at the wrong point then the mutex would be left locked. So don't do this. Unless you're evil. Like me.
Conclusions
Erlang's a great language. Actors are a great model. But they are neither race nor deadlock free. That's because, fundamentally, they're all about shared mutable state and locking. If after all of these articles you still don't believe me or you think I'm using the words wrong then please send your comments to dev/null.
In my next article I'll try to pervert Erlang for good rather than for evil and maybe try to explain the frequent observation that "yeah, races and deadlocks can happen with Erlang, but it seems less common than with more traditional shared memory/manual locking mechanisms."
Footnotes
- Erlangers use the word "process" because of the memory isolation enforced between different actors. I'm using "thread" here to avoid confusion with OS processes. But even here there's confusion because I mean logical threads which may or may not be the same as hardware threads. I've seen a few libraries call this concept "sparks."
- Or until some third actor spoofs an unlock message from the locking actor.