Almost a year ago, I wanted to see how good Akka was, and how easy it would be to create a distributed hash with Akka Clustering…
So I created a sample project to play with it a little bit (https://github.com/roclas/akka-distributed-hash – I actually changed a couple of things just before writing this post).
It consist of a distributed key-value registry supported by different akka instances with the same code launched on different nodes.
The idea behind it is very straight forward: you run the application a few times simultaneously (passing a different port number as a parameter) and once the first node (if it is in the seeds list) has started, you can connect as many nodes to the cluster as you want. Nodes (in this case, actors) have a hash that is modified through HTTP requests, and that hash’s changes are propagated across the cluster, so that every node ends up having the same information.
The seed nodes are on src/main/resources/application.conf
When the application starts, one of the first thing it does is trying to join the cluster. In order to do that, it goes through the list of seeds and tries to connect to the first available node.
A very interesting fact is that if the node is the first element of the seeds list, and it is not the first node we start, we will most likely end up having an island in the cluster (when it finds itself as the first element of the seeds list, it will connect to itself and will create an island). The previously started nodes will be in their own sub-cluster, and the subsequently created nodes will get connected to this node, creating an isolated second sub-cluster.
I have avoided this problem by doing a trick (maybe you could find a more elegant way?):
val selfAddress = Cluster(system).selfAddress val seeds= system.settings .config.getList("akka.cluster.seed-nodes") System.setProperty("akka.cluster.seed-nodes", seeds filter(!_.toString.contains(selfAddress.toString)) toString)
As a general rule, we could say it is not a bad idea to remove your own address from the seeds list before joining the cluster (by doing that, you would not have any problem; but I did not want to keep a different config file for each node I create and that’s why each of them programmatically removes itself from the seeds list).
Once the actor is connected to the cluster, I can receive cluster messages such as:
MemberUp (when a new node joins the cluster) MemberRemoved (when a node leaves the cluster)
Each actor will have a hash object and a tiny HTTP server (that will respond to get, put and delete requests). Put requests will add elements to the actor’s internal hash, and remove requests will delete them.
And from this point on, you can let your imagination run wild:
- Every time we modify an actor’s hash, it will send a message notifying its peers of the change, so that they can change their hash too
- After actualizing the hash, by another actors request, it sends a message back with its hash’s md5 so that the original actor can check if the changes were successfully made
- When a new member joins the cluster, it synchronizes its state (its hash) with the other nodes
Please download the zip or clone (firstname.lastname@example.org:roclas/akka-distributed-hash.git) the project, and try it yourselves; it is so easy to try and very fun to play with. It’s a very simple example that you can use as a base for more complex projects. Everything is explained on the README.md file.
Also, don’t forget to tell us of useful things you think you could do with Akka Clustering. Fork, use and improve the project, and suggest us things we could do with this awesome technology.