I was a happy java developer until I stated coding in scala.
In this post, I’ll try to convince you to move to scala by showing you how much easier it is to write concurrent & async programs in scala than in java.
Problem Statement - Word Counting
We will write a program to get files and word count in each file from a given directory. The program will essentially do the following things.
- Get all files in the directory
- Count words in all the files
- Return a Map of file name and total word count.
Solution 1 - Java Serial Implementation
A very simple implementation of java code should look like this.
Nothing fancy here, method getWordCountForFiles(String path)
gets the job done sequentially for each file.
Solution 2 - Java Concurrent Implementation
Now after this gets rejected in the code review. We go back to our desk and use some java concurrency constructs so that the solution scales.
To do that in java we all know what to do - use the Callable
and give it to our favourite ExecutorService
and count the words for all files in parallel.
So the getWordCountForFiles
method now creates a executor service, invokes our callable and then collects the results from each one of them.
Below code should be easy to understand and relate to for every java programmer. [ I have tried to use java 8 features so as to keep the comparison as honest as possible.]
As of last year I would be happy with the above code. It has minimal boiler plate, I use some of the fancy Java 8 features and now its doing the tasks concurrently so performance is also good.
But not today, Scala has spoilt me. I don’t want to write so much code just to make it concurrent.
Solution 3 - Scala Futures FTW!
The central drive behind Scala is to make life easier and more productive for the developer. – Martin Odersky
Let me introduce you to scala Future
A Future is an object that can hold a value that may become available, as its name suggests, at a later time. It essentially acts as proxy to an actual value that does not yet exist.
To execute something async all I need to do is wrap it inside a Future.
//someFuture will hold the result of the computation and T represents the type of the result.
def someFuture[T]: Future[T] = Future {
someExpensiveComputation()
}
Yes that is it, you can read more on how it works in detail here
Now lets see how we would implement this word count program in scala and how different it would look.
The above program is almost half the size of what our java program looks like and every method of it is async.
Method getWordCountForFiles
uses for comprehension. For-comprehensions can be used to register new callbacks (to post new things to do) when the future is completed, i.e., when the computation is finished.
So it invokes getFilesList
first and once the result of getFilesList
is available processFiles
is invoked with the list of files. The result of processFiles
is returned once it is available.
The methods getWordCount
and processFiles
are exactly similar to our java implementation. Just that they are wrapped in a Future which saves me from writing all the executor service code.
We just need to specify the ExecutionContext. ExecutionContext is an abstraction over a thread pool that is responsible for executing all the tasks submitted to it. Notice the import ExecutionContext.Implicits.global
, we specify the default global execution context available in the Scala library.
Isn’t this so much better than the Java implementation?
Testing Async programs
If you are happy with the scala code, you must be wondering how to write test cases for such async code. Don’t worry the scala has got you covered and its very easy.
We use scala test library. It has got whenReady
construct which can be used like below to test the program we just wrote.
class WordCountWithFutureSpec extends WordSpec with MustMatchers with ScalaFutures {
//Notice how readable tests are.
"A word count helper" should {
"return correct number of files and the word count" in {
val resultFuture = WordCountWithFuture.getFileCount("src/main/resources/")
whenReady(resultFuture){ result =>
result.size mustEqual 2
result.get("File1.txt") mustBe Some(6480000)
result.get("File2.txt") mustBe Some(1000000)
}
}
}
}
Code
Both java and scala code with tests can be found here -
Actor Model
The Actor Model provides a higher level of abstraction for writing concurrent and distributed systems. It alleviates the developer from having to deal with explicit locking and thread management, making it easier to write correct concurrent and parallel systems.
In my next post I have implemented the same program with the actor model using Akka and see how much easier concurrency can get.