Resources are good, aren't they? Too bad they never seem to be enough, because as soon as we get them we immediately find the way to exhaust them all. That's an interesting side-effect of human ingenium and an amusing catch 22 situation.
Some resources are then shared, some need to be shared (how often did you make a wish that roads and streets could be only yours and that there's nobody else in the way? Our planet is too small, or there are too many of us, for everyone to have his own streets dude, give up!). All of them need to be used wisely and that's not limited to IT (hi, eco-sensible folks!).
In IT, OSs and low-level layers on top of them were born precisely as platforms to effectively share computational and storage resources between concurrent applications. The trend today is to have general-purpose OSs but unfortunately optimal resources usage highly depends on the characteristics of the workload. Some OSs are more flexible than others in being customized for specific usages (e.g. because they're lighter or more sophisticated or because strategies can be plugged in or through configuration) but in some cases their generic support is just not good enough, or a new additional feature would be meeded for a specific application usage. In these cases, if OS improvement is not going to happen timely, an additional middleware or runtime support layer might solve the problem.
These are interesting times. The unbelievably fast evolution of our software engineering culture continuously brings down barriers, costs and time required to build applications and networked computing is becoming more and more pervasive in our lives, just like an infinitely abundant resource users can tap into whenever and wherever they decide to.
As a side effect, any networked application has a potentially huge amount of data and concurrent users that it must be able to cope with in case of success. Problem is, networked applications have traditionally been a centralized affair and an intensive one too for our poor server, as thin client technologies such as the Web have gained ubiquitous popularity at the expense of rich client ones. Too bad hardware can't keep up anymore so, simply put, Internet applications need to be able scale more than ever.
What does scalable application mean though? Given a workload measured in some way and assuming we have an acceptable Quality of Service (QoS) in terms of availability, latency and throughput for all functions in our application, being able to scale up effectively means retaining about the same QoS with n times the workload through an investment in hardware and off-the-shelf software infrastracture that is linear, i.e. about n times the initial one. The application code should be able to run unmodified, except perhaps for minor things like some configuration parameters; plus, application downtime during the infrastructure change should be 0 or extremely small.
Today one application and one database server's computing power and storage are most likely not going to be enough but for the very first lifetime part of successful Internet-enabled applications, no matter how powerful and big the iron is.
Scalability requires using extremely well every single machine and, most often, adding more. Interestingly, this step raises the server-centric applications architecture design bar: concurrency (i.e. the loss of computation linearity) is a reasonably well understood matter in the server arena (although its optimal exploitation is not always achieved) but distribution (i.e. the loss of shared memory) is much less. Let alone data or code mobility.
Even more interestingly, this time for me and a few others alone perhaps, as I have some background (yes, my lonely PhD year) about programming languages specifically designed to cope with these difficult aspects, time for a reification of their theoretical properties into actual tools might be, gradually and incrementally of course but finally, just about to come.
Resources in IT are of two main kinds: computation (time) and data (space). In this first part I'll stay closer to computation-related technologies and touch all the (IMHO) coolest technologies available around to write scalable applications. In the second part instead I'll cover briefly scalable storage solutions, so technologies that specialize in persistent data.
1) Continuations and mobile agents
There are a few languages and runtimes around that support continuations, which basically allow a program's control flow to be suspended before thread termination, stored and then resumed on a new live thread (see my first post in the Scala 2.8 Continuations series).
This is especially useful because it allows to avoid the event-driven programming model while still retaining minimum OS thread usage (so related computational and memory resources like context switch / TLB / cache miss times and stack space resp.), so it belongs to the single-machine concurrency eploitation area, but not so fast as there could be an usage for them in distributed system as well.
This is especially true for those languages that support serializable continuations and (guess what) Scala is one of them (and more admirably so considering nor JVM neither .NET runtimes support continuations: they're simply translated as Scala closures which are serializable): the Swarm project's mission is to leverage Scala continuations to implement a full-featured transparent mobile agents system, i.e. one where code disguised as a continuation closure can move transparently to the remote data, not vice-versa. Needless to say, it's as ambitious as it is interesting.
2) Inversion of control AKA event-processing AKA poor man's high-scalability programming model
I have touched the subject of event-based programming in my first post about Scala 2.8 Continuations already. This is the way networked servers, and generally anyone lacking a more sophisticated alternative, usually implements high-scalability applications: we register a piece of code that will react to specific events like I/O (e.g. network) that would otherwise keep an heavyweight thread alive and with eyes wide open in the OS process, wasting precious and scarce resources. Unfortunately this means that we have to give up any computation structure as our software becomes a collection of uncontextualized reaction procedures; as an alternative, we have to put extra effort and use bookkeeping data structures to encode, remember and then decode the computational context (through, say, a java POJO with a bunch of flags and data).
Java NIO, Apache Mina and Netty all belong to this poor man's solutions toolbox but with different abstraction levels.
3) Immutable message-passing programming model
This software engineering culture could perhaps be considered a more concrete cousin of the research that went on about name-passing calculi (like the Pi Calculus). It tends to abolish memory and synchronisation primitives and to model concurrent and distributed computation only through communication of immutable values between (light- or heavy-weight) threads AKA agents AKA actors.
a) Erlang: a dynamic, purely functional programming language created initially by Ericsson (and later silently but very generously released in the Open) to build extremely reliable telecom platfoms (even uptime 5 nines or 99.99999%). Being a pure functional language, Erlang is a value-building, datatransform-oriented language and there's no concept of "variable modification" or "state" altogether. OS-unrelated functions yield same outputs for same inputs as they don't depend on internal state and concurrency is only message-passing, which can be made network-transparent if needed. In this very regular, maths-like and stateless world traditional concurrency and distribution problems about state transfer and synchronization are simply avoided.
The Erlang VM has a lot of very desirable technological features we all would really really like to see implemented in the JVM yesterday, please. Preemptive green (lightweight) threads, full modules hot code replacement with upgrade rollback spport, network transparency for message passing, monitoring facilities.
The OTP library is quite notable too as it provides combinators for agents that allow their lifecycle to be managed easily and in very powerful and flexible ways.
Unfortunately Erlang still suffers from a few embarassing shortcomings due to its origins as a telecom platform: binary protocol handling is as easy as it could ever get, on the other hand strings are really linked lists of 32-bit integers, so forget about performance in any string-intensive task (very common nowadays, think of web template rendering or data parsing / externalization / processing in textual formats like HTML, XML, JSON or YAML).
Purity with no escape could also get in the way even of very side-effect-aware people just trying to build code that is "functional outside, imperative inside" but this is perhaps a minor problem.
b) Scala Actors: Scala actors are similar to Erlang's support for message-passing concurrency. They're not part of the language but are implemented as a Scala DSL and shipped with the standard library of the language. The interesting thing about them is that they implement a form of cooperative multithreading based on tasks that are effectively closures scheduled to be run by the library itself, so they could all be executed in an interleaved fashion on exactly one OS thread.
This of course becomes impossible when an actor needs to call a blocking (e.g. traditional I/O) operation, in which case the whole OS thread enters a sleep state. For this reason the library launches a controller thread that runs actors on an pool of OS threads whose size is initially the optimal one, i.e. the number of processor cores available to the OS. The actors messaging calls of the library keep track of last activity in a static; if too old, it assumes that tasks on the thread pool are stuck on blocking thread ops and resizes the thread pool accordingly.
Please note that the writable-shared-memory-with-locks Java concurrency model is still fully available in Scala (and unfortunately needs to be because of Java interoperability), so nothing prevents you from messing with it (except your wisdom of course). Also, nothing prevents you from writing actors-based deadlocks as big as you wish, so that all OS threads available for the process are consumed and the VM starts getting nice cryptical OOM exceptions. Erlang solves this problem by encapsulating all blocking calls within a dedicated actor-based I/O server using a dedicated thread pool (this architecture is called I/O ports). A similar server could be well implemented for the JVM but some code (a library without you knowing, in the worst case) could always resort to calling blocking JDK ops directly, so its usefulness (i.e. safety) would be strongly limited.
Here's a very basic sample that mimics coroutines; run it and observe different actors being run on the same thread:
AKKA is an amazing Java and Scala library by Jonas Bonér that brings actors programming at an enterprise-ready level; it includes the following:
import scala.actors.Actor
import scala.actors.Actor._
case class Ping
case class Pong
case class Stop
trait L {
def l( s : String ) = println( "[" + Thread.currentThread().getName() + "] " + s )
}
object PingActor extends Actor with L {
def act = {
var left = 100;
l( "Ping: sending Pong" )
PongActor ! Pong;
loop {
react {
case Ping => {
if (left == 0) {
l( "Ping: left = 0, sending Stop and exiting" )
sender ! Stop
exit()
} else {
l( "Ping: left = " + left + ", sending Pong" )
sender ! Pong
left -= 1
}
}
}
}
}
}
object PongActor extends Actor with L {
def act = {
loop {
react {
case Pong => {
l( "Pong: sending Ping" )
sender ! Ping
}
case Stop => {
l( "Pong: exiting" )
exit()
}
}
}
}
}
object PingPong extends Application with L {
l( "Starting PingPong" )
PongActor.start
PingActor.start
l( "Terminating PingPong" )
}
- Remotable actors with hot-replace support for the message-processing partial function
- Configurable serialization format for messages (e.g. JSON, binary, ...)
- Dispatchers to which actors can be assigned with configurable threading models
- Erlang's OTP-like actors lifecycle hierarchies
- ACID or BASE scala objects persistence (MongoDB or Cassandra at the moment)
- Software Transactional Memory (STM) support
- REST-enabling annotations
- JMX monitoring
Jonas Bonér has also presented at JavaOne 2009 in a very clear way the differences and applicability of three main concurrency models: actors, STM and dataflow concurrency (for which he has also written a Scala DSL).
Naggati is quite interesting as well as it bridges Mina and Actors to enable writing NIO servers in a very simple way.
There are other actor frameworks for Java around, have a look at this excellent post on Salmon Run: More Java Actor Frameworks Compared.
As you can see, actor frameworks don't offer a fully transparent scaling solution but provide very powerful and high-level building blocks to construct any such system, transparent or not.
As a last note on the topic, the Lift web framework leverages Scala lightweight actors to provide Comet-style widgets, i.e. bundles of HTML and JavaScript/JQuery code that perform AJAX requests (actors-backed on the server) polling for display updates.
4) Distributed runtimes AKA Runtime clustering
There's a "school of thought" believing that the runtime platform itself should support configurable transparent computation distribution among several servers as it they were a single machine (i.e. clustering). That's a systems approach to solve the problem and its solution truly is the JVM runtime man's dream coming true: Terracotta.
Unfortunately the JVM is not such a distributed runtime platform by itself, so the Terracotta guys decided that they'd leverage the platform-provided hooks to build one. Terracotta is essentially 2 things:
- A bytecode instrumenter that will rewrite memory-related and synchronization-related JVM bytecodes into calls that, resp., will propagate field changes to the main memory of all servers in the cluster that hold reference to the changed object(s) and will acquire and release object locks cluster-wise. Basically, all servers will see a unified memory, where configured static fields (and objects reachable from them) are kept in-sync, including object identity, (no == breakage) with high-performance change propagation policies that don't even require the objects to be serializable.
- A Java heap paging mechanism that provides virtually unlimited memory space by using persistent storage.
This dream tool basically turns an expandable set of servers into one powerful Java machine with unlimited memory, which you can use as you'd normally use it but without the need to store anything in DBs or other explicit persistent stores or, for example, as a high-performance distributed second-level cache.
It's an all-encompassing solution (both computation and storage scaling) but of course the blessing can become the curse as well: clustering means transparent scaling and distribution, so it brings simplicity and platform-level configurability but also forcibly disallows far-distance distribution, application-aware (such as location-based) balancing and scaling and generally application-aware scaling logic.
5) Batch-processing huge amounts of distributed data
If you, not unlike Google, need to batch-process or query huge amounts of data, then the Google distributed batch-processing infrastracture may be the best tool in your box. There are open-source implementations of this computing environment such as Hadoop.
Google uses a distributed file-system with high replication levels and ability to manage petabytes of data, plus job and task management software that tries hard to minimize costly data moves on the network. Instead, computation is preferably run in parallel by each node (and/or close neighbors) where the data resides. Google's computation model is called MapReduce as it's basically a 2-stage process:
- In the Map or Transform stage(s), nodes process in parallel all the data transforming each data record into a new one, often with a different structure;
- In the Reduce or Aggregate stage(s), nodes synthetize in parallel a set of transformed records they hold into a typically smaller set of new and different records; the process is possibly repeated recursively on the set of synthetic records until even less, or only one, is yielded as a result.
Many data analysis tasks can be encoded as one or several MapReduce jobs, so this computing infrastructure is actually quite powerful and flexible enough, yet the jobs can be lightly specified enough that they can move wherever the data is (instead of the opposite). You can think of this architecture as a simplified distributed code mobility platform specifically tailored to huge (data-wise) batch-processing jobs.