Wednesday 30 September 2009

Scalability and tools to write scalable applications (part 1)


Resources are good, aren't they? Too bad they never seem to be enough, because as soon as we get them we immediately find the way to exhaust them all. That's an interesting side-effect of human ingenium and an amusing catch 22 situation.
Some resources are then shared, some need to be shared (how often did you make a wish that roads and streets could be only yours and that there's nobody else in the way? Our planet is too small, or there are too many of us, for everyone to have his own streets dude, give up!). All of them need to be used wisely and that's not limited to IT (hi, eco-sensible folks!).
In IT, OSs and low-level layers on top of them were born precisely as platforms to effectively share computational and storage resources between concurrent applications. The trend today is to have general-purpose OSs but unfortunately optimal resources usage highly depends on the characteristics of the workload. Some OSs are more flexible than others in being customized for specific usages (e.g. because they're lighter or more sophisticated or because strategies can be plugged in or through configuration) but in some cases their generic support is just not good enough, or a new additional feature would be meeded for a specific application usage. In these cases, if OS improvement is not going to happen timely, an additional middleware or runtime support layer might solve the problem.

These are interesting times. The unbelievably fast evolution of our software engineering culture continuously brings down barriers, costs and time required to build applications and networked computing is becoming more and more pervasive in our lives, just like an infinitely abundant resource users can tap into whenever and wherever they decide to.
As a side effect, any networked application has a potentially huge amount of data and concurrent users that it must be able to cope with in case of success. Problem is, networked applications have traditionally been a centralized affair and an intensive one too for our poor server, as thin client technologies such as the Web have gained ubiquitous popularity at the expense of rich client ones. Too bad hardware can't keep up anymore so, simply put, Internet applications need to be able scale more than ever.

What does scalable application mean though? Given a workload measured in some way and assuming we have an acceptable Quality of Service (QoS) in terms of availability, latency and throughput for all functions in our application, being able to scale up effectively means retaining about the same QoS with n times the workload through an investment in hardware and off-the-shelf software infrastracture that is linear, i.e. about n times the initial one. The application code should be able to run unmodified, except perhaps for minor things like some configuration parameters; plus, application downtime during the infrastructure change should be 0 or extremely small.

Today one application and one database server's computing power and storage are most likely not going to be enough but for the very first lifetime part of successful Internet-enabled applications, no matter how powerful and big the iron is.
Scalability requires using extremely well every single machine and, most often, adding more. Interestingly, this step raises the server-centric applications architecture design bar: concurrency (i.e. the loss of computation linearity) is a reasonably well understood matter in the server arena (although its optimal exploitation is not always achieved) but distribution (i.e. the loss of shared memory) is much less. Let alone data or code mobility.
Even more interestingly, this time for me and a few others alone perhaps, as I have some background (yes, my lonely PhD year) about programming languages specifically designed to cope with these difficult aspects, time for a reification of their theoretical properties into actual tools might be, gradually and incrementally of course but finally, just about to come.

Resources in IT are of two main kinds: computation (time) and data (space). In this first part I'll stay closer to computation-related technologies and touch all the (IMHO) coolest technologies available around to write scalable applications. In the second part instead I'll cover briefly scalable storage solutions, so technologies that specialize in persistent data.


1) Continuations and mobile agents

There are a few languages and runtimes around that support continuations, which basically allow a program's control flow to be suspended before thread termination, stored and then resumed on a new live thread (see my first post in the Scala 2.8 Continuations series).
This is especially useful because it allows to avoid the event-driven programming model while still retaining minimum OS thread usage (so related computational and memory resources like context switch / TLB / cache miss times and stack space resp.), so it belongs to the single-machine concurrency eploitation area, but not so fast as there could be an usage for them in distributed system as well.
This is especially true for those languages that support serializable continuations and (guess what) Scala is one of them (and more admirably so considering nor JVM neither .NET runtimes support continuations: they're simply translated as Scala closures which are serializable): the Swarm project's mission is to leverage Scala continuations to implement a full-featured transparent mobile agents system, i.e. one where code disguised as a continuation closure can move transparently to the remote data, not vice-versa. Needless to say, it's as ambitious as it is interesting.


2) Inversion of control AKA event-processing AKA poor man's high-scalability programming model

I have touched the subject of event-based programming in my first post about Scala 2.8 Continuations already. This is the way networked servers, and generally anyone lacking a more sophisticated alternative, usually implements high-scalability applications: we register a piece of code that will react to specific events like I/O (e.g. network) that would otherwise keep an heavyweight thread alive and with eyes wide open in the OS process, wasting precious and scarce resources. Unfortunately this means that we have to give up any computation structure as our software becomes a collection of uncontextualized reaction procedures; as an alternative, we have to put extra effort and use bookkeeping data structures to encode, remember and then decode the computational context (through, say, a java POJO with a bunch of flags and data).
Java NIO, Apache Mina and Netty all belong to this poor man's solutions toolbox but with different abstraction levels.


3) Immutable message-passing programming model

This software engineering culture could perhaps be considered a more concrete cousin of the research that went on about name-passing calculi (like the Pi Calculus). It tends to abolish memory and synchronisation primitives and to model concurrent and distributed computation only through communication of immutable values between (light- or heavy-weight) threads AKA agents AKA actors.

a) Erlang: a dynamic, purely functional programming language created initially by Ericsson (and later silently but very generously released in the Open) to build extremely reliable telecom platfoms (even uptime 5 nines or 99.99999%). Being a pure functional language, Erlang is a value-building, datatransform-oriented language and there's no concept of "variable modification" or "state" altogether. OS-unrelated functions yield same outputs for same inputs as they don't depend on internal state and concurrency is only message-passing, which can be made network-transparent if needed. In this very regular, maths-like and stateless world traditional concurrency and distribution problems about state transfer and synchronization are simply avoided.
The Erlang VM has a lot of very desirable technological features we all would really really like to see implemented in the JVM yesterday, please. Preemptive green (lightweight) threads, full modules hot code replacement with upgrade rollback spport, network transparency for message passing, monitoring facilities.
The OTP library is quite notable too as it provides combinators for agents that allow their lifecycle to be managed easily and in very powerful and flexible ways.
Unfortunately Erlang still suffers from a few embarassing shortcomings due to its origins as a telecom platform: binary protocol handling is as easy as it could ever get, on the other hand strings are really linked lists of 32-bit integers, so forget about performance in any string-intensive task (very common nowadays, think of web template rendering or data parsing / externalization / processing in textual formats like HTML, XML, JSON or YAML).
Purity with no escape could also get in the way even of very side-effect-aware people just trying to build code that is "functional outside, imperative inside" but this is perhaps a minor problem.

b) Scala Actors: Scala actors are similar to Erlang's support for message-passing concurrency. They're not part of the language but are implemented as a Scala DSL and shipped with the standard library of the language. The interesting thing about them is that they implement a form of cooperative multithreading based on tasks that are effectively closures scheduled to be run by the library itself, so they could all be executed in an interleaved fashion on exactly one OS thread.
This of course becomes impossible when an actor needs to call a blocking (e.g. traditional I/O) operation, in which case the whole OS thread enters a sleep state. For this reason the library launches a controller thread that runs actors on an pool of OS threads whose size is initially the optimal one, i.e. the number of processor cores available to the OS. The actors messaging calls of the library keep track of last activity in a static; if too old, it assumes that tasks on the thread pool are stuck on blocking thread ops and resizes the thread pool accordingly.
Please note that the writable-shared-memory-with-locks Java concurrency model is still fully available in Scala (and unfortunately needs to be because of Java interoperability), so nothing prevents you from messing with it (except your wisdom of course). Also, nothing prevents you from writing actors-based deadlocks as big as you wish, so that all OS threads available for the process are consumed and the VM starts getting nice cryptical OOM exceptions. Erlang solves this problem by encapsulating all blocking calls within a dedicated actor-based I/O server using a dedicated thread pool (this architecture is called I/O ports). A similar server could be well implemented for the JVM but some code (a library without you knowing, in the worst case) could always resort to calling blocking JDK ops directly, so its usefulness (i.e. safety) would be strongly limited.
Here's a very basic sample that mimics coroutines; run it and observe different actors being run on the same thread:

import scala.actors.Actor
import scala.actors.Actor._

case class Ping
case class Pong
case class Stop

trait L {
def l( s : String ) = println( "[" + Thread.currentThread().getName() + "] " + s )
}

object PingActor extends Actor with L {
def act = {
var left = 100;
l( "Ping: sending Pong" )
PongActor ! Pong;
loop {
react {
case Ping => {
if (left == 0) {
l( "Ping: left = 0, sending Stop and exiting" )
sender ! Stop
exit()
} else {
l( "Ping: left = " + left + ", sending Pong" )
sender ! Pong
left -= 1
}
}
}
}
}
}

object PongActor extends Actor with L {
def act = {
loop {
react {
case Pong => {
l( "Pong: sending Ping" )
sender ! Ping
}
case Stop => {
l( "Pong: exiting" )
exit()
}
}
}
}
}

object PingPong extends Application with L {
l( "Starting PingPong" )
PongActor.start
PingActor.start
l( "Terminating PingPong" )
}
AKKA is an amazing Java and Scala library by Jonas Bonér that brings actors programming at an enterprise-ready level; it includes the following:

- Remotable actors with hot-replace support for the message-processing partial function
- Configurable serialization format for messages (e.g. JSON, binary, ...)
- Dispatchers to which actors can be assigned with configurable threading models
- Erlang's OTP-like actors lifecycle hierarchies
- ACID or BASE scala objects persistence (MongoDB or Cassandra at the moment)
- Software Transactional Memory (STM) support
- REST-enabling annotations
- JMX monitoring

Jonas Bonér has also presented at JavaOne 2009 in a very clear way the differences and applicability of three main concurrency models: actors, STM and dataflow concurrency (for which he has also written a Scala DSL).

Naggati is quite interesting as well as it bridges Mina and Actors to enable writing NIO servers in a very simple way.

There are other actor frameworks for Java around, have a look at this excellent post on Salmon Run: More Java Actor Frameworks Compared.

As you can see, actor frameworks don't offer a fully transparent scaling solution but provide very powerful and high-level building blocks to construct any such system, transparent or not.

As a last note on the topic, the Lift web framework leverages Scala lightweight actors to provide Comet-style widgets, i.e. bundles of HTML and JavaScript/JQuery code that perform AJAX requests (actors-backed on the server) polling for display updates.


4) Distributed runtimes AKA Runtime clustering

There's a "school of thought" believing that the runtime platform itself should support configurable transparent computation distribution among several servers as it they were a single machine (i.e. clustering). That's a systems approach to solve the problem and its solution truly is the JVM runtime man's dream coming true: Terracotta.

Unfortunately the JVM is not such a distributed runtime platform by itself, so the Terracotta guys decided that they'd leverage the platform-provided hooks to build one. Terracotta is essentially 2 things:

- A bytecode instrumenter that will rewrite memory-related and synchronization-related JVM bytecodes into calls that, resp., will propagate field changes to the main memory of all servers in the cluster that hold reference to the changed object(s) and will acquire and release object locks cluster-wise. Basically, all servers will see a unified memory, where configured static fields (and objects reachable from them) are kept in-sync, including object identity, (no == breakage) with high-performance change propagation policies that don't even require the objects to be serializable.

- A Java heap paging mechanism that provides virtually unlimited memory space by using persistent storage.

This dream tool basically turns an expandable set of servers into one powerful Java machine with unlimited memory, which you can use as you'd normally use it but without the need to store anything in DBs or other explicit persistent stores or, for example, as a high-performance distributed second-level cache.

It's an all-encompassing solution (both computation and storage scaling) but of course the blessing can become the curse as well: clustering means transparent scaling and distribution, so it brings simplicity and platform-level configurability but also forcibly disallows far-distance distribution, application-aware (such as location-based) balancing and scaling and generally application-aware scaling logic.


5) Batch-processing huge amounts of distributed data

If you, not unlike Google, need to batch-process or query huge amounts of data, then the Google distributed batch-processing infrastracture may be the best tool in your box. There are open-source implementations of this computing environment such as Hadoop.
Google uses a distributed file-system with high replication levels and ability to manage petabytes of data, plus job and task management software that tries hard to minimize costly data moves on the network. Instead, computation is preferably run in parallel by each node (and/or close neighbors) where the data resides. Google's computation model is called MapReduce as it's basically a 2-stage process:

- In the Map or Transform stage(s), nodes process in parallel all the data transforming each data record into a new one, often with a different structure;

- In the Reduce or Aggregate stage(s), nodes synthetize in parallel a set of transformed records they hold into a typically smaller set of new and different records; the process is possibly repeated recursively on the set of synthetic records until even less, or only one, is yielded as a result.

Many data analysis tasks can be encoded as one or several MapReduce jobs, so this computing infrastructure is actually quite powerful and flexible enough, yet the jobs can be lightly specified enough that they can move wherever the data is (instead of the opposite). You can think of this architecture as a simplified distributed code mobility platform specifically tailored to huge (data-wise) batch-processing jobs.

Monday 28 September 2009

Control flows in Heaven with Scala 2.8 Continuations - Part 3


This is going to be a short one. Let's just recall from my previous post in the Scala Continuations series that a shift call's return value is the continuation function's input value and both have type A. The shift call's type is always decorated (by the type inferrer plugin or by the programmer) with the @[B, C] type annotation, where B is the return value of the continuation function and C is the return value of the redirect function (and of the reset call).

This is where things really get tricky but don't be discouraged!

Let's now suppose we want to use the shift call twice in a single toplevelContinuationFunction body and on the same redirect function argument: this means that the continuation built by the first shift call will contain ONLY the code that will execute up to the next shift call, then will build another continuation and finally will transfer immediately control to the redirect function.
This is the turning point: when the latter returns, ending the second reset, it returns with a value of type C. BUT, its return point is actually the return of the first continuation function call as well, which has type B!

So, this brings us to the following consideration: in order to have multiple shift calls in the same reset block, B must equal C.

This experiment also makes even more apparent a couple of things: continuations "invert" the normal program control flow and the continuations plugin is very good at capturing continuations behaviour with small typechecking and code transformation extensions to the Scala type system and compiler.

Looking forward to seeing it in action in a final 2.8 release soon!


Thursday 17 September 2009

Java.next, the Communities and the Companies


I really believe that today's high-productivity languages and frameworks like Ruby, Python, Groovy, Scala, F#, Clojure, Rails, Django, Grails, Lift and others could change the software engineering landscape forever and a lot of that comes from the fact that they can now interoperate: now and more so in the future, developers will be polyglot and able to use the right tool for the right area / job even within the same project.
This is largely due to these languages being built on (or ported to) well-established runtime platforms like the JVM and Mono/.NET.

Of course it all depends on whether those platforms are going to evolve rapidly enough to accomodate their feature and performance needs; something is being done, like invokeDynamic, thanks to projects like the Da Vinci Machine AKA Multi-language VM; it still remains to be seen if platform-backing companies are going to use effectively this exceptional (largely FOSS community-provided) contribution.
I wonder, for example, why key JRuby developers had to choose to leave Sun for Engine Yard and why more of the MLVM extensions (like continuations for example) aren't being brought into the upcoming JDK 7.
I also wonder why Sun has chosen to compete with Flash in the thin-client rich-graphics RIA area by creating JavaFX (and this late, too). Don't get me wrong: it's a nice technology but why another language? They seem to have augmented Java with a few extensions for first-class functions and closures, DSL-enabled syntax and generally more of declarative, scripting-flavoured, functional- and data transformation-oriented programming constructs. The thing is: Scala and others already had all of the above and more: they could simply spare time and have written an applet to interpret one or several of them as scripting languages instead of creating a new one. And it could have been the occasion to officially show some support for them, as they really are the platform's future.
Could that have something to do with Oracle acquiring Sun? Oracle has a lot of interest in the Java platform being alive and well too as it has many products built on it, so it's perhaps not the case.

It's still a fact, though, that the platforms needs to evolve better and faster. Is that going to happen and how? Through Sun / Oracle support or though the community? Is the Java/JVM ecosystem at risk?

Time will tell and we'll see for sure sooner than later but I'll be glad to hear your thoughts through comments.

Monday 31 August 2009

Control flows in Heaven with Scala 2.8 continuations - Part 2


My previous post in the series covered the continuation concept, the Scala 2.8 support for it and how the control flows in a program using them.

Let's have a look now at how the continuation typechecking and compilation works as this will allow us to use Scala 2.8 continuations with non-Unit types. I'm not grounding my reasoning on the formal typing rules in the paper but on my experiments and understanding so there could well be errors in the following explanation (don't hesitate to let me know!). My goal, though, is to gain and give back just an intuitive and practical understanding of how to use continuations with non-Unit types and how to help the type inferrer when it can't guess the correct types.

Let's first assume that the type of the continuation function built by the shift call and passed back to the redirect function is ContinuationFunctionType = A => B for some types A and B. Then the shift and redirect call types are the following:
  • shift( redirectFunction : ContinuationFunctionType => C ) : A @cps[B, C]; please note that the redirect function can return a value of type C, i.e. belonging to a different type than the value returned by the continuation (B); also note that the return type of the shift call is A because it returns when the continuation function is invoked, and it returns precisely the continuation function's input.
  • reset( toplevelContinuationFunction : () => B @cps[B, C] ) : C; please note that toplevelContinuationFunction returns B @cps[B, C] and not A @cps[B, C]; the B return type is because this function is the end of the continuation body (which returns a value of type B) and the C type in second position inside the annotation is because, as we've seen from the shift call type, it represents the reset return type.
The typechecker can infer the type of the shift call and will try to trace it up (towards the root) in the call tree until it finds a corresponding reset call with compatible types. If it doesn't succeed, it will complain. Sometimes manual annotations can do the trick; if they don't work it means that there's something wrong in the code about the continuation types.

As usual, if it compiles then the continuation structure of your program will run ok (and, as usual, only if you have coded your intended logic correctly you'll get the result you wanted, as no typechecker can read your mind yet).

Here's a modified (and simplified as, for now, there's only one shift call involved) version of Example1.scala in my previous post using different A, B and C types than Unit. Specifically, A = Int, B = Unit and C = String. You can see that the above typing rules are satisfied by explicit types and annotations and, indeed, the program compiles and runs fine:

import scala.continuations._
import ControlContext._

object Example2 {

private def l( msg : String ) {
println( "[Thread '" + Thread.currentThread().getName() + "'] " + msg )
}

private def redirectFunction( cont : Int => Unit ) : String = {
l( "Entered redirectFunction, calling continuation with 1" )

/* 3 */

val r : Unit = /* <-- 10 */ cont( 1 )

/* 11 */

l( "Exiting redirectFunction, continuation called, got result " + r + ", returning ok" )

/* 12 */

"ok"
}

private def packContAndUseItIfNeeded() : Int @cps[Unit, String] = {
l( "Entered packContAndUseItIfNeeded, doing shift" )

/* 2 */

val r : Int @cps[Unit, String] = /* <-- 4 */ shift( redirectFunction )

/* 5 */

l(
"Exiting packContAndUseItIfNeeded, shift done, " +
"ACTUAL CONTINUATION RUNNING! Returning passed in value: " + r
)

/* 6 */

r
}

private def toplevelContinuationFunction() : Unit @cps[Unit, String]= {
l( "Entered reset and toplevelContinuationFunction, before shift" )

/* 1 */

val r1 : Int @cps[Unit, String] = /* <-- 7 */ packContAndUseItIfNeeded

/* 8 */

l(
"In toplevelContinuationFunction, after second packContAndUseItIfNeeded, " +
"ACTUAL CONTINUATION RUNNING!"
)

l( "Exiting toplevelContinuationFunction and reset, returning ()" )

/* 9 */

()
}

def main( args: Array[String] ) : Unit = {
l( "Before reset" )

val x : String = /* <-- 13 */ reset( toplevelContinuationFunction )

/* 14 */

l( "After reset, terminating" )
}
}

As usual, read it and follow the flow through numbered comments, then just compile it and run it to see if logs match your understanding.

Types are not just used to check the program structure though, but to drive the cps-transform as well. In order to better understand the cps-transform, just use the plugin's compilation logging facility as in Rich Dougherty's post. Here's the result of cps-transforming the above program (with some bureocracy stripped out for easier reading):

object Example2 {
private def l(msg: String): Unit =
println("[Thread '" + Thread.currentThread().getName() + "'] " + msg);

private def redirectFunction(cont: (Int) => Unit): String = {
l( "Entered redirectFunction, calling continuation with 1" );
val r: Unit = cont.apply(1);
l( "Exiting redirectFunction, continuation called, got result " + r + ", returning ok" );
"ok"
};

private def packContAndUseItIfNeeded(): ControlContext[Int,Unit,String] = {
l( "Entered packContAndUseItIfNeeded, doing shift" );
ControlContext.shiftR[Int, Unit, String]( {
((cont: (Int) => Unit) => redirectFunction( cont ))
}).map[Int](((r: Int) => {
l(
"Exiting packContAndUseItIfNeeded, shift done, ACTUAL CONTINUATION RUNNING! " +
"Returning passed in value: " + r
);
r
}))
};

private def toplevelContinuationFunction(): ControlContext[Unit,Unit,String] = {
l("Entered reset and toplevelContinuationFunction, before shift");
packContAndUseItIfNeeded().map[Unit]( ((r1: Int) => {
l(
"In toplevelContinuationFunction, after second packContAndUseItIfNeeded, " +
"ACTUAL CONTINUATION RUNNING!"
);
l( "Exiting toplevelContinuationFunction and reset, returning ()" );
()
}))
};

def main(args: Array[String]): Unit = {
l( "Before reset" );
val x: String = ControlContext.reset[Unit, String]( toplevelContinuationFunction() );
l( "After reset, terminating" )
}
}
  1. Any expression having been assigned a type of the form A @cps[B, C] by the programmer or by the type inferrer (so not only shift calls but sub-calls containing shift calls as well) will be compiled into a new ControlContext[A, B, C] object. This object stores 2 things:
    1. The closure itself, i.e. the rest of the execution (if the shift call happens inside a sub-method, the rest of the method's body; if it happens inside a toplevel continuation function passed to a reset, up to the end of it's body);
    2. A reference to the redirect function passed to the shift call.
    The essential feature of these objects is that they exibit monadic operations, i.e. they define methods that enable them to be chained. This is essentially what happens in the transformed toplevelContinuationFunction, when the return value of packContAndUseItIfNeeded (i.e. the continuation portion contained in that method after the shift call) is chained, a map calls, with another continuation one built out of the lines in toplevelContinuationFunction after the packContAndUseItIfNeeded call. You can see that, essentially, the cps-transform is about "packaging" code in chainable continuations instead of executing it directly (a task that, incidentally, is made easier because Scala has closures). You can also see that the number of continuations built is always the least possible (and that's why the cps transform is a selective one) as, after the last call to packContAndUseItIfNeeded, there's is only one continuation built out of 2 calls to l().
  2. reset is nothing more than a library function which call takes as an argument the chain of continuations just built, then simply starts executing the redirect function in the first continuation in the chain. The redirect function will then invoke the first continuation, which is able to hand control over to the next ones in the chain and execute up to the end of the toplevelContinuationFunction.
If you'd like to explore more about monads and understand how they relate to continuations, here's an amazing Monads are Elephants posts series by James Iry. You should also have a look at the definition of the cps-transform and of the Shift object in the original paper (also having a look at the type rules won't do any harm if you have the time to study them).

In the next (and last) post we'll explore a few variants of this sample that will enable you to
  1. Invoke more than once shift in the toplevel continuation function;
  2. Call the continuation on the result of another continuation call.

Sunday 30 August 2009

Care about the Data

Developing software is a hard and time-consuming task. We can certainly do much more in much less time than when assembly was the only option, but expectations and needs about software have grown proportionally, and perhaps even faster, than software development processes and tools themselves.
For this reason, improvements in productivity and cost-effectiveness of software development never seem to be enough and it's even more the case because smaller and smaller companies require, and expect to be able to afford with smaller and smaller budgets, the benefits that software and correct information management processes can bring to them in terms of effectiveness and competitive edge.
As a consequence, software development companies try hard to find ways to develop software in less time and with less (and less skilled) people. There are several ways to try to reach this goal, here are a few:
  • Stimulating staff growth, experience, speed and overall professional roundness
  • Studying, evaluating and adopting increasingly sophisticated tools (more and more expressive languages, frameworks, libraries; more mature development management tools and processes; ...), i.e. technology and expertise advance;
  • Relying on an asset of already built libraries or application templates for common cases;
  • Automation and software generation;
  • Of course, last but not least (and most frequent sometimes), trying to sell low quality, fast-hacked solutions as if they were much better than they are, relying on the substantial ignorance of the customer.
Let alone the fact that the latter is clearly a blind and short-termed strategy, there is some risk of quality loss in the other ones as well. This risk is clearly stimulated by software markets that are shaped out of the needs of companies short on budget and with scarce IT awareness (typically frequent, in my experience, with small oens just starting to let in IT) that will easily end up just choosing the cheapest solution.
Quality loss is very dangerous in short, medium and long terms and in every area but I'm perhaps most sensible about the data storage area. Why? Simply because:
  • Understandable and accessible data is easily the most valuable asset for many companies;
  • Application data survives the application itself and even application providers; in fact, any information can be considered legacy as soon as it is stored for use in a production environment;
  • Changing and sometimes even evolving persisted data structures is hard (more so if they're not yours);
  • Most often than not, application data must be easily accessed directly through storage tools usage by the customer itself for maintenance, fixes, queries, OLAP and BI. As there's always little time for docs and for instruction, the more the data structure is clean, clear and open to 3rd-party understanding, the better for anyone;
  • Even if all of the above is satisfied, you'll find yourself fiddling with customer data directly in the storage more often than you might think (and this means daily in most cases) because of bugs and planned / unplanned support or upgrades. You'd soon regret a poorly designed storage for this reason alone.
Another lesson to be learnt is: develop schema and data migration strategies as soon as possible in the development process. You'll soon be happy of having thought about it when you'll find you need a different storage solution for whatever reason (new performance requirements, new data access patterns, ...).

So, for the health of your business (and of yourself) here's my warmest and solicit suggestion: Take Big Care of the Data.

Thursday 27 August 2009

Control flows in Heaven with Scala 2.8 continuations - Part 1

I'm no expert by any means but I'm very interested in high-performance and high-scalability topics (who hasn't broken his neck on those issues too late in the development process?) and continuations are a very promising language concept in this area. They allow to suspend a program control flow and then to resume it when (or if ever) needed.
Yes, that's useful as, for example, you could write a program that processes an HTTP request, then renders a page to gather more info and "freezes" (in the sense that it suspends and releases its thread) while it waits for an answer, then resumes execution in a new request processing thread. You could say: "well, that's more easily done by blocking the thread until an answer arrives or by writing event-style code as any sane framework will force you to do".
The thing is:
  1. Event-style code can be difficult to read and mantain when it doesn't match your logic
  2. OS threads are an expensive resource, as many platforms won't let you have more than a few thousands alive at the same time in a system (you thought many more, didn't you?), so having one just to wait for an answer is an highly suboptimal approach as far as scalability is concerned
See this Apache Cocoon page for a more thorough explanation of this use case. You can easily see, anyway, that any concurrent request-processing software piece could benefit from them.

On a less immediately pragmatic side, continuations can be used to unify seemingly different control flow structures like exceptions or coroutines (which, as an aside, are very useful to write highly-scalable concurrent programs as they allow cooperative multi-threading using only one OS thread).

The Scala 2.8 language will feature a compiler plugin and a library-based DSL for composable continuations (there's a preview post at Scala's website and another very good post by Rich Dougherty here and more excellent continuation-related ones follow in his blog; the full-blown original paper is here).
Continuations can be thought of as values that store the (partial or total) rest of a control flow as it was at a given point in time. In Scala 2.8, they are functions built by the shift call, which always stores a partial remaining flow delimited by an outer (call-wise, not necessarily syntactically) reset call.

Since neither the JVM nor .NET provide primitives to build continuations (or not yet, see the Da Vinci Machine project), the Scala team has implemented them by using a program transformation technique called "Continuation-Passing Style (CPS) transform": they built a compiler plugins that rewrites select portions of a Scala program in such a way that continuations work as intended. This plugins also provides a typechecker / type inferrer extension, on which it relies to check the "reset outside shift" calls structure and to track the portions of the program that need the cps transform to be applied; very intuitively it assigns annotations to shift calls, it propagates them up the calls structure in the program and expects to find them back when checking reset calls.

Perhaps it's just me but the exact behaviour of these new primitives has escaped my full understanding at first, so I decided to drill a bit into the topic and to perform several small experiments to figure out what was happening.

In this post and the next ones I'll show a few Scala 2.8 examples I built that helped me in grasping gradually how exactly Scala continuations and their typechecking work. All of them are self-contained, show explicit annotated types and try to avoid any Scala syntactic sugar in order to make things more apparent and explicit. They also abound in tracing println statements and there are comments with numbers that track the control flow throughout the code.

First of all, how do you invoke shift and reset and what's happening when you do?
  • A reset call takes as its argument the toplevel continuation function, i.e. the code that will be suspended by some shift call (and will possibly be resumed later). The shift call must be performed at a certain point during the execution of the toplevel continuation function itself (possibly not directly in its body but at least somewhere in sub-calls), else the compiler plugin will be able to detect this anomaly and will complain.
  • A shift call builds an object (the continuation) that will "freeze" and store the rest of the control flow up to the end of the toplevel continuation function passed to the reset call. Control is then transferred immediately to the argument that we could name the "redirect" or "replacement" function. The redirect function will, of course, receive the continuation object as its argument and, after doing anything it wants, will have the option to call it to resume the enclosed control flow (and then do again anything it wants with the result of it). Please note that, unless the redirect function calls the continuation, the toplevel continuation function won't finish executing anymore and will just stay freezed until the continuation is garbage collected. The shift call ends when its redirect function returns. The reset call, instead, ends when the first shift call has returned. In case there are subsequent shift calls, they'll be part of the continuation built by a previous shift, which means they're not necessarily executed. But, if they are, the reset call returns when their redirect function has finished as well. This is an odd behaviour at first but, as you'll see, it makes perfectly sense.
The Example1 program below show, in a simplified procedural version (all types involved are Unit) how all of the above pieces fit and work together: you can see the reset call in main method on the toplevel continuation function. The latter doesn't call directly shift but invokes a couple of times a private metod that will take care of it insted. The shift call receives a redirect function as its argument, which uses the continuation parameter to resume the frozen flow. Please read it, follow and understand the control sequence by following number comments, make sure they make sense with the above explanation and then use the info in Rich Dougherty's post to compile and run it:


import scala.continuations._
import ControlContext._

object Example1 {
private def l( msg : String ) {
println( "[Thread '" + Thread.currentThread().getName() + "'] " + msg )
}

private def redirectFunction( continuation : Unit => Unit ) : Unit = {
l( "Entered redirectFunction, calling continuation" )

/* 3 */
/* 11 */

val r : Unit = /* <-- 19 */ /* <-- 22 */ continuation()

/* 20 */
/* 23 */

l( "Exiting redirectFunction, continuation called, returning ()" )

/* 21 */
/* 24 */

()
}

private def packContAndUseItIfNeeded() : Unit @cps[Unit, Unit] = {
l( "Entered packContAndUseItIfNeeded, doing shift" )

/* 2 */
/* 10 */

val r : Unit @cps[Unit, Unit] = /* <-- 4 */ /* <-- 12 */ shift( redirectFunction )

/* 5 */
/* 13 */

l(
"Exiting packContAndUseItIfNeeded, shift done, " +
"ACTUAL CONTINUATION (1 or 2) RUNNING! returning ()"
)

/* 6 */
/* 14 */

()
}

private def toplevelContinuationFunction() : Unit @cps[Unit, Unit] = {
l( "Entered reset and toplevelContinuationFunction, before first shift" )

/* 1 */

val r1 : Unit @cps[Unit, Unit] = /* <-- 7 */ packContAndUseItIfNeeded

/* 8 */

l(
"In toplevelContinuationFunction, after first packContAndUseItIfNeeded, " +
"ACTUAL CONTINUATION 1 RUNNING! Before second shift"
)

/* 9 */

val r2 : Unit @cps[Unit, Unit] = /* <-- 15 */ packContAndUseItIfNeeded

/* 16 */

l(
"In toplevelContinuationFunction, after second packContAndUseItIfNeeded, " +
"ACTUAL CONTINUATION 2 RUNNING!"
)

/* 17 */

l( "Exiting toplevelContinuationFunction and reset, returning ()" )

/* 18 */

()
}

def main( args: Array[String] ) : Unit = {
l( "Before reset" )

val x : Unit = /* <-- 25 */ reset( toplevelContinuationFunction )

l( "After reset, terminating" )
}
}



I really hope this first part has been useful to you. If you find any mistakes or misunderstandings please let me know and I'll fix them ASAP (plus I'll learn something new, thank you!).
In the next posts we'll see slightly different versions of the above example in order to better understand the continuations type-checking and type restrictions.

Wednesday 26 August 2009

Hardcoder for a living

Hello everybody!

I'm a CS graduate and a programmer with 7 years of experience on my shoulders, still trying to make a living of a passion like many others.
A living that (those many others can easily agree I think) is not always pleasant, unless you are masochistic enough to enjoy at least some of the following odd pleasures:
  • Data entry
  • Data fixing
  • Sysadmin (software, hardware and network if needed)
  • Legacy data import / export
  • Unstructured support
  • Coding at night for yesterday's needs
  • Coding at night for req. A from boss X or conflicting req. B from boss Y, provided they have understood the matter, or who-knows-what in case they haven't
  • In general, dealing with bosses having no clues about what "management", "structure", "basic planning", "KISS", "consolidation", "common sense", "development process", "organization", "instruments", "working environment", "precise", "analysis", "dealing with people", "cooperation" and "enterprise functions integration" mean. Let alone "IT" or "software development".
So, until / unless a better world (or a better Enterprise) comes in, for those of us that have broader and higher interests than the above there is no other solution than spending spare time trying to accomplish something that is simply worth the time.
OSS writing is a good one, but I think blogging (and, in general, building and publishing freely available content) is as well and is not this different. After all, software is running and usable form of knowledge that, in turn, requires technical knowledge and expertise to be written. So, I think, sharing self-built knowledge behind software building is as valuable as sharing software itself.
So stay tuned: there will be upcoming posts about any technology or CS concept I'm interested in and looking at in my spare time, which means a broad spectrum of topics and a broad spectrum of depths.

Looking forward to escaping (at least for the time of a post) the hardcoding life, enjoying the sharing, being useful and getting your feedbacks. What miracles can a simple blog accomplish.