Monday 31 August 2009

Control flows in Heaven with Scala 2.8 continuations - Part 2


My previous post in the series covered the continuation concept, the Scala 2.8 support for it and how the control flows in a program using them.

Let's have a look now at how the continuation typechecking and compilation works as this will allow us to use Scala 2.8 continuations with non-Unit types. I'm not grounding my reasoning on the formal typing rules in the paper but on my experiments and understanding so there could well be errors in the following explanation (don't hesitate to let me know!). My goal, though, is to gain and give back just an intuitive and practical understanding of how to use continuations with non-Unit types and how to help the type inferrer when it can't guess the correct types.

Let's first assume that the type of the continuation function built by the shift call and passed back to the redirect function is ContinuationFunctionType = A => B for some types A and B. Then the shift and redirect call types are the following:
  • shift( redirectFunction : ContinuationFunctionType => C ) : A @cps[B, C]; please note that the redirect function can return a value of type C, i.e. belonging to a different type than the value returned by the continuation (B); also note that the return type of the shift call is A because it returns when the continuation function is invoked, and it returns precisely the continuation function's input.
  • reset( toplevelContinuationFunction : () => B @cps[B, C] ) : C; please note that toplevelContinuationFunction returns B @cps[B, C] and not A @cps[B, C]; the B return type is because this function is the end of the continuation body (which returns a value of type B) and the C type in second position inside the annotation is because, as we've seen from the shift call type, it represents the reset return type.
The typechecker can infer the type of the shift call and will try to trace it up (towards the root) in the call tree until it finds a corresponding reset call with compatible types. If it doesn't succeed, it will complain. Sometimes manual annotations can do the trick; if they don't work it means that there's something wrong in the code about the continuation types.

As usual, if it compiles then the continuation structure of your program will run ok (and, as usual, only if you have coded your intended logic correctly you'll get the result you wanted, as no typechecker can read your mind yet).

Here's a modified (and simplified as, for now, there's only one shift call involved) version of Example1.scala in my previous post using different A, B and C types than Unit. Specifically, A = Int, B = Unit and C = String. You can see that the above typing rules are satisfied by explicit types and annotations and, indeed, the program compiles and runs fine:

import scala.continuations._
import ControlContext._

object Example2 {

private def l( msg : String ) {
println( "[Thread '" + Thread.currentThread().getName() + "'] " + msg )
}

private def redirectFunction( cont : Int => Unit ) : String = {
l( "Entered redirectFunction, calling continuation with 1" )

/* 3 */

val r : Unit = /* <-- 10 */ cont( 1 )

/* 11 */

l( "Exiting redirectFunction, continuation called, got result " + r + ", returning ok" )

/* 12 */

"ok"
}

private def packContAndUseItIfNeeded() : Int @cps[Unit, String] = {
l( "Entered packContAndUseItIfNeeded, doing shift" )

/* 2 */

val r : Int @cps[Unit, String] = /* <-- 4 */ shift( redirectFunction )

/* 5 */

l(
"Exiting packContAndUseItIfNeeded, shift done, " +
"ACTUAL CONTINUATION RUNNING! Returning passed in value: " + r
)

/* 6 */

r
}

private def toplevelContinuationFunction() : Unit @cps[Unit, String]= {
l( "Entered reset and toplevelContinuationFunction, before shift" )

/* 1 */

val r1 : Int @cps[Unit, String] = /* <-- 7 */ packContAndUseItIfNeeded

/* 8 */

l(
"In toplevelContinuationFunction, after second packContAndUseItIfNeeded, " +
"ACTUAL CONTINUATION RUNNING!"
)

l( "Exiting toplevelContinuationFunction and reset, returning ()" )

/* 9 */

()
}

def main( args: Array[String] ) : Unit = {
l( "Before reset" )

val x : String = /* <-- 13 */ reset( toplevelContinuationFunction )

/* 14 */

l( "After reset, terminating" )
}
}

As usual, read it and follow the flow through numbered comments, then just compile it and run it to see if logs match your understanding.

Types are not just used to check the program structure though, but to drive the cps-transform as well. In order to better understand the cps-transform, just use the plugin's compilation logging facility as in Rich Dougherty's post. Here's the result of cps-transforming the above program (with some bureocracy stripped out for easier reading):

object Example2 {
private def l(msg: String): Unit =
println("[Thread '" + Thread.currentThread().getName() + "'] " + msg);

private def redirectFunction(cont: (Int) => Unit): String = {
l( "Entered redirectFunction, calling continuation with 1" );
val r: Unit = cont.apply(1);
l( "Exiting redirectFunction, continuation called, got result " + r + ", returning ok" );
"ok"
};

private def packContAndUseItIfNeeded(): ControlContext[Int,Unit,String] = {
l( "Entered packContAndUseItIfNeeded, doing shift" );
ControlContext.shiftR[Int, Unit, String]( {
((cont: (Int) => Unit) => redirectFunction( cont ))
}).map[Int](((r: Int) => {
l(
"Exiting packContAndUseItIfNeeded, shift done, ACTUAL CONTINUATION RUNNING! " +
"Returning passed in value: " + r
);
r
}))
};

private def toplevelContinuationFunction(): ControlContext[Unit,Unit,String] = {
l("Entered reset and toplevelContinuationFunction, before shift");
packContAndUseItIfNeeded().map[Unit]( ((r1: Int) => {
l(
"In toplevelContinuationFunction, after second packContAndUseItIfNeeded, " +
"ACTUAL CONTINUATION RUNNING!"
);
l( "Exiting toplevelContinuationFunction and reset, returning ()" );
()
}))
};

def main(args: Array[String]): Unit = {
l( "Before reset" );
val x: String = ControlContext.reset[Unit, String]( toplevelContinuationFunction() );
l( "After reset, terminating" )
}
}
  1. Any expression having been assigned a type of the form A @cps[B, C] by the programmer or by the type inferrer (so not only shift calls but sub-calls containing shift calls as well) will be compiled into a new ControlContext[A, B, C] object. This object stores 2 things:
    1. The closure itself, i.e. the rest of the execution (if the shift call happens inside a sub-method, the rest of the method's body; if it happens inside a toplevel continuation function passed to a reset, up to the end of it's body);
    2. A reference to the redirect function passed to the shift call.
    The essential feature of these objects is that they exibit monadic operations, i.e. they define methods that enable them to be chained. This is essentially what happens in the transformed toplevelContinuationFunction, when the return value of packContAndUseItIfNeeded (i.e. the continuation portion contained in that method after the shift call) is chained, a map calls, with another continuation one built out of the lines in toplevelContinuationFunction after the packContAndUseItIfNeeded call. You can see that, essentially, the cps-transform is about "packaging" code in chainable continuations instead of executing it directly (a task that, incidentally, is made easier because Scala has closures). You can also see that the number of continuations built is always the least possible (and that's why the cps transform is a selective one) as, after the last call to packContAndUseItIfNeeded, there's is only one continuation built out of 2 calls to l().
  2. reset is nothing more than a library function which call takes as an argument the chain of continuations just built, then simply starts executing the redirect function in the first continuation in the chain. The redirect function will then invoke the first continuation, which is able to hand control over to the next ones in the chain and execute up to the end of the toplevelContinuationFunction.
If you'd like to explore more about monads and understand how they relate to continuations, here's an amazing Monads are Elephants posts series by James Iry. You should also have a look at the definition of the cps-transform and of the Shift object in the original paper (also having a look at the type rules won't do any harm if you have the time to study them).

In the next (and last) post we'll explore a few variants of this sample that will enable you to
  1. Invoke more than once shift in the toplevel continuation function;
  2. Call the continuation on the result of another continuation call.

Sunday 30 August 2009

Care about the Data

Developing software is a hard and time-consuming task. We can certainly do much more in much less time than when assembly was the only option, but expectations and needs about software have grown proportionally, and perhaps even faster, than software development processes and tools themselves.
For this reason, improvements in productivity and cost-effectiveness of software development never seem to be enough and it's even more the case because smaller and smaller companies require, and expect to be able to afford with smaller and smaller budgets, the benefits that software and correct information management processes can bring to them in terms of effectiveness and competitive edge.
As a consequence, software development companies try hard to find ways to develop software in less time and with less (and less skilled) people. There are several ways to try to reach this goal, here are a few:
  • Stimulating staff growth, experience, speed and overall professional roundness
  • Studying, evaluating and adopting increasingly sophisticated tools (more and more expressive languages, frameworks, libraries; more mature development management tools and processes; ...), i.e. technology and expertise advance;
  • Relying on an asset of already built libraries or application templates for common cases;
  • Automation and software generation;
  • Of course, last but not least (and most frequent sometimes), trying to sell low quality, fast-hacked solutions as if they were much better than they are, relying on the substantial ignorance of the customer.
Let alone the fact that the latter is clearly a blind and short-termed strategy, there is some risk of quality loss in the other ones as well. This risk is clearly stimulated by software markets that are shaped out of the needs of companies short on budget and with scarce IT awareness (typically frequent, in my experience, with small oens just starting to let in IT) that will easily end up just choosing the cheapest solution.
Quality loss is very dangerous in short, medium and long terms and in every area but I'm perhaps most sensible about the data storage area. Why? Simply because:
  • Understandable and accessible data is easily the most valuable asset for many companies;
  • Application data survives the application itself and even application providers; in fact, any information can be considered legacy as soon as it is stored for use in a production environment;
  • Changing and sometimes even evolving persisted data structures is hard (more so if they're not yours);
  • Most often than not, application data must be easily accessed directly through storage tools usage by the customer itself for maintenance, fixes, queries, OLAP and BI. As there's always little time for docs and for instruction, the more the data structure is clean, clear and open to 3rd-party understanding, the better for anyone;
  • Even if all of the above is satisfied, you'll find yourself fiddling with customer data directly in the storage more often than you might think (and this means daily in most cases) because of bugs and planned / unplanned support or upgrades. You'd soon regret a poorly designed storage for this reason alone.
Another lesson to be learnt is: develop schema and data migration strategies as soon as possible in the development process. You'll soon be happy of having thought about it when you'll find you need a different storage solution for whatever reason (new performance requirements, new data access patterns, ...).

So, for the health of your business (and of yourself) here's my warmest and solicit suggestion: Take Big Care of the Data.

Thursday 27 August 2009

Control flows in Heaven with Scala 2.8 continuations - Part 1

I'm no expert by any means but I'm very interested in high-performance and high-scalability topics (who hasn't broken his neck on those issues too late in the development process?) and continuations are a very promising language concept in this area. They allow to suspend a program control flow and then to resume it when (or if ever) needed.
Yes, that's useful as, for example, you could write a program that processes an HTTP request, then renders a page to gather more info and "freezes" (in the sense that it suspends and releases its thread) while it waits for an answer, then resumes execution in a new request processing thread. You could say: "well, that's more easily done by blocking the thread until an answer arrives or by writing event-style code as any sane framework will force you to do".
The thing is:
  1. Event-style code can be difficult to read and mantain when it doesn't match your logic
  2. OS threads are an expensive resource, as many platforms won't let you have more than a few thousands alive at the same time in a system (you thought many more, didn't you?), so having one just to wait for an answer is an highly suboptimal approach as far as scalability is concerned
See this Apache Cocoon page for a more thorough explanation of this use case. You can easily see, anyway, that any concurrent request-processing software piece could benefit from them.

On a less immediately pragmatic side, continuations can be used to unify seemingly different control flow structures like exceptions or coroutines (which, as an aside, are very useful to write highly-scalable concurrent programs as they allow cooperative multi-threading using only one OS thread).

The Scala 2.8 language will feature a compiler plugin and a library-based DSL for composable continuations (there's a preview post at Scala's website and another very good post by Rich Dougherty here and more excellent continuation-related ones follow in his blog; the full-blown original paper is here).
Continuations can be thought of as values that store the (partial or total) rest of a control flow as it was at a given point in time. In Scala 2.8, they are functions built by the shift call, which always stores a partial remaining flow delimited by an outer (call-wise, not necessarily syntactically) reset call.

Since neither the JVM nor .NET provide primitives to build continuations (or not yet, see the Da Vinci Machine project), the Scala team has implemented them by using a program transformation technique called "Continuation-Passing Style (CPS) transform": they built a compiler plugins that rewrites select portions of a Scala program in such a way that continuations work as intended. This plugins also provides a typechecker / type inferrer extension, on which it relies to check the "reset outside shift" calls structure and to track the portions of the program that need the cps transform to be applied; very intuitively it assigns annotations to shift calls, it propagates them up the calls structure in the program and expects to find them back when checking reset calls.

Perhaps it's just me but the exact behaviour of these new primitives has escaped my full understanding at first, so I decided to drill a bit into the topic and to perform several small experiments to figure out what was happening.

In this post and the next ones I'll show a few Scala 2.8 examples I built that helped me in grasping gradually how exactly Scala continuations and their typechecking work. All of them are self-contained, show explicit annotated types and try to avoid any Scala syntactic sugar in order to make things more apparent and explicit. They also abound in tracing println statements and there are comments with numbers that track the control flow throughout the code.

First of all, how do you invoke shift and reset and what's happening when you do?
  • A reset call takes as its argument the toplevel continuation function, i.e. the code that will be suspended by some shift call (and will possibly be resumed later). The shift call must be performed at a certain point during the execution of the toplevel continuation function itself (possibly not directly in its body but at least somewhere in sub-calls), else the compiler plugin will be able to detect this anomaly and will complain.
  • A shift call builds an object (the continuation) that will "freeze" and store the rest of the control flow up to the end of the toplevel continuation function passed to the reset call. Control is then transferred immediately to the argument that we could name the "redirect" or "replacement" function. The redirect function will, of course, receive the continuation object as its argument and, after doing anything it wants, will have the option to call it to resume the enclosed control flow (and then do again anything it wants with the result of it). Please note that, unless the redirect function calls the continuation, the toplevel continuation function won't finish executing anymore and will just stay freezed until the continuation is garbage collected. The shift call ends when its redirect function returns. The reset call, instead, ends when the first shift call has returned. In case there are subsequent shift calls, they'll be part of the continuation built by a previous shift, which means they're not necessarily executed. But, if they are, the reset call returns when their redirect function has finished as well. This is an odd behaviour at first but, as you'll see, it makes perfectly sense.
The Example1 program below show, in a simplified procedural version (all types involved are Unit) how all of the above pieces fit and work together: you can see the reset call in main method on the toplevel continuation function. The latter doesn't call directly shift but invokes a couple of times a private metod that will take care of it insted. The shift call receives a redirect function as its argument, which uses the continuation parameter to resume the frozen flow. Please read it, follow and understand the control sequence by following number comments, make sure they make sense with the above explanation and then use the info in Rich Dougherty's post to compile and run it:


import scala.continuations._
import ControlContext._

object Example1 {
private def l( msg : String ) {
println( "[Thread '" + Thread.currentThread().getName() + "'] " + msg )
}

private def redirectFunction( continuation : Unit => Unit ) : Unit = {
l( "Entered redirectFunction, calling continuation" )

/* 3 */
/* 11 */

val r : Unit = /* <-- 19 */ /* <-- 22 */ continuation()

/* 20 */
/* 23 */

l( "Exiting redirectFunction, continuation called, returning ()" )

/* 21 */
/* 24 */

()
}

private def packContAndUseItIfNeeded() : Unit @cps[Unit, Unit] = {
l( "Entered packContAndUseItIfNeeded, doing shift" )

/* 2 */
/* 10 */

val r : Unit @cps[Unit, Unit] = /* <-- 4 */ /* <-- 12 */ shift( redirectFunction )

/* 5 */
/* 13 */

l(
"Exiting packContAndUseItIfNeeded, shift done, " +
"ACTUAL CONTINUATION (1 or 2) RUNNING! returning ()"
)

/* 6 */
/* 14 */

()
}

private def toplevelContinuationFunction() : Unit @cps[Unit, Unit] = {
l( "Entered reset and toplevelContinuationFunction, before first shift" )

/* 1 */

val r1 : Unit @cps[Unit, Unit] = /* <-- 7 */ packContAndUseItIfNeeded

/* 8 */

l(
"In toplevelContinuationFunction, after first packContAndUseItIfNeeded, " +
"ACTUAL CONTINUATION 1 RUNNING! Before second shift"
)

/* 9 */

val r2 : Unit @cps[Unit, Unit] = /* <-- 15 */ packContAndUseItIfNeeded

/* 16 */

l(
"In toplevelContinuationFunction, after second packContAndUseItIfNeeded, " +
"ACTUAL CONTINUATION 2 RUNNING!"
)

/* 17 */

l( "Exiting toplevelContinuationFunction and reset, returning ()" )

/* 18 */

()
}

def main( args: Array[String] ) : Unit = {
l( "Before reset" )

val x : Unit = /* <-- 25 */ reset( toplevelContinuationFunction )

l( "After reset, terminating" )
}
}



I really hope this first part has been useful to you. If you find any mistakes or misunderstandings please let me know and I'll fix them ASAP (plus I'll learn something new, thank you!).
In the next posts we'll see slightly different versions of the above example in order to better understand the continuations type-checking and type restrictions.

Wednesday 26 August 2009

Hardcoder for a living

Hello everybody!

I'm a CS graduate and a programmer with 7 years of experience on my shoulders, still trying to make a living of a passion like many others.
A living that (those many others can easily agree I think) is not always pleasant, unless you are masochistic enough to enjoy at least some of the following odd pleasures:
  • Data entry
  • Data fixing
  • Sysadmin (software, hardware and network if needed)
  • Legacy data import / export
  • Unstructured support
  • Coding at night for yesterday's needs
  • Coding at night for req. A from boss X or conflicting req. B from boss Y, provided they have understood the matter, or who-knows-what in case they haven't
  • In general, dealing with bosses having no clues about what "management", "structure", "basic planning", "KISS", "consolidation", "common sense", "development process", "organization", "instruments", "working environment", "precise", "analysis", "dealing with people", "cooperation" and "enterprise functions integration" mean. Let alone "IT" or "software development".
So, until / unless a better world (or a better Enterprise) comes in, for those of us that have broader and higher interests than the above there is no other solution than spending spare time trying to accomplish something that is simply worth the time.
OSS writing is a good one, but I think blogging (and, in general, building and publishing freely available content) is as well and is not this different. After all, software is running and usable form of knowledge that, in turn, requires technical knowledge and expertise to be written. So, I think, sharing self-built knowledge behind software building is as valuable as sharing software itself.
So stay tuned: there will be upcoming posts about any technology or CS concept I'm interested in and looking at in my spare time, which means a broad spectrum of topics and a broad spectrum of depths.

Looking forward to escaping (at least for the time of a post) the hardcoding life, enjoying the sharing, being useful and getting your feedbacks. What miracles can a simple blog accomplish.