Main | May 2005 »

April 28, 2005

Meeting the LABOS

Martin, Iulian an me met a delegation of the LABOS (Steven Dropsho and Willy Zwaenepoel) to discuss whether they share our interest in finding better ways to access databases by modifying a (modern / OO / functional) programming language.

For them, a database interface library or language is interesting if one of the following behaviours are better than what currently exists. But in all cases, a decrease in performance is a show-stopper.

  • Complexity of code is demonstrably reduced. But cleaner code with the same complexity is a no-show. Of course, in that case, they are not really interested in the problem itself but rather the result.
  • Performance (in an arbitrary definition) somehow increases. For large systems, and in particular distributed systems, performance becomes an overwhelming problem and offering something that solves or mitigates this would be very useful.
  • And a particular instance of performance that interest them particularly: make middle-ware programs simpler. That means that some of the processing done currently in the middle-ware program at run-time might be done at compile time by looking at the queries. An example of such processing is to find the intersection between queries (used for efficient locking of data). Tackling this problem as such is very specialised and not really in my field though.
In short, the part that interests them directly (pre-processing of queries for middle-ware) does not really intersect with what interests me. However, if we develop a system that can be used to do library-specific optimisations at compile-time, they might have some very interesting use cases to provide: a database library that optimise for a specific middle-ware system. Such a library has the potential to become very important in large distributed application programming.

April 22, 2005

Optimism is paying off (and other stories)

The first optimistic transaction library is now working. The results are rather satisfactory, but improvement is still needed. If you read on, some test results are described. This for-comprehension as a monad proved to be something that I had troubles to grasp, especially since all the different parts (for, flatMap, etc) are named as if they were only for the list comprehension. Anyway, it is now working.

I have also started implementing the library for the database access that will be optimised à la SLinks. Of course, it is quite uninteresting by itself without the changes to the compiler to support its optimisation. But there is a start for everything. Plenty of work remaining for this though.

Test results

Consider the following test environment. 100 threads all execute one transaction that will swap the value of two variables using one support variable (s:=x;x:=y;y:=s). The three used variables for every thread are randomly selected from a pool of 300 variables. That means that the different transactions share some data, but not too much. There are typically 70 transactions that share at least one variable with another transaction. During the execution of the swap function, the threads will each sleep for a randomly chosen time between 0 and 300 ms. This is of course only somewhat realistic (it might correspond to waiting for input or example) but I believe it is also a rather good indication of how well it might work in a multi-processor environment.

In this setting, the policy that locks down the whole system when doing a transaction takes a time around 16'000 ms in average. This is what one would expect as 100 threads times 150 ms (the average time a swap waits) is 15'000 ms plus the locking overhead. On the other hand, the optimistic policy is much faster: around 1000 ms (15 times less). The number of aborts are quite high, but it doesn't seem to be such a problem when compared to the other policy. I did not measure directly the average number of aborts per thread, but I tried to estimate it differently: If the number of aborts allowed for a given transaction is limited to 50, 2-5 percents of transactions will fail (that is, they would have aborted more than 50 times). With a abort limit of 100, the failure of a transaction is too rare to be detected (significantly less than 1%).

Of course, in unfavourable cases, the optimistic algorithm becomes very bad. The good news however is that the optimistic algorithm remains at least as good as the pessimistic one ... until it starts to completely wreak havoc on the system: since the abort mechanism is recursive with no tail call, it starts using up huge amounts of memory and the threads fail because they run out of memory. I will have to improve this.

April 20, 2005

Transactions

I have been working (with Burak's help) on writing a simple library providing atomic transactions in Scala. To do this, the for comprehension is diverted from its original use to be considered as a monad instead. In practice, here is how an atomic exchange of the value of two variables would look like:

val x = new atomic.Variable[Int];
val y = new atomic.Variable[Int];
val z = new atomic.Variable[Int];
x.value = 1; y.value = 2;
(for (
  val a <- x.get();
  val _ <- z.put(a);
  val b <- y.get();
  val _ <- x.put(b);
  val c <- z.get();
  val d <- y.put(c);
) yield d) run

The problem now is that the for loop as a monad is something that I find very difficult to get an intuition about. I have a very simple version that locks down the entire program when evaluating the atomic bloc, but this is of course hopelessly inefficient. But when I try to improve from this basic version, I am having troubles grasping how things interact around this comprehension/monad, and getting the right intuition on how to do it. Please, read on for a small explanation on the locking mechanisms I intend to add.

Both locking mechanism below should considerably improve the performance when compared with the first one. But both are optimistic so could actually also decrease performance in the worst case.

  • Locking variables. With this method, the locking would be done on the variable itself. A variable must be locked as soon as it is read or written to guarantee a consistent state during the entire transaction. When a variable that is already locked by another transaction is reached, the transaction is aborted and restarted to prevent deadlocks. Restarting means reverting all variables that have been changed to their pre-change value. The easiest here is to write all changes to a buffer in the variable and commit the buffers when the transaction is completed or leave them alone when the transaction is aborted.
  • Versioning variables. When a variable is read, its version number is remembered. When a variable is written, its version number is incremented and a function that will undo the write is generated (this is not easy). At the end, the used variables are locked and their current version numbers are compared with the version numbers remembered. If they concur, the transaction is confirmed. If they don't, it means another thread has been messing with the variables; the transaction is restarted after the undo functions are applied to the variables.

April 12, 2005

A world of weblogs

One more weblog on the Internet, what a deal. But this one is special: it speaks about me! On this blog, I will post thoughts and ideas about my work for my PhD, or any other interesting computer science things I might stumble upon during the day.

Very shortly: I started my PhD at the laboratoire des méthodes de programmation (LAMP) at the EPFL in Lausanne under the supervision of Prof. Martin Odersky on the fourth of April 2005. Currently, I intend to work on how to support relational database querying in a completely transparent (that is without SQL) yet efficient (that is with SQL) way in modern programming languages. The project might quickly extend to include more general querying for a variety of data repository systems (who said XQuery?). The modern programming language that will be used as a reference here is the LAMP's very own language, Scala.