<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Gilles&apos; PhD</title>
    <link rel="alternate" type="text/html" href="http://www.dubochet.ch/gilles/blogs/phd/" />
    <link rel="self" type="application/atom+xml" href="http://www.dubochet.ch/gilles/blogs/phd/atom.xml" />
   <id>tag:www.dubochet.ch,2006:/gilles/blogs/phd//2</id>
    <link rel="service.post" type="application/atom+xml" href="http://www.dubochet.ch/movabletype/mt-atom.cgiweblog/blog_id=2" title="Gilles' PhD" />
    <updated>2006-07-04T17:33:21Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type  3.2b3</generator>
 
<entry>
    <title>DBC Mark 1</title>
    <link rel="alternate" type="text/html" href="http://www.dubochet.ch/gilles/blogs/phd/2005/07/07/dbc_mark_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.dubochet.ch/movabletype/mt-atom.cgiweblog/blog_id=2/entry_id=106" title="DBC Mark 1" />
    <id>tag:www.dubochet.ch,2005:/gilles/blogs/phd//2.106</id>
    
    <published>2005-07-07T09:50:28Z</published>
    <updated>2006-07-04T17:33:21Z</updated>
    
    <summary><![CDATA[ScalaDBC Mark 1 is now finished. I have added it to the webapp CVS server. It is tested, but not very thoroughly. S&eacute;bastien N. will hopefully start testing it in a real-life application as soon as he comes back from...]]></summary>
    <author>
        <name>Gilles Dubochet</name>
        <uri>http://www.dubochet.ch/gilles/</uri>
    </author>
            <category term="Status" />
    
    <content type="html" xml:lang="en" xml:base="http://www.dubochet.ch/gilles/blogs/phd/">
        <![CDATA[<p>ScalaDBC Mark 1 is now finished. I have added it to the webapp CVS server. It is tested, but not very thoroughly. S&eacute;bastien N. will hopefully start testing it in a real-life application as soon as he comes back from vacation. Please, read the extended entry for some comments about why a Mark 2 version will be needed.</p>]]>
        <![CDATA[<p>There are two things that ScalaDBC does not as well as what it ought to be doing. I believe both have to do with the expressiveness of Scala, but to be totally honest, I am not quite sure: Scala is complex and there might be a satisfying solution for these problems that I did not find.
</p>
<ul>
<li>Static type checking is not good enough. When declaring a query, there is implicit type information attached. For example, the query <code>select fields ("a" of boolean, "b" of varchar) from "table"</code> gives information about what type fields "a" and "b" have ("boolean" and "varchar" are instances of classes that define the native Scala type for this field as type members). However, the query in the above code does not contain the type information in a form exploitable by the compiler. It would need to be given as type parameters, but the number of fields, and therefore of types, is arbitrary, and Scala does not support classes with an arbitrary number of type parameters. Tuple or Function classes try to simulate it by declaring multiple such classes (Tuple1, Tuple2, etc.) but with a relation, the number of fields can be quite large and declaring, say, 100 classes for all different sizes would be very impractical. Martin proposed it might be an idea to support a mechanism to generate on request a relation (or tuple or function) of the right size, but this is not planned and would not be trivial to do. This problem of type safety will also be true for ScalaDALL, so this remains very much an open problem.</li>
<li>Automatic type conversions using views is not broad enough. Bug #449 (or a variant of it) prevents the conversion of all the required values. In particular, it is not possible to view a value of type integer (SQL) as a value of type long (Scala) for example, even though it would be perfectly legal. I hope it will be possible to improve this in NSC with the new view system (based on "implicit"). Another sad thing is that views are only one level deep. Of course, multi-level views would be quite a different problem altogether: Cyclicality for one thing would probably be very difficult to manage, and I am not even sure the problem is solvable. It would also make development very unpredictable, but quite a bit more powerful. In practice, this means that if a field is a value, and a value an int, for example, a field is not an int except if this is made explicit by adding a direct view from one to the other.</li>
</ul>]]>
    </content>
</entry>
<entry>
    <title>ScalaDBC, ScalaDALL and al.</title>
    <link rel="alternate" type="text/html" href="http://www.dubochet.ch/gilles/blogs/phd/2005/06/27/scaladbc_scaladall_and_al.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.dubochet.ch/movabletype/mt-atom.cgiweblog/blog_id=2/entry_id=105" title="ScalaDBC, ScalaDALL and al." />
    <id>tag:www.dubochet.ch,2005:/gilles/blogs/phd//2.105</id>
    
    <published>2005-06-27T17:10:20Z</published>
    <updated>2006-07-04T17:33:00Z</updated>
    
    <summary>This blog has been a little forsaken. But with this heat-weave over Europe, it is the right time to bring it back to life (all right, that isn&apos;t a reason). So here is my current status: ScalaDBC, the generic database...</summary>
    <author>
        <name>Gilles Dubochet</name>
        <uri>http://www.dubochet.ch/gilles/</uri>
    </author>
            <category term="Status" />
    
    <content type="html" xml:lang="en" xml:base="http://www.dubochet.ch/gilles/blogs/phd/">
        <![CDATA[<p>This blog has been a little forsaken. But with this heat-weave over Europe, it is the right time to bring it back to life (all right, that isn't a reason). So here is my current status:</p>

<p>ScalaDBC, the generic database library for Scala is more or less working. Well, actually it is working as far as I can tell, but I haven't put it through any serious testing (special cases etc.). There are two things that need to be finished though:<ol><li>Automatic type conversion of database types to native Scala types using views. Bug #449 is a problem here, but I can't quite understand how and why.</li><li>A factory (and I mean a <a href="http://supergrass.densitron.net/diary4/berlin/wolfsburg.JPG">real hard-core factory</a>) to write statement (queries) AST in a nice way. This is already partially working (one can write "selectBag fields ("a", "b" as "c") from ("a" join "b")") but it isn't supporting quite enough of the SQL standard to be useful yet. Currently, I'm stuck with a bug that might be a Scala bug, but I still have to track it down to be sure.</li></ol></p>

<p>ScalaDALL is the name I am giving to the database library using for-comprehensions and specific optimisations that is the next task after ScalaDBC. I have started some exploratory work for this as I was really getting frustrated with the bugs that where hindering the advance of ScalaDBC. But there isn't anything really working yet.</p>

<p>Otherwise, I also did a little poster that you can find next to my office's door.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Better transactions</title>
    <link rel="alternate" type="text/html" href="http://www.dubochet.ch/gilles/blogs/phd/2005/05/09/better_transactions.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.dubochet.ch/movabletype/mt-atom.cgiweblog/blog_id=2/entry_id=104" title="Better transactions" />
    <id>tag:www.dubochet.ch,2005:/gilles/blogs/phd//2.104</id>
    
    <published>2005-05-09T10:15:32Z</published>
    <updated>2006-07-04T17:32:28Z</updated>
    
    <summary>I have improved the retry mechanism for my simple transaction library. There are two improvements: Firstly, memory usage is now constant with relation to the number of retries. Second, the time that a thread will wait until it retries is...</summary>
    <author>
        <name>Gilles Dubochet</name>
        <uri>http://www.dubochet.ch/gilles/</uri>
    </author>
            <category term="Status" />
    
    <content type="html" xml:lang="en" xml:base="http://www.dubochet.ch/gilles/blogs/phd/">
        <![CDATA[<p>I have improved the retry mechanism for my simple transaction library. There are two improvements: Firstly, memory usage is now constant with relation to the number of retries. Second, the time that a thread will wait until it retries is now calculated in a smarter way: the more a thread has retried getting the lock, the more it will wait until it tries again.</p>

<p>The current version of the library is useable, but there are at least two things that I think should be improved if one wants to use it in real applications:<ul><li>Allow more user control on the way the wait time before a retry is calculated &mdash; as this is key to the performance one will get. This could be done by supporting "pluggable wait policy" for example.</li><li>Implement an even more optimistic policy that does not lock variables at all, but instead calculated the entire locked section as if it was alone, and only once this is done, just before committing, test whether some other thread was messing around with the data.</li></ul></p>]]>
        
    </content>
</entry>
<entry>
    <title>ScalaDBC: first results</title>
    <link rel="alternate" type="text/html" href="http://www.dubochet.ch/gilles/blogs/phd/2005/05/05/scaladbc_first_results.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.dubochet.ch/movabletype/mt-atom.cgiweblog/blog_id=2/entry_id=103" title="ScalaDBC: first results" />
    <id>tag:www.dubochet.ch,2005:/gilles/blogs/phd//2.103</id>
    
    <published>2005-05-05T10:27:37Z</published>
    <updated>2006-07-04T17:31:47Z</updated>
    
    <summary>I have obtained the first data from a database through ScalaDBC. Nothing extraordinary yet, but at least, something is working. I will now expand ScalaDBC&apos;s capabilities (it is very limited for now) and smooth all edges that are still a...</summary>
    <author>
        <name>Gilles Dubochet</name>
        <uri>http://www.dubochet.ch/gilles/</uri>
    </author>
            <category term="Status" />
    
    <content type="html" xml:lang="en" xml:base="http://www.dubochet.ch/gilles/blogs/phd/">
        <![CDATA[<p>I have obtained the first data from a database through ScalaDBC. Nothing extraordinary yet, but at least, something is working. I will now expand ScalaDBC's capabilities (it is very limited for now) and smooth all edges that are still a bit rough (and there are plenty of them).</p>]]>
        
    </content>
</entry>
<entry>
    <title>Meeting the LABOS</title>
    <link rel="alternate" type="text/html" href="http://www.dubochet.ch/gilles/blogs/phd/2005/04/28/meeting_the_labos.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.dubochet.ch/movabletype/mt-atom.cgiweblog/blog_id=2/entry_id=102" title="Meeting the LABOS" />
    <id>tag:www.dubochet.ch,2005:/gilles/blogs/phd//2.102</id>
    
    <published>2005-04-28T17:20:05Z</published>
    <updated>2006-07-04T17:31:31Z</updated>
    
    <summary><![CDATA[Martin, Iulian an me met a delegation of the LABOS (Steven Dropsho and Willy Zwaenepoel) to discuss whether they share our interest in finding better ways to access databases by modifying a (modern&nbsp;/ OO&nbsp;/ functional) programming language. For them, a...]]></summary>
    <author>
        <name>Gilles Dubochet</name>
        <uri>http://www.dubochet.ch/gilles/</uri>
    </author>
            <category term="Querying" />
    
    <content type="html" xml:lang="en" xml:base="http://www.dubochet.ch/gilles/blogs/phd/">
        <![CDATA[<p>Martin, Iulian an me met a delegation of the LABOS (Steven Dropsho and Willy Zwaenepoel) to discuss whether they share our interest in finding better ways to access databases by modifying a (modern&nbsp;/ OO&nbsp;/ functional) programming language.</p>

<p>For them, a database interface library or language is interesting if one of the following behaviours are better than what currently exists. But in all cases, a decrease in performance is a show-stopper.<ul><li>Complexity of code is demonstrably reduced. But cleaner code with the same complexity is a no-show. Of course, in that case, they are not really interested in the problem itself but rather the result.</li><li>Performance (in an arbitrary definition) somehow increases. For large systems, and in particular distributed systems, performance becomes an overwhelming problem and offering something that solves or mitigates this would be very useful.</li><li>And a particular instance of performance that interest them particularly: make middle-ware programs simpler. That means that some of the processing done currently in the middle-ware program at run-time might be done at compile time by looking at the queries. An example of such processing is to find the intersection between queries (used for efficient locking of data). Tackling this problem as such is very specialised and not really in my field though.</li></ul>In short, the part that interests them directly (pre-processing of queries for middle-ware) does not really intersect with what interests me. However, if we develop a system that can be used to do library-specific optimisations at compile-time, they might have some very interesting use cases to provide: a database library that optimise for a specific middle-ware system. Such a library has the potential to become very important in large distributed application programming.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Optimism is paying off (and other stories)</title>
    <link rel="alternate" type="text/html" href="http://www.dubochet.ch/gilles/blogs/phd/2005/04/22/optimism_is_paying_off_and_oth.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.dubochet.ch/movabletype/mt-atom.cgiweblog/blog_id=2/entry_id=101" title="Optimism is paying off (and other stories)" />
    <id>tag:www.dubochet.ch,2005:/gilles/blogs/phd//2.101</id>
    
    <published>2005-04-22T18:02:42Z</published>
    <updated>2005-08-10T17:51:16Z</updated>
    
    <summary>The first optimistic transaction library is now working. The results are rather satisfactory, but improvement is still needed. If you read on, some test results are described. This for-comprehension as a monad proved to be something that I had troubles...</summary>
    <author>
        <name>Gilles Dubochet</name>
        <uri>http://www.dubochet.ch/gilles/</uri>
    </author>
            <category term="Status" />
    
    <content type="html" xml:lang="en" xml:base="http://www.dubochet.ch/gilles/blogs/phd/">
        <![CDATA[<p>The first optimistic transaction library is now working. The results are rather satisfactory, but improvement is still needed. If you read on, some test results are described. This for-comprehension as a monad proved to be something that I had troubles to grasp, especially since all the different parts (<code>for</code>, <code>flatMap</code>, etc) are named as if they were only for the list comprehension. Anyway, it is now working. </p>

<p>I have also started implementing the library for the database access that will be optimised &agrave; la SLinks. Of course, it is quite uninteresting by itself without the changes to the compiler to support its optimisation. But there is a start for everything. Plenty of work remaining for this though.</p>]]>
        <![CDATA[<h2>Test results</h2>

<p>Consider the following test environment. 100 threads all execute one transaction that will swap the value of two variables using one support variable (s:=x;x:=y;y:=s). The three used variables for every thread are randomly selected from a pool of 300 variables. That means that the different transactions share some data, but not too much. There are typically 70 transactions that share at least one variable with another transaction. During the execution of the swap function, the threads will each sleep for a randomly chosen time between 0 and 300 ms. This is of course only somewhat realistic (it might correspond to waiting for input or example) but I believe it is also a rather good indication of how well it might work in a multi-processor environment.</p>

<p>In this setting, the policy that locks down the whole system when doing a transaction takes a time around 16'000 ms in average. This is what one would expect as 100 threads times 150 ms (the average time a swap waits) is 15'000 ms plus the locking overhead. On the other hand, the optimistic policy is much faster: around 1000 ms (15 times less). The number of aborts are quite high, but it doesn't seem to be such a problem when compared to the other policy. I did not measure directly the average number of aborts per thread, but I tried to estimate it differently: If the number of aborts allowed for a given transaction is limited to 50, 2-5 percents of transactions will fail (that is, they would have aborted more than 50 times). With a abort limit of 100, the failure of a transaction is too rare to be detected (significantly less than 1%).</p>

<p>Of course, in unfavourable cases, the optimistic algorithm becomes very bad. The good news however is that the optimistic algorithm remains at least as good as the pessimistic one ... until it starts to completely wreak havoc on the system: since the abort mechanism is recursive with no tail call, it starts using up huge amounts of memory and the threads fail because they run out of memory. I will have to improve this.</p>]]>
    </content>
</entry>
<entry>
    <title>Transactions</title>
    <link rel="alternate" type="text/html" href="http://www.dubochet.ch/gilles/blogs/phd/2005/04/20/transactions.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.dubochet.ch/movabletype/mt-atom.cgiweblog/blog_id=2/entry_id=100" title="Transactions" />
    <id>tag:www.dubochet.ch,2005:/gilles/blogs/phd//2.100</id>
    
    <published>2005-04-20T11:23:05Z</published>
    <updated>2006-07-04T17:29:00Z</updated>
    
    <summary>I have been working (with Burak&apos;s help) on writing a simple library providing atomic transactions in Scala. To do this, the for comprehension is diverted from its original use to be considered as a monad instead. In practice, here is...</summary>
    <author>
        <name>Gilles Dubochet</name>
        <uri>http://www.dubochet.ch/gilles/</uri>
    </author>
            <category term="Status" />
    
    <content type="html" xml:lang="en" xml:base="http://www.dubochet.ch/gilles/blogs/phd/">
        <![CDATA[<p>I have been working (with Burak's help) on writing a simple library providing atomic transactions in Scala. To do this, the for comprehension is diverted from its original use to be considered as a monad instead. In practice, here is how an atomic exchange of the value of two variables would look like:</p><pre>val x = new atomic.Variable[Int];
val y = new atomic.Variable[Int];
val z = new atomic.Variable[Int];
x.value = 1; y.value = 2;
(for (
  val a <- x.get();
  val _ <- z.put(a);
  val b <- y.get();
  val _ <- x.put(b);
  val c <- z.get();
  val d <- y.put(c);
) yield d) run
</pre><p>
The problem now is that the for loop as a monad is something that I find very difficult to get an intuition about. I have a very simple version that locks down the entire program when evaluating the atomic bloc, but this is of course hopelessly inefficient. But when I try to improve from this basic version, I am having troubles grasping how things interact around this comprehension/monad, and getting the right intuition on how to do it. Please, read on for a small explanation on the locking mechanisms I intend to add.</p>]]>
        <![CDATA[<p>Both locking mechanism below should considerably improve the performance when compared with the first one. But both are optimistic so could actually also decrease performance in the worst case.</p><ul>
<li>Locking variables. With this method, the locking would be done on the variable itself. A variable must be locked as soon as it is read or written to guarantee a consistent state during the entire transaction. When a variable that is already locked by another transaction is reached, the transaction is aborted and restarted to prevent deadlocks. Restarting means reverting all variables that have been changed to their pre-change value. The easiest here is to write all changes to a buffer in the variable and commit the buffers when the transaction is completed or leave them alone when the transaction is aborted.</li>
<li>Versioning variables. When a variable is read, its version number is remembered. When a variable is written, its version number is incremented and a function that will undo the write is generated (this is not easy). At the end, the used variables are locked and their current version numbers are compared with the version numbers remembered. If they concur, the transaction is confirmed. If they don't, it means another thread has been messing with the variables; the transaction is restarted after the undo functions are applied to the variables.</li>
</ul>]]>
    </content>
</entry>
<entry>
    <title>A world of weblogs</title>
    <link rel="alternate" type="text/html" href="http://www.dubochet.ch/gilles/blogs/phd/2005/04/12/a_world_of_weblogs.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.dubochet.ch/movabletype/mt-atom.cgiweblog/blog_id=2/entry_id=99" title="A world of weblogs" />
    <id>tag:www.dubochet.ch,2005:/gilles/blogs/phd//2.99</id>
    
    <published>2005-04-12T18:25:58Z</published>
    <updated>2006-07-04T17:28:39Z</updated>
    
    <summary>I started my PhD at the LAMP. My current research domain is tranparent database querying in a programming language.</summary>
    <author>
        <name>Gilles Dubochet</name>
        <uri>http://www.dubochet.ch/gilles/</uri>
    </author>
            <category term="Anything" />
    
    <content type="html" xml:lang="en" xml:base="http://www.dubochet.ch/gilles/blogs/phd/">
        <![CDATA[<p>One more weblog on the Internet, what a deal. But this one is special: it speaks about me! On this blog, I will post thoughts and ideas about my work for my PhD, or any other interesting computer science things I might stumble upon during the day.</p>

<p>Very shortly: I started my PhD at the <a href="http://lampwww.epfl.ch/">laboratoire des m&eacute;thodes de programmation</a> (LAMP) at the <a href="http://www.epfl.ch/">EPFL</a> in Lausanne under the supervision of Prof. Martin Odersky on the fourth of April 2005. Currently, I intend to work on how to support relational database querying in a completely transparent (that is without SQL) yet efficient (that is with SQL) way in modern programming languages. The project might quickly extend to include more general querying for a variety of data repository systems (who said XQuery?). The modern programming language that will be used as a reference here is the LAMP's very own language, <a href="http://scala.epfl.ch/">Scala</a>.</p>]]>
        
    </content>
</entry>

</feed> 

