« October 2004 | Main | December 2004 »

November 30, 2004

Pushing selections

After correcting a bug with the pattern matching (due to the restriction of the built-in ocaml parser to LR(1) syntax), and a bug with the projection optimisation, I continued working on the selection push optimisation. I believe I understand the principle in detail now, but haven't finished implementing it yet.

November 29, 2004

Looking at Kleisli again

I tried working on the other SQL optimisations (selection and joint optimisations). To do this, I wanted to see how Kleisli was doing it. I ended up trying (unsuccessfully) to compile Kleisli and reading tons of code: I am not sure I understand better though. I did however start to understand some parts of the general architecture of Kleisli.

I also noticed that the projection optimisation I had implemented had a sad limitation: the optimiser doesn't take into account functions applied to an 'iteration variable' (a variable defined from a generator expression in a comprehension). This means that if this expression is optimised [bag x.#a | ^x <bag table ... from db] this one isn't [bag (fun ^x->x.#a)(x) | ^x <bag table ... from db]. I will have to think about the best way to solve that.

November 26, 2004

Projection optimisation

I had promised yesterday not to post anything until I have something interesting to report. But since it would be sad to break the uninterrupted row of entries I have been producing for two month, and since I have something to report, here it is:

Projections are now optimised into SQL queries. That means that the system will only query the database for the columns it will actually really need. Read on for an example. Before that, I also added another optimisation to try out the optimisation mechanism: remove 'let' operators that declare variables that won't be used. But this will probably disappear from the final version since it is not an acceptable optimisation if there are side-effects, and for an interpreter, might even be counter-productive.

I have also improved the syntactic sugar module because it was producing useless 'let', that were simply renaming variables, when handling pattern matching.

Here is an example of projection optimisation (in SLinks' verbose debug mode):
? [set x.#a | ^x <bag (table "tata" with [bag {#a:pre(int),#b:pre(string),#c:pre(float)}] from db)];;
  Raw: (for x <bag table '(SELECT tata.a AS a, tata.b AS b, tata.c AS c FROM tata WHERE true)' from db in [set (let {#a=$gv3$|$gv4$} = x in $gv3$)])
  Opt: (for x <bag table '(SELECT tata.a AS a FROM tata WHERE true)' from db in [set (let {#a=$gv3$|$gv4$} = x in $gv3$)])
  Is: [set 1, 2, 3] : [set int]

'Raw' is a string representation of the AST for the expression. One can see the effect of syntactic sugar: comprehensions are transformed into simpler 'for' expressions and the dot notation becomes a record selection operation. 'Opt' is the AST after it has been optimised. Notice that the query now only fetches the 'a' column that will be needed later.

November 25, 2004

Optimality

Since I will be working on the optimisation for a little while, I will only post messages when something interesting happens.

November 24, 2004

Back to Limsoon

I started working on the optimisation part. I re-read some of the material about optimisation in Kleisli by Limsoon Wong, and did some preparatory work for the implementation.

I also had to re-organise the module structure, because the original structure was no longer coping very well with the increased size and complexity of the program. Code for the database was in the AST module (because of unhealthy dependencies) and horrible things like that. For now, everything is neatly in it's place, and it should allow clean addition of optimisers.

November 23, 2004

Queries executed

It is now possible to query (in a very simple sense) a database from within a SLinks program. Please read on for more details. Tomorrow, I will start with the optimisation part.

Here is how a database-enabled session looks (notice that the developer has to give the type of the table):
? def ^conn = {#name="gilles",#user="gilles"};;
  Defined conn as {#name="gilles",#user="gilles"} : {#name:pre(string), #user:pre(string)}
? def ^db = database conn;;
  Defined db as (database gilles from gilles@localhost:5432) : database
? table "tata" with [bag {#a:pre(int),#b:pre(string),#c:pre(float)}] from db;;
  [bag {#a=1,#b="cool",#c=2.2}, {#a=2,#b="super",#c=3.5}] : [bag {#a:pre(int), #b:pre(string), #c:pre(float)}]

The 'table' operator really is a hidden 'SELECT' statement, in this case 'SELECT a, b, c FROM tata'. By 'optimisation part', I mean completing this statement with conditions from SLinks comprehensions applied to it.

Type safety is only guaranteed if the type of the table provided by the developer is correct. Testing it during type inference would be nice (at least for an interpreter) but is problematic because one needs to have the database connection already, which is not the case during type inference. I haven't found a good way to solve this yet (currently, a runtime error is raised when there is a type incompatibility between the developer-provided type and the table).

November 22, 2004

Executing queries II

I continued working on the execution of SQL queries. I am trying to have a reasonably 'neat' implementation of the database part, which takes a little time. I expect to have it working tomorrow.

After that, I will start with the optimisation part, first with single expression optimisation (like in Kleisli). That means that a query will always be executed before the end of an expression: a value defined as the result of a query will be a list of records, not the query itself, even if executing it later might yield to further optimisations.

Later in this project, it might be interesting to look into lazy evaluation of queries.

November 19, 2004

Executing queries

Today, I continued working on the execution of SQL queries. There is still some work to be done.

November 18, 2004

Exchanging recursive types for a bug

After all it seems that it wasn't the type's lack of expressiveness that explained the wrong typing described yesterday. Actually, the expression 'defrec ^f = fun ^x -> case x of <#a=^a> in f(a) or <#b=^b> in b' should not be accepted by the type checker: it was really a bug, as Prof. Wadler pointed it out. Of course, this means one can't use recursive functions to create or parse data structures such as trees anymore, and that the solution drafted Tuesday for the expression problem wouldn't work: recursive types would be needed for that.

But for now, it is enough to have the simple non-recursive types. So I fixed this new bug (it was a problem with the type inference rule I was using for letrec in algorithm W, the expression above now raises a type error), and started working on a way to make SLinks execute SQL queries.

November 17, 2004

Recursive types

I added the 'case' without 'otherwise' operator to the interpreter and managed to get the Postgres library for O'Caml to work.

I also discovered I have a problem with the SLinks type inference algorithm: I believe the current type representation is not powerful enough to give a safe type to recursive functions. Unfortunately, I was not careful enough when adding recursive functions to the language and only tested (and thought) it with recursive functions that returned single values, and not structures built by the recursive function. Here is an example of the problem:
defrec ^f = fun ^x -> case x of <#a=^a> in f(a) or <#b=^b> in b;
Receives type: (<#a:pre('1), #b:pre('2)> -> '2). Of course, type variable '1 should really be <#a:pre('1), #b:pre('2)> where '1 should really be ... and so this f function is not type safe:
f(<#a=5>);
Leads to a fatal exception at runtime.

November 16, 2004

Expression problem

I finished with the syntactic sugar mentioned yesterday. From last week's list, only the profiling and comprehension problem are remaining.

I started looking into the comprehension problem. I still have to think about it, because the first solution I had found is not type safe. I have the impression that to solve it I will need a 'case' operator without the 'otherwise' clause, that I don't have right now. To understand why, please read on.

To start with, here is the current algorithm:

This function evaluates an expression with only the 'constant' case (represented as variants like <#c=value>):
def ^eval = fun ^exp -> case exp of <#c=^c> in c | _ in 0;

This function extends the eval function to supports the 'plus' case in expressions (represented as variants like <#p={#l=expression1,#r=expression2}>), that is adds a 'row':
defrec ^eval2 = fun ^exp -> case exp of <#p={#l=^l,#r=^r}> in (eval2(l))+(eval2(r)) | ^e in eval(e);

This function adds print string capabilities to the expression, that is adds a 'column':
defrec ^show2 = fun ^exp -> case exp of <#p={#l=^l,#r=^r}> in "("&show2(l)&"+"&show2(r)&")" or <#c=^c> in string_of_int(c) | _ in "";

The problem is that expressions like 'eval(<#p={#l=<#c=5>,#r=4}>);' are not detected as wrong by the typer (because eval produces a completely incorrect dummy value in case the variant is not of the right type instead of forbidding it). If eval would forbid anything else than the constant value for the variant, I believe everything would be great and working. I will think about this in more detail (and implement the case without otherwise if it makes sense to do so) tomorrow.

November 15, 2004

Conditional pattern matching

Et voilà, pattern matching is available everywhere it makes sense. Comprehensions support 'conditional pattern matching' where the pattern contains non-binding variables, constants or variants that are used to filter the values. This allows nice code like this (using a constant as the filter in this case):

? def ^the_list = [lst {#x=1,#y=true}, {#x=2,#y=true}, {#x=3,#y=false}, {#x=4,#y=true}];
  Defined the_list : [lst {#y:pre(bool), #x:pre(int)}]
? [lst x | {#x=^x,#y=true} <lst the_list];
  [lst 1, 2, 4] : [lst int]

I must admit that getting this conditional pattern matching to work, even though it was only syntactic sugar, was a bit more difficult than I had thought. Anyway, it is working now. Letrec does not support pattern matching because I didn't find any way of obtaining it as syntactic sugar from the basic let operator (actually, I believe it is impossible). I am not the only one: O'Caml doesn't support it either (It says 'Only variables are allowed as left-hand side of `let rec'').

From last week's list, two big points are remaining: profiling the software to understand why it is slow for lists (I got an answer from an O'Caml guy that told me he would be working on the bug that prevents me from doing that) and looking up the expression problem. A few smaller points also need to be done (dot notation, unlabelled tuples) but I should be able to finish them quickly tomorrow morning.

November 12, 2004

Pattern matching

I continued working on the lists of things to do. In particular, the pattern matching is starting to work. There is still a little bit to do, but the basic infrastructure is in place. Lovely expressions like def ^mul = (fun {#a=^x,#b=^y} -> x*y);; now work. When playing with this new feature, I noticed however that using a hat for the binding of variables has a little drawback: on 'continental' keyboards (like a swiss-french one) you end up typing 'îd' more often than '^id', because the input method 'folds' a hat with the next character.

November 11, 2004

Not all bugs are mine

I continued working on Tuesday's list of things to do. Unfortunately, I spent a large part of the day working on a bug ... of O'Caml (I think). The profiler seems to be broken when handling some codes generated by the parser/lexer generator. I have filed a bug report for it: hopefully they will fix it reasonably quickly so that I can profile my code.

Philip Wadler proposed a solution for adding pattern matching as syntactic sugar, which is much easier than the solution I outlined yesterday.

November 10, 2004

Things to improve

Today I started working on adding pattern matching to the SLinks interpreter. I actually only need to add pattern matching to the function operator, the rest can be obtained by syntactic sugar. However, I still have to add quite a lot of code, and in particular, I have to write a W-like algorithm specially for patterns. The unification and other type algorithm however will not need modifications, I believe.

And here is the list of things to improve mentioned in yesterday's post:

  • Print types correctly (with parenthesis)
  • Correct type system for missing absent field
  • Disallow set to list and bag to list operators
  • Order of fields in records must be preseved
  • Add syntactic sugar for f(x,y,z) as f(x)(y)(z) and (a,b,c) -> d as a -> b -> c -> d
  • Change semantic of 'ext' operator to for x <- s in e
  • Add '^' in front of all binding variables
  • Add syntactic sugar for dot notation
  • Add syntactic sugar for tuples as records with labels '#1', '#2' etc.
  • Find out why lists are so slow (if it is an algorithmic problem, or a faulty implementation)
  • Add support for pattern matching
  • Look up the expression problem and try to encode a solution in SLinks

November 09, 2004

Softening the edges

I started working on the list of things to improve that has been decided with Philip Wadler today. I got rid of the easy things. I also tried to get the profiler to work: but for now there seems to be a mood incompatibility between the profiler and the lexer or parser: strange errors arise when using the profiler that do not exist otherwise.

November 08, 2004

Recursive function declarations

Today, I added the recursive let operator to my interpreter. Unfortunately, typing 'let rec' for the most general case is not trivial: actually, it has been proved undecidable for normal Hindley-Milner types (see this article). In O'Caml, the 'let rec' operator is permitted only for 'definitions where the defined names occur only inside function bodies or as argument to a data constructor' as the documentation states it; and I believe it is in reality only allowed if the function or the data constructor is the root operator in the definition. My current implementation supports a subset of this, which is the recursive definition of functions. I suppose I might add support for data constructors but I didn't think it was a priority: I couldn't find any example where using a data constructor in a 'let rec' with O'Caml's limitations was useful, so I guess one might live without it (if anyone has an example where data constructors in 'let rec' are really great, let me know).

I also tried again to get ODBC working on my computer, still with no success. And lastly, keeping the best for the end, I finally received the source code for Kleisli. I quickly looked at it, but considering the complexity of the system (a quick estimation of the number of files puts it at least at 400) it will be a while before I can get anything out of it.

November 05, 2004

ODBC

I tried setting up ODBC on my computer all day ... without success. All this ODBC is such a mess: poor documentation, buggy programs (all GUIs I encountered fail miserably at even running), etc. I also tried using a native O'Caml connector to PostgreSQL (installing the database was easy, for once), but I didn't succeed (yet) there either. I really hope to get this mess running quickly: it is so boring to fight with missing libraries!

November 04, 2004

SLinks

I have now more or less finished the "sufficient part" of the work outlined in Tuesday's post. That is, I have a version of my program that types and interprets SLinks expressions with records, variants, lists, sets and bags. It supports simple operators for the basic types (boolean, integer and float; string is supported but has no operators), and all operators described earlier in this weblog for records, variants and collections, including full comprehensions. I had forgotten in tuesday's post that I would need a 'define' operator (like caml's 'let' without 'in' operator), and this has been added also.

Of course, the interpreter is still a little bit 'rough on the edges': it doesn't have the nicest error handling available out there (to say the least), the lack of recursive function is annoying, and the code for the unification algorithm for rows is not as nice as it could be. There is also another problem with collections: The 'collection with many elements' operator is syntactic sugar for the union/concatenation of single element collections, and the compilation/typing phase for this is slow. That means there is a visible wait to compile a list of 50 elements. I am not quite sure yet whether this inefficiency can be optimised away, or if I will have to add the multi-collection as a base operator for the language. Interpretation does not have the same problems since it represents lists as lists.

November 03, 2004

Collections

I have added simple support for collections today. However, I have just finished it and didn't have time to test it thoroughly (it works on my simple examples though). My program is able to parse, type and interpret a language which has the following constructs for set, bag and list: empty collection, singleton collection, union (or append) and extension (a basic construct from which comprehensions can be obtained by syntactic sugar).

What is still missing are the list-bag-set translation constructs (actually the typer and interpreter should already be able to handle these, but the parser is not ready yet), syntactic sugar to facilitate work on collections and some operators on basic types. I will work on this tomorrow.

November 02, 2004

The type is not enough

I wrote the interpreter for the language I have now. Currently, it uses strict evaluation.

Here is what I think needs to be done until I have something that is sufficient to start the SQL part:

  • add a few base operators for my base types (record, variant and boolean have what they need, but the others (int, float, char) don't),
  • add support for collections (lists, bags and sets).

Furthermore, there are a few other things which might be useful but are not necessary:

  • pattern matching (Kleisli supports it, but one can define everything without them),
  • recursive functions (Kleisli does not support it, I believe, but it might be usefull).

November 01, 2004

The creeping bug is gone

I spent the day rewriting the unification algorithm for rows to solve the problem mentioned in the last post: there was a fundamental error in the concept. The good news is that the rewrite means that one gets (slightly) more understandable type error messages for row unification, but the program is now 150 lines longer. For now, all test cases I have work with the current version of the program, I hope they cover all situations, this time, and that I didn't miss some tricky type error somewhere.

I can now get back to the business of adding more basic types and basic operators to obtain a language that is actually usable. I hope to get a better idea about what kind of lists to implement tomorrow when I meet Philip Wadler.