« Hitting a ladder | Main | System F »

December 20, 2004

Type erasure

To work on the type-driven optimisation system, I need of course correct types for the expressions. However, some of the optimisation modify the type of an expression (not the top level type, of course, but the type of some internal sub-expressions). Here is an example: In '[set x.#a | ^x <bag (table "tata" with {#a:int,#b:string,#c:float} from db)]' the table operator has type '[bag {#a:int,#b:string,#c:float}]'. However, after a selection optimisation, the type of that table operator is '[bag {#a:int]'.

This means that the expression to optimise must be re-typed between two optimisations, which is a severe efficiency drawback. Another more complicated (I don't know if it is reasonably feasible) solution would be to modify not only the expression but also the type during optimisation. But since the SLinks interpreter is a prototype for a language that should ultimately be compiled, this is probably not such a big problem.

With these correct type information even after the optimisation phase, I have been able to add dynamic type checking of queries. That means that when a table operator is executed, the result provided by the DBMS will be tested to see if it is compatible with the expected type (if the interpreter knows how to translate every column into the right SLinks type). That also means that if the system was until now a "type erasure" system, this is no longer the case: the type of the table operator is used during interpretation.

And finally, I have changed the table operator (it's representation, typing and interpretation) to make it more powerful. An example of the new syntax is: 'table "name" where {#a:int,#b:string} unique order [#a:asc,#b:desc] from db'. Notice the simplified types that are now used. The behaviour of 'unique' and 'order' are self-explanatory. Order is optional, the default value being no ordering at all. Notice also that the type does not require the class (bag, set or list) of the collection anymore (only bags were allowed until now). This is now calculated automatically: if neither 'unique' nor 'order' are set, it is a bag, if only 'unique' is there, it is a set, if 'order' is set, it is a list. This is also a simplified solution to the addition of a 'sortby' operator.

As a side note, this points to a slightly non-logical decision in the design of the language. SLinks has 3 classes of collections, but there should really be 4 of them: ordered or unordered with or without duplicates. In SLinks, following the lines of Kleisli, the ordered collection without duplicates is missing.

And with all this, I didn't have time to work on the type-driven optimisation system itself today, but the structure is here, and I will try it tomorrow.