I finished the new selection optimisation I have been working on for a few days now. For my standard test set, it is working as expected (there is one case with joining on the same table where I am not quite sure, I'll have to think about whether it is really correct). Please read on for a little comment about types, this last optimisation and the projection optimisation.
To make things clear, the selection optimisation is the optimisation that pushes conditions into the WHERE part of an SQL query. The projection optimisation reduces the number of columns queried depending on what is really needed in the code by reducing the number of fields in the SELECT part of an SQL query. What I want to show is that the projection optimisation can use types to obtain the information it needs, while the selection optimisation must look at the actual code. I will use a new , hopefully easier to read, syntax for these examples. Please note that this syntax is not implemented yet.
Here is an example for the projection optimisation. Consider this expression:
{ x.a | ^x <- (table "tata" with (a:int,b:string,c:float) from db)}
The type of the body of this comprehension 'x.a' has type '(a:'a,'b)' where ''a' and ''b' are type variables. By using this type, one can easily see that the table will only use the 'a' field of the table, and hence optimise the query. Furthermore, since the type inference mechanism keeps track of the type of all values declared earlier, it will use this knowledge if such values are used in the body (for example for functions).
Now, for the selection optimisation. Consider this expression:
{ b | ^(a=^a,b=^b|^r) <- (table "tata" with (a:int,b:string,c:float) from db), a == 1 }
Which will be transformed into this after syntactic sugar has been removed:
for ^(a=^a,b=^b|^r) = (table "tata" with (a:int,b:string,c:float) from db) in
if (a == 1) then
{ b }
else {}
The information that is needed to determine what can be optimised comes from the syntactic structure of the body expression (the expression after the 'for … in'). A condtion where one of the branches is the empty list can be pushed into the query. The information contained in the type, at any point in the above expression, is useless to detect this kind of optimisations. I do not have a proof of this, but I really believe that type information is not usable in this case. If a reader disagrees, I'd be happy to hear about his views.
A last little note, while writing this post, I noticed that for the projection optimisation, I still had a little bit too much syntactic exploration in my code, and I will correct that tomorrow.