« The report | Main | Closing down »

February 18, 2005

Good bye Edinburgh

Et voilà, I'm out of here, on Monday an aeroplane will fly me back across the channel to mountainous Switzerland. And I have one month of holiday before I start working again ...

Between the cleaning of my desk, last re-readings of my report etc. I discussed with Philip about whether there could be better, more formal ways to represent all theses optimisations I've worked on. For the projection optimisation we came up with a neat and nice way based on the type of variables. Re-implementing the optimisation with this new solution could be a great idea for whoever will continue to work on SLinks in the future. For the other optimisations, we wrote some basic transformation rules, but did not come up with any completely satisfactory solution. If you read on, you will find the notes we came up when thinking about these optimisations.

def ^db = database (name="gilles", user="gilles");;

-----------------------------------------
| Remove renaming variable declarations |
-----------------------------------------

fun ^x -> let ^y = x in y;;
\/ \/ \/ \/ \/
fun ^x -> x;;

---------------------------------------
| Remove unused variable declarations |
---------------------------------------

fun ^x -> let ^y = x in x;;
\/ \/ \/ \/ \/
fun ^x -> x;;

fun ^x -> let ^(a=^a|^r) = x in x;;
\/ \/ \/ \/ \/
fun ^x -> x;;

fun ^x -> let ^(a=^a|^r) = x in (let ^(b=^b|^r) = r in b);;
\/ \/ \/ \/ \/
fun ^x -> let ^(a=^a|^r) = x in b;;

fun ^x -> let ^(a=^a|^r) = x in (let ^(b=^b|^r) = r in r);;
\/ \/ \/ \/ \/
no optimisation

fun ^x -> let ^(a=^a|^r) = x in (let ^(b=^b|^r) = r in (b,r));;
\/ \/ \/ \/ \/
no optimisation

fun ^x -> let ^(a=^a|^r) = x in (let ^(b=^b|^s) = r in (b,r));;
\/ \/ \/ \/ \/
no optimisation

---------------------------
| Push projections to SQL |
---------------------------

How to work out the projection using types.
(1) Push as much as possible into the table expression (selections, etc.).
(2) Replace the table type by forall r.{ (|r) }
Here (|r) is a record type with no fields and row r.
(3) Infer the type of the table.
The solution for r tells you which rows to project.

{ x | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) };;
\/ \/ \/ \/ \/
no optimisation

{ {x.a, x.b} | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) };;
\/ \/ \/ \/ \/
{ {x.a, x.b} | ^x <-- (table "tata" with (a:int, b:string) from db) };;

{ x.a | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) };;
\/ \/ \/ \/ \/
{ x.a | ^x <-- (table "tata" with (a:int) from db) };;

{| 4 | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) |};;
\/ \/ \/ \/ \/
{| 4 | ^x <-- (table "tata" with () from db) |};;

{ (fun ^x -> x.a)(x) | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) |};;
\/ \/ \/ \/ \/
{ (fun ^x -> x.a)(x) | ^x <-- (table "tata" with (a:int) from db) |};;

def ^takea = fun ^x -> x.a;;
{ takea(x) | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) };;
\/ \/ \/ \/ \/
{ takea(x) | ^x <-- (table "tata" with (a:int) from db) };;

--------------------------
| Push selections to SQL |
--------------------------

for y <- (for x <- d return e) return f ==> for x <- d return (for y <- e return f)

for x <- f in (if b then d else e) ==> if b then (for x <- f in d) else (for x <- f in e),
if x not in fv(b)

for x <- (select R from T where C order by S) return if b then d else []
==>
for x <- (select R from T where C, b order by S) return d,
if all operators in b are available in SQL

{| b | ^(a=^a, b=^b|^r) <-- (table "tata" with (a:int, b:string, c:float) from db), a == 1 |};;
\/ \/ \/ \/ \/
{| b | ^(a=^a, b=^b|^r) <-- (table "tata" with (a:int, b:string, c:float) where a == 1 from db) |};;

{| (x.a, y.b) | ^x <-- (table "tata" with (a:int, b:string, c:float) from db),
^y <-- (table "tata" with (a:int, b:string, c:float) from db),
x.a == 1,
y.b <> "cool" |};;
\/ \/ \/ \/ \/
{| (x.a, y.b) | ^x <-- (table "tata" with (a:int, b:string, c:float) where a == 1 from db),
^y <-- (table "tata" with (a:int, b:string, c:float) where b <> "cool" from db) |};;

{| (auth.name, {| pub.title | ^pub <-- (table "publication" with (publication_id:int, author_id:int, title:string) from db),
pub.author_id == auth.author_id |} )
|^auth <-- (table "author" with (author_id:int, name:string) from db) |};;
\/ \/ \/ \/ \/
{| (auth.name, {| pub.title | ^pub <-- (table "publication" with (publication_id:int, author_id:int, title:string)
where author_id == (auth.author_id) from db) |} )
|^auth <-- (table "author" with (author_id:int, name:string) from db) |};;

// auth.author_id in the optimised expression is a variable in a query that will be set to the calculated value at runtime,
// as far as the query is concerned, it is a constant.

---------------------
| Push joins to SQL |
---------------------

for x1 <- (select R1 from T1 where C1 order by S1) return
for x2 <- (select R2 from T2 where C2 order by S2) return
d
==> for (x1,x2) <- (select R1,R2 from T1,T2 where C1,C2 order by S1,S2) return d

Here is a justification for the law

select R from T where C ~ for T return (if C return [R] else [])

for x1 <- (for y1 <- e1 return [f1]) return
for x2 <- (for y2 <- e2 return [f2]) return
d
==>
for (x1,x2) <- (for y1 <- e1 return for y2 <- e2 return (f1,f2)) return d

for x1 <- (for y1 <- e1 return [f1]) return
for x2 <- (for y2 <- e2 return [f2]) return
d
==>
for y1 <- e1 return
for x1 <- [f1] return
for y2 <- e2 return
for x2 <- [f2] return
d
==>
for y1 <- e1 return
for y2 <- e2 return
d[f1/x1, f2/x2]
<==
for (x1,x2) <- (for y1 <- e1 return for y2 <- e2 return [(f1,f2)]) return d


{| (t1.name1, t2.name2) | ^t1 <-- (table "join1" with (id1:int, name1:string) from db),
^t2 <-- (table "join2" with (id2:int, id1f:int, name2:string) from db),
t1.id1 == t2.id1f |};;
\/ \/ \/ \/ \/
{| {t1.name1, t2.name2} | ^t1 <-- (table "join1" join "join2"
with (id1:int, name1:string, id2:int, id1f:int, name2:string)
where id1 == id1f from db) |};;

-----------------------
| Push sorting to SQL |
-----------------------

sortby S (select R from T where C sortby S') ==> select R from T where C order by S,S'
if sortby is stable
if it's not stable, we can just drop S'


sort_up ({| a | ^(a=^a, b=^b, c=^c) <-- (table "tata" with (a:int, b:string, c:float) from db) |});
\/ \/ \/ \/ \/
[ a | ^(a=^a, b=^b, c=^c) <--- (table "tata" with (a:int, b:string, c:float) order [a:asc] from db)];

sort_up ({| x | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) |});
\/ \/ \/ \/ \/
[ x | ^x <--- (table "tata" with (a:int, b:string, c:float) order [a:asc, b:asc, c:asc] from db)];

Compilation rules

R ::= () | x.a, R
T ::= () | TABLE x, T where TABLE is a table name
C ::= () | b, C


select R from TABLE x, T where C ==>
for x <- TABLE return (select R from T where C)

select R from () where b, C ==>
if b then (select R from () where C) else []

select R from () where () ==>
[R]

much harder with an order by clause!