« January 2005 | Main | April 2005 »

February 18, 2005

Good bye Edinburgh

Et voilà, I'm out of here, on Monday an aeroplane will fly me back across the channel to mountainous Switzerland. And I have one month of holiday before I start working again ...

Between the cleaning of my desk, last re-readings of my report etc. I discussed with Philip about whether there could be better, more formal ways to represent all theses optimisations I've worked on. For the projection optimisation we came up with a neat and nice way based on the type of variables. Re-implementing the optimisation with this new solution could be a great idea for whoever will continue to work on SLinks in the future. For the other optimisations, we wrote some basic transformation rules, but did not come up with any completely satisfactory solution. If you read on, you will find the notes we came up when thinking about these optimisations.

def ^db = database (name="gilles", user="gilles");;

-----------------------------------------
| Remove renaming variable declarations |
-----------------------------------------

fun ^x -> let ^y = x in y;;
\/ \/ \/ \/ \/
fun ^x -> x;;

---------------------------------------
| Remove unused variable declarations |
---------------------------------------

fun ^x -> let ^y = x in x;;
\/ \/ \/ \/ \/
fun ^x -> x;;

fun ^x -> let ^(a=^a|^r) = x in x;;
\/ \/ \/ \/ \/
fun ^x -> x;;

fun ^x -> let ^(a=^a|^r) = x in (let ^(b=^b|^r) = r in b);;
\/ \/ \/ \/ \/
fun ^x -> let ^(a=^a|^r) = x in b;;

fun ^x -> let ^(a=^a|^r) = x in (let ^(b=^b|^r) = r in r);;
\/ \/ \/ \/ \/
no optimisation

fun ^x -> let ^(a=^a|^r) = x in (let ^(b=^b|^r) = r in (b,r));;
\/ \/ \/ \/ \/
no optimisation

fun ^x -> let ^(a=^a|^r) = x in (let ^(b=^b|^s) = r in (b,r));;
\/ \/ \/ \/ \/
no optimisation

---------------------------
| Push projections to SQL |
---------------------------

How to work out the projection using types.
(1) Push as much as possible into the table expression (selections, etc.).
(2) Replace the table type by forall r.{ (|r) }
Here (|r) is a record type with no fields and row r.
(3) Infer the type of the table.
The solution for r tells you which rows to project.

{ x | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) };;
\/ \/ \/ \/ \/
no optimisation

{ {x.a, x.b} | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) };;
\/ \/ \/ \/ \/
{ {x.a, x.b} | ^x <-- (table "tata" with (a:int, b:string) from db) };;

{ x.a | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) };;
\/ \/ \/ \/ \/
{ x.a | ^x <-- (table "tata" with (a:int) from db) };;

{| 4 | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) |};;
\/ \/ \/ \/ \/
{| 4 | ^x <-- (table "tata" with () from db) |};;

{ (fun ^x -> x.a)(x) | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) |};;
\/ \/ \/ \/ \/
{ (fun ^x -> x.a)(x) | ^x <-- (table "tata" with (a:int) from db) |};;

def ^takea = fun ^x -> x.a;;
{ takea(x) | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) };;
\/ \/ \/ \/ \/
{ takea(x) | ^x <-- (table "tata" with (a:int) from db) };;

--------------------------
| Push selections to SQL |
--------------------------

for y <- (for x <- d return e) return f ==> for x <- d return (for y <- e return f)

for x <- f in (if b then d else e) ==> if b then (for x <- f in d) else (for x <- f in e),
if x not in fv(b)

for x <- (select R from T where C order by S) return if b then d else []
==>
for x <- (select R from T where C, b order by S) return d,
if all operators in b are available in SQL

{| b | ^(a=^a, b=^b|^r) <-- (table "tata" with (a:int, b:string, c:float) from db), a == 1 |};;
\/ \/ \/ \/ \/
{| b | ^(a=^a, b=^b|^r) <-- (table "tata" with (a:int, b:string, c:float) where a == 1 from db) |};;

{| (x.a, y.b) | ^x <-- (table "tata" with (a:int, b:string, c:float) from db),
^y <-- (table "tata" with (a:int, b:string, c:float) from db),
x.a == 1,
y.b <> "cool" |};;
\/ \/ \/ \/ \/
{| (x.a, y.b) | ^x <-- (table "tata" with (a:int, b:string, c:float) where a == 1 from db),
^y <-- (table "tata" with (a:int, b:string, c:float) where b <> "cool" from db) |};;

{| (auth.name, {| pub.title | ^pub <-- (table "publication" with (publication_id:int, author_id:int, title:string) from db),
pub.author_id == auth.author_id |} )
|^auth <-- (table "author" with (author_id:int, name:string) from db) |};;
\/ \/ \/ \/ \/
{| (auth.name, {| pub.title | ^pub <-- (table "publication" with (publication_id:int, author_id:int, title:string)
where author_id == (auth.author_id) from db) |} )
|^auth <-- (table "author" with (author_id:int, name:string) from db) |};;

// auth.author_id in the optimised expression is a variable in a query that will be set to the calculated value at runtime,
// as far as the query is concerned, it is a constant.

---------------------
| Push joins to SQL |
---------------------

for x1 <- (select R1 from T1 where C1 order by S1) return
for x2 <- (select R2 from T2 where C2 order by S2) return
d
==> for (x1,x2) <- (select R1,R2 from T1,T2 where C1,C2 order by S1,S2) return d

Here is a justification for the law

select R from T where C ~ for T return (if C return [R] else [])

for x1 <- (for y1 <- e1 return [f1]) return
for x2 <- (for y2 <- e2 return [f2]) return
d
==>
for (x1,x2) <- (for y1 <- e1 return for y2 <- e2 return (f1,f2)) return d

for x1 <- (for y1 <- e1 return [f1]) return
for x2 <- (for y2 <- e2 return [f2]) return
d
==>
for y1 <- e1 return
for x1 <- [f1] return
for y2 <- e2 return
for x2 <- [f2] return
d
==>
for y1 <- e1 return
for y2 <- e2 return
d[f1/x1, f2/x2]
<==
for (x1,x2) <- (for y1 <- e1 return for y2 <- e2 return [(f1,f2)]) return d


{| (t1.name1, t2.name2) | ^t1 <-- (table "join1" with (id1:int, name1:string) from db),
^t2 <-- (table "join2" with (id2:int, id1f:int, name2:string) from db),
t1.id1 == t2.id1f |};;
\/ \/ \/ \/ \/
{| {t1.name1, t2.name2} | ^t1 <-- (table "join1" join "join2"
with (id1:int, name1:string, id2:int, id1f:int, name2:string)
where id1 == id1f from db) |};;

-----------------------
| Push sorting to SQL |
-----------------------

sortby S (select R from T where C sortby S') ==> select R from T where C order by S,S'
if sortby is stable
if it's not stable, we can just drop S'


sort_up ({| a | ^(a=^a, b=^b, c=^c) <-- (table "tata" with (a:int, b:string, c:float) from db) |});
\/ \/ \/ \/ \/
[ a | ^(a=^a, b=^b, c=^c) <--- (table "tata" with (a:int, b:string, c:float) order [a:asc] from db)];

sort_up ({| x | ^x <-- (table "tata" with (a:int, b:string, c:float) from db) |});
\/ \/ \/ \/ \/
[ x | ^x <--- (table "tata" with (a:int, b:string, c:float) order [a:asc, b:asc, c:asc] from db)];

Compilation rules

R ::= () | x.a, R
T ::= () | TABLE x, T where TABLE is a table name
C ::= () | b, C


select R from TABLE x, T where C ==>
for x <- TABLE return (select R from T where C)

select R from () where b, C ==>
if b then (select R from () where C) else []

select R from () where () ==>
[R]

much harder with an order by clause!

February 17, 2005

The report

That's it, my report is finished. I had a little panic attack this morning when I realised that the entire section I had written about algorithm W and unification was garbage, but I rewrote it and everything is fine now. Jeremy has promised he'll proof-read it until tomorrow morning and I will re-read it too until then. But basically it is done. I even have a nice title page. The grand total of pages is 61. It is quite a pleasant feeling to have finished this report, and even well in time!

February 16, 2005

The end is near

More or less everything is written now, even most of the introduction and conclusion. Tomorrow I will have to make a re-reading orgy and make sure all the layout is nice and neat. I might even finish my report in time (not that I have any choice about that anyway).

February 15, 2005

Fighting boredom

Well, it seems I might not be that bored on Thursday after all. I am already late on my week plan outlined yesterday: there are still some unfinished spots lingering around my report. I will have to boost my productivity tomorrow.

February 14, 2005

And plenty of pages

I added 19 pages (or 17 if counted in a new smaller font) this time. Well, actually there is a little bit of cheating going on here since this amount includes my week-end production. I decided on Friday it was time to start to be nervous about the delay for my report: after all only one week was remaining. It clearly helped my productivity since I worked on the week-end and wrote an average of more than 6 pages per day for the last three days, and I believe better pages too.

The chapter about the system structure is finished. The three other chapters about the language, the type checking system and the database optimisations are mostly done but still have some unfinished spots from place to place. If everything goes well, I hope to have finished them tomorrow and have started the introduction and conclusion. Wednesday I will finish the introduction and conclusion and write the review for the article for Philip (now that I know what it is about) and Thursday I will be bored to death because I won't have anything to do (why do I feel this might not be true?).

February 11, 2005

And pages

I added 5 more pages this time.

February 10, 2005

And pages

I added 4 more pages to my report today. Nothing more to say really.

February 09, 2005

And pages

I added 3 pages to my report today, but found out that the number of pages announced on this blog do not sum up to the real number: 2 pages are missing, probably lost while merging different pieces of text together. I also re-read the article that I must review, and understood it quite well this time. There is just one section - that deals with category theory - that I do not understand at all. Now, I only have to find out what a review exactly should contain.

February 08, 2005

Pages

4 pages is the yield of today's report writing.

February 07, 2005

A slow Monday

On this rather slow Monday, I added three more pages to my report. Nothing much to say, except maybe that the end of my stay here is approaching very quickly: only two more weeks and I will be gone.

February 04, 2005

Administrative work

Today, I mostly worked on various forms and reports I have to send back to my university in Switzerland before I go home in order to satisfy the voracious appetite of it's administration. I also worked on my report, but I mostly changed existing content, which actually reduced the size of my report by two pages. But then also, quality is more important than quantity, isn't it?

February 03, 2005

Reporting

I wrote three more pages for my report, and changed some other things in the existing report.

February 02, 2005

Reporting

In a now slightly repetitive fashion, I have continued working on my report. I am afraid I will not be able to report a grand number of additional pages, as I have mostly been modifying old pages to make them (hopefully) better. Two additional pages have nevertheless been added to the total.

February 01, 2005

A little reading

I have been working most of the time on my report (4 new pages). Since this is not very exciting, I am also pleased to announce that I read an article that Philip wants me to review (and that will require a few additional reads to be fully understood) and went to a LFCS lab-lunch talk about relational database queries optimisations which was interesting.