CS789 Relational Database in Scheme - University of...

CS789Relational Database

in Scheme

School of Computer ScienceUniversity of Nevada, Las Vegas

(c) 2007, Matt Pedersen

1

1 Introduction

This assignment deals with something called relational databases. You willlearn about relations, and you will work with a relational database system thatI have created for you. The purpose of the assignment is for you to becomefamiliar with relational databases, learn to write expressions for them, and inthe end extend the query language that the database system uses. This assign-ment contains two different kinds of exercises: regular exercises which you mustcomplete and hand in for marks, and special exercises that you should do tobecome familiar with the database and the query language.

Please take the extra 1-2 hours it might take to make your assignment answerlook nice - it took me well over 40 hours to prepare this assignment, to writethe database and to make it all look nice, I am sure you appreciate that thematerial I present you with is easy to read, nice to look at etc., just as muchas I value looking at a well written assignment—especially since I have to readand correct them.

2 Preliminaries

A database consists of a number of relations. Figure 1 shows an example ofa relation containing address information. A relation consists of 2 parts:

1. A Schema (the first row in the relation in Fig. 1).

2. A number of tuples (the last 3 rows in the relation in Fig. 1)

Name : string Address : string Phone : symbolMatt Helmcken Street 604-688-0654Alex Alma Street 604-228-2121Yvonne Hornby Street 604-123-1234

Figure 1: Example of a simple relation with 3 columns and 3 tuples.

The schema of a relation names the columns and gives them types (a name/typepair is called an attribute). In Fig. 1 we have 3 columns, so the schema contains3 entries. Name of type string, Address of type string and Phone of type symbol.If the following we often refer to the name part of an attribute as a ’field’). (Notethat even though strings are normally encased in ” ”, I have omitted them in theabove relation, so it is actually impossible to tell whether the name is a stringor a symbol, but the address must be a string as symbols cannot contain spaces).

The relation in Fig. 1 contains 3 tuples, which are illustrated in Fig. 2.

2

("Matt" "Helmcken Street" 604-688-0654)("Alex" "Alma Street" 604-228-2121)("Yvonne" "Hornby Street" 604-123-1234)

Figure 2: The 3 tuples of the sample relation.

If we want to insert more tuples into the relation they must have exactly 3entries; the first and second must be of type string and the third must be oftype symbol.I have written a little database system which you can download from the coursewebsite. The file is called cs789-database.scm. You also need to downloadthe file called relations.scm, which contains some example relations. Whenyou load the file and evaluate it the system should start up automatically. Ifyou break the execution or it crashes (which it might do if you forgot bracketsin the commands!) then you can restart it by typing (run).

3 Getting Started

When you start the system you should see something similar to Fig 3.

Welcome to DrScheme, version 103p1.Language: Graphical Full Scheme (MrEd).Please wait....Starting up the CS-789 Database/Database:4>

Figure 3: Starting up the CS-789 Database.

When the database starts it has a number of predefined relations defined. Thenumber 4 following the /Database: prompt shows the number of relations inthe database at any given time. Any command to the database system takesthe form of a list. The most simple commands are (quit) and (list). The(quit) command of course quits the database system and returns to the regu-lar Scheme command prompt, while the (list) command will list the relationscurrently defined in the database. This is illustrated in Fig. 4.

As you can see from Fig. 4, the result of executing the (list) command is initself a relation. The schema of the relation returned by the (list) commandhas 2 attributes: name : symbol and size : number. As we expected it has 4tuples, one representing each relation in the database. The size attribute rep-resents the number of tuples in the relation, and the name represents the namethat the relation has in the database.

3

/Database:4> (list)

name : symbol size : numberhelp 2summer2002 8billboard 500test 5

Figure 4: Executing the (list) command.

All commands take the form: ( ... ) where ... is a database command.You will learn many more commands later on, and eventually implement someyourself. If at any time you need a list of commands while you are workingwith the database you can use the command (show help), which will returna relation with 3 attributes: command, syntax, and description. This relationcontains information about all the commands available in the database system(including the ones you will implement).

To view the contents of a relation you use the command (show <rel-exp>),where, for now, <rel-exp> is the name of a relation. If we wish to see thecontents of the summer2002 relation we do as in Fig. 5.

/Database:4> (show summer2002)

course : number instructor : string course-name : string124 Matt Pedersen Principles of Computer Science I128 Michael Huggett Principles of Computer Science219 Andrew Warfield Computer Organization220 Stephane Durocher Introduction to Discrete Structures304 Raymond Ng Introduction to Relational Databases126 Jesse Hoey Principles of Computer Science II216 George Tsiknis Program Design and Data Structures218 Mark McCann Computer Organization

Figure 5: Executing the (show summer2002) command.

Exercise 3.1 (No deliverables required): Download the database system, loadthe cs789-database.scm file and run it. Try all the commands you have learnedso far. Remember that if you want to restart the database from the Scheme

4

prompt you can do so by typing (run).

3.1 Recap

In this section we have introduced the 2 simple commands: (quit), and (list).In addition we have introduced the (show <rel-exp>) command to inspect thecontents of a relation.

4 Creating Relations and Inserting Tuples

A database is not much fun if you cannot create relations yourself. In thissection you will learn to create schemas and how to turn them into relations.We will also look at inserting tuples into relations. The result of executing the(list) command was itself a relation; almost everything in our database is arelation. What about the schema of a relation? As you might have guessed,even a schema is a relation. Using the (schema <rel-expr>) you can obtainthe schema of a relation as a relation. Fig. 6 illustrates this.

/Database:4> (schema summer2002)

name : symbol type : symbolcourse numberlecturer stringcourse-name string

Figure 6: Executing the (schema summer2002) command.

As you can see from Fig. 6, the schema of a relation is itself a relation with 2attributes: name of type symbol and type of type symbol.

Exercise 4.1: Explain why the expression (schema (schema R)), where R isany relation, always computes and returns the same relation no matter whichrelation you choose for R.

Now that you have an understanding of what a schema is, we need to be able tocreate schemas ourselves; thus being able to eventually create relations. Sincethe schema of a ’schema relation’ always looks the same (see Fig. 6 and Ex-ercise 4.1) we have a simple command (make-schema) that creates an empty’schema relation’ with the 2 well known schema attributes.

5

/Database:4> (make-schema)

name : symbol type : symbol

Figure 7: Executing the (make-schema) command.

Figure 7 shows the result of executing the (make-schema) command. Weneed to name this relation (we refer to such a relation as a ’schema relation’)so we can insert tuples into it, thus defining the schema for the relation weare going to create. What we want to do is to add this new ’schema relation’to the database and give it a name. To do this we use the (let <name> be<rel-expr>) command. <name> is the name we wish to give the relation in thedatabase, and <rel-expr> a relation or a command/expression that returns arelation. (make-schema) happens to be such a command, it returns a ’schema-relation’. Supposed we wish to name this new ’schema-relation’ my-schema, wedo as illustrated in Fig. 8. We can use the (list) command to verify thatmy-schema has been added to the database, and we can use (show my-schema)verify the contents (so far the relations should not contain any tuples).

/Database:4> (let my-schema be (make-schema))

name : symbol type : symbol

Figure 8: Creating a new ’schema-relation’ and naming it in the database.

Note that all the commands in the database language always return their results,and they are printed out, and also note that the (let ...) in the database isequivalent to (define ...) in Scheme.

Exercise 4.2 (No deliverables): Verify, using (show ...), that my-schema hasbeen added successfully to the database and list the contents to ensure that ithas the right schema and no tuples.

We now have the ability to add relations (so far only ’schema-relations’) to thedatabase, but we still need to be able to add tuples to a relation. If we wish toadd a tuple T to a relation R, we must make sure that the tuple is compatiblewith the relation, that is, that the tuple matches the schema. The type of eachelement in the tuple must match the corresponding type of the attribute in theschema. More formally, assume T = {e1 e2 . . . en}, where T is a tuple with

6

n elements. Assume that the type of ei is ti. Schema(R) = (a1 a2 . . . am),Schema(R) is the schema of relation R, and ai is an attribute consisting of aname ni and a type si.

In order to be able to insert T into R the following constraints must be met:

1. n = m (the tuple must have as many elements as the relation is ’wide’).

2. ti = si,∀1 ≤ i ≤ n (The types must match).

If these constraints are met the tuples can be successfully inserted into the re-lation. The command to insert tuples into a relation is (insert <tup> into<rel-expr>), where <tup> is a tuple that is compatible with the relation re-turned by <rel-expr>. Figure 9 shows how to insert a tuple into the my-schema’schema-relation’ we created earlier.

/Database:4> (insert (name string) into my-schema)

name : symbol type : symbolname string

Figure 9: Inserting a tuple into a relation.

In Fig. 9 we insert a tuple (name string) which eventually will become an at-tribute name : string in the relations we are about to create.

Exercise 4.3: We wish to eventually create an address database, so finishthe schema by adding more tuples (attributes in the final relation): add an ad-dress of type string, a phone-number of type symbol and an age of type number.

Now you should have a relation called my-schema with 4 tuples in. We can nowuse this schema to create a relation, namely our own little directory. From nowon we will refer to all commands as expressions, this makes more sense becausethey all return relations. In order to create a relation from a ’schema relation’we use the (make-relation <rel-expr>) expression. Figure 10 shows how thisworks.Note that (make-relation <rel-expr>) only accepts a relational expressionthat returns a relation, that is, that has a schema which contains two attributes:the first name : symbol and the second type : symbol. If you try to create a re-lation by passing something that is not a ’schema-relation’ you will get an error.

Exercise 4.3: Explain why it it is impossible to create a relation based on a

7

/Database:4> (make-relation my-schema)

name : string address : string phone-number : symbol age : number

Figure 10: Creating a relation based on a ’schema relation’.

non-’schema-relation’.

Exercise 4.4: We wish to eventually create an address database called directory.Add a new relation with the schema represented by the my-schema relation tothe database.

4.1 Recap

In this section we have introduced the (schema <rel-expr>) expression toretrieve the schema of a relation. We also saw how to create new schemasusing the (make-schema) expression. Tuples can be inserted into relations usingthe (insert <tup> into <rel-expr>) expression, and new relations can becreated based on schema-relations using (make-relations <rel-expr>).

5 The Internals of the CS789 Database Program

In this section we will study the implementation of the database program thatwe have only worked with so far.

5.1 Data Representation

A relation is stored as a list with 4 elements:

1. A list of attribute names (list of symbols) called the schema (technicallythe entire schema contains both names and types).

2. A list of attribute types (list of symbols) called the type-list.

3. A list of the tuples in the relation (list of lists).

4. An internal name, typically the name of the function that created therelation, e.g., make schema .

Figure 11 shows the internal representation for the my-schema relation afterinserting all 4 tuples, and for the directory relation immediately after creation.We call this representation the Relation Abstract Data Type, or Relation ADTfor short.

8

((name type)(symbol symbol)((name string)(address string)(phone-number symbol)(age number))_make_schema_)

((name address phone-number age)(string string symbol number)()_make_relation_)

my-schema directory

Figure 11: The my-schema and the directory relation (internal representation).

Exercise 5.1: Show the internal representation of the directory relation afterinserting the following two tuples: (”Matt Pedersen” ”Helmcken Street” 604-112-2334 35) and (”Yvonne Coady” ”Hornby Street” 604-998-8776 40).

5.2 Evaluating Relational Expressions

The main idea behind the evaluation part of the database system is as follows.Remember that each command is written as a list, so each pair of parentheses( ... ) represents a command. The evaluation works in a number of steps:

1. Read in an expression.

2. Type check the expression.

3. Evaluate all sub expressions that themselves could be an expression of theform ( ... ) by calling recursively on the evaluate function (repeatingthis step on the new expression).

4. Evaluate the expression with the values obtained in the previous step andreturn the resulting relation.

You will need to familiarize yourself with the run and with the evaluate func-tion.

I have written a (type) checker that will check your commands before tryingto evaluate them, thus (hopefully) prohibiting the system from crashing if youtype malformed expressions and non existing attribute names or relation names.

The type checking algorithm works in much the same way as the evaluatingfunction, the only difference is that it does not perform any computation on thetuples, but only on the schemas; that way we can catch anything that might

9

cause an error before we try to evaluate the expression.

Study the name-check function and convince yourself that it works.

Exercise 5.2: Show using a recursion-droid like model how name-check iscalled recursively when checking the following expression, and indicate whichcases are taken in the case statement inside the name-check function:

(make-relation (insert (phone symbol) into(insert (address string) into(insert (name string) into (make-schema)))))

Exercise 5.3: Consider the following two set of expressions (the first one has2 the second has 1):

1.)

(let myrel be (make-relation(insert (name string) into (make-schema))))

(insert ("matt") into myrel)

2.)

(let myrel be (insert ("matt") into(make-relation (insert (name string) into (make-schema))))

In the database system all expressions are checked before they are evaluated(by calling name-check) to avoid any errors during the evaluation, this is calledstatic type checking. Static type checking checks everything without actuallylooking at the tuples, that is, without actually evaluating the expression) bysimply considering the schemas of the relations computed by the expressions.

1. Explain why the first set of expressions numbered 1.) can be staticallytype checked.

2. Explain why the second set of expressions numbered 2.) cannot be stat-ically type checked.

3. Explain why (schema (make-relation (schema R))), where R is anyrelation, can be statically type checked.

Note, that in the main loop of the run function, the variable relations containsa list of all the relations that have been defined by the (let ... ) command.

10

5.3 Extending the Evaluator

When you are going to extend the evaluator you do not need to extend thetype checking part of the evaluator. Type checking can be a little complicatedat times, so I have already added all the functionality that you need. You willconcentrate on 2 tasks every time you add a new function to the database querylanguage:

1. Adding a new case in the evaluate function that calls a function to dothe evaluation of the new expression.

2. Writing the function that performs the calculation itself.

We will briefly explain how to implement a simple function and add it to thelanguage now.

Suppose we wish to implement a function (count <rel-expr>) that returns arelation with a schema with just one attribute: count : number, and just onetuple: a number which represents the number of tuples in <rel-expr>.

If you look closely at the other functions already implemented you will seethat they take in the relations they work on plus whatever extra information isneeded. One important thing to remember is that <rel-expr> can it self be anexpression, so before calling the function that calculates the count function weneed to evaluate <rel-expr>. This is done by a recursive call to evaluate:

((COUNT)(count (evaluate (cadr expr) relations)))

We now have to write the count function. All it has to do is return a relationwith one tuple. Let us assume that we were using the database language tocreate this relation. We would probably proceed using the commands listed inFig. 12.

/Database:3> (let count-schema be (make-schema))/Database:4> (insert (count number) into count-schema)/Database:4> (let count-rel be (make-relation count-schema))/Database:5> (insert (???) into count-rel)

Figure 12: The equivalent expressions for (count <rel-expr>) in the querylanguage.

The first step is to create the schema for the resulting relation. Next, we insertthe one attribute that the result relation will have. Then we use the schema

11

relation to create the final relation. The only problem is which number do weinsert? the ??? marks the number that the tuple holds, this number should bethe number of tuples in the relation in which we are interested.

Now that you have a good idea of how you would have written it in the querylanguage (note that you cannot complete Fig. 12 because you do not have away of counting the number of tuples of a relation that is indeed exactly what(count <rel-expr>) will be doing) you can implement it and add it to thelanguage. The implementation could look like this:

(define count(lambda (rel)(let* ((count-schema (make-schema))

(v1 (insert count-schema ’(count number)))(count-rel (make-relation count-schema))(size (length (get-tuples rel))))

(insert count-rel (list size))count-rel)))

A different solution would be to build the relation by hand and simply returnit. We know how it will look so it is not difficult:

(define count(lambda (rel)(list ’(count)

’(number)(list (list (length (get-tuples rel))))’_count_)))

Exercise 5.4: Implement the (count <rel-expr>) function, and show that itworks.

Exercise 5.5: Implement a function (union of <rel-expr1> and <rel-expr2>)that computes the union of two relations. Do not worry about tuples that occurmore than once, that is ok for now. The union is only legal if the schema of<rel-expr1> is the same as <rel-exp2>. However, that is a type checking issue,and I have already dealt with that, so all you need to do it implement a unionfunction that computes the union of the two expressions.

Exercise 5.6: Show that your new (union ...) function works. Create 2relations with the same schema and use union to a new relation with all thetuples in.

12

5.4 Recap

In section 4 we have looked closer at the implementation of the database lan-guage evaluator. You implemented the (count <rel-expr>) expression andthe (union of <rel-expr> and <rel-expr>) yourself.

6 More Database Expressions

In this section we will introduce two of the most useful functions in any databaselanguage: select and project. select is used when you want to select (!)a number of tuples from a relation. The syntax of select is (select from<rel-expr> where <attribute-name> <op> <value>), where <rel-expr> asusual denotes an expression that returns a relation, <attribute-name> is thename of one of an attribute in the <rel-expr> relation, <op> is one of =, !=,<, >, <=, or >=, and finally, <value> is a constant value of the same type asthe type of the attribute specified by <attribute-name>. Figure 13 shows anexample that selects all the courses from the summer2002 relation that has acourse number smaller than 200.

/Database:4> (select from summer2002 where course < 200)

course : number instructor : string course-name : string124 Matt Pedersen Principles of Computer Science I128 Michael Huggett Principles of Computer Science126 Jesse Hoey Principles of Computer Science II

Figure 13: Executing the (select from summer2002 where course < 200)command.

Exercise 6.1: Construct a query expression that returns a relation with all thecourses that have course numbers smaller than 200 or bigger than 300.

select can be used to select tuples from a relation, that is, it picks entirerows from the relation, exactly those that satisfy some predicate. We wouldalso like to be able to select a number of attributes (a number of columns)in our output. For example, it would be nice if we could easily create a re-lation that contains all the tuples in summer2002 but only the course and theinstructor columns (no one ever knows the name of the courses anyway!). Todo that we need a new function. This function is called project. We will im-plement project together, that is, I will provide some of it and you will finish it.First of all, let us consider the syntax of project: (project <rel-expr> over<attribute-name1> ... <attribute-namen>), where <rel-expr> is what it

13

normally is, and <attribute-namei> is the name of a column (an attribute),the resulting relation contains all the same tuples as the relation <rel-expr>but only the columns specified by the attribute names. Figure 14 shows anexample of project.

/Database:4> (project summer2002 over course instructor)

course : number instructor : string124 Matt Pedersen128 Michael Huggett219 Andrew Warfield220 Stephane Durocher304 Raymond Ng126 Jesse Hoey216 George Tsiknis218 Mark McCann

Figure 14: Executing the (project summer2002 over course instructor)command.

Exercise 6.2: We are now going to implement the project command, butbefore we do that we need to look at how to easily select the value of a tuplethat belongs to a certain column. The following algorithm can be used. Assumethat we have a relation rel which has the following internal representation:

((name address phone-number gender age)(string string symbol symbol number)( ... <all the tuples go here> ...)_insert_)

We would like to make a function that when applied to every single tuple returnsthe value in a given column specified by the attribute name of that column, e.g.,say we want to make a function that when applied to a tuple returns the phone-number. Let us write a function that can create such a function (this is agood example of a factory function: a function that returns another function).Assuming that we are trying to create a function that can extract the phone-number out of a tuple, here is the way we do it:

1. Search for the phone-number attribute in the schema, and remember itsindex (for the example this index is 2).

14

2. Return a function, that when applied to a tuple returns the element atposition 2 (i.e., the 3rd element). This can be done using list-ref. Thefunction returned looks like this:

(lambda (tup)(list-ref tup index))

Here is the make-accessor function that does exactly that

(define make-accessor(lambda (rel name)(letrec ((mk-acc(lambda (name-list index)(if (null? name-list)#f(if (equal? (car name-list) name)(lambda (tup) (list-ref tup index))(mk-acc (cdr name-list) (add1 index)))))))

(mk-acc (get-name-list rel) 0))))

If we create such an accessor for every attribute-name in the project commandand put them in a list, we can apply these accessors to each tuple and obtainthe relation we wish. Note that we can also use this list of accessors to createthe schema and type-list of the result, in exactly the same way we do with thetuples. (hint: use map, sometimes twice!)

(define project(lambda (rel names);;; remove any attributes that were repeated(let* ((unique-names (remove-doubles names))

;;; create the list of accessors(accessors (map (lambda (x)

(make-accessor rel x))unique-names))

;;; extract the schema, type-list and tuples;;; from the relation.(tuples (get-tuples rel))(name-list (get-name-list rel))(type-list (get-type-list rel)))

;;; return a new relation with the wanted columns(list

;;; code that computes the new name-list;;; code that computes the new type-list;;; code that computes the new list of tuples

’_project_))))

15

Finish the implementation of project. You just need to write the 3 pieces ofcode that compute the new name-list, type-list and list of tuples.

Exercise 6.3: Use your implementation of project to compute a relation thatcontains the instructor names of all the courses that have course numbers be-tween 200 and 299.

Exercise 6.4 (No deliverables) Study the implementation of select and tryto understand how it works (You will need to change the implementation ina coming exercise). In particular, study the compare function and understandhow the textual representation (as a symbol) of the operator maps to a functionthat can compare the correct types (remember, e.g., < can only be used onnumbers, not strings!).

Exercise 6.5: Implement a function (minimum <rel-expr>) that returns arelation with the same schema as <rel-expr> and with just one tuple: theminimum value for each column. (Hint: You probably need to call (compare’< ... )). Implement a function (maximum <rel-expr>) that works just like(minimum ...) it just computes the largest element in each column. Figure 15shows the result of applying minimum and maximum to summer2002. Note thatthe tuple that the result relation contains might not be in the original relationas the minimum/maximum is computed for each column and the assembledinto one tuple. (Hint: assume that we are working our way through a list oftuples, and also assume that mins = (m0 m1 ... mn) represent the minimumelements we have seen so far. Let tup be the next tuple in the tuple list, whatdoes (map (lambda (t m) (if (compare ’< t m) t m) tup mins) do?

Exercise 6.6: Write an expression that returns a relation containing the nameof the minimum and maximum instructor for all courses with course numberless than 300.

16

/Database:4> (minimum summer2002)

course : number instructor : string course-name : string124 Andrew Warfield Computer Organization

/Database:4> (maximum summer2002)

course : number instructor : string course-name : string304 Stephane Durocher Program Design and Data Structures

Figure 15: Executing the (minimum summer2002) and the (maximumsummer2002) commands.

Exercise 6.7: Recall the syntax of the (select ...) expression:

(select from <rel-expr> where <name> <op> <value>)

We would like to replace <value> by $.<name>, that is, we would like to ex-change the constant value on the right hand side of the operator with a way ofaccessing the value of another element in the tuple. For example:

The filter operation of the select looks like this:

(filter (lambda (tup) (compare op (accessor tup) value))(get-tuples rel))

That is, all the tuples where (compare op (accessor tup) value) evaluatesto true will be included in the end result. We want to change this filter expres-sion into something general where <value> is replaced by a function that takesin tup as well, and if we indeed have a <value> simply returns this value, elseit takes the value of the element in the tuple that corresponds to the name in$.<name>.

So we will rewrite the select function to look like this:

(define select(lambda (rel name op value)(let ((accessor (make-accessor rel name))

(value-accessor << ...............>> ))(mk-relation

(get-name-list rel)

17

(get-type-list rel)(filter (lambda (tup)

(compare op (accessor tup) (value-accessor tup)))(get-tuples rel))

’_select_))))

where << ...............>> is some code that the returns either

• a function that just returns value if value is not of the form $.<name> or

• of value is a symbol $.<name>, that is, has $ as the first character whenturned into a string, use extract-field to get <name> and use make-accessorto make an accessor that can extract the correct value from a tuple, andreturn a function that when applied to a tuple returns the value obtainedby applying the accessor created by the call to make-accessor.

Write the missing piece of code.

The next command we will introduce is the join command. The syntax is asfollows:(join <rel-expr> with <rel-expr>). The result of joining two rela-tions is a new relation whose attributes are a combination of all the attributes inboth relations. Any attributes that exist in both relations (common attributes)will only be added once. (Note, names and types must match for 2 attributesto be common). The tuples in the new relation are formed by combining allpairs of tuples from the 2 relations where the 2 tuples share the same valueson the common attributes. The join expression can return an empty relation ifthere are no two tuples with the same value on the common attributes. If the 2relations do not have any common attributes the result relation will contain anycombination between tuples in the first and the second relation (this is reallyjust the cross product of the two relations).

Figure 16 shows an example of using join where the two relations have commonattributes: addresses and numbers have the attribute Name : string in common.If you inspect both addresses and numbers you will see that they each containtuples that are not in the result returned from join, namely because the namesare not in both relations - thus they will be left out: The name ”Paul Kry” isnot in addresses and the names ”Dima Brodsky” and ”Andy Warfield” are notin sf numbers.

Let us now create 2 new relations - one with just the name column of the ad-dresses relation and one with just the address column of the relation addresses.Figure 17 shows what happens when we join these 2 new relations. Since theydo not have any common attributes join will produce the cross product of thetwo relations, that is, all tuples in R1 will be combines with all tuples in R2, soif R1 has n tuples and R2 has m, the result relation will have n ∗ m tuples.

18

/Database:4> (join addresses with numbers)

Name : string Address : string Phone: symbolMatt Pedersen Helmcken Street 604-688-5456Alex Brodsky West 2nd Avenue 604-283-1635Yvonne Coady Hornby Street 604-682-5432

Figure 16: The result of executing the (join ...) command on two relationswith common attributes.

19

/Database:4> (let r1 be (project addresses over name))

Name : stringMatt PedersenAlex BrodskyYvonne CoadyPaul Kry

/Database:4> (let r2 be (project addresses over address))

Address : stringHelmcken StreetWest 2nd AvenueHornby StreetWest 1st Avenue

/Database:4> (join r1 with r2)

Name : string Address : stringMatt Pedersen Helmcken StreetMatt Pedersen West 2nd AvenueMatt Pedersen Hornby StreetMatt Pedersen West 1st AvenueAlex Brodsky Helmcken StreetAlex Brodsky West 2nd AvenueAlex Brodsky Hornby StreetAlex Brodsky West 1st AvenueYvonne Coady Helmcken StreetYvonne Coady West 2nd AvenueYvonne Coady Hornby StreetYvonne Coady West 1st AvenuePaul Kry Helmcken StreetPaul Kry West 2nd AvenuePaul Kry Hornby StreetPaul Kry West 1st Avenue

Figure 17: The result of executing the (join ...) command with two relationsthat do not have any common attributes.

Exercise 6.8: I have implemented the part of the join command that deals

20

with 2 relations that have common attributes. You must implement the codethat computes the cross product of 2 relations without any common attributes.Complete the code for the the join command.

The second to last expression you need to know is the rename expression. Thesyntax is (rename <att-name> in <rel-expr> to <att-name>). The expres-sion renames an existing column in a relation to have a new name. Figure 18shows an example of using the rename expression. The rename expression can

/Database:4> (show addresses)

Name : string Address : stringMatt Pedersen Helmcken StreetAlex Brodsky West 2nd AvenueYvonne Coady Hornby StreetPaul Kry West 1st Avenue

/Database:4> (rename address in addresses to where-do-they-live)

Name : string Where-do-they-live: stringMatt Pedersen Helmcken StreetAlex Brodsky West 2nd AvenueYvonne Coady Hornby StreetPaul Kry West 1st Avenue

Figure 18: An example of renaming a column using the (rename ...) expres-sion.

be extremely helpful if you need to join two relations that have a column incommon but no shared attribute because the columns are not named the same.Using the rename expression you can rename one of them and thus avoid com-puting a big cross product and having to use select to pick out the ones wherethe two columns have the same value.

Example:

Assume that we have two relations R1 and R2 where

Schema(R1) = (name : string address : string)Schema(R2) = (nombre : string numero-de-telefono : symbol)

So R1 has a column with a name and one with an address, and R2 has a namecolumn, but it happens to be in Spanish - nombre, and a (yet again spanish)

21

telephone number column. We know that the name and the nombre columnsboth contain names, so if we want to join these together without renaming firstwe will get a big cross product. Assuming that R1 has n tuples and R2 hasm tuples, the result of joining R1 with R2 will have n ∗ m tuples, if we wantthe tuples where name = nombre (to get a tuple per person with his/her name,address and phone number) we would do something like:

(select from (join R1 with R2) where name = $.nombre)

We would compute a lot of useless tuples - all the ones where name is not equalto nombre (which would probably be most of them).

Let us now rename the nombre column to name and then join R1 and R2:

(join R1 with (rename nombre in R2 to name))

This would not compute any tuples that we do not get in the final relation.

Finally, in order to improve the output, it might be seful if we could sort arelation according to a column. We wish to implement a sort command withthe following syntax:

(sort <rel-expr> by <attr-name>)

Exercise 6.9: Implement the sort command. Implement quick-sort (useset! when appending to the various lists in quick-sort). Use make-accessorand compare ’< ... to construct the correct accessors to extract the wantedcolumn from a tuple, and to compare such values according to their type.

6.1 Recap

In section 6 you have learned how to select tuples from a relation using the(select from <rel-expr> where <attr-name> <op> <value / $.attr-name>),you have also learned how to project a relation over a number of column usingthe (project <rel-expr> over <attr-name1> ... <attr-namen>) expres-sion. You implemented a (minimum <rel-expr>) and a (maximum <rel-expr>)expression yourself and we did (join <rel-expr> with <rel-expr>) together,and finally you implemented sort <rel-expr> by <attribute-name>) to sorta relation based on one column.

22

7 The Planets and their Moons

You will find 2 relations in the relation list that has to do with planets andmoons in our solar system: planets and moons. In this section you will use youreverything you have learned to compute interesting relations on this material.

Exercise 7.1: Compute a relation that has the following schema:

(planet-name : string name : string distance : number)

and contains one tuple matching the schema above. This one tuple should rep-resent the moon in our solar system that is closest to its planet.

Exercise 7.2: Write an expression that returns a relation with the name andthe days-to-rotate for all planets that have one or more moons that are closerthan 25000 miles to the planet.

Exercise 7.3: Write an expression that returns a relations with one columncontaining the names of the planets that has moons that were discovered by”Voyager 2”.

Exercise 7.4: Write an expression that returns a relation with 2 columns: thefirst contains moon names, the second planet names. We wish to computer arelation where a tuple (moon planet) is present in the relation is the diameter ofthe moon is bigger than the diameter of the planet, that is, computer a relationof all the moons that have a diameter bigger than a (any) planet and list whichplanet it is bigger than. (Hint, a cross product might be useful followed by aselect) - the final relation has 9 tuples in it.

Exercise 7.5: (Hard) Create a query with the following schema:

Planet − name : string P lanet − Diameter : number

that lists all planets which have at least one moon that has a diameter (moondiameter) greater than the smallest planet’s diameter. The relation must besorted by the Planet-Diameter. (The correct result has just 4 tuples).

We can (fairly easily) create a relation that contains the name of the moons(along with their planets) much like the previous exercise if we wish, that is, aquery that results in the following relation:

23

Planet:string Name:string Diameter:numberNeptune Triton 2700.0Jupiter Europa 3138.0Earth Moon 3476.0Jupiter Io 3630.0Jupiter Callisto 4800.0Saturn Titan 5150.0Jupiter Ganymede 5262.0

but creating the following relation is a different story:

Planet:string Number-of-moons:numberNeptune 1Jupiter 4Earth 1Saturn 1

That is, a relation where for each planet that has a moon that is larger thanthe smallest planet, the number of such moons is listed in the Number-of-moonscolumn.

Exercise 7.5: Explain why you cannot write a query for your database that re-turns the above relation; and give a description of the extension to the databasesystem that is needed in order to compute such a relation.

24

CS789 Relational Database in Scheme - University of...

Documents

Transcript of CS789 Relational Database in Scheme - University of...