CS 542 Putting it all together -- Storage Management
-
Upload
j-singh -
Category
Technology
-
view
493 -
download
2
Transcript of CS 542 Putting it all together -- Storage Management
CS 542 Database Management Systems
J Singh March 14, 2011
Putting it all together
Storage Hierarchy Secondary Storage Management
System Catalogs Data Modeling
2© J Singh, 2011 2
Plan for today
• Putting it all together. – Storage Hierarchy– Secondary Storage Management– System Catalogs
• By the second break, we will have arrived at a place where we have most of the tools to build our own database
• After the second break,– Data Modeling
• This topic does not fit with the other three but some of you will need it for your project so I am including it
3© J Singh, 2011 3
Storage Hierarchy
• A “typical” system is shown here
• Many different levels– Main Memory (2011)
• 1μsec access / word, $12.50/GB– Disk (2011, ref)
• 3.4 msec access / page, $1.35/GB
• Further away from primary storage,• Cost per MB decreases• Access speed decreases• Storage capacity increases
• Secondary- and tertiary-storage must be non-volatile Source: Wikipedia
4© J Singh, 2011 4
DBMS vs. OS File System
OS does disk space & buffer mgmt already So why not let OS manage these tasks?
• Differences in OS support: Portability issues
• Some limitations, e.g., files don’t span multiple disk devices.
• Buffer management in DBMS requires ability to:– pin a page in buffer pool, – force a page to disk (important for implementing CC &
recovery),– adjust replacement policy, and pre-fetch pages based on
access patterns in typical DB operations.
5© J Singh, 2011 5
Structure of a DBMS
• A typical DBMS has a layered architecture.
• Disk Storage hierarchy, RAID
• Disk Space ManagementRoles, Free blocks
• Buffer ManagementBuffer Pool, Replacement policy
• Files and Access MethodsFile organization
• heaps, sorted files, indexes
File and Page level storage
Query Optimizationand Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
These layersmust considerconcurrencycontrol andrecovery
Index FilesSystem Catalog
Data Files
6© J Singh, 2011 6
Five-minute rule (p1)
• Jim Gray, 1985, 1997
• When it comes to improving database performance,– Pay per MB when buying RAM
• If RAM is cheaper, we can afford to buy more
– Pay for speed when buying disk• Faster disk, can get away with less RAM
• The critical question:– At what point does the cost of keeping data in memory balance
the cost of getting it from disk?
Source: ACM Queue
7© J Singh, 2011 7
Five-minute Rule (p2)
• In 1985 context: (ref)– Block size: 1KB– Disk:
• 15 I/Os per second = $15K• 1 I/O per sec $1K +
overhead $2K
– Memory• $5K /MB $5 /KB 1KB = $5
– Trade-off• Spend $5 for 1K in memory to
save $2K in I/O cost? Sure!• Any RAM cost up to $2K is
good!• $2K/$5 = 400 seconds
• Five minute rule:– Have enough RAM to
keep any data that will be used within 5 minutes
• In 1997 context: (ref)– Re-validated and re-
published
• In 2008 context: (ref)– Still valid, but for much
larger block sizes (64KB). Why?
• Disk speeds have not increased at the same rate as RAM capacity, making it more economical to bring in bigger blocks.
8© J Singh, 2011 8
Extending the Hierarchy (p1)
• Flash memories (Solid-State Disks) (ref)
– Fit nicely at the half-way point between memory and disk
– 2011 price: 80¢/GB– Persistent Storage– Low Power– Implications of 5-minute rule:
• RAM-Flash buffer size 4KB• Flash-Disk buffer size 256KB• Special uses?
– Index Structures?– Materialized Views?
9© J Singh, 2011 9
Extending the Hierarchy (P2)
• Principles of caching also extend into the cloud. Comparison:
– Disk (2011)• 3.4 msec access, $1.35/GB
– Amazon S3 (2011)• Highly Redundant storage constructed from cheap disks• 100-250 msec access (across the network), $0.14/GB
• Potential Applications,– How to turn a key-value store (e.g., Amazon SimpleDB)
into a document store• Use SimpleDB for its indexing capabilities • Use S3 for storing documents
– Database backups / checkpoints into the cloud
10© J Singh, 2011 10
Disks
• Secondary storage device of choice.
• Data is stored and retrieved in units:– called disk blocks or pages.
• Unlike RAM, time to retrieve a disk page varies depending upon location on disk.
– Therefore, relative placement of pages on disk has major impact on DBMS performance
11© J Singh, 2011 11
Components of a Disk
• The platters spin.– 15,000 rpm available– 7,200 or 5,400 rpm are
more typical
• The arm assembly is moved in or out to position a head on a desired track.
• Tracks under heads make a cylinder (imaginary!).
• Only one head reads/writes at any one time.
Block size is a multiple of sector size (which is fixed).
Platters
Spindle
Disk head
Arm movement
Arm assembly
Tracks
Sector
12© J Singh, 2011 12
Accessing a Disk Page
• Time to access (read/write) a disk block:– seek time (moving arms to position disk head on track)– rotational delay (waiting for block to rotate under head)– transfer time (actually moving data to/from disk surface)
• Seek time and rotational delay dominate.– Seek time varies from about 1 to 20msec– Rotational delay varies from 0 to 10msec– Transfer rate is about 1msec per 4KB page
• Lower I/O cost: reduce seek/rotation delays
13© J Singh, 2011 13
Disk Access Optimizations
• Instead of responding to access requests in FIFO order, respond to them in an order that takes the disk characteristics into account
– Disk Scheduling
• Distribute data accesses among several disks and have them return data in parallel
– also known as Disk Striping
• Mirror disks
• Prefetch for sequential accesses– Locate blocks strategically on disk to minimize seek times and
rotational delay
14© J Singh, 2011 14
Disk Scheduling
• The Elevator Algorithm (like an elevator in a building)
1. Sort all requests by cylinder outer to inner,
2. Move arm inward and return results for each request
3. Sort all requests inner to outer,
4. Move arm outward and return results for each request
5. Repeat
Platters
Spindle
Disk head
Arm movement
Arm assembly
Tracks
Sector
15© J Singh, 2011 15
Disk Striping
• Distribute the data among several disks
• Seek time goes up, not down!
• But data transfer time can go down
R1R5R9
R2R6
R10
R3R7
R11
R4R8
R12
16© J Singh, 2011 16
Mirror disks
• Can read from either copy, whichever is faster
• Must write to both copies
Copy 1 Copy 2
17© J Singh, 2011 17
Locating Blocks for Sequential access
• ‘Next’ block concept: – blocks on same track, followed by– blocks on same cylinder, followed by– blocks on adjacent cylinder
• Blocks in a file should be arranged sequentially on disk (by `next’), to minimize seek and rotational delay.
• For a sequential scan, pre-fetching several pages at a time is a big win
CS-542 Database Management SystemsRedundant Array of Independent Disks (RAID)
19© J Singh, 2011 19
Introduction to RAID
• Arrangement of several disks that gives the abstraction of a single large disk
• Goals: Increase Performance and Reliability
• Techniques1. Data striping
• Data is partitioned• Definition: size of each partition is called the striping unit• Partitions are distributed over several disks
2. Redundancy• More disks Increased reliability• Redundant information allows reconstruction of data if disk
fails
20© J Singh, 2011 20
RAID Levels 0 and 1
• Level 0: No redundancy– Best write performance– Not best in reading. (Why?)
• Level 1: Mirrored (two identical copies)– Each disk has a mirror image– Parallel reads, a write involves two disks.– Maximum transfer rate = transfer rate of one disk
21© J Singh, 2011 21
RAID Level 4
• Arrangement– Uses n data disks and 1 parity disk– Block is striped across the n disks. – The parity disk holds XOR of the blocks.
• Read: from the n data disks• Write: update the data disks and also the parity disk• Upon crash: remaining disks are used to reconstruct the
data• Problem: performance bottleneck on the parity disk
11110000
10101010
00111000
01100010
22© J Singh, 2011 22
RAID Level 5
• Arrangement– 1/n of the cylinders of each disk are set aside for parity
• Thus the bottleneck is distributed evenly
23© J Singh, 2011 23
Disk Space Management
• Lowest layer of DBMS software manages space on disk.
• Higher levels call upon this layer to:– allocate/de-allocate a page– read/write a page
• Higher levels don’t need to know how this is done, or how free space is managed.
24© J Singh, 2011 24
Buffer Management in a DBMS
• Data must be in RAM for DBMS to operate on it!• Table of <frame#, pageid> pairs is maintained.
• DB
• MAIN MEMORY
• DISK
• disk page
• free frame
• Page Requests from Higher Levels
• BUFFER POOL
• choice of frame dictated• by replacement policy
25© J Singh, 2011 25
When a Page is Requested ...
• If requested page is not in buffer pool:– Choose a frame for replacement– If frame is dirty, write it to disk– Read requested page into chosen frame
• Pin the page and return its address.
• If requests can be predicted (e.g., sequential scans), pages can be pre-fetched
26© J Singh, 2011 26
More on Buffer Management
• Requestor of page must unpin it, and indicate whether page has been modified:
– dirty bit is used for this.
• Page in pool may be requested many times, – a pin count is used. – A page is a candidate for replacement iff pin count = 0.
• CC & recovery may entail additional I/O when a frame is chosen for replacement. (Write-Ahead Log protocol; more later.)
27© J Singh, 2011 27
Buffer Replacement Policy
• Frame is chosen for replacement by a replacement policy:
– Least-recently-used (LRU), Clock, MRU etc.
• Policy can have big impact on # of I/O’s; depends on access pattern.
• Sequential flooding: Nasty situation caused by LRU + repeated sequential scans.
– # buffer frames < # pages in file means each page request causes an I/O.
– MRU much better in this situation (but not in all situations, of course).
28© J Singh, 2011 28
Representing addresses (p1)
• We need pointers especially in object oriented databases. • Two kind of addresses:
– Physical (e.g. host, driveID, cylinder, surface, sector (block), offset)
– Logical (unique ID). • Physical addresses are very long
– 8B is the minimum – up to 16B in some systems• Example: A database that is designed to last 100 years.
– If the database grows to encompass 1 million machines and each machine creates 1 object each nanoseconds then we could have 277 objects.
– 10 bytes are needed to represent addresses for that many objects.
29© J Singh, 2011 29
• We need a map table for flexibility.
• The level of indirection gives the flexibility.
• For example, often we move records around, either within a block or from block to block.
• What about the programs that are pointing to these records?
• They are going to have dangling pointers, if they work with physical addresses.
• We only arrange the map table!
Physical address
logical physical
Logical address
Representing Addresses (p2)
30© J Singh, 2011 30
Pointer Swizzling (p1)
Typical DB structure: • Data maintained by server process, using physical or logical addresses of perhaps 8 bytes.
• Application programs are clients with their own (conventional memory) address spaces.
• When blocks and records are copied to client's memory, DB addresses must be swizzled = translated to virtual memory addresses.
– Allows conventional pointer following. – Especially important in OODBMS, where pointers as data are common.
• DBMS uses translation table
Db address memory address
31© J Singh, 2011 31
Pointer swizzling (p2)
• DBMS uses a translation table– Map Table vs. Translation Table– Logical and Physical address
are both representations for the database address.
– In contrast, memory addresses in the translation table are for copies of the corresponding object in memory.
– All addressable items in the database have entries in the map table, while only those items currently in memory are mentioned in the translation table.
DBaddr Mem-addr
database address
memory address
32© J Singh, 2011 32
Swizzling Example
Disk Memory
Block 1
Block 2
Read intomemory
Unswizzled
Swizzled
33© J Singh, 2011 33
Pointer Swizzling (p3)
• Swizzling Options:– Never swizzle. Keep a translation table of DB pointers
local pointers; consult map to follow any DB pointer. • Problem: time to follow pointers.
– Automatic swizzling. When a block is copied to memory, replace all its DB pointers by local pointers.
• Problem: requires knowing where every pointer is (use block and record headers for schema info).
• Problem: large investment if not too many pointer followings occur.
– Swizzle on demand. When a block is copied to memory, enter its own address and those of member records into translation table, but do not translate pointers within the block. If we follow a pointer, translate it the first time.
• Problem: requires a bit in pointer fields for DB/local, • Problem: extra decision at each pointer following.
34© J Singh, 2011 34
Pinned records
• Pinned record = some swizzled pointer points to it• Pointers to pinned records have to be unswizzled before the
pinned record is returned to disk• We need to know where the pointers to it are• Implementation: keep a linked list of all (swizzled) records
pointing to a record.
x y
y
y
Swizzled pointer
35© J Singh, 2011 35
Variable-Length Data
• Skipped discussion of fixed-length records, please read in the book.
• Real complexity is with variable-length records– Varying-size data items (e.g., address)– Repeating fields (stars-to-movie relationship)
• Sliding Records – Use offset table in a
block, pointing to current records.
– If a record grows, slide records around the block.
• Not enough space? Create overflow block; offset table must indicate “record moved.”
36© J Singh, 2011 36
System Catalogs
• Meta information stored in system catalogs.– For each index:
• structure (e.g., B+ tree) and search key fields– For each relation:
• name, file name, file structure (e.g., Heap file)• attribute name and type, for each attribute• index name, for each index• integrity constraints
– For each view:• view name and definition
– Plus statistics, authorization, buffer pool size, etc.
• Catalogs are themselves stored as relations
37© J Singh, 2011 37
Example: Attribute Catalog
attr_name rel_name type position
attr_name Attribute_Cat
string 1
rel_name Attribute_Cat
string 2
type Attribute_Cat
string 3
position Attribute_Cat
integer 4
employee_id
Employee string 1
DoB Employee Date 2
… … … …
• Conceptually speaking…
38© J Singh, 2011 38
Example: MySQL Information Schema
• INFORMATION_SCHEMA Tables– The INFORMATION_SCHEMA SCHEMATA Table. – The INFORMATION_SCHEMA TABLES Table. – The INFORMATION_SCHEMA COLUMNS Table. – The INFORMATION_SCHEMA STATISTICS Table. – The INFORMATION_SCHEMA USER_PRIVILEGES Table. – The INFORMATION_SCHEMA SCHEMA_PRIVILEGES Table. – The INFORMATION_SCHEMA TABLE_PRIVILEGES Table. – The INFORMATION_SCHEMA COLUMN_PRIVILEGES Table. – The INFORMATION_SCHEMA CHARACTER_SETS Table– … (Total 18 tables)
39© J Singh, 2011 39
Summary
• Disks provide cheap, non-volatile storage.– Random access, but cost depends on location of page on
disk– Important to arrange data sequentially to minimize seek
and rotation delays.
• Buffer manager brings pages into RAM.– Page stays in RAM until released by requestor.– Written to disk when frame chosen for replacement. – Frame to replace based on replacement policy.– Tries to pre-fetch several pages at a time.
40© J Singh, 2011 40
More Summary
• DBMS vs. OS File Support– DBMS needs features not found in many OSs.
• forcing a page to disk• controlling the order of page writes to disk• files spanning disks• ability to control pre-fetching and page replacement policy
based on predictable access patterns
• Two mapping structures help us map addresses– Map tables take us from logical addresses to physical
addresses– Translation tables take us from physical addresses to in-
memory addresses (where applicable)• Swizzling helps keep track of where in memory
41© J Singh, 2011 41
Even More Summary
• Catalog relations store information about relations, indexes and views.
– Information common to all records in collection.
CS 542 – Database Management SystemsData Modeling
43© J Singh, 2011 43
Data Modeling Techniques
• Entity-Relationship Modeling– E/R Diagrams allow us to sketch database schema
designs• Designs are pictures called entity-relationship diagrams
– Weak Entity Sets • Skipping, please read in book if interested, not on exam
– Converting E/R Diagrams to Relations
• Unified Modeling Language (UML)– Skipping, please read in book if interested, not on exam
• Object Definition Language (ODL)– Skipping, please read in book if interested, not on exam
44© J Singh, 2011 44
Framework for E/R
• Design is a serious business.
• The “boss” (or customer) knows they want a database, but they don’t know what they want in it.
• Sketching the key components is an efficient way to develop a working database.
44
45© J Singh, 2011 45
Entity Sets
• Entity = “thing” or object.
• Entity set = collection of similar entities.– Similar to a class in object-oriented languages.
• Attribute = property of (the entities of) an entity set.– Attributes are simple values, e.g. integers or character
strings, not structs, sets, etc.
• In an entity-relationship diagram:– Entity set = rectangle.– Attribute = oval, with a line to the rectangle representing
its entity set.
46© J Singh, 2011 46
Example: Beer Manufacturers
• Entity set Beers has two attributes, name and manf (manufacturer).
• Each Beers entity has values for these two attributes, e.g. (Bud, Anheuser-Busch)
Beers
name manf
47© J Singh, 2011 47
Relationships
• A relationship connects two or more entity sets.• It is represented by a diamond, with lines to each of the
entity sets involved.
Drinkers addrname
Beers
manfname
Bars
name
license
addr
Note:license =beer, full,none
Sells Bars sell somebeers.
Likes
Drinkers likesome beers.Frequents
Drinkers frequentsome bars.
48© J Singh, 2011 48
Relationship Set
• The current “value” of an entity set is the set of entities that belong to it.
– Example: the set of all bars in our database.
• The “value” of a relationship is a relationship set, a set of tuples with one component for each related entity set.
• For the relationship Sells, we might have a relationship set like:
Bar Beer
Joe’s Bar Bud
Joe’s Bar Miller
Sue’s Bar
Bud
Sue’s Bar
Pete’s Ale
Sue’s Bar
Bud Lite
49© J Singh, 2011 49
Multiway Relationships
• Sometimes, we need a relationship that connects more than two entity sets.
• Suppose that drinkers will only drink certain beers at certain bars.
– Our three binary relationships Likes, Sells, and Frequents do not allow us to make this distinction.
– But a 3-way relationship would.
50© J Singh, 2011 50
Example: 3-Way Relationship
• Bars • Beers
• Drinkers
• name • name• addr • manf
• name • addr
• license
• Preferences
51© J Singh, 2011 51
A Typical Relationship Set
Bar Drinker Beer
Joe’s Bar Ann Miller
Sue’s Bar Ann Bud
Sue’s Bar Ann Pete’s Ale
Joe’s Bar Bob Bud
Joe’s Bar Bob Miller
Joe’s Bar Cal Miller
Sue’s Bar Cal Bud Lite
• Each row of a relationship set typically consists of foreign keys to other tables
52© J Singh, 2011 52
Many-Many Relationships
• Focus: binary relationships, such as Sells between Bars and Beers.
• In a many-many relationship, an entity of either set can be connected to many entities of the other set.
– E.g., a bar sells many beers; a beer is sold by many bars.
53© J Singh, 2011 53
Many-One Relationships
• Some binary relationships are many -one from one entity set to another.
• Each entity of the first set is connected to at most one entity of the second set.
• But an entity of the second set can be connected to zero, one, or many entities of the first set.
• FavBeer, (Drinkers Beers) is many-one
– A drinker has at most one favBeer– A beer can be the favorite of any
number of drinkers, including zero
54© J Singh, 2011 54
One-One Relationships
• In a one-one relationship, each entity of either entity set is related to at most one entity of the other set.
• Example: Relationship Best-seller between entity sets Manfs (manufacturer) and Beers.
– A beer cannot be made by more than one manufacturer
– No manufacturer can have more than one best-seller (assume no ties).
55© J Singh, 2011 55
Representing “Multiplicity”
• Show a many-one relationship by an arrow entering the “one” side.
• Show a one-one relationship by arrows entering both entity sets.
• Rounded arrow = “exactly one,” i.e., each entity of the first set is related to exactly one entity of the target set.
56© J Singh, 2011 56
Example: Many-One Relationship
• Drinkers • Beers• Likes
• Favorite• Notice: two relationships• connect the same entity• sets, but are different.
57© J Singh, 2011 57
Example: One-One Relationship
• Consider Best-seller between Manfs and Beers.• But a beer manufacturer has to have a best-seller.
– Shown with a rounded arrow• Some beers are not the best-seller of any manufacturer
– A rounded arrow to Manfs would be inappropriate
Manfs BeersBest-seller
A manufacturer hasexactly one bestseller.
A beer is the best-seller for 0 or 1manufacturer.
58© J Singh, 2011 58
Attributes on Relationships
• Sometimes it is useful to attach an attribute to a relationship.
• Think of this attribute as a property of tuples in the relationship set.
Bars BeersSells
price
Price is a function of both the bar and the beer,not of one alone.
59© J Singh, 2011 59
Subclasses in E/R Diagrams
• Subclass – fewer entities, more properties.
• Example: Ales are a kind of beer.– Not every beer is an ale, but some
are.– In addition to all the properties
(attributes and relationships) of beers, suppose ales also have the attribute color.
• Assume subclasses form a tree.– I.e., no multiple inheritance.
• Isa triangles indicate the subclass relationship.
Beers
Ales
isa
name manf
color
60© J Singh, 2011 60
E/R vs. Object-Oriented Subclasses
• In OO, objects are in one class only.
– Subclasses inherit from superclasses.
• In contrast, E/R entities have representatives in all subclasses to which they belong.
– Rule: if entity e is represented in a subclass, then e is represented in the superclass (and recursively up the tree).
Beers
Ales
isa
name manf
color
Pete’s Ale
61© J Singh, 2011 61
Designating keys
• Show keys by underlining the attribute
Beers
Ales
isa
name manf
color
62© J Singh, 2011 62
Example: Good
Beers ManfsManfBy
name
This design gives the address of each manufacturer exactly once.
name addr
63© J Singh, 2011 63
Example: Bad
Beers ManfsManfBy
name
This design states the manufacturer of a beer twice: as an attribute and as a related entity.
name
manf
addr
64© J Singh, 2011 64
Example: Bad
Beers
name
This design repeats the manufacturer’s address once for each beer and loses the address if there are temporarily no beers for a manufacturer.
manf manfAddr
65© J Singh, 2011 65
Example: Good
• Manfs deserves to be an entity set because of the nonkey attribute addr.
• Beers deserves to be an entity set because it is the “many” of the many-one relationship ManfBy.
Beers ManfsManfBy
name name addr
66© J Singh, 2011 66
Example: Bad
• Since the manufacturer is nothing but a name, and is not at the “many” end of any relationship, it should not be an entity set.
Beers ManfsManfBy
name name
67© J Singh, 2011 67
From E/R Diagrams to Relations
• Entity set relation.– Attributes attributes.
• Relationships relations whose attributes are only:– The keys of the connected entity sets.– Attributes of the relationship itself.
68© J Singh, 2011 68
Entity Set Relation
Relation: Beers(name, manf)
68
Beers
name manf
69© J Singh, 2011 69
Relationship Relation
Drinkers BeersLikes
Likes(drinker, beer)Favorite
Favorite(drinker, beer)
Married
husband wife
Married(husband, wife)
name addr name manf
Buddies
1 2
Buddies(name1, name2)
70© J Singh, 2011 70
Combining Relations
• OK to combine into one relation:1. The relation for an entity-set E 2. The relations for many-one relationships of which E is
the “many.”
• Example: Drinkers(name, addr) and Favorite(drinker, beer) combine to make Drinker1(name, addr, favBeer).
71© J Singh, 2011 71
Risk with Many-Many Relationships
• Combining Drinkers with Likes would be a mistake. It leads to redundancy, as:
name addr beer
Sally 123 Maple BudSally 123 Maple Miller
Redundancy
72© J Singh, 2011 72
Subclasses: Three Approaches
1. Object-oriented : One relation per subset of subclasses, with all relevant attributes.
2. Use nulls : One relation; entities have NULL in attributes that don’t belong to them.
3. E/R style : One relation for each subclass:– Key attribute(s).– Attributes of that subclass.
72
73© J Singh, 2011 73
Example: Subclass Relations
Beers
Ales
isa
name manf
color
74© J Singh, 2011 74
Object-Oriented
• Beers
• Ales
• Good for queries like “find the color of ales made by Pete’s”
Name Manf
Bud Anheuser-Busch
Name Manf Color
Summerbrew
Pete’s Ales
Dark
Beers
Ales
isa
name manf
color
75© J Singh, 2011 75
E/R Style
• Beers
• Ales
• Good for queries like “find all beers, including ales, made by Pete’s”
Name Manf
Bud Anheuser-Busch
Summerbrew Pete’s
Name Color
Summerbrew Dark
Beers
Ales
isa
name manf
color
76© J Singh, 2011 76
Using NULLS
• Beers
• Saves space and does everything with one table
Name Manf Color
Summerbrew
Pete’s Ales
Dark
Bud Anheuser-Busch
NULL
Beers
Ales
isa
name manf
color
77© J Singh, 2011 77
Data Models for NoSQL Databases
• Class Discussion at Next Meeting. – How would you represent a many-to-many relationships
in?• Amazon SimpleDB?• Cassandra?• Google App Engine?• MongoDB?• Redis?• Other?
– Inviting a 3-minute presentation (on 3/21) for 20 bonus points
• Only one presentation per DB• Please volunteer by Tuesday noon if interested• I will let you know by Wednesday noon if you were selected
78© J Singh, 2011 78
Summary
• Data Modeling is an essential part of designing an application– Intersects business and technology– Essential elements
• Entities• Relationships
– Is-a relationships– Multiplicity
– Has to be done with an eye toward the long term• (But has to avoid analysis paralysis)• Attributes can be added later but Entities and Relationships are
baked-in in the beginning and very hard to change later– Pay particular attention to multiplicity of relationships
• Best to separate modeling from “table design”– Needed for all databases, Relational or not.
79© J Singh, 2011 79
Next meetings
• March 21: Sort and Join Processing– Sort: Chapter 15– Join: Sections 16.1 – 16.4