0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x...
Transcript of 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x...
![Page 1: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/1.jpg)
1-1
Qin Zhang
B561 Advanced Database
Concepts
§0 Introduction
![Page 2: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/2.jpg)
2-1
Self introduction: my research interests
• Algorithms for Big Data:streaming/sketching algorithms;algorithms on distributed data;I/O-efficient algorithms;data structures;
• Complexity:communication complexity.
I am a theoretician, and occasionally work ondatabases and data mining
![Page 3: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/3.jpg)
2-2
Self introduction: my research interests
• Algorithms for Big Data:streaming/sketching algorithms;algorithms on distributed data;I/O-efficient algorithms;data structures;
• Complexity:communication complexity.
I am a theoretician, and occasionally work ondatabases and data mining
You may ask: “why do you teach databases(and probably make our lives harder)”?
![Page 4: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/4.jpg)
2-3
Self introduction: my research interests
• Algorithms for Big Data:streaming/sketching algorithms;algorithms on distributed data;I/O-efficient algorithms;data structures;
• Complexity:communication complexity.
I am a theoretician, and occasionally work ondatabases and data mining
You may ask: “why do you teach databases(and probably make our lives harder)”?
Hope you will not ask me again after this course :)I am learning together with you.
![Page 5: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/5.jpg)
3-1
What does a typicalundergrad databasecourse cover?
![Page 6: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/6.jpg)
4-1
How to represent data?
How to represent the data in the computer?
Title Year Length Type
Star Wars 1977 124 colorMighty Ducks 1991 104 colorWayne’s World 1992 95 color
Movie
Sci-Fi Cartoon Comedy
StarWars,1977,. . .
MightyDucks,1991,. . .
Wayne’sWorld,1992,. . .
OR?
![Page 7: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/7.jpg)
4-2
How to represent data?
How to represent the data in the computer?
Title Year Length Type
Star Wars 1977 124 colorMighty Ducks 1991 104 colorWayne’s World 1992 95 color
Movie
Sci-Fi Cartoon Comedy
StarWars,1977,. . .
MightyDucks,1991,. . .
Wayne’sWorld,1992,. . .
OR?
OR?
![Page 8: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/8.jpg)
5-1
How to operate on data?
Given the data, say, a set of tables, how to answer queries?
Difficulty: Queries may depend crucially on the data in all tables.
Q: Find all products under price 200 manufactured in Japan?
cName StockPrice Country
GizmoWorks 25 USACanon 65 JapanHitachi 15 Japan
PName Price Category Manufacturer
Gizmo 19.99 Gadgets GizmoWorksPowergizmo 29.99 Gadgets GizmoWorksSingleTouch 149.99 Photography CanonMultiTouch 203.99 Household Hitachi
Company
Product
![Page 9: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/9.jpg)
6-1
• SQL
SELECT x .PName, x .PriceFROM Product x , Company y
WHERE x .Manufacturer=y .CNameAND y .Country=‘Japan’AND x .Price ≤ 200
• Relational AlgebraπPName, Price
(σPrice≤200∧Country=‘Japan’(Product 1Manufacturer=CName Company))
How to operate on data? (cont.)
CName StockPrice Country
GizmoWorks 25 USACanon 65 JapanHitachi 15 Japan
PName Price Category Manufacturer
Gizmo 19.99 Gadgets GizmoWorksPowergizmo 29.99 Gadgets GizmoWorksSingleTouch 149.99 Photography CanonMultiTouch 203.99 Household Hitachi
Company
Product
![Page 10: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/10.jpg)
7-1
How to speed up the operation?
Relational operations can sometimes be computed muchfaster if we have precomputed a suitable data structure onthe data. This is called Indexing.
How to speed up the operation?
![Page 11: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/11.jpg)
7-2
How to speed up the operation?
Relational operations can sometimes be computed muchfaster if we have precomputed a suitable data structure onthe data. This is called Indexing.
How to speed up the operation?
Most notably, two kinds of index structures are essential todatabase performance:
1. B-trees.
2. External hash tables.
For example, hash tables may speed up relationaloperations that involve finding all occurrences in a relationof a particular value.
![Page 12: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/12.jpg)
8-1
How to make a good operation plan?
R(A,B,C ,D),S(E ,F ,G )
Find all pairs (x , y), x ∈ R, y ∈ S such that(1) x .D = y .E , (2) x .A = 5 and (3) y .G = 9
σA=5∧G=9(R 1D=E S) = σA=5(R) 1D=E σG=9(S)
How to optimize the orders of the operations?
Q: Use the LHS or RHS?
![Page 13: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/13.jpg)
9-1
How to deal with transactions?
Transactions with the ideal ACID properties resolve thesemantic problems that arise when many concurrent usersaccess and change the same database.
• Atomicity (= recovery)• Consistency• Isolation (= concurrency control)• Durability
![Page 14: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/14.jpg)
9-2
How to deal with transactions?
Transactions with the ideal ACID properties resolve thesemantic problems that arise when many concurrent usersaccess and change the same database.
We will talk about how transactions are implemented usinglocking and timestamp mechanisms.
This knowledge is useful in database programming, e.g., itmakes it possible in some cases to avoid (or reduce)rollbacks of transactions, and generally make transactionswait less for each other.
• Atomicity (= recovery)• Consistency• Isolation (= concurrency control)• Durability
![Page 15: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/15.jpg)
10-1
Summarize
Database =
Logic(express the query)
Algorithm(solve the query)
System(implementation)
![Page 16: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/16.jpg)
10-2
Summarize
Database =
Logic(express the query)
Algorithm(solve the query)
System(implementation)
Concept (our focus)
Implementation
(see B662 Database Systemand Internal Design)
![Page 17: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/17.jpg)
10-3
Summarize
Database =
Logic(express the query)
Algorithm(solve the query)
System(implementation)
Concept (our focus)
Implementation
(see B662 Database Systemand Internal Design)
Data Representation, RelationalAlgebra, SQL (Datalog), etc.
Indexing, Query Optimization,Concurrency Control, etc.
![Page 18: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/18.jpg)
10-4
Summarize
Database =
Logic(express the query)
Algorithm(solve the query)
System(implementation)
Concept (our focus)
Implementation
(see B662 Database Systemand Internal Design)
Data Representation, RelationalAlgebra, SQL (Datalog), etc.
Indexing, Query Optimization,Concurrency Control, etc.
And you need math!!
![Page 19: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/19.jpg)
11-1
What’s more in this course?
![Page 20: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/20.jpg)
12-1
Advanced topics
Beyond ”SQL, Relational Algebra, Data Models,Storage, Views and Indexing, Query Processing, QueryOptimization, Transaction Recovery, ConcurrencyControl”
I will give you a taste of1. Data Privacy (Data Suppression, Differential
Privacy)2. External Memory a.k.a. I/O-Efficient
Algorithms (Sorting, List Ranking)3. Streaming Algorithms (Sampling, Heavy Hitters,
Distinct Elements)4. Data Integration / Cleaning (Deduplication)5. MapReduce
![Page 21: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/21.jpg)
13-1
Other important topics in databases
More but probably will not cover
1. Tree-based data models e.g., XML2. Graph-based data models e.g., RDF3. Spatial databases4. Parallel and Distributed databases
partly covered in MapReduce5. Social Networks6. Uncertainty in databases
etc.
![Page 22: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/22.jpg)
14-1
Tentative course plan
Part 0 : Introductions
Part 1 & 2 : Basics
– SQL, Relational Algebra– Data Models, Storage, Indexing
Part 3 : Optimization
Part 4 : Trasactions
Part 5 : Data Privacy
Part 6 : I/O-Efficient Algorithms
Part 7 : Streaming Algorithms
Part 8 : Data Integration
Part 9 : MapReduce
![Page 23: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/23.jpg)
14-2
Tentative course plan
Part 0 : Introductions
Part 1 & 2 : Basics
– SQL, Relational Algebra– Data Models, Storage, Indexing
Part 3 : Optimization
Part 4 : Trasactions
Part 5 : Data Privacy
Part 6 : I/O-Efficient Algorithms
Part 7 : Streaming Algorithms
Part 8 : Data Integration
Part 9 : MapReduce
We will also have some student presentations at the end of the course
![Page 24: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/24.jpg)
15-1
Resources
Main reference book (we will go beyond this)
• Database System Concepts
by UllSilberschatz, Korth and Sudarshan,
6th Edition
Other reference books (undergrad textbooks ... )
• Database Management Systems
by Ramakrishnan and Guhrke, 3rd Edition
• Database Systems: The Complete Book
by Hector Garcia-Molina, Jeff Ullman
and Jennifer Widom, 2nd Edition
![Page 25: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/25.jpg)
16-1
Resources (cont.)
Other reference books (cont.):
• Readings in Database Systems “Red book”
Hellerstein and Stonebraker, eds., 4th Edition
(Will be one of our readings)
• Foundations of Databases: The Logical Level
“Alice book”
by Abiteboul, Hull, Vianu
• Concurrency Control and Recovery in DatabaseSystems a
by Bernstein, Hadzilacos, Goodman
ahttp://research.microsoft.com/en-us/people/philbe/
ccontrol.aspx
![Page 26: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/26.jpg)
17-1
Resources (cont.)
Other reference books (cont.):
• Algorithms and Data Structures
for External Memory a
by Vitter
ahttp://www.ittc.ku.edu/~jsv/Papers/Vit.IO_book.pdf
• Data Streams: Algorithms and Applications a
by S. Muthukrishnan
ahttp://www.cs.rutgers.edu/ muthu/stream-1-1.ps
![Page 27: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/27.jpg)
17-2
Resources (cont.)
Other reference books (cont.):
• Algorithms and Data Structures
for External Memory a
by Vitter
ahttp://www.ittc.ku.edu/~jsv/Papers/Vit.IO_book.pdf
• Data Streams: Algorithms and Applications a
by S. Muthukrishnan
ahttp://www.cs.rutgers.edu/ muthu/stream-1-1.ps
These are surely not enough, and sometimes dated. Doyou want to learn more? Reading original papers!
In fact, some of my slides are directly from VLDB toturials
![Page 28: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/28.jpg)
18-1
Instructors
Instructor: Qin ZhangEmail: [email protected] hours: Tuesday 2:45pm-3:45pm(Lindley 215E temporary, then Lindley 430A)
Associate Instructors:
• Erfan Sadeqi Azer• Le Liu• Yifan Pan• Ali Varamesh• Prasanth Velamala
Office hours: Posted on course website
![Page 29: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/29.jpg)
19-1
Grading
Assignments 50% : Three written assignments (each 10%).Solutions should be typeset in LaTeX(highly recommended) or Word.
And one reading assignment (20%)(next slide for details)
Selected/volunteer students will givepresentations
Exams 50% : Mid-term (20%) and Final (30%).
![Page 30: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/30.jpg)
19-2
Grading
Assignments 50% : Three written assignments (each 10%).Solutions should be typeset in LaTeX(highly recommended) or Word.
And one reading assignment (20%)(next slide for details)
Selected/volunteer students will givepresentations
Exams 50% : Mid-term (20%) and Final (30%).
Use A,B, . . . for each item (assignments, exams). Final gradewill be a weighted average (according to XX%).
![Page 31: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/31.jpg)
20-1
Reading assignment
One or a group of two read some (1 to +∞) papers/surveys/articlesand write a report (4 pages for one, and 8 pages for a group of two)on what you think of the articles you read (not just a repeat of whatthey have said).
Topics can be found in redbookhttp://redbook.cs.berkeley.edu/bib4.html,and more topics on the course website “More reading topics” (googlethe papers / surveys yourself; contact AI if you cannot find it).
Selected students/groups (volunteer first) will give 25mins talks(20mins presentation +5mins Q&A) in class. The best 1/3individuals/groups will get a bonus in their final grades. A penaltywill be given if you agree to give a talk but cannot do at the end,while the quality of the talk is irrelevant.
![Page 32: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/32.jpg)
21-1
LaTeX
LaTeX: Highly recommended tools forassignments/reports
1. Read wiki articles:http://en.wikipedia.org/wiki/LaTeX
2. Find a good LaTeX editor.
3. Learn how to use it, e.g., read “A Not So ShortIntroduction to LaTeX 2e” (Google it)
![Page 33: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/33.jpg)
22-1
Prerequisite
Participants are expected to have a background inalgorithms and data structures. For example, have taken
1. C241 Discrete Structures for Computer Science
2. C343 Data Structures
3. B403 Introduction to Algorithm Design and Analysis
or equivalent courses, and know some basics of databases.
![Page 34: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/34.jpg)
23-1
Frequently asked questions
Is this a course good for my job hunting in industry?
Yes, if you get to know some advanced concepts in databases, thatwill certainly help.
But, this is a course on theoretical foundations of databases, butnot designed for teaching commercially available techniquesand not a programming language (SQL? PHP?) course,and not a “hands on” course (this is not a course for professionaltraining; this is a graduate course in a major research universitythus should be much more advanced)
![Page 35: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/35.jpg)
23-2
Frequently asked questions
Is this a course good for my job hunting in industry?
Yes, if you get to know some advanced concepts in databases, thatwill certainly help.
But, this is a course on theoretical foundations of databases, butnot designed for teaching commercially available techniquesand not a programming language (SQL? PHP?) course,and not a “hands on” course (this is not a course for professionaltraining; this is a graduate course in a major research universitythus should be much more advanced)
I haven’t taken B403 “Introduction to Algorithm Designand Analysis” or equivalent courses. Can I take thecourse? Or, will this course fit me?
Generally speaking, this is an advanced course. It will be difficultif you do not have enough background. You can take intoconsideration the touch-base exam.
![Page 36: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/36.jpg)
24-1
The goal of this course
Open / change your
views of the world
(of databases)
![Page 37: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/37.jpg)
24-2
The goal of this course
Open / change your
views of the world
(of databases)
Seriously, it is not just SQL programming.
Read “The relational model is dead, SQL is dead,
and I don’t feel so good myself”
![Page 38: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/38.jpg)
25-1
Big Data
![Page 39: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/39.jpg)
26-1
• : over 2.5 petabytes of sales transactions• : an index of over 19 billion web pages• : over 40 billion of pictures• . . .
Big Data
Big data is everywhere
![Page 40: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/40.jpg)
26-2
• : over 2.5 petabytes of sales transactions• : an index of over 19 billion web pages• : over 40 billion of pictures• . . .
Big Data
Big data is everywhere
Nature ’06 Nature ’08 Economist ’10CACM ’08
Magazine covers
![Page 41: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/41.jpg)
27-1
• Retailer databases: Amazon, Walmart
• Logistics, financial & health data: Stock prices
• Social network: Facebook, twitter
• Pictures by mobile devices: iphone
• Internet traffic: IP addresses
• New forms of scientific data: Large Synoptic Survey Telescope
Source and challenge
Source
![Page 42: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/42.jpg)
27-2
• Retailer databases: Amazon, Walmart
• Logistics, financial & health data: Stock prices
• Social network: Facebook, twitter
• Pictures by mobile devices: iphone
• Internet traffic: IP addresses
• New forms of scientific data: Large Synoptic Survey Telescope
Source and challenge
Source
• Volume
• Velocity
• Variety (Documents, Stock records, Personal profiles,Photographs, Audio & Video, 3D models, Location data, . . . )
Challenge
![Page 43: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/43.jpg)
27-3
• Retailer databases: Amazon, Walmart
• Logistics, financial & health data: Stock prices
• Social network: Facebook, twitter
• Pictures by mobile devices: iphone
• Internet traffic: IP addresses
• New forms of scientific data: Large Synoptic Survey Telescope
Source and challenge
Source
• Volume
• Velocity
• Variety (Documents, Stock records, Personal profiles,Photographs, Audio & Video, 3D models, Location data, . . . )
Challenge
} The main technical challenges
![Page 44: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/44.jpg)
28-1
What does Big Data really mean?
We don’t define Big Data in terms of TB, PB, EB, . . .
The data is too big to fit in memory. What can we do?
![Page 45: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/45.jpg)
28-2
What does Big Data really mean?
We don’t define Big Data in terms of TB, PB, EB, . . .
The data is too big to fit in memory. What can we do?
Store them in the disk, and read/write a block of data each time
![Page 46: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/46.jpg)
28-3
What does Big Data really mean?
We don’t define Big Data in terms of TB, PB, EB, . . .
The data is too big to fit in memory. What can we do?
Processing one by one as they come,and throw some of them away on the fly.
Store them in the disk, and read/write a block of data each time
![Page 47: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/47.jpg)
28-4
What does Big Data really mean?
We don’t define Big Data in terms of TB, PB, EB, . . .
The data is too big to fit in memory. What can we do?
Processing one by one as they come,and throw some of them away on the fly.
Store in multiple machines, which collaborate via communication
Store them in the disk, and read/write a block of data each time
![Page 48: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/48.jpg)
28-5
What does Big Data really mean?
We don’t define Big Data in terms of TB, PB, EB, . . .
The data is too big to fit in memory. What can we do?
Processing one by one as they come,and throw some of them away on the fly.
Store in multiple machines, which collaborate via communication
RAM model does not fit
A processor and an infinite size memory
Probing each cell of the memory has a unit cost
RAM
CPU
Store them in the disk, and read/write a block of data each time
![Page 49: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/49.jpg)
29-1
Big Data:
A marketing buzzword??
![Page 50: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/50.jpg)
29-2
Big Data:
A marketing buzzword??
A good reading topic
![Page 51: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/51.jpg)
30-1
Popular models for big data
(see another slides)
![Page 52: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/52.jpg)
31-1
Summary for the introduction
We have discussed topics that will be covered in thiscourse
We have introduced some models for big datacomputation.
We have talked about the course plan and assessment.
![Page 53: 0 Introductionhomes.sice.indiana.edu/qzhangcs/B561-14-fall-advanceDB/intro.pdf6-1 SQL SELECT x .PName, x .Price FROM Product x , Company y WHERE x .Manufacturer=y .CName AND y .Country=‘Japan’](https://reader034.fdocuments.net/reader034/viewer/2022042803/5f49c1c41645244c165b541e/html5/thumbnails/53.jpg)
32-1
Thank you!Questions?
A few introductory slides are based on RasmusPagh’s slideshttp://www.itu.dk/people/pagh/ADBT06/