DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering...
Transcript of DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering...
![Page 1: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/1.jpg)
Tusbupt!Jesfpt
![Page 2: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/2.jpg)
DATA
INDEX——HOW——
TO STORE ——DATA——
![Page 3: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/3.jpg)
DATA
INDEX
data structure decisions define the algorithms that access data
ALGORITHMS
![Page 4: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/4.jpg)
DATA
INDEX
[7,4,2,6,1,3,9,10,5,8]
ALGORITHMSunordered
![Page 5: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/5.jpg)
DATA
INDEX
[7,4,2,6,1,3,9,10,5,8]
ALGORITHMSunordered
![Page 6: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/6.jpg)
DATA
INDEX
[7,4,2,6,1,3,9,10,5,8]
ALGORITHMS[1,2,3,4,5,6,7,8,9,10]
unordered
ordered
![Page 7: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/7.jpg)
DATA
INDEX
ALGORITHMS
![Page 8: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/8.jpg)
DATA
INDEX
ALGORITHMS
DATA SYSTEMS
![Page 9: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/9.jpg)
DATA STRUCTURES
DEFINE PERFORMANCE
2020
spee
d COMPUTE
DATA MOVEMENT
![Page 10: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/10.jpg)
2020
spee
d COMPUTE
DATA MOVEMENT
register = this room
disk = Pluto memory = nearby city
Jim Gray, Turing Award 1998
caches = this city
![Page 11: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/11.jpg)
Read Update
Memory
no perfect structure
amplification
![Page 12: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/12.jpg)
Read Update
Memory
Mem
ory
Read
Upda
teno perfect structure
amplification
![Page 13: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/13.jpg)
Read Update
Memory
Mem
ory
Read
Upda
te
differential approximate
pointtree
no perfect structure
amplification
![Page 14: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/14.jpg)
Read Update
Memory
Mem
ory
Read
Upda
te
differential approximate
pointtree
no perfect structure
amplification
Array
Linked-List
Skip-List
Trie
Hash-Table
Sorted Array
B-tree
![Page 15: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/15.jpg)
How do I make my data system run x times as fast? (sql,nosql,bigdata, …)
![Page 16: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/16.jpg)
How do I make my data system run x times as fast?
How do I minimize my bill in the cloud?
(sql,nosql,bigdata, …)
![Page 17: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/17.jpg)
How do I make my data system run x times as fast?
How do I minimize my bill in the cloud?
(sql,nosql,bigdata, …)
How do I extend the lifetime of my hardware?
![Page 18: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/18.jpg)
How do I make my data system run x times as fast?
How do I minimize my bill in the cloud?
How to accelerate statistics computation for data science/ML?
(sql,nosql,bigdata, …)
How do I extend the lifetime of my hardware?
![Page 19: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/19.jpg)
How do I make my data system run x times as fast?
How do I minimize my bill in the cloud?
How do I train my neural network x times faster?
How to accelerate statistics computation for data science/ML?
(sql,nosql,bigdata, …)
How do I extend the lifetime of my hardware?
![Page 20: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/20.jpg)
NEW APPLICATIONS
![Page 21: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/21.jpg)
NEW APPLICATIONS
existing systems need to change too
![Page 22: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/22.jpg)
NEW APPLICATIONS
existing systems need to change too
WORKLOAD HARDWARE
ADAPT
![Page 23: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/23.jpg)
NEW APPLICATIONS
existing systems need to change too
WORKLOAD HARDWARE
ADAPT
IMPROVE WITHIN A BUDGET
WHAT WILL BREAK MY SYSTEM?
REASON
![Page 24: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/24.jpg)
more data
new applications
new h/w
continuous need
for newstorage solutions
![Page 25: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/25.jpg)
fundamental of storage learning outcome
![Page 26: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/26.jpg)
fundamental of storage learning outcome
software engineering data-driven startup research
![Page 27: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/27.jpg)
fundamental of storage learning outcome
software engineering data-driven startup researchdata structures, SQL, NoSQL, Big Data, Neural Networks, Statistics, Data Science
![Page 28: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/28.jpg)
fundamental of storage learning outcome
software engineering data-driven startup researchdata structures, SQL, NoSQL, Big Data, Neural Networks, Statistics, Data Science
small set of principles across all fields
![Page 29: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/29.jpg)
![Page 30: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/30.jpg)
first 4 weeks: introduction to research problems/thinking through lectures
![Page 31: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/31.jpg)
first 4 weeks: introduction to research problems/thinking through lectures
Reading research papers
Open ended projects/research
![Page 32: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/32.jpg)
as of week 5: discussions/ presentations
![Page 33: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/33.jpg)
as of week 5: discussions/ presentations
interaction: in and out of classM/W/F OH/labs, Sat/Sun remote OH
![Page 34: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/34.jpg)
There is no such thing as a
wrong question/answer!!!!
as of week 5: discussions/ presentations
interaction: in and out of classM/W/F OH/labs, Sat/Sun remote OH
![Page 35: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/35.jpg)
Recent Research Papers
review and slides should focus on
what is the problem why is it important
why is it hard why existing solutions do not work
what is the core intuition for the solution solution step by step
does the paper prove its claims exact setup of analysis/experiments are there any gaps in the logic/proof
possible next steps
* follow a few citations to gain more background
Each student: 2 reviews per week/1 presentation
![Page 36: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/36.jpg)
Recent Research Papers
review and slides should focus on
what is the problem why is it important
why is it hard why existing solutions do not work
what is the core intuition for the solution solution step by step
does the paper prove its claims exact setup of analysis/experiments are there any gaps in the logic/proof
possible next steps
* follow a few citations to gain more background
Each student: 2 reviews per week/1 presentation
learn to judge constructively
learn to present
learn to prepare slides
![Page 37: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/37.jpg)
systems project research project
semester project: due in the end of semester + a midway check in (early March,10%)
![Page 38: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/38.jpg)
systems project
individual projectNoSQL, in c/c++
research project
semester project: due in the end of semester + a midway check in (early March,10%)
![Page 39: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/39.jpg)
systems project
individual projectNoSQL, in c/c++
research project
groups of threeNoSQL, Neural Networks
Periodic Table of Data Structures
semester project: due in the end of semester + a midway check in (early March,10%)
![Page 40: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/40.jpg)
systems project
individual projectNoSQL, in c/c++
research project
groups of threeNoSQL, Neural Networks
Periodic Table of Data Structures
semester project: due in the end of semester + a midway check in (early March,10%)
![Page 41: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/41.jpg)
ACM Special Interest Group In Data Management (SIGMOD)Undergrad Research Competition
first prize in 2016, 2017, 2018, 2019Adaptive Denormalization Evolving Trees Splaying LSM-Trees Adaptive NoSQL
![Page 42: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/42.jpg)
ACM Special Interest Group In Data Management (SIGMOD)Undergrad Research Competition
first prize in 2016, 2017, 2018, 2019Adaptive Denormalization Evolving Trees Splaying LSM-Trees Adaptive NoSQL
Design continuums at CIDR 2019, two projects in SIGMOD 2020 finals
![Page 43: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/43.jpg)
piazza forum
all announcements & discussions as of week 2
link on class website - check out usage guidelines
![Page 44: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/44.jpg)
piazza forum
all announcements & discussions as of week 2
link on class website - check out usage guidelines
classes are recorded (links on class website)
![Page 45: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/45.jpg)
piazza forum
all announcements & discussions as of week 2
link on class website - check out usage guidelines
classes are recorded (links on class website)
NO LAPTOP/PHONE POLICYclass is based on participation!
![Page 46: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/46.jpg)
piazza forum
all announcements & discussions as of week 2
link on class website - check out usage guidelines
classes are recorded (links on class website)
NO LAPTOP/PHONE POLICYclass is based on participation!
Project: 40% Midway Check-in:10% Discussion: 20% Presentation: 15% Reviews: 15%
![Page 47: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/47.jpg)
Check out: syllabus, preparation readings, project 0, systems project, online sections
http://daslab.seas.harvard.edu/classes/cs265/
Get familiar with the very basics of traditional database architectures:Architecture of a Database System. By J. Hellerstein, M. Stonebraker and J. Hamilton. Foundations and Trends in Databases, 2007
Get familiar with very basics of modern database architectures:The Design and Implementation of Modern Column-store Database Systems. By D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden. Foundations and Trends in Databases, 2013
Get familiar with the very basics of modern large scale systems:Massively Parallel Databases and MapReduce Systems. By Shivnath Babu and Herodotos Herodotou. Foundations and Trends in Databases, 2013
![Page 48: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/48.jpg)
tvcbsob
Teaching Fellows:
Off class discussions are key! question on readings, ideas, help with code/analysis
csjbo tjrjboh
![Page 49: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/49.jpg)
Prerequisitesknowledge of algorithms, data structures, hardware, systems
![Page 50: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/50.jpg)
Prerequisitesknowledge of algorithms, data structures, hardware, systems
Systems track allows taking the class without all prerequisites Research track: open to CS165 students
![Page 51: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/51.jpg)
Prerequisitesknowledge of algorithms, data structures, hardware, systems
Systems track allows taking the class without all prerequisites Research track: open to CS165 students
(165/265 will not be offered in fall 2020/spring2021)
![Page 52: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/52.jpg)
questions on logistics?
![Page 53: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/53.jpg)
BASICS of storageIntro to RESEACH topics
Discussion phase/presentation as of week 5
Next few classes:
![Page 54: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/54.jpg)
periodic table of data [email protected]
![Page 55: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/55.jpg)
registers
on chip cache
on board cache
memory
disk
CPU
memory wall
chea
per
fast
er
SRAM
DRAM
~1ns
~10ns
~100ns
cache miss: looking for something which is not in the cache
memory miss: looking for something which is not in memory
time
speed cpu
mem
![Page 56: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/56.jpg)
registers
on chip cache
on board cache
memory
disk
CPU
memory wall
chea
per
fast
er
SRAM
DRAM
~1ns
~10ns
~100ns
cache miss: looking for something which is not in the cache
memory miss: looking for something which is not in memory
time
speed cpu
mem
![Page 57: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/57.jpg)
Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations award
disk100Kx Pluto
2 years
memory100x New York1.5 hours
on board cache10x this building
10 min
on chip cache2x this room
1 min
registers my head~0
![Page 58: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/58.jpg)
…
need to only read x… but have to read all of page 1
page1 page2 page3
data value x
registers
on chip cache
on board cachememory
disk
CPU
data
mov
e
![Page 59: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/59.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
5 10 6 4 12
(size=120 bytes)
2 8 9 7 6 7 11 3 9 6
![Page 60: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/60.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
scan
5 10 6 4 12(size=120 bytes)
2 8 9 7 6 7 11 3 9 6
![Page 61: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/61.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
scan
5 10 6 4 12(size=120 bytes)
2 8 9 7 6
4
7 11 3 9 6
![Page 62: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/62.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
scan
40 bytes
5 10 6 4 12(size=120 bytes)
2 8 9 7 6
4
7 11 3 9 6
![Page 63: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/63.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
scan scan
40 bytes
5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4
7 11 3 9 6
![Page 64: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/64.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
scan scan
40 bytes
5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2
7 11 3 9 6
![Page 65: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/65.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
scan scan
5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2
7 11 3 9 6
80 bytes
![Page 66: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/66.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
(size=120 bytes) 2 8 9 7 6 4 2
7 11 3 9 6
80 bytes
![Page 67: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/67.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6
scan
80 bytes
![Page 68: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/68.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6
scan
3
80 bytes
![Page 69: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/69.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6
scan
3
120 bytes
![Page 70: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/70.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
5 10 6 4 12
(size=120 bytes)
2 8 9 7 6 7 11 3 9 6
an oracle gives us the positions
![Page 71: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/71.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
oracle
5 10 6 4 12(size=120 bytes)
2 8 9 7 6 7 11 3 9 6
an oracle gives us the positions
![Page 72: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/72.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
oracle
5 10 6 4 12(size=120 bytes)
2 8 9 7 6
4
7 11 3 9 6
an oracle gives us the positions
![Page 73: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/73.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
oracle
40 bytes
5 10 6 4 12(size=120 bytes)
2 8 9 7 6
4
7 11 3 9 6
an oracle gives us the positions
![Page 74: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/74.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
oracle oracle
40 bytes
5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4
7 11 3 9 6
an oracle gives us the positions
![Page 75: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/75.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
oracle oracle
40 bytes
5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2
7 11 3 9 6
an oracle gives us the positions
![Page 76: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/76.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
oracle oracle
5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2
7 11 3 9 6
80 bytesan oracle gives us the positions
![Page 77: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/77.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
(size=120 bytes) 2 8 9 7 6 4 2
7 11 3 9 6
80 bytesan oracle gives us the positions
![Page 78: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/78.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6
oracle
80 bytesan oracle gives us the positions
![Page 79: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/79.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6
oracle
3
80 bytesan oracle gives us the positions
![Page 80: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/80.jpg)
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6
memory level N
memory level N-1
query x<5
page size: 5x8 bytes
(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6
oracle
3
120 bytesan oracle gives us the positions
![Page 81: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/81.jpg)
when does it make sense to have an oraclehow can we minimize the cost
…5 10 6 4 12 2 8 9 7 6 7 11 3 9 65 10 6 4 12 2 8 9 7 6 7 11 3 9 6
e.g., query x<5
![Page 82: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/82.jpg)
algorithm system design = not just computation
![Page 83: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/83.jpg)
CPU DATA MOVEMENT
MEMORY REQUIREMENT
SPACE REQUIREMENT
(ENERGY)
ROBUSTNESS
![Page 84: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/84.jpg)
CPU DATA MOVEMENT
MEMORY REQUIREMENT
SPACE REQUIREMENT
(ENERGY)
SQL, NoSQL, Graph, Neural Nets, Statistics, Vision
ROBUSTNESS
![Page 85: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/85.jpg)
CPU DATA MOVEMENT
MEMORY REQUIREMENT
SPACE REQUIREMENT
(ENERGY)
SQL, NoSQL, Graph, Neural Nets, Statistics, Vision
TIME —— CLOUD COSTS
ROBUSTNESS
![Page 86: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/86.jpg)
Check out: syllabus, preparation readings, project 0, systems project, online sections
http://daslab.seas.harvard.edu/classes/cs265/
Get familiar with the very basics of traditional database architectures:Architecture of a Database System. By J. Hellerstein, M. Stonebraker and J. Hamilton. Foundations and Trends in Databases, 2007
Get familiar with very basics of modern database architectures:The Design and Implementation of Modern Column-store Database Systems. By D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden. Foundations and Trends in Databases, 2013
Get familiar with the very basics of modern large scale systems:Massively Parallel Databases and MapReduce Systems. By Shivnath Babu and Herodotos Herodotou. Foundations and Trends in Databases, 2013
![Page 87: DATA - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics,](https://reader034.fdocuments.net/reader034/viewer/2022042806/5f6b36e72d57076dee64fd7d/html5/thumbnails/87.jpg)
Tusbupt!Jesfpt