Master of Science in Data Science Syllabus
Transcript of Master of Science in Data Science Syllabus
Master of Science in Data Science
Syllabus
First Course Algorithms for Searching, Sorting, and Indexing
About this Course: This course covers basics of algorithm design and analysis, as well as
algorithms for sorting arrays, data structures such as priority queues, hash functions, and
applications such as Bloom filters
Duration 4 weeks
Will learn
1. Explain fundamental concepts for algorithmic searching and sorting
2. Design basic algorithms to implement sorting, selection, and hash functions in
heap data structures
3. Describe heap data structures and analyze heap components, such as arrays and
priority queues
Skills
1. Analysis of Algorithms
2. Hash tables
3. Algorithm Design
4. Python Programming
5. Data Structure Design
Syllabus - What you will learn from this course
1. Basics of Algorithms Through Searching and Sorting
2. Heaps and Hashtable Data Structures
3. Randomization: Quicksort, Quickselect, and Hashtables
4. Applications of Hashtables
Week one: - (7 videos (Total 202 min, 9 readings, 5 quizzes)
Basics of Algorithms
Through Searching
and Sorting
Videos Readings Practice Exercises
In this module the
student will learn the
very basics of
algorithms through
three examples:
insertion sort (sort an
array in
ascending/descending
order); binary search:
search whether an
element is present in a
sorted array and if yes,
find its index; and
merge sort (a faster
method
for sorting an array).
Through these
algorithms the student
will be introduced to
the analysis
of algorithms -- i.e,
proving that the
algorithm is correct for
the task it has been
designed for
and establishing a
bound on how the time
taken to execute the
algorithm grows as a
function
1- What is an
Algorithm?
28m
2- An
Introduction
Through the
Insertion Sort
Algorithm
44m
3- Time and
Space
Complexity
30m
4- Asymptotic
Notation 31m
5- Binary Search
22m
6- Merge Sort
Algorithm,
Analysis and
Proof of
Correctness
28m
7- Pitfalls and
Logarithms
15m
1- Important
Prerequisites
10m
2- Logistics:
Textbook and
Readings
10m
3- CLRS
Chapter
110m
4- Overview of
Module
110m
5- CLRS
Chapter
210m
6- CLRS
Chapter
310m
7- Binary
Search
Lecture
Slides10m
8- Jupyter
Notebook on
Binary
Search10m
9- Notes on
MergeSort
10m
1. Insertion Sort and
Running Times 30m
2. Asymptotic
Notation and
Complexity 30m
3. Binary Search
30m
4. Mergesort
Algorithm 30m
of input. The student is
also exposed to the
notion of a faster
algorithm and
asymptotic
complexity through the
O, big-Omega and big-
Theta notations
Week Two:- (5 videos (Total 120 min), 6 readings, 6 quizzes)
Heaps and Hash table
Data Structures Videos Readings Practice Exercises
In this module, the
student will learn
about the basics of
data structures that
organize data to
make certain types of
operations faster. The
module starts with a
broad introduction to
data structures and
talks about some
simple data structures
such as first-in first out
queues and
last-in first out stack.
Next, we introduce the
heap data structure and
the basic properties of
heaps. This is followed
by algorithms for
insertion, deletion and
finding the minimum
1. A Simple
Data Structure:
The Dynamic
Array 20m
2. Heap,
Min/Max-
Heaps and
Properties of
Heaps 24m
3. Heap
Primitives:
Bubble
Up/Bubble
Down 29m
4. Priority
Queues,
Heapify, and
Heapsort 28m
5. Hashtables –
Introduction
17m
1. Overview of
Module 210m
2. CLRS Chapter
10, 10.1
(Optional)10m
3. CLRS Chapter
6.1 and 6.2 10m
4. CLRS Chapter
6.310m
5. CLRS Chapter
6.4 and 6.510m
6. CLRS Chapter
11.1 and 11.2
10m
1. Basics of Data
Structures 30m
2. Basics of Heap
Data Structures
30m
3. Bubble-
Up/Bubble-
Down, Insertion
and Deletion
Operations 30m
4. Heapify, Priority
Queues and
Heapsort 30m
5. Hashtables 30m
element of a heap
along with their time
complexities. Finally,
we will study the
priority queue data
structure and showcase
some applications.
Week Three:- (7 videos (Total 152 min), 6 readings, 6 quizzes)
Randomization:
Quicksort, Quick
select, and Hash
tables
Videos Readings Practice Exercises
We will go through the
quicksort and
quickselect algorithms
for sorting and
selecting the kth
smallest element in an
array efficiently. This
will also be an
introduction to the role
of randomization in
algorithm design.
Next, we will study
hashtables: a highly
useful data structure
that allows for efficient
search and retrieval
from large amounts of
data. We will learn
about the basic
principles of hash-
1. Introduction
to
Randomization
+ Average Case
Analysis +
Recurrences
23m
2. Partition and
Quicksort
Algorithm 13m
3. Detailed
Design of
Partitioning
Schemes 25m
4. Analysis of
Quicksort
Algorithm 28m
5. Quickselect
Algorithm and
1. Overview of
Module 310m
2. CLRS Chapter
7.110m
3. CLRS Chapter
7.110m
4. CLRS Chapter
7.2 - 7.410m
5. CLRS Chapter
9.1, 9.210m
6. CLRS Chapter
11.310m
1. Quicksort and
Partition 30m
2. Partition
Schemes30m
3. Analysis of
Quicksort30m
4. Quickselect
Algorithm30m
5. Universal Hash
Functions30m
table and operations on
hashtables
its Applications
18m
6. Selecting
Hash Functions
22m
7. Universal
Hash Functions
and Analysis
20m
Week four: - (5 videos (Total 113 min), 6 readings, 2 quizzes)
Applications of
Hashtables Videos Readings Practice Exercises
In this module, we will
learn randomized pivot
selection for quicksort
and quickselect. We
will learn how to
analyze the complexity
of the randomized
quicksort/quickselect
algorithms. We will
learn open address
hashing: a technique
that simplifies
hashtable design. Next
we will study the
design of hash
functions and their
analysis. Finally, we
present and analyze
1. Open
Address
Hashing 18m
2. Perfect
hashing and
Cuckoo hashing
33m
3. Bloom Filters
and Analysis
14m
4. Count-Min
Sketching
Using Hashing
31m
5. String
Matching Using
Hashing 16m
1. Overview of
Module 410m
2. CLRS
11.410m
3. CLRS Chapter
11.5 (Perfect
Hashing) and
Slides with
Scribbles10m
4. Bloom Filter:
Slides10m
5. Count-Min
Sketches
Slides10m
6. Slides with
Scribbles10m
Open Address
Hashing30m
Bloom filters that are
used in various
applications such as
querying streaming
data and counting
Second Course Trees and Graphs: Basics
About this Course: Basic algorithms on tree data structures, binary search trees, self-
balancing trees, graph data structures and basic traversal algorithms on graphs. This course
also covers advanced topics such as kd-trees for spatial data and algorithms for spatial data.
Introduction: Trees and Graphs: Basics can be taken for academic credit as part of CU
Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera
platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU
Boulder’s departments of Applied Mathematics, Computer Science, Information Science,
and others. With performance-based admissions and no application process, the MS-DS is
ideal for individuals with a broad range of undergraduate education and/or professional
experience in computer science, information science, mathematics, and statistics.
WHAT YOU WILL LEARNŞ
1. Define basic tree data structures and identify algorithmic functions associated with
them
2. Execute traversals and create graphs within a binary search tree structure
3. Describe strongly connected components in graphs
SKILLS YOU WILL GAIN
1. Analysis of Algorithms
2. Algorithm Design
3. Python Programming
4. Data Structure Design
5. Graphs Algorithms
WEEK 1: 5 videos (Total 147 min), 8 readings, 6 quizzes
Binary
Search
Trees and
Videos Readings Quizzes
Algorithms
on Trees
In this
module, you
will learn
about binary
search trees
and basic
algorithms
on binary
search trees.
We will also
become
familiar with
the problem
of balancing
in binary
search trees
and study
some
solutions for
balanced
binary search
trees such as
Red-Black
Trees.
1- Binary Search Trees
-- Introduction and
Properties22m
2- Binary Search Trees
-- Insertion and
Deletion31m
3- Red-Black Trees
Basics33m
4- Red-Black Trees --
Rotations/Algorithms
for Insertion (and
Deletion)29m
5- Skip Lists30m
1- Important
Prerequisites10m
2- Logistics:
Textbook and
Readings10m
3- Overview of
Module 110m
4- Reading CLRS
Chapter 1210m
5- CLRS Chapter
12.1-12.310m
6- CLRS Chapter
13 - 13.110m
7- CLRS Chapter
13.2 - 13.310m
8- Skip 10m
1- Basics of Binary
Search Trees30m
2- Binary Search Tree:
Insert and
Delete30m
3- Red-Black Tree
Basics30m
4- Tree Rotations30m
5- Skip Lists30m
WEEK 2: 7 videos (Total 125 min), 6 readings, 5 quizzes
Basics of
Graphs and
Graphs
Traversals
Videos Readings Quizzes
In this
module, you
1- Graphs and Their
Representations14m
1- Overview of
Module 210m
1- Graph
Representations30m
will learn
about graphs
and various
basic
algorithms
on graphs
such as
depth
first/breadth
first
traversals,
finding
strongly
connected
components,
and
topological
sorting.
2- Graph Traversals and
Breadth First
Traversal17m
3- Depth First
Search33m
4- Topological Sorting
and Applications11m
5- Strongly Connected
Components -
Definitions15m
6- Strongly Connected
Components -
Properties16m
7- Strongly Connected
Components -
Algorithm16m
2- CLRS Chapter
22 (Section
22.1)10m
3- CLRS Chapter
22 (Section
22.2)10m
4- CLRS Chapter
22 (Section
22.3)10m
5- CLRS Chapter
22 (Section
22.410m
6- CLRS Chapter
22 (Section
22.5)10m
2- Combined Quiz on
Graph
Traversals30m
3- Topological Sort
Graphs30m
4- Strongly Connected
Components30m
WEEK 3: 5 videos (Total 127 min), 5 readings, 5 quizzes
Union-Find
Data
Structures
and
Spanning
Tree
Algorithms
Videos Readings Quizzes
Union Find
Data-
structure
with rank
compression.
Spanning
trees and
1- Amortized Analysis
of Data
Structures27m
2- Amortized Analysis:
Potential
Functions26m
3- Spanning Trees and
Minimal Spanning
1- Overview of
Module 310m
2- CLRS Chapter
1710m
3- CLRS Chapter
23 (Section
23.1)10m
1- Amortized
Analysis30m
2- Minimum Spanning
Tree30m
3- Kruskal's
Algorithm30m
properties of
spanning
trees. Prim’s
algorithm for
finding
minimal
spanning
trees.
Kruskal’s
algorithm for
finding
minimal
spanning
trees.
Trees with
Applications26m
4- Kruskal’s Algorithm
for Finding Minimal
Spanning Trees8m
5- Union-Find Data
Structures and Rank
Compression38m
4- CLRS Chapter
23 (Section
23.2)10m
5- CLRS Chapter
2110m
4- Disjoint Set
Forest30m
WEEK 4: 6 videos (Total 155 min), 6 readings, 4 quizzes
Shortest
Path
Algorithms
Videos Readings Quizzes
In this
module, you
will learn
about:
Shortest Path
Problem:
Basics.
Bellman-
Ford
Algorithm
for single
source
shortest
path.
1- Shortest Path
Problems and Their
Properties29m
2- Bellman-Ford
Algorithm for Single
Source Shortest
Paths45m
3- Shortest Path on
DAGs11m
4- Dijkstra’s Algorithm
for Single Source
Shortest Paths with
Nonnegative Edge
Weights20m
1- Overview of
Module 410m
2- CLRS Chapter
24 (up to section
24.1)10m
3- CLRS Chapter
24 (Section
24.1)10m
4- CLRS Chapter
24 (Section
24.2)10m
5- CLRS Chapter
24 (Section 24.3
and 24.5)10m
1- Shortest Path
Problems
Properties30m
2- Shortest Path -
Bellman Ford
Algorithm30m
3- Dijkstra's
Algorithm30m
Dijkstra’s
algorithm.
Algorithms
for all-pairs
shortest path
problem
(Floyd-
Warshall
Algorithm)
5- Proof of Dijkstra's
Algorithm12m
6- All Pairs Shortest
Path Problems and
Floyd-Warshall’s
Algorithm34m
6- CLRS Chapter
25 (Sections
25.1 and
25.2)10m
Therd Course Data Science as a Field
About this Course: This course provides a general introduction to the field of Data
Science. It has been designed for aspiring data scientists, content experts who work
with data scientists, or anyone interested in learning about what Data Science is and
what it’s used for. Weekly topics include an overview of the skills needed to be a data
scientist; the process and pitfalls involved in data science; and the practice of data
science in the professional and academic world. This course is part of CU Boulder’s
Masters of Science in Data Science and was collaboratively designed by both
academics and industry professionals to provide learners with an insider’s perspective
on this exciting, evolving, and increasingly vital discipline.
Introduction: Data Science as a Field can be taken for academic credit as part of CU
Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera
platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU
Boulder’s departments of Applied Mathematics, Computer Science, Information Science,
and others. With performance-based admissions and no application process, the MS-DS is
ideal for individuals with a broad range of undergraduate education and/or professional
experience in computer science, information science, mathematics, and statistics.
WHAT YOU WILL LEARNŞ
By taking this course, you will be able explain what data science is and identify the key
disciplines involved. *You will be able to use the steps of the data science process to
create a reproducible data analysis and identify personal biases. *You will be able to
identify interesting data science applications, locate jobs in Data Science, and begin
developing a professional network.
SKILLS YOU WILL GAIN
1. Data Science
2. Applied Mathematics
3. Information
4. Science Statistics
5. Computer Science
WEEK 1: ( 1 hour to complete ) 4 videos( total 15 min m)
Introduction to
Data Science:
the Past,
Present, and
Future of a
New Discipline
Videos Readings Quizzes
This week we will
talk about the
past, present and
future of data
science. The
growth of data
science has been
fueled by the
growth of the
internet, social
media and online
shopping as well
as by the rapid
increases 1-Data
Science as a Field
Course
Introduction Data
Science as a Field
Course
Introduction 2m
2-Where Does
Data Science
Come From? 2m
in data storage
capabilities. You
will watch several
short videos and
1-Data Science as a
Field Course
Introduction Data
Science as a Field
Course Introduction
2m
2-Where Does Data
Science Come From?
2m
3-The Current State
of the Field 7m
4-Where is Data
Science Going? 2m
participate in
discussions about
the future of data
science
WEEK 2: (4 hours to complete) 8 videos( total 97 m) 7 readings 1 quiz
Data Science in
Industry,
Government,
and Academia
Videos Readings Quizzes
In this module,
you will learn
about graphs and
various basic
algorithms on
graphs such as
depth
first/breadth first
traversals,
finding strongly
connected
components, and
topological
sorting.
1- Introduction to
"Data Science in
Business, Industry,
and the Professional l
World" Introduction
to "Data Science in
Business, Industry,
and the Professional l
World" 1m
2-Brian Brown &
Rinaldo Madera 16m
3- Natalie Jackson
11m
4- Villa Hulden16m
5- Robin Burke 9m
6- Seth Spielman
16m
7- Katharina Kenn
15m
8- Dan Larrimore
10m
1- Introducing
Brian Brown and
Rinaldo
Maldera10 m
2- Introducing
Natalie Jackson10
m
3- Introducing
Villa Holden 10m
4- Introducing
Robin Burke 10m
5- Introducing
Seth Spielman
10m
6- Introducing
Katharina
Kann10m
7- Introducing
Dan Larremore1
0m
1 quiz
WEEK 3: (4 hours to complete) 11 videos( total 64 m) 9 readings 2 quizzes
Data Science
Process and
Pitfalls
Videos Readings Quizzes
This week you
will learn about
the importance
of
reproducibility
and how to
achieve it, learn
the steps in a
data analysis
process and
learn about the
possible pitfalls
in data science.
You will watch
demonstrating
the various steps
in the data
science process
and try out these
processes for
yourself on a
different dataset
1- Importance and
Process of
Reproducibility
Importance and
Process of
Reproducibility
Importance and
Process of
Reproducibility 4m
1-Knit to PDF 3m
2-Intro to R
Markdown8m
3-Overview of Steps
in the Data Science
Process 2m
4-Importing Data6m
5-Tidying and
Transforming Data
8m
6- Visualizing Data
6m
7- Analyzing Data
7m
8-Modeling Data 5m
9-Bias sources 4m
10- Intro to Data
Ethics course with
Bobby Schnabel 5m
1- Before You
Watch the Next
Video...10m
2- Knit the
Template10m
3-Use R
Markdown to
Create a
Document 10m
4-For More Info
On Tidy verse
Packages...10m
5- Project
Files10m
6- Project Step 1:
Start an Rmd
Document10m
7- Project Step 2:
Tidy and
Transform Your
Data10m
8-Project Step 3:
Add
Visualizations and
Analysis10m
9-Project Step 4:
Add Bias
Identification10m
File Unlocking
Quiz1m
WEEK 4: 6 videos (Total 155 min), 6 readings, 4 quizzes
Communicating
Your Results Videos Readings Quizzes
This week you
will learn about
important ways
of
communicating
your results. We
will discuss the
important things
to know about
presentations
and reports. You
will also learn
about the
importance of
networking and
try it out.
1- Do’s and Don’ts
for Good Reports and
Presentations
Do’s and Don’ts for
Good Reports and
Presentations4m
2- CU Boulder’s MS
in Data Science:
Where to Go from
Here?3m
-Imposter
Syndrome 10m
1 quizzes
Fourth Course Cybersecurity for Data Science
About this Course: This course aims to help anyone interested in data science
understand the cybersecurity risks and the tools/techniques that can be used to
mitigate those risks. We will cover the distinctions between confidentiality, integrity,
and availability, introduce learners to relevant cybersecurity tools and techniques
including cryptographic tools, software resources, and policies that will be essential
to data science. We will explore key tools and techniques for authentication and
access control so producers, curators, and users of data can help ensure the security
and privacy of the data.
Introduction: This course can be taken for academic credit as part of CU Boulder’s
Master of Science in Data Science (MS-DS) degree offered on the Coursera platform.
The MS-DS is an interdisciplinary degree that brings together faculty from CU
Boulder’s departments of Applied Mathematics, Computer Science, Information
Science, and others. With performance-based admissions and no application process,
the MS-DS is ideal for individuals with a broad range of undergraduate education
and/or professional experience in computer science, information science,
mathematics, and statistics.
WHAT YOU WILL LEARNŞ
1. Characterize the CIA principles and use them to classify a variety of cyber
scenarios.
2. Identify and disseminate vulnerabilities in the data security space- social
(human) and technical (digital).
3. Distinguish ethical boundaries of hacking and its applications.
4. Explore professional cybersecurity networks and connect with experts from the
field.
SKILLS YOU WILL GAIN
1. Communication
2. Risk Analysis
3. Problem Solving
WEEK 1: 4 videos (Total 20 min), 4 readings, 1 quiz
Basic
Cybersecurity
Concepts and
Principles
Videos Readings Quizzes
In this module,
you will learn
the basics of
cybersecurity
and the CIA
triad.
1- Introduction1m
2- Introduce the CIA
Triad and
Cybersecurity
Basics5m
3- LinkedIn and
Twitter for
Professionals6m
4- Join LinkedIn
Cybersecurity
Course Group and
Post6m
1- The CIA
Triad10m
2- Create Your
LinkedIn and
Accounts10m
3- Networking
with LinkedIn
and Twitter10m
4- Assignment:
Join
Cybersecurity
Course Group
on LinkedIn and
Submit
Posts10m
Basic
Cybersecurity
Concepts and
Principles30m
WEEK 2: 3 videos (Total 38 min), 5 readings, 2 quizzes
Your Cyber
Story and
Your Public
Data Profile
Videos Readings Quizzes
In this module,
you will
explore your
Cyber Story
and examine
your pubic
data profile
components,
and
topological
sorting.
1- Digital Reputation
and Cyberstory,
Yourself10m
2- Passwords and
Cybersecuirty16m
3- Basic
Cryptography and
Encryption11m
1- Your Online
Reputation10m
2- Exercise —
What is Your
Cyber Story?
10m
3- Exercise —
Ungoogle
Yourself and
Set up Google
Alerts10m
4- Passwords20m
5- Cryptography1h
10m
Your Cyber Story
and Your Public
Data Profile30m
WEEK 3: 3 videos (Total 29 min), 4 readings, 1 quiz
Wi fi, IoT,
Hacking,
Data
Breaches and
Social
Engineering
Videos Readings Quizzes
This module
explores the
world of
hacking, IoT
and social
engineering.
1- Hacking — White,
Grey and Black
Hackers10m
2- IoT9m
3- Social
Engineering9m
1- Hackers10m
2- Internet of
Things10m
3- Social
Engineering10m
4- LinkedIn
Discussion:
Hacking or IOT
and
Cybersecurity
10m
Wifi, IoT,
Hacking, Data
Breaches and
Social
Engineering30m
WEEK 4: 2 videos (Total 21 min), 4 readings, 1 quiz
The Ethics of
Cyber
Security
Videos Readings Quizzes
This session
students will
leverage social
media to
connect with
cybersecurity
experts and
explore the
ethics around
cybersecurity
and data.
1- Facial Recognition
and 'Big Brother'
12m
2- Getting Yourself
Out There on
LinkedIn and
Twitter8m
1- Big Brother and
Surveillance7h
43m
2- Optional
Discussion10m
3- Cybersecurity
Experts on
Twitter10m
4- Explore and
Network10m
The Ethics of
Cyber Security
30m
Fifth Course Ethical Issues in Data Science (core course)
About this Course: The applications of computing that involve large amounts of
data - the field of data science - affect the lives of most people in the United States
and the world. These impacts include recommendations made to us by Internet-based
systems, information about us online, technologies used for security and monitoring,
data used in healthcare, and much more. In many cases, they are affected Artificial
intelligence and machine learning techniques.
Introduction: This course can be taken for academic credit as part of CU Boulder’s
Master of Science in Data Science (MS-DS) degree offered on the Coursera platform.
The MS-DS is an interdisciplinary degree that brings together faculty from CU
Boulder’s departments of Applied Mathematics, Computer Science, Information
Science, and others. With performance-based admissions and no application process,
the MS-DS is ideal for individuals with a broad range of undergraduate education
and/or professional experience in computer science, information science,
mathematics, and statistics.
WHAT YOU WILL LEARNŞ
1. Characterize the CIA principles and use them to classify a variety of cyber
scenarios.
2. Identify and disseminate vulnerabilities in the data security space- social
(human) and technical (digital).
3. Distinguish ethical boundaries of hacking and its applications.
4. Explore professional cybersecurity networks and connect with experts from the
field.
SKILLS YOU WILL GAIN
4. Communication
5. Risk Analysis
6. Problem Solving
WEEK 1: 2 hours to complete 4 videos 4 readings 1 practice exercise
What are
Ethics?
Videos Readings Quizzes
Module 1 of
this course
establishes a
basic
foundation in
the notion of
simple
utilitarian
ethics we use
for this course.
The lecture
material and
the quiz
questions are
designed to get
most people to
come to an
agreement
about right and
wrong, using
the utilitarian
framework
taught here. If
you bring your
own moral
sense to bear,
or think hard
about possible
counter-
1- What are Ethics?
9m
2- Data Science Needs
Ethics3m
3- Case Study: Spam
(not the meat)4m
1- Course
Syllabus10m
2- Welcome
Announcement10m
3- Help us learn
more about
you!10m
4- What are Ethics?
- Introduction10m
Module 1
Quiz30m
arguments, it is
likely that you
can arrive at a
different
conclusion.
But that
discussion is
not what this
course is
about. So,
resist that
temptation, so
that we can
jointly lay a
common
foundation for
the rest of this
course
History,
Concept of
Informed
Consent
Videos Readings Quizzes
Early
experiments
on human
subjects were
by scientist’s
intent
on advancing
medicine, to
the benefit of
all humanity,
disregard
1- Human Subjects
Research and
Informed Consent:
Part 28m
2- Limitations of
Informed Consent 9m
Case Study: It's Not
Occupied 6m
Module 2
Quiz30m
for welfare of
individual
human
subjects. Often
these were
performed by
white
scientists, on
black subject.
In this module
we
will talk about
the laws that
govern the
Principle of
Informed
Consent. We
will also
discuss why
informed
consent
doesn’t work
well for
retrospective
studies, or
for the
customers of
electronic
businesses.
Data
Ownership
Videos Readings Quizzes
Who owns
data about
you? We'll
explore that
question in
this
module. A few
examples of
personal data
include
copyrights
for
biographies;
ownership of
photos posted
online, Yelp,
Trip
Advisor,
public data
capture, and
data sale.
We'll also
explore the
limits on
recording and
use of
data
1- Limits on
Recording
and Use 7m
2- Data Ownership
Finale 3m
3- Case Study: Rate
My
Professor 3m
4- Case Study:
Privacy
After Bankruptcy
2m
Module 3
Quiz30m
WEEK 2: 3 videos (Total 38 min), 5 readings, 2 quizzes
Privacy Videos Readings Quizzes
Privacy is a
basic human
1- History of Privacy
15m
Privacy -
Introduction10m
Module 4 Quiz30m
need. Privacy
means the
ability to
control
information
about yourself,
not necessarily
the ability to
hide things. We
have seen the
rise different
value systems
with regards to
privacy. Kids
today are more
likely to share
personal
information on
social media,
for example. So
while values are
changing, this
doesn’t remove
the
fundamental
need to be able
to control
personal
information. In
this module
we'll examine
the relationship
between the
services we are
2- Degrees of Privacy
10m
3- Modern Privacy
Risks 12m
4- Case Study:
Targeted Ads 3m
5- Case Study: The
Naked Mile 2m
6- Case Study: Sneaky
Mobile Apps 5m
Module 4
Discussion
Prompt
References10m
provided and
the data we
provide in
exchange: for
example, the
location for a
cell phone.
We'll also
compare and
contrast "data"
against
"metadata".
Anonymity Videos Readings Quizzes
Certain
transactions
can be
performed
anonymously.
But many
cannot,
including where
there is physical
delivery of
product. Two
examples
related to
anonymous
transactions
1- Anonymity5m
De-identification
Has Limited Value:
Part17 m
2- De-identification
Has Limited Value:
Part 2 10m
3- Case Study: Credit
Card Statements 2m
Module 5 Quiz30m
we'll look at are
"block chains"
and "bitcoin".
We'll also look
at some of the
drawbacks that
come with
anonymity
WEEK 3: 3 videos (Total 29 min), 4 readings, 1 quiz
Data Validity Videos Readings Quizzes
Data validity
is not a new
concern.
All too often,
we see the
inappropriate
use of Data
Science
methods
leading to
erroneous
conclusions.
This module
points
out common
errors, in
language
suited for a
student with
limited
exposure to
statistics.
We'll focus
1- Validity 9m Choice
of Attributes and
Measures6m Errors in
Data Processing 8m
2- Errors in Model
Design 8m
3- Managing Change
5m Case Study: Three
Blind Mice 4m
4- Case Study:
Algorithms and Race
3m Case Study:
Algorithms in the
Office 3m 5- Case Study:
Germanwings Crash
5m Case Study:
Google Flu 5m
Data Validity - Introduction10m
Module 6 Quiz 30m
on the notion
of
representative
sample:
opinionated
customers,
for example,
are not
necessarily
representative
of all
customers.
Algorithmic
Fairness
Videos Readings Quizzes
What could be
fairer than a
data driven
analysis?
Surely the
dumb
computer
cannot harbor
prejudice or
stereotypes.
While indeed
the analysis
technique may
be completely
neutral, given
the
assumptions,
the model, the
training data,
and so forth,
1- Algorithmic
Fairness10m Correct
but Misleading
Results12m
2- P Hacking10m
Case Study: High
Throughput
Biology3m
3-Case Study:
Geopricing2m
4- Case Study: Your
Safety Is My Lost
Income10m
Algorithmic
Fairness -
Introduction10m
Module 7 Quiz
30m
all of these
boundary
conditions are
set by humans,
who may
reflect their
biases in the
analysis result,
possibly
without even
intending to
do so. Only
recently have
people begun
to think about
how
algorithmic
decisions can
be unfair.
Consider this
article,
published in
the New York
Times. This
module
discusses this
cutting edge
issue.
WEEK 4: 2 videos (Total 21 min), 4 readings, 1 quiz
Societal
Consequences
Videos Readings Quizzes
In Module 8,
we consider
societal
consequences of
Data Science
that we should
be concerned
about even if
there are no
issues with
fairness,
validity,
anonymity,
privacy,
ownership or
human subject's
research. These
“systemic”
concerns are
often the
hardest to
address, yet just
as important as
other issues
discussed
before. For
example, we
consider
ossification, or
the tendency of
algorithmic
methods to
learn and codify
the current state
of the world and
thereby make it
1- Societal Impact16m
Ossification7m
2- Surveillance4m Case
Study: Social Credit
Scores7m
3- Case Study:
Predictive Policing8m
Societal
Consequences -
Introduction10m
Module 8 Quiz30m
harder to
change.
Information
asymmetry has
long been
exploited for
the advantage
of some, to the
disadvantage of
others.
Information
technology
makes spread of
information
easier, and
hence generally
decreases
asymmetry.
However, Big
Data sets and
sophisticated
analyses
increase
asymmetry in
favor of those
with ability to
acquire/access.
Code of Ethics Videos Readings Quizzes
Finally, in
Module 9, we
tie all the
issues we have
considered
together into a
simple, two-
Wrap Up2m Post Course Survey10m Module 9
Quiz30m
point code of
ethics for the
practitioner
Attributions Videos Readings Quizzes
This module
contains lists
of attributions
for the
external
audiovisual
resources used
throughout the
course.
Week 1
Attributions10m
Sixth Course Data Mining Pipeline
About this Course: This course introduces the key steps involved in the data mining
pipeline, including data understanding, data preprocessing, data warehousing, data
modeling, interpretation and evaluation, and real-world applications.
Introduction: Data Mining Pipeline can be taken for academic credit as part of CU
Boulder’s Master of Science in Data Science (MS-DS) degree offered on the
Coursera platform. The MS-DS is an interdisciplinary degree that brings together
faculty from CU Boulder’s departments of Applied Mathematics, Computer
Science, Information Science, and others. With performance-based admissions and
no application process, the MS-DS is ideal for individuals with a broad range of
undergraduate education and/or professional experience in computer science,
information science, mathematics, and statistics.
WHAT YOU WILL LEARNŞ
1. By the end of this course, you will be able to identify the key components of
the data mining pipeline and describe how they're related.
2. You will be able to identify particular challenges presented by each component
of the data mining pipeline.
3. You will be able to apply techniques to address challenges in each component
of the data mining pipeline.
SKILLS YOU WILL GAIN
1. Data Pre-Processing
2. Data Warehousing
3. data understanding
4. data mining pipeline
WEEK 1: 2 videos (Total 88 min), 1 reading, 2 quizzes
Data Mining
Pipeline
Videos Readings Quizzes
This module
provides an
introduction to
data mining and
data mining
pipeline,
including the
four views of
data mining and
the key
components in
the data mining
pipeline.
1- Introduction to
Data Mining41m
2- Introduction to
Data Mining
Pipeline46m
Course
Information10m
WEEK 2: 2 videos (Total 71 min)
Data
Understanding
Videos Readings Quizzes
This module
covers data
understanding
by identifying
key data
properties and
applying
techniques to
characterize
different
datasets.
1- Objects &
Attributes,
Statistics,
Visualization30m
2- Data Similarity
39m
WEEK 3: 2 videos (Total 77 min)
Data
Preprocessing
Videos Readings Quizzes
This module
explains why
data
preprocessing
is needed and
what
techniques can
be used to
preprocess
data.
1- Data Cleaning,
Data Integration
33m
2- Data
Transformation,
Data
Reduction43m
WEEK 4: 2 videos (Total 54 min)
Data
Warehousing
Videos Readings Quizzes
This module
covers the key
characteristics
of data
warehousing
and the
techniques to
support data
warehousing.
1- Data Warehouse,
Data Cube and
OLAP25m
2- Data Cube
Computation,
Data Warehouse
Architecture28m
Seventh Course Statistical Modeling for Data Science
About this Course: Statistical modeling lies at the heart of data science. Well-crafted
statistical models allow data scientists to draw conclusions about the world from the
limited information present in their data. In this three-credit sequence, learners will
add some intermediate and advanced statistical modeling techniques to their data
science toolkit. In particular, learners will become proficient in the theory and
application of linear regression analysis; ANOVA and experimental design; and
generalized linear and additive models. Emphasis will be placed on analyzing real
data using the R programming language.
Introduction: This specialization can be taken for academic credit as part of CU
Boulder’s Master of Science in Data Science (MS-DS) degree offered on the
Coursera platform. The MS-DS is an interdisciplinary degree that brings together
faculty from CU Boulder’s departments of Applied Mathematics, Computer Science,
Information Science, and others. With performance-based admissions and no
application process, the MS-DS is ideal for individuals with a broad range of
undergraduate education and/or professional experience in computer science,
information science, mathematics, and statistics.
WHAT YOU WILL LEARNŞ
1. Correctly analyze and apply tools of regression analysis to model relationship
between variables and make predictions given a set of input variables.
2. Successfully conduct experiments based on best practices in experimental
design.
3. Use advanced statistical modeling techniques, such as generalized linear and
additive models, to model wide range of real-world relationships.
SKILLS YOU WILL GAIN
1. Linear Model
2. R Programming
3. Statistical Model
4. regression
5. Calculus
6. and probability theory.
7. Linear Algebra
1- Modern Regression Analysis in R
About this Course:This course will provide a set of foundational statistical
modeling tools for data science. In particular, students will be introduced to methods,
theory, and applications of linear statistical models, covering the topics of parameter
estimation, residual diagnostics, goodness of fit, and various strategies for variable
selection and model comparison. Attention will also be given to the misuse of
statistical models and ethical implications of such misuse.
SKILLS YOU WILL GAIN
1. Linear Model
2. R Programming
3. Statistical Model
4. regression
WEEK 1: 8 videos (Total 82 min)
Introduction to
Statistical
Models
Videos Readings Quizzes
In this module,
we will
introduce the
basic conceptual
framework for
statistical
modeling in
general, and for
linear regression
models in
particular.
1- Frameworks
and Goals of
Statistical
Modeling14m
2- The
Assumption of
Concept
Validity7m
3- The Linear
Regression
Model11m
4- Matrix
Representation
of the Linear
Regression
Model15m
5- Assumptions
of Linear
Regression9m
6- The
Appropriatenes
s of Linear
Regression11
m
7- Interpreting the
Linear
Regression
Model I7m
8- Interpreting the
Linear
Regression
Model II5m
Introduction to
Statistical
Modeling30m
The Linear
Regression
Model30m
WEEK 2: 9 videos (Total 134 min)
Linear
Regression
Parameter
Estimation
Videos Readings Quizzes
In this module,
we will learn
how to fit linear
regression
models with
least squares.
We will also
study the
properties of
least squares,
and describe
some goodness
of fit metrics for
linear
regression
models.
1- Introduction to
Least
Squares12m
2- Linear Algebra
for Least
Squares9m
3- Deriving the
Least Squares
Solution20m
4- Regression
Modeling in R:
a First
Pass19m
5- Justifying
Least Squares:
the Gauss-
Markov
Theorem and
Maximum
Likelihood
Estimation13m
6- Sums of
Squares and
Estimating the
Error
Variance19m
1- Least
Squares30m
2- Variability and
Identifiability in
Regression
Models30m
7- The
Coefficient of
Determination
9m
8- The Problem
of Non-
identifiabiliity
6m
9- Regression
Modeling in R:
a Second
Pass22m
WEEK 3: 8 videos (Total 121 min), 1 reading, 5 quizzes
Inference in
Linear
Regression
Videos Readings Quizzes
In this module,
we will study
the uses of
linear
regression
modeling for
justifying
inferences from
samples to
populations.
1- Motivating
Statistical
Inference in the
Linear
Regression
Context9m
2- The Sampling
Distribution of
the Least
Squares
Estimator23m
3- T-Tests for
Individual
Regression
Parameters14
m
Ethics in Statistical
Practice and
Communication:
Five
Recommendations30
m
1- Statistical
Inference: Intro
and T-Tests30m
2- Statistical
Inference: the F-
tests and
Confidence
Intervals30m
4- T-Tests in
R20m
5- Motivating the
F-Test:
Multiple
Statistical
Comparisons8
m
6- The F-Test22m
7- The F-Test in
R10m
8- Confidence
Intervals in the
Regression
ContextConfid
ence Intervals
in the
Regression
Context11m
WEEK 4: 6 videos (Total 82 min)
Prediction and
Explanation in
Linear
Regression
Analysis
Videos Readings Quizzes
In this module,
we will identify
how models can
predict future
values, as well
as construct
1- Differentiating
Prediction and
Explanation12
m
Prediction30m
interval
estimates for
those values.
We will also
explore the
relationship
between
statistical
modelling and
causal
explanations.
2- Point
Estimates for
Prediction10m
3- Interval
Estimates for
Prediction9m
4- Making
Predictions
Using Real
Data in R19m
5- When
Prediction
Goes
Wrong7m
6- Defining
Causality22m
2- ANOVA and Experimental Design
About this Course: This second course in statistical modeling will introduce
students to the study of the analysis of variance (ANOVA), analysis of covariance
(ANCOVA), and experimental design. ANOVA and ANCOVA, presented as a type
of linear regression model, will provide the mathematical basis for designing
experiments for data science applications. Emphasis will be placed on important
design-related concepts, such as randomization, blocking, factorial design, and
causality. Some attention will also be given to ethical issues raised in
experimentation.
SKILLS YOU WILL GAIN
1. Calculus
2. and probability theory.
3. Linear Algebra
WEEK 1: 9 videos (Total 87 min)
Introduction to
ANOVA and
Experimental
Design
Videos Readings Quizzes
In this module,
we will
introduce the
basic
conceptual
framework for
experimental
design and
define the
models that will
allow us to
answer
meaningful
questions about
the differences
between group
means with
respect to a
continuous
variable. Such
models include
the one-way
Analysis of
Variance
(ANOVA) and
Analysis of
Covariance
(ANCOVA)
models.
1- Introduction to
Experimental
Design10m
2- The One-Way
ANOVA and
ANCOVA
Models6m
ANOVA Variance
Decomposition8m
ANOVA Sums of
Squares and the F-
test14m
3- ANOVA and
ANCOVA as
Regression
Models10m
One-Way
ANOVA
Interpretation in
the Regression
Context10m
1- Introduction to
ANOVA and
Experimental
Design30m
2- The One-Way
ANOVA and
ANCOVA
Models30m
3- ANOVA
Variance
Decomposition30m
4- ANOVA Sums
of Squares and the
F-Test30m
5- ANOVA and
ANCOVA as
Regression
Models30m
6- One-Way
ANOVA
Interpretation in
the Regression
Context30m
7- The ANCOVA
Model30m
8- ANCOVA with
Interactions30m
ANCOVA with
Interactions in
R30m
4- The ANCOVA
Model15m
ANCOVA with
Interactions7m
5- ANCOVA with
Interactions in
R4m
WEEK 2: 6 videos (Total 91 min), 2 readings, 6 quizzes
Hypothesis
Testing in the
ANOVA
Context
Videos Readings Quizzes
In this module,
we will learn
how statistical
hypothesis
testing and
confidence
intervals, in the
ANOVA/ANC
OVA context,
can help answer
meaningful
questions about
the differences
between group
means with
respect to a
continuous
variable.
1. Beyond the Full F-
test12m
2. Planned Comparisons:
Defining Contrasts16m
3. Planned Comparisons:
Hypothesis Testing
with Contrasts14m
4. Post Hoc
Comparisons13m
5. Post Hoc Comparisons
in R16m
6. Type II Error and
Power in the ANOVA
Context18m
1. Patrizio E.
Tressoldi
and David
Giofré:
"The
pervasive
avoidance
of
prospectiv
e statistical
power:
major
consequen
ces and
practical
solutions"1
0m
2. Optional:
Beyond
1- Beyond the
Full F-test30m
2- Planned
Comparisons:
Defining
Contrasts30m
3- Planned and
Unplanned
Comparisons30m
4- Type II Error
and Power in the
ANOVA
Context30m
Power
Calculatio
ns:
Assessing
Type S
(Sign) and
Type M
(Magnitud
e)
Errors10m
WEEK 3: 7 videos (Total 79 min)
Two-Way
ANOVA and
Interactions
Videos Readings Quizzes
In this module, we
will study the two-
way ANOVA
model and use it
to answer research
questions using
real data.
1. Motivating the Two-
way ANOVA
Model10m
2. The two-way ANOVA
model9m
3. The Two-way
ANOVA Model as a
Regression Model9m
4. Interaction Terms in
the Two-way ANOVA
Model: Definitions and
Visualizations13m
5. Interactions in the
Two-way ANOVA
Model: Formal
Tests15m
6. Two-way ANOVA
Hypothesis Testing (no
interaction)14m
1. Motivating
the Two-way
ANOVA
Model30m
2. The Two-way
ANOVA
Model30m
3. The Two-way
ANOVA
Model as a
Regression
Model30m
4. Interaction
Terms in the
Two-way
ANOVA
Model:
Definitions
and
7. Looking Ahead: Two-
Way ANOVA and
Experimental
Design5m
Visualizations
30m
5. Interactions in
the Two-way
ANOVA
Model:
Formal
Tests30m
6. Two-way
ANOVA
Hypothesis
Testing (no
interaction)30
m
WEEK 4: 7 videos (Total 79 min), 2 readings, 7 quizzes
Experimental
Design: Basic
Concepts and
Designs
Videos Readings Quizzes
In this module,
we will study
fundamental
experimental
design concepts,
such as
randomization,
treatment
design,
replication, and
blocking. We
will also look at
basic factorial
designs as an
improvement
over elementary
“one factor at a
time” methods.
We will
combine these
concepts with
the ANOVA
and ANCOVA
models to
conduct
meaningful
experiments.
1. The Conceptual
Framework of
Experimental
Design19m
2. The Completely
Randomized
Design12m
3. The Randomized
Complete Block
Design (RCBD)8m
4. The Randomized
Complete Block
Design (RCBD):
Hypothesis Testing8m
5. The Factorial
Design10m
6. Further Issues in
Experimental
Design7m
7. Ethical Issues in
Experimental
Design12m
1- Causation
and
Experimental
Design10m
2- Resources
on Ethics 10m
1- The
Conceptual
Framework of
Experimental
Design30m
2- The
Completely
Randomized
Design30m
3- The
Randomized
Complete Block
Design
(RCBD)30m
4- The Factorial
Design30m
Further Issues in
Experimental
Design30m
3- Generalized Linear Models and Nonparametric Regression
About this Course: In the final course of the statistical modeling for data science
program, learners will study a broad set of more advanced statistical modeling tools.
Such tools will include generalized linear models (GLMs), which will provide an
introduction to classification (through logistic regression); nonparametric modeling,
including kernel estimators, smoothing splines; and semi-parametric generalized
additive models (GAMs). Emphasis will be placed on a firm conceptual
understanding of these tools. Attention will also be given to ethical issues raised by
using complicated statistical models.
SKILLS YOU WILL GAIN
1- Calculus
2- and probability theory.
3- Linear Algebra
WEEK 1: 7 videos (Total 75 min), 1 reading, 7 quizzes
An
Introduction to
Generalized
Linear Models
Through
Binomial
Regression
Videos Readings Quizzes
In this module,
we will
introduce
generalized
linear models
(GLMs)
through the
study of
binomial data.
In particular, we
will motivate
the need for
GLMs;
introduce the
binomial
regression
model,
1. From Linear Models to
Generalized Linear
Models12m
2. The Components of a
GLM6m
3. The Exponential
Family of
Distributions14m
4. Introduction to
Binomial
Regression9m
5. Binomial Regression
Parameter
Estimation11m
6. Interpretation of
Binomial
Regression7m
Fair ML Book,
Introduction10
m
1- Introduction to
Generalized
Linear
Models30m
2- Binomial
Regression30m
3- Binomial
Regression
Inference30m
including the
most common
binomial link
functions;
correctly
interpret the
binomial
regression
model; and
consider various
methods for
assessing the fit
and predictive
power of the
binomial
regression
model.
7. Binomial Regression in
R11m
WEEK 2: 7 videos (Total 83 min)
Models for
Count Data
Videos Readings Quizzes
In this module,
we will consider
how to model
count data.
When the
response
variable is a
count of some
phenomenon,
and when that
count is thought
to depend on a
1. Poisson Regression: A
New Model for Count
Data13m
2. Poisson Regression
Parameter
Estimation6m
3. Interpreting the
Poisson Regression
Model7m
4. Poisson Regression on
Real Data in R21m
1- Poisson
Regression
Basics30m
2- Poisson
Regression
Inference and
Goodness of
Fit30m
set of
predictors, we
can use Poisson
regression as a
model. We will
describe the
Poisson
regression in
some detail and
use Poisson
regression on
real data. Then,
we will describe
situations in
which Poisson
regression is not
appropriate, and
briefly present
solutions to
those situations.
5. Goodness of Fit for
Poisson Regression
I16m
6. Goodness of Fit for
Poisson Regression
II4m
7. Overdispersion12m
WEEK 3: 6 videos (Total 66 min)
Introduction to
Nonparametric
Regression
Videos Readings Quizzes
In this module,
we will
introduce the
concept of a
nonparametric
regression
model. We will
contrast this
notion with the
1. Introduction to
Nonparametric
Regression
Models11m
2. Motivating Kernel
Estimators6m
3. Kernel Estimators14m
4. Smoothing Splines13m
Nonparametric
Regression:
Theory30m
parametric
models that we
have studied so
far. Then, we’ll
study particular
nonparametric
regression
models: kernel
estimators and
splines. Finally,
we will
introduce
additive models
as a blending of
parametric and
nonparametric
methods.
5. Loess: Locally
Estimated Scatterplot
Smoothing14m
6. Kernel Estimation in
R5m
WEEK 4: 6 videos (Total 81 min), 1 reading, 4 quizzes
Introduction to
Generalized
Additive
Models
Videos Readings Quizzes
Some models,
such as linear
regression, are
easily
interpretable,
but inflexible,
in that they
don't capture
many real-
world
relationships
1. Motivating
Generalized Additive
Models17m
2. Generalized Additive
Models in R16m
3. Inference with
Generalized Additive
Models: Effective
Degrees of
Freedom12m
Required:
Generalized
additive
models for
data
science10m
1- Generalized
Additive Models:
Basics30m
2- Generalized
Additive Models:
Inference and
Data
Analysis30m
accurately.
Other models,
such as neural
networks, are
quite flexible,
but very
difficult to
interpret.
Generalized
additive models
(GAMs) are a
nice balance
between
flexibility and
interpretability.
In this module,
we will further
motivate
GAMs, learn
the basic
mathematics of
fitting GAMs,
and
implementing
them on
simulated and
real data in R.
4. Inference with
Generalized Additive
Models: Tests4m
5. Generalized Additive
Models in R: Inference
and Interpretation13m
6. Generalized Additive
Models: A Complete
Example with Real
Data16m
Eightth Course Introduction to High-Performance and Parallel Computing
About this Course: This course introduces the fundamentals of high-performance
and parallel computing. It is targeted to scientists, engineers, scholars, really
everyone seeking to develop the software skills necessary for work in parallel
software environments. We will cover the basics of Linux environments and bash
scripting all the way to high throughput computing and parallelizing cod.
Introduction: This course can be taken for academic credit as part of CU Boulder’s
Master of Science in Data Science (MS-DS) degree offered on the Coursera
platform. The MS-DS is an interdisciplinary degree that brings together faculty from
CU Boulder’s departments of Applied Mathematics, Computer Science, Information
Science, and others. With performance-based admissions and no application
process, the MS-DS is ideal for individuals with a broad range of undergraduate
education and/or professional experience in computer science, information science,
mathematics, and statistics.
WHAT YOU WILL LEARNŞ
1. The components of a high-performance distributed computing system
2. Types of parallel programming models and the situations in which they
might be used
3. High-throughput computing
4. Shared memory parallelism
5. Distributed memory parallelism
6. Navigating a typical Linux-based HPC environment
7. Assessing and analyzing application scalability including weak and strong
scaling
8. Quantifying the processing, data, and cost requirements for a computational
project or workflow
SKILLS YOU WILL GAIN
These skills include big-data analysis, machine learning, parallel programming,and
optimization.data understanding
WEEK 1: 9 videos( total 46 m)1 reading 1 practice1 quizzes
High-
Performance
Computing
(HPC) for
Non-
Computer
Scientists
Videos Readings Quizzes
Get to know the
basics of an
HPC system.
Users will learn
how to work
with common
high
performance
computing
systems they
may encounter
in future
efforts. This
includes
navigating
filesystems,
working with a
typical HPC
operating
system (Linux),
and some of the
basic concepts
1- Course Overview
2m
2- Tour of JupyterL
4 m
3- Submitting
Assignments 6m
4- Linux - Part 1 5m
5- Linux - Part 2 3m
6- Accessing Remote
Ssystems 6m
7- Filesystems 4m
8- Bash Scripting,
Part 1 7m
9- Bash Scripting -
Part 2 5m
Course Syllabus
10m
Week 1 Quiz30m
of HPC. We
will also
provide users
some key
information
that is specific
to the logistics
of this course.
WEEK 2: 9 videos( total 26 m) 1 practice exercise Quiz 30m
Nuts and Bolts
of HPC
Videos Readings Quizzes
During this
week we will
actually begin
to use HPC
infrastructure.
Some concepts
we will learn
are - how to
load software
appropriately
onto an HPC
system, what
the different
types of nodes
a user can
expect to
encounter on a
system, and
how to submit
a job to
conduct work,
1- HPC Architecture
4m
2- Software 4m
3- Allocations 3m
4- Node Types 1m
5- Job Submission
with Slurm - Part
1 6m
6- Job Submission
with Slurm - Part
2 8m
Week 2 Quiz
30m
such as perform
calculations.
WEEK 3: 6 videos( total 25 m) practice exercise Quiz 30m
Basic
Parallelism
Videos Readings Quizzes
In this module,
we will
introduce users
to the nuances
of memory on a
high
performance
computing
system. We
will also cover
some ways to
conduct work
on a system
most
efficiently. We
will also
introduce some
beginning
components of
parallel
programming.
1- Simple
Application
Timing 3m
2- Serial vs. Parallel
Processing - Part 1
3m
3- Serial vs. Parallel
Processing - Part 2
5m
4- Parallel Memory
Models 5m
5- Data vs. Task
Parallelism 5m
6- High Throughput
Computing 4m
Week 3 Quiz30m
WEEK 4: 4 videos( total 17 m) 1 reading (1)practice exercise Quiz 30m
Evaluating
Parallel
Program
Performance
Videos Readings Quizzes
In this module,
we will continue
to review topics
related to using
a high-
performance
computing
system most
efficiently,
including
scaling your
workflow
measuring how
efficient your
work on a
system is, and
how to utilize as
much of the
computing
resource as
possible.
1- How to Parallelize
Code 6m
2- Speedup and
Parallel Efficiency
4m
3- Scalability 4m
4- Limits to Scaling
3m
Summary of This Course 10m
Week 4 Quiz30m
Nighth Course Managing, Describing, and Analyzing Data
About this Course: In this course, you will learn the basics of understanding the data
you have and why correctly classifying data is the first step to making correct decisions.
You will describe data both graphically and numerically using descriptive statistics and R
software. You will learn four probability distributions commonly used in the analysis of
data. You will analyze data sets using the appropriate probability distribution. Finally, you
will learn the basics of sampling error, sampling distributions, and errors in decision-
making.
Introduction: This course can be taken for academic credit as part of CU Boulder’s
Master of Science in Data Science (MS-DS) degree offered on the Coursera
platform. The MS-DS is an interdisciplinary degree that brings together faculty from
CU Boulder’s departments of Applied Mathematics, Computer Science, Information
Science, and others. With performance-based admissions and no application
process, the MS-DS is ideal for individuals with a broad range of undergraduate
education and/or professional experience in computer science, information science,
mathematics, and statistics.
WHAT YOU WILL LEARNŞ
1. Calculate descriptive statistics and create graphical representations using R
software
2. Explore the basics of sampling and sampling distributions with respect to
statistical
3. inference
4. Solve problems and make decisions using probability distributions
5. Classify types of data with scales of measurement
SKILLS YOU WILL GAIN
1. analyzing data
2. describing data
3. using R
4. graphing data
WEEK 1: (3 hours to complete-7 videos (Total 48 min), 1 reading, 2
quizzes)
Data and
Measurement
Videos Readings Quizzes
Upon
completion of
this module,
students will be
able to use R
and R Studio to
work with
data and
classify types of
data using
measurement
scales.
1. Welcome to
Managing,
Describing and
Analyzing Data1m
2. Types of Data and
Measurement
Scales6m
3. Measurement
Scales: Nominal and
Ordinal6m
4. Measurement
Scales: Interval, Ratio
and Absolute6m
5. Measurement as a
Process, The Big 5
Aspects of Data10m
6. Sampling
Concepts6m
7. Working in
RStudio10m
Attention Learners:
R Code / File
Resources10m
Week 1 Practice
Assessment30m
Assessment: Data
and
Measurement45m
WEEK 2: (5 hours to complete -11 videos (Total 85 min))
Describing
Data
Graphically
and Numerical
Videos Readings Quizzes
Upon
completion of
this module,
students will be
able to use R
and RStudio to
create visual
representations
of data, and
calculate
descriptive
statistics to
describe
location, spread
and
shape of data.
1. Create a Run
Chart9m
2. Frequency
Distributions7m
3. Frequency
Polygons and
Histograms7m
4. Histogram Patterns
and Density Plots7m
5. Box and Whisker
Plots7m
6. Measures of
Central Tendency
Mean9m
7. Measures of
Central Tendency:
Median, Mode7m
8. Measures of
Position8m
9. Measures of
Dispersion 7m
10. Measures of
Shape6m
11. Measures of
Relationship6m
1. Week 2
Practice
Assessment30m
2. Assessment:
Describing Data
Graphically1h
15m
3. Assessment:
Describing Data
Numerically1h
15m
WEEK 3: (4 hours to complete-8 videos (Total 70 min))
Probability
and
Videos Readings Quizzes
Probability
Distributions
Upon
completion of
this module,
students will be
able to apply
the rules and
conditions of
probability and
probability
distributions to
make decisions
and solve
problems using
R and R
Studio.
1. Introduction to
Probability Part 16m
2. Introduction to
Probability Part 28m
3. Probability
Distributions Part
15m
4. Probability
Distributions Part
27m
5. The Binomial
Distribution10m
6. The Poisson
Distribution8m
7. The Normal
Distribution12m
8. The Exponential
Distribution9m
1. Week 3
Practice
Assessment30m
2. Probability and
Probability
Distributions2h
WEEK 4: (3 hours to complete-8 videos (Total 55 min))
Sampling
Distributions,
Error and
Estimation
Videos Readings Quizzes
Upon
completion of
this module,
1. Sampling Error7m 1. Week 4
Practice
Assessment30m
students will be
able to use R
and R Studio to
characterize
sampling and
sampling
distributions,
error and
estimation with
respect to
statistical
inference.
2. Random Sampling
Distributions8m
3. The Central
Theorem5m
4. Probability with
RSDs7m
5. Estimates and
Estimators6m
6. Confidence
Intervals4m
7. Confidence
Intervals for the Mean
and Variance10m
8. Confidence
Intervals for
Proportions and
Poisson Counts4m
2. Sampling
Distributions,
Error and
Estimation1h
30m