Master of Science in Data Science Syllabus

Master of Science in Data Science

Syllabus

First Course Algorithms for Searching, Sorting, and Indexing

About this Course: This course covers basics of algorithm design and analysis, as well as

algorithms for sorting arrays, data structures such as priority queues, hash functions, and

applications such as Bloom filters

Duration 4 weeks

Will learn

1. Explain fundamental concepts for algorithmic searching and sorting

2. Design basic algorithms to implement sorting, selection, and hash functions in

heap data structures

3. Describe heap data structures and analyze heap components, such as arrays and

priority queues

Skills

1. Analysis of Algorithms

2. Hash tables

3. Algorithm Design

4. Python Programming

5. Data Structure Design

Syllabus - What you will learn from this course

1. Basics of Algorithms Through Searching and Sorting

2. Heaps and Hashtable Data Structures

3. Randomization: Quicksort, Quickselect, and Hashtables

4. Applications of Hashtables

Week one: - (7 videos (Total 202 min, 9 readings, 5 quizzes)

Basics of Algorithms

Through Searching

and Sorting

Videos Readings Practice Exercises

In this module the

student will learn the

very basics of

algorithms through

three examples:

insertion sort (sort an

array in

ascending/descending

order); binary search:

search whether an

element is present in a

sorted array and if yes,

find its index; and

merge sort (a faster

method

for sorting an array).

Through these

algorithms the student

will be introduced to

the analysis

of algorithms -- i.e,

proving that the

algorithm is correct for

the task it has been

designed for

and establishing a

bound on how the time

taken to execute the

algorithm grows as a

function

1- What is an

Algorithm?

28m

2- An

Introduction

Through the

Insertion Sort

Algorithm

44m

3- Time and

Space

Complexity

30m

4- Asymptotic

Notation 31m

5- Binary Search

22m

6- Merge Sort

Algorithm,

Analysis and

Proof of

Correctness

28m

7- Pitfalls and

Logarithms

15m

1- Important

Prerequisites

10m

2- Logistics:

Textbook and

Readings

10m

3- CLRS

Chapter

110m

4- Overview of

Module

110m

5- CLRS

Chapter

210m

6- CLRS

Chapter

310m

7- Binary

Search

Lecture

Slides10m

8- Jupyter

Notebook on

Binary

Search10m

9- Notes on

MergeSort

10m

1. Insertion Sort and

Running Times 30m

2. Asymptotic

Notation and

Complexity 30m

3. Binary Search

30m

4. Mergesort

Algorithm 30m

of input. The student is

also exposed to the

notion of a faster

algorithm and

asymptotic

complexity through the

O, big-Omega and big-

Theta notations

Week Two:- (5 videos (Total 120 min), 6 readings, 6 quizzes)

Heaps and Hash table

Data Structures Videos Readings Practice Exercises

In this module, the

student will learn

about the basics of

data structures that

organize data to

make certain types of

operations faster. The

module starts with a

broad introduction to

data structures and

talks about some

simple data structures

such as first-in first out

queues and

last-in first out stack.

Next, we introduce the

heap data structure and

the basic properties of

heaps. This is followed

by algorithms for

insertion, deletion and

finding the minimum

1. A Simple

Data Structure:

The Dynamic

Array 20m

2. Heap,

Min/Max-

Heaps and

Properties of

Heaps 24m

3. Heap

Primitives:

Bubble

Up/Bubble

Down 29m

4. Priority

Queues,

Heapify, and

Heapsort 28m

5. Hashtables –

Introduction

17m

1. Overview of

Module 210m

2. CLRS Chapter

10, 10.1

(Optional)10m

3. CLRS Chapter

6.1 and 6.2 10m

4. CLRS Chapter

6.310m

5. CLRS Chapter

6.4 and 6.510m

6. CLRS Chapter

11.1 and 11.2

10m

1. Basics of Data

Structures 30m

2. Basics of Heap

Data Structures

30m

3. Bubble-

Up/Bubble-

Down, Insertion

and Deletion

Operations 30m

4. Heapify, Priority

Queues and

Heapsort 30m

5. Hashtables 30m

element of a heap

along with their time

complexities. Finally,

we will study the

priority queue data

structure and showcase

some applications.

Week Three:- (7 videos (Total 152 min), 6 readings, 6 quizzes)

Randomization:

Quicksort, Quick

select, and Hash

tables

Videos Readings Practice Exercises

We will go through the

quicksort and

quickselect algorithms

for sorting and

selecting the kth

smallest element in an

array efficiently. This

will also be an

introduction to the role

of randomization in

algorithm design.

Next, we will study

hashtables: a highly

useful data structure

that allows for efficient

search and retrieval

from large amounts of

data. We will learn

about the basic

principles of hash-

1. Introduction

to

Randomization

+ Average Case

Analysis +

Recurrences

23m

2. Partition and

Quicksort

Algorithm 13m

3. Detailed

Design of

Partitioning

Schemes 25m

4. Analysis of

Quicksort

Algorithm 28m

5. Quickselect

Algorithm and

1. Overview of

Module 310m

2. CLRS Chapter

7.110m

3. CLRS Chapter

7.110m

4. CLRS Chapter

7.2 - 7.410m

5. CLRS Chapter

9.1, 9.210m

6. CLRS Chapter

11.310m

1. Quicksort and

Partition 30m

2. Partition

Schemes30m

3. Analysis of

Quicksort30m

4. Quickselect

Algorithm30m

5. Universal Hash

Functions30m

table and operations on

hashtables

its Applications

18m

6. Selecting

Hash Functions

22m

7. Universal

Hash Functions

and Analysis

20m

Week four: - (5 videos (Total 113 min), 6 readings, 2 quizzes)

Applications of

Hashtables Videos Readings Practice Exercises

In this module, we will

learn randomized pivot

selection for quicksort

and quickselect. We

will learn how to

analyze the complexity

of the randomized

quicksort/quickselect

algorithms. We will

learn open address

hashing: a technique

that simplifies

hashtable design. Next

we will study the

design of hash

functions and their

analysis. Finally, we

present and analyze

1. Open

Address

Hashing 18m

2. Perfect

hashing and

Cuckoo hashing

33m

3. Bloom Filters

and Analysis

14m

4. Count-Min

Sketching

Using Hashing

31m

5. String

Matching Using

Hashing 16m

1. Overview of

Module 410m

2. CLRS

11.410m

3. CLRS Chapter

11.5 (Perfect

Hashing) and

Slides with

Scribbles10m

4. Bloom Filter:

Slides10m

5. Count-Min

Sketches

Slides10m

6. Slides with

Scribbles10m

Open Address

Hashing30m

Bloom filters that are

used in various

applications such as

querying streaming

data and counting

Second Course Trees and Graphs: Basics

About this Course: Basic algorithms on tree data structures, binary search trees, self-

balancing trees, graph data structures and basic traversal algorithms on graphs. This course

also covers advanced topics such as kd-trees for spatial data and algorithms for spatial data.

Introduction: Trees and Graphs: Basics can be taken for academic credit as part of CU

Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera

platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU

Boulder’s departments of Applied Mathematics, Computer Science, Information Science,

and others. With performance-based admissions and no application process, the MS-DS is

ideal for individuals with a broad range of undergraduate education and/or professional

experience in computer science, information science, mathematics, and statistics.

WHAT YOU WILL LEARNŞ

1. Define basic tree data structures and identify algorithmic functions associated with

them

2. Execute traversals and create graphs within a binary search tree structure

3. Describe strongly connected components in graphs

SKILLS YOU WILL GAIN

1. Analysis of Algorithms

2. Algorithm Design

3. Python Programming

4. Data Structure Design

5. Graphs Algorithms

WEEK 1: 5 videos (Total 147 min), 8 readings, 6 quizzes

Binary

Search

Trees and

Videos Readings Quizzes

Algorithms

on Trees

In this

module, you

will learn

about binary

search trees

and basic

algorithms

on binary

search trees.

We will also

become

familiar with

the problem

of balancing

in binary

search trees

and study

some

solutions for

balanced

binary search

trees such as

Red-Black

Trees.

1- Binary Search Trees

-- Introduction and

Properties22m

2- Binary Search Trees

-- Insertion and

Deletion31m

3- Red-Black Trees

Basics33m

4- Red-Black Trees --

Rotations/Algorithms

for Insertion (and

Deletion)29m

5- Skip Lists30m

1- Important

Prerequisites10m

2- Logistics:

Textbook and

Readings10m

3- Overview of

Module 110m

4- Reading CLRS

Chapter 1210m

5- CLRS Chapter

12.1-12.310m

6- CLRS Chapter

13 - 13.110m

7- CLRS Chapter

13.2 - 13.310m

8- Skip 10m

1- Basics of Binary

Search Trees30m

2- Binary Search Tree:

Insert and

Delete30m

3- Red-Black Tree

Basics30m

4- Tree Rotations30m

5- Skip Lists30m


Basics of

Graphs and

Graphs

Traversals


In this

module, you

1- Graphs and Their

Representations14m

1- Overview of

Module 210m

1- Graph

Representations30m

will learn

about graphs

and various

basic

algorithms

on graphs

such as

depth

first/breadth

first

traversals,

finding

strongly

connected

components,

and

topological

sorting.

2- Graph Traversals and

Breadth First

Traversal17m

3- Depth First

Search33m

4- Topological Sorting

and Applications11m

5- Strongly Connected

Components -

Definitions15m


Components -

Properties16m


Components -

Algorithm16m

2- CLRS Chapter

22 (Section

22.1)10m

3- CLRS Chapter

22 (Section

22.2)10m

4- CLRS Chapter

22 (Section

22.3)10m

5- CLRS Chapter

22 (Section

22.410m

6- CLRS Chapter

22 (Section

22.5)10m

2- Combined Quiz on

Graph

Traversals30m

3- Topological Sort

Graphs30m


Components30m


Union-Find

Data

Structures

and

Spanning

Tree

Algorithms


Union Find

Data-

structure

with rank

compression.

Spanning

trees and

1- Amortized Analysis

of Data

Structures27m

2- Amortized Analysis:

Potential

Functions26m

3- Spanning Trees and

Minimal Spanning

1- Overview of

Module 310m

2- CLRS Chapter

1710m

3- CLRS Chapter

23 (Section

23.1)10m

1- Amortized

Analysis30m

2- Minimum Spanning

Tree30m

3- Kruskal's

Algorithm30m

properties of

spanning

trees. Prim’s

algorithm for

finding

minimal

spanning

trees.

Kruskal’s

algorithm for

finding

minimal

spanning

trees.

Trees with

Applications26m

4- Kruskal’s Algorithm

for Finding Minimal

Spanning Trees8m

5- Union-Find Data

Structures and Rank

Compression38m

4- CLRS Chapter

23 (Section

23.2)10m

5- CLRS Chapter

2110m

4- Disjoint Set

Forest30m


Shortest

Path

Algorithms


In this

module, you

will learn

about:

Shortest Path

Problem:

Basics.

Bellman-

Ford

Algorithm

for single

source

shortest

path.

1- Shortest Path

Problems and Their

Properties29m

2- Bellman-Ford

Algorithm for Single

Source Shortest

Paths45m

3- Shortest Path on

DAGs11m

4- Dijkstra’s Algorithm

for Single Source

Shortest Paths with

Nonnegative Edge

Weights20m

1- Overview of

Module 410m

2- CLRS Chapter

24 (up to section

24.1)10m

3- CLRS Chapter

24 (Section

24.1)10m

4- CLRS Chapter

24 (Section

24.2)10m

5- CLRS Chapter

24 (Section 24.3

and 24.5)10m

1- Shortest Path

Problems

Properties30m

2- Shortest Path -

Bellman Ford

Algorithm30m

3- Dijkstra's

Algorithm30m

Dijkstra’s

algorithm.

Algorithms

for all-pairs

shortest path

problem

(Floyd-

Warshall

Algorithm)

5- Proof of Dijkstra's

Algorithm12m

6- All Pairs Shortest

Path Problems and

Floyd-Warshall’s

Algorithm34m

6- CLRS Chapter

25 (Sections

25.1 and

25.2)10m

Therd Course Data Science as a Field

About this Course: This course provides a general introduction to the field of Data

Science. It has been designed for aspiring data scientists, content experts who work

with data scientists, or anyone interested in learning about what Data Science is and

what it’s used for. Weekly topics include an overview of the skills needed to be a data

scientist; the process and pitfalls involved in data science; and the practice of data

science in the professional and academic world. This course is part of CU Boulder’s

Masters of Science in Data Science and was collaboratively designed by both

academics and industry professionals to provide learners with an insider’s perspective

on this exciting, evolving, and increasingly vital discipline.

Introduction: Data Science as a Field can be taken for academic credit as part of CU

Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera

platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU

Boulder’s departments of Applied Mathematics, Computer Science, Information Science,

and others. With performance-based admissions and no application process, the MS-DS is

ideal for individuals with a broad range of undergraduate education and/or professional

experience in computer science, information science, mathematics, and statistics.


By taking this course, you will be able explain what data science is and identify the key

disciplines involved. *You will be able to use the steps of the data science process to

create a reproducible data analysis and identify personal biases. *You will be able to

identify interesting data science applications, locate jobs in Data Science, and begin

developing a professional network.


1. Data Science

2. Applied Mathematics

3. Information

4. Science Statistics

5. Computer Science

WEEK 1: ( 1 hour to complete ) 4 videos( total 15 min m)

Introduction to

Data Science:

the Past,

Present, and

Future of a

New Discipline


This week we will

talk about the

past, present and

future of data

science. The

growth of data

science has been

fueled by the

growth of the

internet, social

media and online

shopping as well

as by the rapid

increases 1-Data

Science as a Field

Course

Introduction Data

Science as a Field

Course

Introduction 2m

2-Where Does

Data Science

Come From? 2m

in data storage

capabilities. You

will watch several

short videos and

1-Data Science as a

Field Course

Introduction Data

Science as a Field

Course Introduction

2m

2-Where Does Data

Science Come From?

2m

3-The Current State

of the Field 7m

4-Where is Data

Science Going? 2m

participate in

discussions about

the future of data

science

WEEK 2: (4 hours to complete) 8 videos( total 97 m) 7 readings 1 quiz

Data Science in

Industry,

Government,

and Academia


In this module,

you will learn

about graphs and

various basic

algorithms on

graphs such as

depth

first/breadth first

traversals,

finding strongly

connected

components, and

topological

sorting.

1- Introduction to

"Data Science in

Business, Industry,

and the Professional l

World" Introduction

to "Data Science in

Business, Industry,

and the Professional l

World" 1m

2-Brian Brown &

Rinaldo Madera 16m

3- Natalie Jackson

11m

4- Villa Hulden16m

5- Robin Burke 9m

6- Seth Spielman

16m

7- Katharina Kenn

15m

8- Dan Larrimore

10m

1- Introducing

Brian Brown and

Rinaldo

Maldera10 m

2- Introducing

Natalie Jackson10

m

3- Introducing

Villa Holden 10m

4- Introducing

Robin Burke 10m

5- Introducing

Seth Spielman

10m

6- Introducing

Katharina

Kann10m

7- Introducing

Dan Larremore1

0m

1 quiz

WEEK 3: (4 hours to complete) 11 videos( total 64 m) 9 readings 2 quizzes

Data Science

Process and

Pitfalls


This week you

will learn about

the importance

of

reproducibility

and how to

achieve it, learn

the steps in a

data analysis

process and

learn about the

possible pitfalls

in data science.

You will watch

demonstrating

the various steps

in the data

science process

and try out these

processes for

yourself on a

different dataset

1- Importance and

Process of

Reproducibility

Importance and

Process of

Reproducibility

Importance and

Process of

Reproducibility 4m

1-Knit to PDF 3m

2-Intro to R

Markdown8m

3-Overview of Steps

in the Data Science

Process 2m

4-Importing Data6m

5-Tidying and

Transforming Data

8m

6- Visualizing Data

6m

7- Analyzing Data

7m

8-Modeling Data 5m

9-Bias sources 4m

10- Intro to Data

Ethics course with

Bobby Schnabel 5m

1- Before You

Watch the Next

Video...10m

2- Knit the

Template10m

3-Use R

Markdown to

Create a

Document 10m

4-For More Info

On Tidy verse

Packages...10m

5- Project

Files10m

6- Project Step 1:

Start an Rmd

Document10m

7- Project Step 2:

Tidy and

Transform Your

Data10m

8-Project Step 3:

Add

Visualizations and

Analysis10m

9-Project Step 4:

Add Bias

Identification10m

File Unlocking

Quiz1m


https://www.coursera.org/lecture/data-science-as-a-field/importance-and-process-of-reproducibility-Mv5N3



Communicating

Your Results Videos Readings Quizzes

This week you

will learn about

important ways

of

communicating

your results. We

will discuss the

important things

to know about

presentations

and reports. You

will also learn

about the

importance of

networking and

try it out.

1- Do’s and Don’ts

for Good Reports and

Presentations

Do’s and Don’ts for

Good Reports and

Presentations4m

2- CU Boulder’s MS

in Data Science:

Where to Go from

Here?3m

-Imposter

Syndrome 10m

1 quizzes

Fourth Course Cybersecurity for Data Science

About this Course: This course aims to help anyone interested in data science

understand the cybersecurity risks and the tools/techniques that can be used to

mitigate those risks. We will cover the distinctions between confidentiality, integrity,

and availability, introduce learners to relevant cybersecurity tools and techniques

including cryptographic tools, software resources, and policies that will be essential

to data science. We will explore key tools and techniques for authentication and

access control so producers, curators, and users of data can help ensure the security

and privacy of the data.

Introduction: This course can be taken for academic credit as part of CU Boulder’s

Master of Science in Data Science (MS-DS) degree offered on the Coursera platform.

The MS-DS is an interdisciplinary degree that brings together faculty from CU

Boulder’s departments of Applied Mathematics, Computer Science, Information

Science, and others. With performance-based admissions and no application process,

the MS-DS is ideal for individuals with a broad range of undergraduate education

and/or professional experience in computer science, information science,

mathematics, and statistics.


1. Characterize the CIA principles and use them to classify a variety of cyber

scenarios.

2. Identify and disseminate vulnerabilities in the data security space- social

(human) and technical (digital).

3. Distinguish ethical boundaries of hacking and its applications.

4. Explore professional cybersecurity networks and connect with experts from the

field.


1. Communication

2. Risk Analysis

3. Problem Solving

WEEK 1: 4 videos (Total 20 min), 4 readings, 1 quiz

Basic

Cybersecurity

Concepts and

Principles


In this module,

you will learn

the basics of

cybersecurity

and the CIA

triad.

1- Introduction1m

2- Introduce the CIA

Triad and

Cybersecurity

Basics5m

3- LinkedIn and

Twitter for

Professionals6m

4- Join LinkedIn

Cybersecurity

Course Group and

Post6m

1- The CIA

Triad10m

2- Create Your

LinkedIn and

Twitter

Accounts10m

3- Networking

with LinkedIn

and Twitter10m

4- Assignment:

Join

Cybersecurity

Course Group

on LinkedIn and

Submit

Posts10m

Basic

Cybersecurity

Concepts and

Principles30m


Your Cyber

Story and

Your Public

Data Profile


In this module,

you will

explore your

Cyber Story

and examine

your pubic

data profile

components,

and

topological

sorting.

1- Digital Reputation

and Cyberstory,

Google

Yourself10m

2- Passwords and

Cybersecuirty16m

3- Basic

Cryptography and

Encryption11m

1- Your Online

Reputation10m

2- Exercise —

What is Your

Cyber Story?

10m

3- Exercise —

Ungoogle

Yourself and

Set up Google

Alerts10m

4- Passwords20m

5- Cryptography1h

10m

Your Cyber Story

and Your Public

Data Profile30m


Wi fi, IoT,

Hacking,

Data

Breaches and

Social

Engineering


This module

explores the

world of

hacking, IoT

and social

engineering.

1- Hacking — White,

Grey and Black

Hackers10m

2- IoT9m

3- Social

Engineering9m

1- Hackers10m

2- Internet of

Things10m

3- Social

Engineering10m

4- LinkedIn

Discussion:

Hacking or IOT

and

Cybersecurity

10m

Wifi, IoT,

Hacking, Data

Breaches and

Social

Engineering30m


The Ethics of

Cyber

Security


This session

students will

leverage social

media to

connect with

cybersecurity

experts and

explore the

ethics around

cybersecurity

and data.

1- Facial Recognition

and 'Big Brother'

12m

2- Getting Yourself

Out There on

LinkedIn and

Twitter8m

1- Big Brother and

Surveillance7h

43m

2- Optional

LinkedIn

Discussion10m

3- Cybersecurity

Experts on

Twitter10m

4- Explore and

Network10m

The Ethics of

Cyber Security

30m

Fifth Course Ethical Issues in Data Science (core course)

About this Course: The applications of computing that involve large amounts of

data - the field of data science - affect the lives of most people in the United States

and the world. These impacts include recommendations made to us by Internet-based

systems, information about us online, technologies used for security and monitoring,

data used in healthcare, and much more. In many cases, they are affected Artificial

intelligence and machine learning techniques.


Master of Science in Data Science (MS-DS) degree offered on the Coursera platform.

The MS-DS is an interdisciplinary degree that brings together faculty from CU

Boulder’s departments of Applied Mathematics, Computer Science, Information

Science, and others. With performance-based admissions and no application process,

the MS-DS is ideal for individuals with a broad range of undergraduate education

and/or professional experience in computer science, information science,



1. Characterize the CIA principles and use them to classify a variety of cyber

scenarios.

2. Identify and disseminate vulnerabilities in the data security space- social

(human) and technical (digital).

3. Distinguish ethical boundaries of hacking and its applications.

4. Explore professional cybersecurity networks and connect with experts from the

field.


4. Communication

5. Risk Analysis

6. Problem Solving

WEEK 1: 2 hours to complete 4 videos 4 readings 1 practice exercise

What are

Ethics?


Module 1 of

this course

establishes a

basic

foundation in

the notion of

simple

utilitarian

ethics we use

for this course.

The lecture

material and

the quiz

questions are

designed to get

most people to

come to an

agreement

about right and

wrong, using

the utilitarian

framework

taught here. If

you bring your

own moral

sense to bear,

or think hard

about possible

counter-

1- What are Ethics?

9m

2- Data Science Needs

Ethics3m

3- Case Study: Spam

(not the meat)4m

1- Course

Syllabus10m

2- Welcome

Announcement10m

3- Help us learn

more about

you!10m

4- What are Ethics?

- Introduction10m

Module 1

Quiz30m

arguments, it is

likely that you

can arrive at a

different

conclusion.

But that

discussion is

not what this

course is

about. So,

resist that

temptation, so

that we can

jointly lay a

common

foundation for

the rest of this

course

History,

Concept of

Informed

Consent


Early

experiments

on human

subjects were

by scientist’s

intent

on advancing

medicine, to

the benefit of

all humanity,

disregard

1- Human Subjects

Research and

Informed Consent:

Part 28m

2- Limitations of

Informed Consent 9m

Case Study: It's Not

Occupied 6m

Module 2

Quiz30m

for welfare of

individual

human

subjects. Often

these were

performed by

white

scientists, on

black subject.

In this module

we

will talk about

the laws that

govern the

Principle of

Informed

Consent. We

will also

discuss why

informed

consent

doesn’t work

well for

retrospective

studies, or

for the

customers of

electronic

businesses.

Data

Ownership


Who owns

data about

you? We'll

explore that

question in

this

module. A few

examples of

personal data

include

copyrights

for

biographies;

ownership of

photos posted

online, Yelp,

Trip

Advisor,

public data

capture, and

data sale.

We'll also

explore the

limits on

recording and

use of

data

1- Limits on

Recording

and Use 7m

2- Data Ownership

Finale 3m

3- Case Study: Rate

My

Professor 3m

4- Case Study:

Privacy

After Bankruptcy

2m

Module 3

Quiz30m


Privacy Videos Readings Quizzes

Privacy is a

basic human

1- History of Privacy

15m

Privacy -

Introduction10m

Module 4 Quiz30m

need. Privacy

means the

ability to

control

information

about yourself,

not necessarily

the ability to

hide things. We

have seen the

rise different

value systems

with regards to

privacy. Kids

today are more

likely to share

personal

information on

social media,

for example. So

while values are

changing, this

doesn’t remove

the

fundamental

need to be able

to control

personal

information. In

this module

we'll examine

the relationship

between the

services we are

2- Degrees of Privacy

10m

3- Modern Privacy

Risks 12m

4- Case Study:

Targeted Ads 3m

5- Case Study: The

Naked Mile 2m

6- Case Study: Sneaky

Mobile Apps 5m

Module 4

Discussion

Prompt

References10m

provided and

the data we

provide in

exchange: for

example, the

location for a

cell phone.

We'll also

compare and

contrast "data"

against

"metadata".

Anonymity Videos Readings Quizzes

Certain

transactions

can be

performed

anonymously.

But many

cannot,

including where

there is physical

delivery of

product. Two

examples

related to

anonymous

transactions

1- Anonymity5m

De-identification

Has Limited Value:

Part17 m

2- De-identification

Has Limited Value:

Part 2 10m

3- Case Study: Credit

Card Statements 2m

Module 5 Quiz30m

we'll look at are

"block chains"

and "bitcoin".

We'll also look

at some of the

drawbacks that

come with

anonymity


Data Validity Videos Readings Quizzes

Data validity

is not a new

concern.

All too often,

we see the

inappropriate

use of Data

Science

methods

leading to

erroneous

conclusions.

This module

points

out common

errors, in

language

suited for a

student with

limited

exposure to

statistics.

We'll focus

1- Validity 9m Choice

of Attributes and

Measures6m Errors in

Data Processing 8m

2- Errors in Model

Design 8m

3- Managing Change

5m Case Study: Three

Blind Mice 4m

4- Case Study:

Algorithms and Race

3m Case Study:

Algorithms in the

Office 3m 5- Case Study:

Germanwings Crash

5m Case Study:

Google Flu 5m

Data Validity - Introduction10m

Module 6 Quiz 30m

on the notion

of

representative

sample:

opinionated

customers,

for example,

are not

necessarily

representative

of all

customers.

Algorithmic

Fairness


What could be

fairer than a

data driven

analysis?

Surely the

dumb

computer

cannot harbor

prejudice or

stereotypes.

While indeed

the analysis

technique may

be completely

neutral, given

the

assumptions,

the model, the

training data,

and so forth,

1- Algorithmic

Fairness10m Correct

but Misleading

Results12m

2- P Hacking10m

Case Study: High

Throughput

Biology3m

3-Case Study:

Geopricing2m

4- Case Study: Your

Safety Is My Lost

Income10m

Algorithmic

Fairness -

Introduction10m

Module 7 Quiz

30m

all of these

boundary

conditions are

set by humans,

who may

reflect their

biases in the

analysis result,

possibly

without even

intending to

do so. Only

recently have

people begun

to think about

how

algorithmic

decisions can

be unfair.

Consider this

article,

published in

the New York

Times. This

module

discusses this

cutting edge

issue.


Societal

Consequences


In Module 8,

we consider

societal

consequences of

Data Science

that we should

be concerned

about even if

there are no

issues with

fairness,

validity,

anonymity,

privacy,

ownership or

human subject's

research. These

“systemic”

concerns are

often the

hardest to

address, yet just

as important as

other issues

discussed

before. For

example, we

consider

ossification, or

the tendency of

algorithmic

methods to

learn and codify

the current state

of the world and

thereby make it

1- Societal Impact16m

Ossification7m

2- Surveillance4m Case

Study: Social Credit

Scores7m

3- Case Study:

Predictive Policing8m

Societal

Consequences -

Introduction10m

Module 8 Quiz30m

harder to

change.

Information

asymmetry has

long been

exploited for

the advantage

of some, to the

disadvantage of

others.

Information

technology

makes spread of

information

easier, and

hence generally

decreases

asymmetry.

However, Big

Data sets and

sophisticated

analyses

increase

asymmetry in

favor of those

with ability to

acquire/access.

Code of Ethics Videos Readings Quizzes

Finally, in

Module 9, we

tie all the

issues we have

considered

together into a

simple, two-

Wrap Up2m Post Course Survey10m Module 9

Quiz30m

point code of

ethics for the

practitioner

Attributions Videos Readings Quizzes

This module

contains lists

of attributions

for the

external

audiovisual

resources used

throughout the

course.

Week 1

Attributions10m

Sixth Course Data Mining Pipeline

About this Course: This course introduces the key steps involved in the data mining

pipeline, including data understanding, data preprocessing, data warehousing, data

modeling, interpretation and evaluation, and real-world applications.

Introduction: Data Mining Pipeline can be taken for academic credit as part of CU

Boulder’s Master of Science in Data Science (MS-DS) degree offered on the

Coursera platform. The MS-DS is an interdisciplinary degree that brings together

faculty from CU Boulder’s departments of Applied Mathematics, Computer

Science, Information Science, and others. With performance-based admissions and

no application process, the MS-DS is ideal for individuals with a broad range of

undergraduate education and/or professional experience in computer science,

information science, mathematics, and statistics.


1. By the end of this course, you will be able to identify the key components of

the data mining pipeline and describe how they're related.

2. You will be able to identify particular challenges presented by each component

of the data mining pipeline.

3. You will be able to apply techniques to address challenges in each component

of the data mining pipeline.


1. Data Pre-Processing

2. Data Warehousing

3. data understanding

4. data mining pipeline

WEEK 1: 2 videos (Total 88 min), 1 reading, 2 quizzes

Data Mining

Pipeline


This module

provides an

introduction to

data mining and

data mining

pipeline,

including the

four views of

data mining and

the key

components in

the data mining

pipeline.

1- Introduction to

Data Mining41m

2- Introduction to

Data Mining

Pipeline46m

Course

Information10m

WEEK 2: 2 videos (Total 71 min)

Data

Understanding


This module

covers data

understanding

by identifying

key data

properties and

applying

techniques to

characterize

different

datasets.

1- Objects &

Attributes,

Statistics,

Visualization30m

2- Data Similarity

39m


Data

Preprocessing


This module

explains why

data

preprocessing

is needed and

what

techniques can

be used to

preprocess

data.

1- Data Cleaning,

Data Integration

33m

2- Data

Transformation,

Data

Reduction43m


Data

Warehousing


This module

covers the key

characteristics

of data

warehousing

and the

techniques to

support data

warehousing.

1- Data Warehouse,

Data Cube and

OLAP25m

2- Data Cube

Computation,

Data Warehouse

Architecture28m

Seventh Course Statistical Modeling for Data Science

About this Course: Statistical modeling lies at the heart of data science. Well-crafted

statistical models allow data scientists to draw conclusions about the world from the

limited information present in their data. In this three-credit sequence, learners will

add some intermediate and advanced statistical modeling techniques to their data

science toolkit. In particular, learners will become proficient in the theory and

application of linear regression analysis; ANOVA and experimental design; and

generalized linear and additive models. Emphasis will be placed on analyzing real

data using the R programming language.

Introduction: This specialization can be taken for academic credit as part of CU

Boulder’s Master of Science in Data Science (MS-DS) degree offered on the

Coursera platform. The MS-DS is an interdisciplinary degree that brings together

faculty from CU Boulder’s departments of Applied Mathematics, Computer Science,

Information Science, and others. With performance-based admissions and no

application process, the MS-DS is ideal for individuals with a broad range of

undergraduate education and/or professional experience in computer science,

information science, mathematics, and statistics.


1. Correctly analyze and apply tools of regression analysis to model relationship

between variables and make predictions given a set of input variables.

2. Successfully conduct experiments based on best practices in experimental

design.

3. Use advanced statistical modeling techniques, such as generalized linear and

additive models, to model wide range of real-world relationships.


1. Linear Model

2. R Programming

3. Statistical Model

4. regression

5. Calculus

6. and probability theory.

7. Linear Algebra

1- Modern Regression Analysis in R

About this Course:This course will provide a set of foundational statistical

modeling tools for data science. In particular, students will be introduced to methods,

theory, and applications of linear statistical models, covering the topics of parameter

estimation, residual diagnostics, goodness of fit, and various strategies for variable

selection and model comparison. Attention will also be given to the misuse of

statistical models and ethical implications of such misuse.


1. Linear Model

2. R Programming

3. Statistical Model

4. regression


Introduction to

Statistical

Models


In this module,

we will

introduce the

basic conceptual

framework for

statistical

modeling in

general, and for

linear regression

models in

particular.

1- Frameworks

and Goals of

Statistical

Modeling14m

2- The

Assumption of

Concept

Validity7m

3- The Linear

Regression

Model11m

4- Matrix

Representation

of the Linear

Regression

Model15m

5- Assumptions

of Linear

Regression9m

6- The

Appropriatenes

s of Linear

Regression11

m

7- Interpreting the

Linear

Regression

Model I7m

8- Interpreting the

Linear

Regression

Model II5m

Introduction to

Statistical

Modeling30m

The Linear

Regression

Model30m


Linear

Regression

Parameter

Estimation


In this module,

we will learn

how to fit linear

regression

models with

least squares.

We will also

study the

properties of

least squares,

and describe

some goodness

of fit metrics for

linear

regression

models.

1- Introduction to

Least

Squares12m

2- Linear Algebra

for Least

Squares9m

3- Deriving the

Least Squares

Solution20m

4- Regression

Modeling in R:

a First

Pass19m

5- Justifying

Least Squares:

the Gauss-

Markov

Theorem and

Maximum

Likelihood

Estimation13m

6- Sums of

Squares and

Estimating the

Error

Variance19m

1- Least

Squares30m

2- Variability and

Identifiability in

Regression

Models30m

7- The

Coefficient of

Determination

9m

8- The Problem

of Non-

identifiabiliity

6m

9- Regression

Modeling in R:

a Second

Pass22m


Inference in

Linear

Regression


In this module,

we will study

the uses of

linear

regression

modeling for

justifying

inferences from

samples to

populations.

1- Motivating

Statistical

Inference in the

Linear

Regression

Context9m

2- The Sampling

Distribution of

the Least

Squares

Estimator23m

3- T-Tests for

Individual

Regression

Parameters14

m

Ethics in Statistical

Practice and

Communication:

Five

Recommendations30

m

1- Statistical

Inference: Intro

and T-Tests30m

2- Statistical

Inference: the F-

tests and

Confidence

Intervals30m

4- T-Tests in

R20m

5- Motivating the

F-Test:

Multiple

Statistical

Comparisons8

m

6- The F-Test22m

7- The F-Test in

R10m

8- Confidence

Intervals in the

Regression

ContextConfid

ence Intervals

in the

Regression

Context11m


Prediction and

Explanation in

Linear

Regression

Analysis


In this module,

we will identify

how models can

predict future

values, as well

as construct

1- Differentiating

Prediction and

Explanation12

m

Prediction30m

interval

estimates for

those values.

We will also

explore the

relationship

between

statistical

modelling and

causal

explanations.

2- Point

Estimates for

Prediction10m

3- Interval

Estimates for

Prediction9m

4- Making

Predictions

Using Real

Data in R19m

5- When

Prediction

Goes

Wrong7m

6- Defining

Causality22m

2- ANOVA and Experimental Design

About this Course: This second course in statistical modeling will introduce

students to the study of the analysis of variance (ANOVA), analysis of covariance

(ANCOVA), and experimental design. ANOVA and ANCOVA, presented as a type

of linear regression model, will provide the mathematical basis for designing

experiments for data science applications. Emphasis will be placed on important

design-related concepts, such as randomization, blocking, factorial design, and

causality. Some attention will also be given to ethical issues raised in

experimentation.


1. Calculus

2. and probability theory.

3. Linear Algebra


Introduction to

ANOVA and

Experimental

Design


In this module,

we will

introduce the

basic

conceptual

framework for

experimental

design and

define the

models that will

allow us to

answer

meaningful

questions about

the differences

between group

means with

respect to a

continuous

variable. Such

models include

the one-way

Analysis of

Variance

(ANOVA) and

Analysis of

Covariance

(ANCOVA)

models.

1- Introduction to

Experimental

Design10m

2- The One-Way

ANOVA and

ANCOVA

Models6m

ANOVA Variance

Decomposition8m

ANOVA Sums of

Squares and the F-

test14m

3- ANOVA and

ANCOVA as

Regression

Models10m

One-Way

ANOVA

Interpretation in

the Regression

Context10m

1- Introduction to

ANOVA and

Experimental

Design30m

2- The One-Way

ANOVA and

ANCOVA

Models30m

3- ANOVA

Variance

Decomposition30m

4- ANOVA Sums

of Squares and the

F-Test30m

5- ANOVA and

ANCOVA as

Regression

Models30m

6- One-Way

ANOVA

Interpretation in

the Regression

Context30m

7- The ANCOVA

Model30m

8- ANCOVA with

Interactions30m

ANCOVA with

Interactions in

R30m

4- The ANCOVA

Model15m

ANCOVA with

Interactions7m

5- ANCOVA with

Interactions in

R4m


Hypothesis

Testing in the

ANOVA

Context


In this module,

we will learn

how statistical

hypothesis

testing and

confidence

intervals, in the

ANOVA/ANC

OVA context,

can help answer

meaningful

questions about

the differences

between group

means with

respect to a

continuous

variable.

1. Beyond the Full F-

test12m

2. Planned Comparisons:

Defining Contrasts16m

3. Planned Comparisons:

Hypothesis Testing

with Contrasts14m

4. Post Hoc

Comparisons13m

5. Post Hoc Comparisons

in R16m

6. Type II Error and

Power in the ANOVA

Context18m

1. Patrizio E.

Tressoldi

and David

Giofré:

"The

pervasive

avoidance

of

prospectiv

e statistical

power:

major

consequen

ces and

practical

solutions"1

0m

2. Optional:

Beyond

1- Beyond the

Full F-test30m

2- Planned

Comparisons:

Defining

Contrasts30m

3- Planned and

Unplanned

Comparisons30m

4- Type II Error

and Power in the

ANOVA

Context30m

Power

Calculatio

ns:

Assessing

Type S

(Sign) and

Type M

(Magnitud

e)

Errors10m


Two-Way

ANOVA and

Interactions


In this module, we

will study the two-

way ANOVA

model and use it

to answer research

questions using

real data.

1. Motivating the Two-

way ANOVA

Model10m

2. The two-way ANOVA

model9m

3. The Two-way

ANOVA Model as a

Regression Model9m

4. Interaction Terms in

the Two-way ANOVA

Model: Definitions and

Visualizations13m

5. Interactions in the

Two-way ANOVA

Model: Formal

Tests15m

6. Two-way ANOVA

Hypothesis Testing (no

interaction)14m

1. Motivating

the Two-way

ANOVA

Model30m

2. The Two-way

ANOVA

Model30m

3. The Two-way

ANOVA

Model as a

Regression

Model30m

4. Interaction

Terms in the

Two-way

ANOVA

Model:

Definitions

and

7. Looking Ahead: Two-

Way ANOVA and

Experimental

Design5m

Visualizations

30m

5. Interactions in

the Two-way

ANOVA

Model:

Formal

Tests30m

6. Two-way

ANOVA

Hypothesis

Testing (no

interaction)30

m


Experimental

Design: Basic

Concepts and

Designs


In this module,

we will study

fundamental

experimental

design concepts,

such as

randomization,

treatment

design,

replication, and

blocking. We

will also look at

basic factorial

designs as an

improvement

over elementary

“one factor at a

time” methods.

We will

combine these

concepts with

the ANOVA

and ANCOVA

models to

conduct

meaningful

experiments.

1. The Conceptual

Framework of

Experimental

Design19m

2. The Completely

Randomized

Design12m

3. The Randomized

Complete Block

Design (RCBD)8m

4. The Randomized

Complete Block

Design (RCBD):

Hypothesis Testing8m

5. The Factorial

Design10m

6. Further Issues in

Experimental

Design7m

7. Ethical Issues in

Experimental

Design12m

1- Causation

and

Experimental

Design10m

2- Resources

on Ethics 10m

1- The

Conceptual

Framework of

Experimental

Design30m

2- The

Completely

Randomized

Design30m

3- The

Randomized

Complete Block

Design

(RCBD)30m

4- The Factorial

Design30m

Further Issues in

Experimental

Design30m

3- Generalized Linear Models and Nonparametric Regression

About this Course: In the final course of the statistical modeling for data science

program, learners will study a broad set of more advanced statistical modeling tools.

Such tools will include generalized linear models (GLMs), which will provide an

introduction to classification (through logistic regression); nonparametric modeling,

including kernel estimators, smoothing splines; and semi-parametric generalized

additive models (GAMs). Emphasis will be placed on a firm conceptual

understanding of these tools. Attention will also be given to ethical issues raised by

using complicated statistical models.


1- Calculus

2- and probability theory.

3- Linear Algebra


An

Introduction to

Generalized

Linear Models

Through

Binomial

Regression


In this module,

we will

introduce

generalized

linear models

(GLMs)

through the

study of

binomial data.

In particular, we

will motivate

the need for

GLMs;

introduce the

binomial

regression

model,

1. From Linear Models to

Generalized Linear

Models12m

2. The Components of a

GLM6m

3. The Exponential

Family of

Distributions14m

4. Introduction to

Binomial

Regression9m

5. Binomial Regression

Parameter

Estimation11m

6. Interpretation of

Binomial

Regression7m

Fair ML Book,

Introduction10

m

1- Introduction to

Generalized

Linear

Models30m

2- Binomial

Regression30m

3- Binomial

Regression

Inference30m

including the

most common

binomial link

functions;

correctly

interpret the

binomial

regression

model; and

consider various

methods for

assessing the fit

and predictive

power of the

binomial

regression

model.

7. Binomial Regression in

R11m


Models for

Count Data


In this module,

we will consider

how to model

count data.

When the

response

variable is a

count of some

phenomenon,

and when that

count is thought

to depend on a

1. Poisson Regression: A

New Model for Count

Data13m

2. Poisson Regression

Parameter

Estimation6m

3. Interpreting the

Poisson Regression

Model7m

4. Poisson Regression on

Real Data in R21m

1- Poisson

Regression

Basics30m

2- Poisson

Regression

Inference and

Goodness of

Fit30m

set of

predictors, we

can use Poisson

regression as a

model. We will

describe the

Poisson

regression in

some detail and

use Poisson

regression on

real data. Then,

we will describe

situations in

which Poisson

regression is not

appropriate, and

briefly present

solutions to

those situations.

5. Goodness of Fit for

Poisson Regression

I16m

6. Goodness of Fit for

Poisson Regression

II4m

7. Overdispersion12m


Introduction to

Nonparametric

Regression


In this module,

we will

introduce the

concept of a

nonparametric

regression

model. We will

contrast this

notion with the

1. Introduction to

Nonparametric

Regression

Models11m

2. Motivating Kernel

Estimators6m

3. Kernel Estimators14m

4. Smoothing Splines13m

Nonparametric

Regression:

Theory30m

parametric

models that we

have studied so

far. Then, we’ll

study particular

nonparametric

regression

models: kernel

estimators and

splines. Finally,

we will

introduce

additive models

as a blending of

parametric and

nonparametric

methods.

5. Loess: Locally

Estimated Scatterplot

Smoothing14m

6. Kernel Estimation in

R5m


Introduction to

Generalized

Additive

Models


Some models,

such as linear

regression, are

easily

interpretable,

but inflexible,

in that they

don't capture

many real-

world

relationships

1. Motivating

Generalized Additive

Models17m

2. Generalized Additive

Models in R16m

3. Inference with


Models: Effective

Degrees of

Freedom12m

Required:

Generalized

additive

models for

data

science10m

1- Generalized

Additive Models:

Basics30m

2- Generalized

Additive Models:

Inference and

Data

Analysis30m

accurately.

Other models,

such as neural

networks, are

quite flexible,

but very

difficult to

interpret.

Generalized

additive models

(GAMs) are a

nice balance

between

flexibility and

interpretability.

In this module,

we will further

motivate

GAMs, learn

the basic

mathematics of

fitting GAMs,

and

implementing

them on

simulated and

real data in R.

4. Inference with


Models: Tests4m


Models in R: Inference

and Interpretation13m


Models: A Complete

Example with Real

Data16m

Eightth Course Introduction to High-Performance and Parallel Computing

About this Course: This course introduces the fundamentals of high-performance

and parallel computing. It is targeted to scientists, engineers, scholars, really

everyone seeking to develop the software skills necessary for work in parallel

software environments. We will cover the basics of Linux environments and bash

scripting all the way to high throughput computing and parallelizing cod.


Master of Science in Data Science (MS-DS) degree offered on the Coursera

platform. The MS-DS is an interdisciplinary degree that brings together faculty from

CU Boulder’s departments of Applied Mathematics, Computer Science, Information

Science, and others. With performance-based admissions and no application

process, the MS-DS is ideal for individuals with a broad range of undergraduate

education and/or professional experience in computer science, information science,



1. The components of a high-performance distributed computing system

2. Types of parallel programming models and the situations in which they

might be used

3. High-throughput computing

4. Shared memory parallelism

5. Distributed memory parallelism

6. Navigating a typical Linux-based HPC environment

7. Assessing and analyzing application scalability including weak and strong

scaling

8. Quantifying the processing, data, and cost requirements for a computational

project or workflow


These skills include big-data analysis, machine learning, parallel programming,and

optimization.data understanding

WEEK 1: 9 videos( total 46 m)1 reading 1 practice1 quizzes

High-

Performance

Computing

(HPC) for

Non-

Computer

Scientists


Get to know the

basics of an

HPC system.

Users will learn

how to work

with common

high

performance

computing

systems they

may encounter

in future

efforts. This

includes

navigating

filesystems,

working with a

typical HPC

operating

system (Linux),

and some of the

basic concepts

1- Course Overview

2m

2- Tour of JupyterL

4 m

3- Submitting

Assignments 6m

4- Linux - Part 1 5m

5- Linux - Part 2 3m

6- Accessing Remote

Ssystems 6m

7- Filesystems 4m

8- Bash Scripting,

Part 1 7m

9- Bash Scripting -

Part 2 5m

Course Syllabus

10m

Week 1 Quiz30m

of HPC. We

will also

provide users

some key

information

that is specific

to the logistics

of this course.

WEEK 2: 9 videos( total 26 m) 1 practice exercise Quiz 30m

Nuts and Bolts

of HPC


During this

week we will

actually begin

to use HPC

infrastructure.

Some concepts

we will learn

are - how to

load software

appropriately

onto an HPC

system, what

the different

types of nodes

a user can

expect to

encounter on a

system, and

how to submit

a job to

conduct work,

1- HPC Architecture

4m

2- Software 4m

3- Allocations 3m

4- Node Types 1m

5- Job Submission

with Slurm - Part

1 6m

6- Job Submission

with Slurm - Part

2 8m

Week 2 Quiz

30m

such as perform

calculations.

WEEK 3: 6 videos( total 25 m) practice exercise Quiz 30m

Basic

Parallelism


In this module,

we will

introduce users

to the nuances

of memory on a

high

performance

computing

system. We

will also cover

some ways to

conduct work

on a system

most

efficiently. We

will also

introduce some

beginning

components of

parallel

programming.

1- Simple

Application

Timing 3m

2- Serial vs. Parallel

Processing - Part 1

3m

3- Serial vs. Parallel

Processing - Part 2

5m

4- Parallel Memory

Models 5m

5- Data vs. Task

Parallelism 5m

6- High Throughput

Computing 4m

Week 3 Quiz30m

WEEK 4: 4 videos( total 17 m) 1 reading (1)practice exercise Quiz 30m

Evaluating

Parallel

Program

Performance


In this module,

we will continue

to review topics

related to using

a high-

performance

computing

system most

efficiently,

including

scaling your

workflow

measuring how

efficient your

work on a

system is, and

how to utilize as

much of the

computing

resource as

possible.

1- How to Parallelize

Code 6m

2- Speedup and

Parallel Efficiency

4m

3- Scalability 4m

4- Limits to Scaling

3m

Summary of This Course 10m

Week 4 Quiz30m

Nighth Course Managing, Describing, and Analyzing Data

About this Course: In this course, you will learn the basics of understanding the data

you have and why correctly classifying data is the first step to making correct decisions.

You will describe data both graphically and numerically using descriptive statistics and R

software. You will learn four probability distributions commonly used in the analysis of

data. You will analyze data sets using the appropriate probability distribution. Finally, you

will learn the basics of sampling error, sampling distributions, and errors in decision-

making.


Master of Science in Data Science (MS-DS) degree offered on the Coursera

platform. The MS-DS is an interdisciplinary degree that brings together faculty from

CU Boulder’s departments of Applied Mathematics, Computer Science, Information

Science, and others. With performance-based admissions and no application

process, the MS-DS is ideal for individuals with a broad range of undergraduate

education and/or professional experience in computer science, information science,



1. Calculate descriptive statistics and create graphical representations using R

software

2. Explore the basics of sampling and sampling distributions with respect to

statistical

3. inference

4. Solve problems and make decisions using probability distributions

5. Classify types of data with scales of measurement


1. analyzing data

2. describing data

3. using R

4. graphing data

WEEK 1: (3 hours to complete-7 videos (Total 48 min), 1 reading, 2

quizzes)

Data and

Measurement


Upon

completion of

this module,

students will be

able to use R

and R Studio to

work with

data and

classify types of

data using

measurement

scales.

1. Welcome to

Managing,

Describing and

Analyzing Data1m

2. Types of Data and

Measurement

Scales6m

3. Measurement

Scales: Nominal and

Ordinal6m

4. Measurement

Scales: Interval, Ratio

and Absolute6m

5. Measurement as a

Process, The Big 5

Aspects of Data10m

6. Sampling

Concepts6m

7. Working in

RStudio10m

Attention Learners:

R Code / File

Resources10m

Week 1 Practice

Assessment30m

Assessment: Data

and

Measurement45m

WEEK 2: (5 hours to complete -11 videos (Total 85 min))

Describing

Data

Graphically

and Numerical


Upon

completion of

this module,

students will be

able to use R

and RStudio to

create visual

representations

of data, and

calculate

descriptive

statistics to

describe

location, spread

and

shape of data.

1. Create a Run

Chart9m

2. Frequency

Distributions7m

3. Frequency

Polygons and

Histograms7m

4. Histogram Patterns

and Density Plots7m

5. Box and Whisker

Plots7m

6. Measures of

Central Tendency

Mean9m

7. Measures of

Central Tendency:

Median, Mode7m

8. Measures of

Position8m

9. Measures of

Dispersion 7m

10. Measures of

Shape6m

11. Measures of

Relationship6m

1. Week 2

Practice

Assessment30m

2. Assessment:

Describing Data

Graphically1h

15m

3. Assessment:

Describing Data

Numerically1h

15m

WEEK 3: (4 hours to complete-8 videos (Total 70 min))

Probability

and


Probability

Distributions

Upon

completion of

this module,

students will be

able to apply

the rules and

conditions of

probability and

probability

distributions to

make decisions

and solve

problems using

R and R

Studio.

1. Introduction to

Probability Part 16m

2. Introduction to

Probability Part 28m

3. Probability

Distributions Part

15m

4. Probability

Distributions Part

27m

5. The Binomial

Distribution10m

6. The Poisson

Distribution8m

7. The Normal

Distribution12m

8. The Exponential

Distribution9m

1. Week 3

Practice

Assessment30m

2. Probability and

Probability

Distributions2h

WEEK 4: (3 hours to complete-8 videos (Total 55 min))

Sampling

Distributions,

Error and

Estimation


Upon

completion of

this module,

1. Sampling Error7m 1. Week 4

Practice

Assessment30m

students will be

able to use R

and R Studio to

characterize

sampling and

sampling

distributions,

error and

estimation with

respect to

statistical

inference.

2. Random Sampling

Distributions8m

3. The Central

Theorem5m

4. Probability with

RSDs7m

5. Estimates and

Estimators6m

6. Confidence

Intervals4m

7. Confidence

Intervals for the Mean

and Variance10m

8. Confidence

Intervals for

Proportions and

Poisson Counts4m

2. Sampling

Distributions,

Error and

Estimation1h

30m

Master of Science in Data Science Syllabus

Documents

Transcript of Master of Science in Data Science Syllabus