Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools,...

26
1 Microsoft Professional Program: Data Science Unit 1 Fundamentals, Course 1: Introduction to Data Science Learn what it takes to become a data scientist. This is the first stop in the Data Science curriculum from Microsoft. It will help you get started with the program, plan your learning schedule, and connect with fellow students and teaching assistants. Along the way, you’ll get an introduction to working with and exploring data using a variety of visualization, analytical, and statistical techniques. What you'll learn How the Microsoft Data Science curriculum works How to navigate the curriculum and plan your course schedule Basic data exploration and visualization techniques in Microsoft Excel Foundational statistics that can be used to analyze data Duration: 2 weeks Total effort: 12 24 hours Level: Introductory Prerequisite knowledge: none Language: English, with Q&A worskshop in Croatian language

Transcript of Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools,...

Page 1: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

1

Microsoft Professional Program: Data Science

Unit 1 – Fundamentals, Course 1: Introduction to Data

Science

Learn what it takes to become a data scientist. This is the first stop in the Data

Science curriculum from Microsoft. It will help you get started with the program, plan

your learning schedule, and connect with fellow students and teaching assistants.

Along the way, you’ll get an introduction to working with and exploring data using a

variety of visualization, analytical, and statistical techniques.

What you'll learn

• How the Microsoft Data Science curriculum works

• How to navigate the curriculum and plan your course schedule

• Basic data exploration and visualization techniques in Microsoft Excel

• Foundational statistics that can be used to analyze data

Duration: 2 weeks

Total effort: 12 – 24 hours

Level: Introductory

Prerequisite knowledge: none

Language: English, with Q&A worskshop in Croatian language

Page 2: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

2

Microsoft Professional Program: Data Science

Unit 1 – Fundamentals, Course 2a: Analyzing and

Visualizing Data with Excel

Excel is one of the most widely used solutions for analyzing and visualizing data. It

now includes tools that enable the analysis of more data, with improved

visualizations and more sophisticated business logics. In this data science course,

you will get an introduction to the latest versions of these new tools in Excel 2016

from an expert on the Excel Product Team at Microsoft.

Learn how to import data from different sources, create mashups between data

sources, and prepare data for analysis. After preparing the data, find out how

business calculations can be expressed using the DAX calculation engine. See how

the data can be visualized and shared to the Power BI cloud service, after which it

can be used in dashboards, queried using plain English sentences, and even

consumed on mobile devices.

Do you feel that the contents of this course is a bit too advanced for you and you

need to fill some gaps in your Excel knowledge? Do you need a better understanding

of how pivot tables, pivot charts and slicers work together, and help in creating

dashboards? If so, check out DAT205x: Introduction to Data Analysis using Excel.

What you'll learn

• Gather and transform data from multiple sources

• Discover and combine data in mashups

• Learn about data model creation

• Explore, analyze, and visualize dana

Duration: 2 weeks

Total effort: 12 – 24 hours

Level: Intermediate

Prerequisite knowledge: Understanding of Excel analytic tools such as tables, pivot

tables and pivot charts. Also, some experience in working with data from databases

and also from text files will be helpful.

Language: English, with Q&A worskshop in Croatian language

Syllabus

Week 1

Setup the lab environment by installing Office applications. Learn how to perform

data analysis in Excel using classic tools, such as pivot tables, pivot charts, and

slicers, on data that is already in a worksheet / grid data. Explore an Excel data

model, its content, and its structure, using the Power Pivot add-in. Create your first

DAX expressions for calculated columns and measures.

Learn about queries (Power Query add-in in Excel 2013 and Excel 2010), and build

an Excel data model from a single flat table. Learn how to import multiple tables from

Page 3: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

3

Microsoft Professional Program: Data Science

a SQL database, and create an Excel data model from the imported data. Create a

mash-up between data from text-files and data from a SQL database.

Week 2

Get the details on how to create measures to calculate for each cell, filter context for

calculation, and explore several advanced DAX functions. Find out how to use

advanced text query to import data from a formatted Excel report. Perform queries

beyond the standard user interface.

Explore ways to create stunning visualizations in Excel. Use the cube functions to

perform year-over-year comparisons. Create timelines, hierarchies, and slicers to

enhance your visualizations. Learn how Excel can work together with Power BI.

Upload an Excel workbook to the Power BI service. Explore the use of Excel on the

mobile platform.

Page 4: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

4

Microsoft Professional Program: Data Science

Unit 1 – Fundamentals, Course 2b: Analyzing and

Visualizing Data with Power BI

Learn Power BI, a powerful cloud-based service that helps data scientists visualize

and share insights from their data.

Power BI is quickly gaining popularity among professionals in data science as a

cloud-based service that helps them easily visualize and share insights from their

organizations’ data.

In this data science course, you will learn from the Power BI product team at

Microsoft with a series of short, lecture-based videos, complete with demos, quizzes,

and hands-on labs. You’ll walk through Power BI, end to end, starting from how to

connect to and import your data, author reports using Power BI Desktop, and publish

those reports to the Power BI service. Plus, learn to create dashboards and share

with business users—on the web and on mobile devices.

What you'll learn

• Connect, import, shape, and transform data for business intelligence (BI)

• Visualize data, author reports, and schedule automated refresh of your

reports

• Create and share dashboards based on reports in Power BI desktop and

Excel

• Use natural language queries

• Create real-time dashboards

Duration: 2 weeks

Total effort: 12 – 24 hours

Level: Introductory

Prerequisite knowledge: Some experience in working with data from Excel,

databases, or text files.

Language: English, with Q&A worskshop in Croatian language

Syllabus

Week 1

• Understanding key concepts in business intelligence, data analysis, and data

visualization

• Importing your data and automatically creating dashboards from services such

as Marketo, Salesforce, and Google Analytics

• Connecting to and importing your data, then shaping and transforming that

data

• Enriching your data with business calculations

• Visualizing your data and authoring reports

• Scheduling automated refresh of your reports

• Creating dashboards based on reports and natural language queries

Page 5: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

5

Microsoft Professional Program: Data Science

• Sharing dashboards across your organization

• Consuming dashboards in mobile apps

Week 2

• Leveraging your Excel reports within Power BI

• Creating custom visualizations that you can use in dashboards and reports

• Collaborating within groups to author reports and dashboards

• Sharing dashboards effectively based on your organization’s needs

• Exploring live connections to data with Power BI

• Connecting directly to SQL Azure, HD Spark, and SQL Server Analysis

Services

• Introduction to Power BI Development API

• Leveraging custom visuals in Power BI

Page 6: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

6

Microsoft Professional Program: Data Science

Unit 1 – Fundamentals, Course 3: Analytics Storytelling for

Impact

All analytics work begins and ends with a story. Storytelling with data is the analytics

professional’s missing link in delivering the essence of date signals and insights to

executives, management, and other stakeholders.

In this analytics storytelling course, you’ll learn effective strategies and tools to

master data communication in the most impactful way possible—through well-crafted

analytics stories.

You'll explore what a story is and, perhaps more importantly, what a story is not. Find

out how stories create value and why they matter. Learn to craft stories, command

the room, finish strong, and assess your impact. Get practical help applying these

ideas to your data analytics work. Plus, you'll learn guidelines and best practices for

creating high-impact reports and presentations.

edX offers financial assistance for learners who want to earn Verified Certificates but

who may not be able to pay the fee. To apply for financial assistance, enroll in the

course, then follow this link to complete an application for assistance.

What you'll learn

• How to apply storytelling principles to your analytics work

• How to improve your analytics presentations through storytelling

• Guidelines and best practices for creating high-impact reports and

presentations

Duration: 1 week

Total effort: 12 – 24 hours

Level: Introductory

Prerequisite knowledge: one of the following courses or equivalent knowledge and

skills:

• Analyzing and Visualizing Data with Excel

• Analyzing and Visualizing Data with Power BI

• Working knowledge of PowerPoint.

Language: English, with Q&A worskshop in Croatian language

Page 7: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

7

Microsoft Professional Program: Data Science

Unit 1 – Fundamentals, Course 4: Ethics and Law in Data

and Analytics Analytics and AI are powerful tools that have real-word outcomes. Learn how to

apply practical, ethical, and legal constructs and scenarios so that you can be an

effective analytics professional.

Corporations, governments, and individuals have powerful tools in Analytics and AI to

create real-world outcomes, for good or for ill.

Data professionals today need both the frameworks and the methods in their job to

achieve optimal results while being good stewards of their critical role in society

today.

In this course, you'll learn to apply ethical and legal frameworks to initiatives in the

data profession. You'll explore practical approaches to data and analytics problems

posed by work in Big Data, Data Science, and AI. You'll also investigate applied data

methods for ethical and legal work in Analytics and AI.

edX offers financial assistance for learners who want to earn Verified Certificates but

who may not be able to pay the fee. To apply for financial assistance, enroll in the

course, then follow this link to complete an application for assistance.

What you'll learn

• Foundational abilities in applying ethical and legal frameworks for the data

profession

• Practical approaches to data and analytics problems, including Big Data and

Data Science and AI

• Applied data methods for ethical and legal work in Analytics and AI

Duration: 1 week

Total effort: 12 – 18 hours

Level: Introductory

Prerequisite knowledge: No prerequisites

Language: English, with Q&A worskshop in Croatian language

Page 8: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

8

Microsoft Professional Program: Data Science

Unit 1 – Fundamentals, Course 5: Querying Data with

Transact-SQL

From querying and modifying data in SQL Server or Azure SQL to programming with

Transact-SQL, learn essential skills that employers need.

Transact-SQL is an essential skill for data professionals and developers working with

SQL databases. With this combination of expert instruction, demonstrations, and

practical labs, step from your first SELECT statement through to implementing

transactional programmatic logic.

Work through multiple modules, each of which explore a key area of the

TransactSQL language, with a focus on querying and modifying data in Microsoft

SQL Server or Azure SQL Database. The labs in this course use a sample database

that can be deployed easily in Azure SQL Database, so you get hands-on experience

with Transact-SQL without installing or configuring a database server.

What you'll learn

• Create Transact-SQL SELECT queries

• Work with data types and NULL

• Query multiple tables with JOIN

• Explore set operators

• Use functions and aggregate data

• Work with subqueries and APPLY

• Use table expressions

• Group sets and pivot data

• Modify data

• Program with Transact-SQL

• Implement error handling and transactions

Duration: 3 weeks

Total effort: 24 – 30 hours

Level: Intermediate

Prerequisite knowledge: Basic understanding of databases and IT systems.

Language: English, with Q&A worskshop in Croatian language

Page 9: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

9

Microsoft Professional Program: Data Science

Unit 2 Core Data Science, Course 6a: Introduction to R for

Data Science

Learn the R statistical programming language, the lingua franca of data science in

this hands-on course.

R is rapidly becoming the leading language in data science and statistics. Today, R

is the tool of choice for data science professionals in every industry and field.

Whether you are full-time number cruncher, or just the occasional data analyst, R

will suit your needs.

This introduction to R programming course will help you master the basics of R. In

seven sections, you will cover its basic syntax, making you ready to undertake your

own first data analysis using R. Starting from variables and basic operations, you will

eventually learn how to handle data structures such as vectors, matrices, data

frames and lists. In the final section, you will dive deeper into the graphical

capabilities of R,

and create your own stunning data visualizations. No prior knowledge in

programming or data science is required.

What makes this course unique is that you will continuously practice your newly

acquired skills through interactive in-browser coding challenges using the DataCamp

platform. Instead of passively watching videos, you will solve real data problems

while receiving instant and personalized feedback that guides you to the correct

solution.

What you'll learn

• Introductory R language fundamentals and basic syntax

• What R is and how it’s used to perform data analysis

• Become familiar with the major R data structures

• Create your own visualizations using R

Duration: 2 weeks

Total effort: 12 – 24 hours

Level: Introductory

Prerequisite knowledge: none, but previous experience in basic mathematics is

helpful.

Language: English, with Q&A worskshop in Croatian language

Syllabus

Module 1: Introduction to Basics

Take your first steps with R. Discover the basic data types in R and assign your first

variable.

Module 2: Vectors

Page 10: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

10

Microsoft Professional Program: Data Science

Analyze gambling behaviour using vectors. Create, name and select elements from

vectors.

Module 3: Matrices

Learn how to work with matrices in R. Do basic computations with them and

demonstrate your knowledge by analyzing the Star Wars box office figures.

Module 4: Factors

R stores categorical data in factors. Learn how to create, subset and compare

categorical data.

Module 5: Data Frames

When working R, you’ll probably deal with Data Frames all the time. Therefore, you

need to know how to create one, select the most interesting parts of it, and order

them.

Module 6: Lists

Lists allow you to store components of different types. Module 6 will show you how

to deal with lists.

Module 7: Basic Graphics

Discover R’s packages to do graphics and create your own data visualizations.

Page 11: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

11

Microsoft Professional Program: Data Science

Unit 2 - Core Data Science, Course 6b: Introduction to

Python for Data Science

The ability to analyze data with Python is critical in data science. Learn the basics,

and move on to create stunning visualizations.

Python is a very powerful programming language used for many different

applications. Over time, the huge community around this open source language has

created quite a few tools to efficiently work with Python. In recent years, a number of

tools have been built specifically for data science. As a result, analyzing data with

Python has never been easier.

In this practical course, you will start from the very beginning, with basic arithmetic

and variables, and learn how to handle data structures, such as Python lists, Numpy

arrays, and Pandas DataFrames. Along the way, you’ll learn about Python functions

and control flow. Plus, you’ll look at the world of data visualizations with Python and

create your own stunning visualizations based on real data.

What you'll learn

• Explore Python language fundamentals, including basic syntax, variables,

and types

• Create and manipulate regular Python lists

• Use functions and import packages

• Build Numpy arrays, and perform interesting calculations

• Create and customize plots on real data

• Supercharge your scripts with control flow, and get to know the Pandas

DataFrame

Duration: 2 weeks

Total effort: 12 – 24 hours

Level: Introductory

Prerequisite knowledge: Some experience in working with data from Excel,

databases, or text files.

Language: English, with Q&A worskshop in Croatian language

Syllabus

Module 1: Python Basics

Take your first steps in the world of Python. Discover the different data types and

create your first variable.

Module 2: Python Lists

Get the know the first way to store many different data points under a single name.

Create, subset and manipulate Lists in all sorts of ways.

Module 3: Functions and Packages

Page 12: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

12

Microsoft Professional Program: Data Science

Learn how to get the most out of other people's efforts by importing Python

packages and calling functions.

Module 4: Numpy

Write superfast code with Numerical Python, a package to efficiently store and do

calculations with huge amounts of data.

Module 5: Matplotlib

Create different types of visualizations depending on the message you want to

convey. Learn how to build complex and customized plots based on real data.

Module 6: Control flow and Pandas

Write conditional constructs to tweak the execution of your scripts and get to know

the Pandas DataFrame: the key data structure for Data Science in Python.

Page 13: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

13

Microsoft Professional Program: Data Science

Unit 2 - Core Data Science, Course 7a: Essential Statistics

for Data Analysis using Excel

Gain a solid understanding of statistics and basic probability, using Excel, and build

on your data analysis and data science foundation.

If you’re considering a career as a data analyst, you need to know about histograms,

Pareto charts, Boxplots, Bayes’ theorem, and much more. In this applied statistics

course, the second in our Microsoft Excel Data Analyst XSeries, use the powerful

tools built into Excel, and explore the core principles of statistics and basic

probability—from both the conceptual and applied perspectives. Learn about

descriptive statistics, basic probability, random variables, sampling and confidence

intervals, and hypothesis testing. And see how to apply these concepts and

principles using the environment, functions, and visualizations of Excel.

As a data science pro, the ability to analyze data helps you to make better decisions,

and a solid foundation in statistics and basic probability helps you to better

understand your data. Using real-world concepts applicable to many industries,

including medical, business, sports, insurance, and much more, learn from leading

experts why Excel is one of the top tools for data analysis and how its built-in

features make Excel a great way to learn essential skills.

Before taking this course, you should be familiar with organizing and summarizing

data using Excel analytic tools, such as tables, pivot tables, and pivot charts. You

should also be comfortable (or willing to try) creating complex formulas and

visualizations. Want to start with the basics? Check out DAT205x: Introduction to

Data Analysis using Excel. As you learn these concepts and get more experience

with this powerful tool that can be extremely helpful in your journey as a data analyst

or data scientist, you may want to also take the third course in our series, DAT206x

Analyzing and Visualizing Data with Excel. This course includes excerpts from

Microsoft Excel 2016: Data Analysis and Business Modeling from Microsoft Press

and authored by course instructor Wayne Winston.

What you'll learn

• Descriptive statistics

• Basic probability

• Random variables

• Sampling and confidence intervals

• Hypothesis testing

Duration: 2 weeks

Total effort: 12 – 24 hours

Level: Intermediate

Prerequisite knowledge: Secondary school (high school) algebra. Ability to work

with tables, formulas, and charts in Excel. Ability to organize and summarize data

using Excel analytic tools such as tables, pivot tables, and pivot charts.

Language: English, with Q&A worskshop in Croatian language

Page 14: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

14

Microsoft Professional Program: Data Science

System Requirements

Excel 2016 is required for the full course experience. Excel 2013 will work but will

not support all the visualizations and functions.

Syllabus

Module 1: Descriptive Statistics

You will learn how to describe data using charts and basic statistical measures. Full

use will be made of the new histograms, Pareto charts, Boxplots, and Treemap and

Sunburst charts in Excel 2016.

Module 2: Basic Probability

You will learn basic probability including the law of complements, independent

events, conditional probability and Bayes Theorem.

Module 3: Random Variables

You will learn how to find the mean and variance of random variables and then learn

about the binomial, Poisson, and Normal random variables. We close with a

discussion of the beautiful and important Central Limit Theorem.

Module 4: Sampling and Confidence Intervals

You will learn the mechanics of sampling, point estimation, and interval estimation of

population parameters.

Module 5: Hypothesis Testing

You will learn null and alternative hypotheses, Type I and Type II error, One sample

tests for means and proportions, Tests for difference between means of two

populations, and the Chi Square Test for Independence.

Page 15: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

15

Microsoft Professional Program: Data Science

Unit 2 - Core Data Science, Course 7b: Essential Math for

Machine Learning: R Edition

Want to study machine learning or artificial intelligence, but worried that your math

skills may not be up to it? Do words like “algebra’ and “calculus” fill you with dread?

Has it been so long since you studied math at school that you’ve forgotten much of

what you learned in the first place?

You’re not alone. Machine learning and AI are built on mathematical principles like

Calculus, Linear Algebra, Probability, Statistics, and Optimization; and many would-

be AI practitioners find this daunting. This course is not designed to make you a

mathematician. Rather, it aims to help you learn some essential foundational

concepts and the notation used to express them. The course provides a hands-on

approach to working with data and applying the techniques you’ve learned.

This course is not a full math curriculum. It’s not designed to replace school or

college math education. Instead, it focuses on the key mathematical concepts that

you’ll encounter in studies of machine learning. It is designed to fill the gaps for

students who missed these key concepts as part of their formal education, or who

need to refresh their memories after a long break from studying math.

What you'll learn

• Familiarity with Equations, Functions, and Graphs

• Differentiation and Optimization

• Vectors and Matrices

• Statistics and Probability

Duration: 3 weeks

Total effort: 36 – 48 hours

Level: Intermediate

Prerequisite knowledge: To complete this course successfully, you should have:

• A basic knowledge of math

• Some programming experience – R is preferred.

Language: English, with Q&A worskshop in Croatian language

Syllabus

• Introduction

• Equations, Functions, and Graphs

• Differentiation and Optimization

• Vectors and Matrices

• Statistics and Probability

Page 16: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

16

Microsoft Professional Program: Data Science

Unit 2 - Core Data Science, Course 7c: Essential Math for

Machine Learning: Python Edition

Want to study machine learning or artificial intelligence, but worried that your math

skills may not be up to it? Do words like “algebra’ and “calculus” fill you with dread?

Has it been so long since you studied math at school that you’ve forgotten much of

what you learned in the first place?

You’re not alone. machine learning and AI are built on mathematical principles like

Calculus, Linear Algebra, Probability, Statistics, and Optimization; and many would-

be AI practitioners find this daunting. This course is not designed to make you a

mathematician. Rather, it aims to help you learn some essential foundational

concepts and the notation used to express them. The course provides a hands-on

approach to working with data and applying the techniques you’ve learned.

This course is not a full math curriculum; it’s not designed to replace school or

college math education. Instead, it focuses on the key mathematical concepts that

you’ll encounter in studies of machine learning. It is designed to fill the gaps for

students who missed these key concepts as part of their formal education, or who

need to refresh their memories after a long break from studying math.

What you'll learn After completing this course, you will be familiar with the following mathematical

concepts and techniques:

• Equations, Functions, and Graphs

• Differentiation and Optimization

• Vectors and Matrices

• Statistics and Probability

Duration: 3 weeks

Total effort: 36 – 48 hours

Level: Intermediate

Prerequisite knowledge: To complete this course successfully, you should have:

• A basic knowledge of math

• Some programming experience – Python is preferred.

Language: English, with Q&A worskshop in Croatian language

Syllabus

• Introduction

• Equations, Functions, and Graphs

• Differentiation and Optimization

• Vectors and Matrices

• Statistics and Probability

Page 17: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

17

Microsoft Professional Program: Data Science

Unit 3 - Applied Data Science, Course 8a: Data Science

Research Methods: Python Edition

Get hands-on experience with the science and research aspects of data science

work, from setting up a proper data study to making valid claims and inferences from

data experiments. Data scientists are often trained in the analysis of data. However, the goal of data

science is to produce a good understanding of some problem or idea and build

useful models on this understanding. Because of the principle of “garbage in,

garbage out,” it is vital that a data scientist know how to evaluate the quality of

information that comes into a data analysis. This is especially the case when data

are collected specifically for some analysis (e.g., a survey).

In this course, you will learn the fundamentals of the research process—from

developing a good question to designing good data collection strategies to putting

results in context. Although a data scientist may often play a key part in data

analysis, the entire research process must work cohesively for valid insights to be

gleaned.

Developed as a powerful and flexible language used in everything from Data

Science to cutting-edge and scalable Artificial Intelligence solutions, Python has

become an essential tool for doing Data Science and Machine Learning. With this

edition of Data Science Research Methods, all of the labs are done with Python,

while the videos are language-agnostic. If you prefer your Data Science to be done

with R, please see Data Science Research Methods: R Edition.

What you'll learn

After completing this course, you will be familiar with the following concepts and

techniques:

• Data analysis and inference

• Data science research design

• Experimental data analysis and modeling

Duration: 2 weeks

Total effort: 12 – 18 hours

Level: Intermediate

Prerequisite knowledge:

• A basic knowledge of math

• Some programming experience – Python is preferred.

Language: English, with Q&A worskshop in Croatian language

Syllabus

• The Research Process

• Planning for Analysis

• Research Claims

Page 18: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

18

Microsoft Professional Program: Data Science

• Measurement

• Correlational and Experimental Design

Page 19: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

19

Microsoft Professional Program: Data Science

Unit 3 - Applied Data Science, Course 8b: Data Science

Research Methods: R Edition

Get hands-on experience with the science and research aspects of data science

work, from setting up a proper data study to making valid claims and inferences from

data experiments.

Data scientists are often trained in the analysis of data. However, the goal of data

science is to produce good understanding of some problem or idea and build useful

models on this understanding. Because of the principle of “garbage in, garbage out,”

it is vital that the data scientist know how to evaluate the quality of information that

comes into a data analysis. This is especially the case when data are collected

specifically for some analysis (e.g., a survey).

In this course, you will learn the fundamentals of the research process—from

developing a good question to designing good data collection strategies to putting

results in context. Although the data scientist may often play a key part in data

analysis, the entire research process must work cohesively for valid insights to be

gleaned.

Developed as a language with statistical analysis and modeling in mind, R has

become an essential tool for doing real-world Data Science. With this edition of Data

Science Research Methods, all of the labs are done with R, while the videos are

tool-agnostic. If you prefer your Data Science to be done with Python, please see

Data Science Research Methods: Python Edition.

What you'll learn

After completing this course, you will be familiar with the following concepts and

techniques:

• Data analysis and inference

• Data science research design

• Experimental data analysis and modeling

Duration: 2 weeks

Total effort: 12 – 18 hours

Level: Intermediate

Prerequisite knowledge:

• A basic knowledge of math

• Some programming experience – R is preferred.

Language: English, with Q&A worskshop in Croatian language

Syllabus

• The Research Process

• Planning for Analysis

• Research Claims

• Measurement

• Correlational and Experimental Design

Page 20: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

20

Microsoft Professional Program: Data Science

Unit 3 - Applied Data Science, Course 9a: Principles of

Machine Learning: R Edition

Get hands-on experience building and deriving insights from machine learning models using R and Azure Notebooks. Machine learning uses computers to run predictive models that learn from existing data in order to forecast future behaviors, outcomes, and trends. In this data science course, you will be given clear explanations of machine learning theory combined with practical scenarios and hands-on experience building, validating, and deploying machine learning models. You will learn how to build and derive insights from these models using R, and Azure Notebooks. What you'll learn

After completing this course, you will be familiar with the following concepts and

techniques:

• Data exploration, preparation and cleaning

• Supervised machine learning techniques

• Unsupervised machine learning techniques

• Model performance improvement

Duration: 4 weeks

Total effort: 36 – 48 hours

Level: Intermediate

Prerequisite knowledge:

• A basic knowledge of math

• Some programming experience – R is preferred.

Language: English, with Q&A worskshop in Croatian language

Syllabus

• Introduction to Machine Learning

• Exploring Data

• Data Preparation and Cleaning

• Getting Started with Supervised Learning

• Improving Model Performance

• Machine Learning Algorithms

• Unsupervised Learning

Page 21: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

21

Microsoft Professional Program: Data Science

Unit 3 - Applied Data Science, Course 9b: Principles of

Machine Learning: Python Edition

Get hands-on experience building and deriving insights from machine learning models using Python and Azure Notebooks. Machine learning uses computers to run predictive models that learn from existing data in order to forecast future behaviors, outcomes, and trends. In this data science course, you will be given clear explanations of machine learning theory combined with practical scenarios and hands-on experience building, validating, and deploying machine learning models. You will learn how to build and derive insights from these models using Python, and Azure Notebooks.

What you'll learn

After completing this course, you will be familiar with the following concepts and

techniques:

• Data exploration, preparation and cleaning

• Supervised machine learning techniques

• Unsupervised machine learning techniques

• Model performance improvement

Duration: 4 weeks

Total effort: 36 – 48 hours

Level: Intermediate

Prerequisite knowledge:

• A basic knowledge of math

• Some programming experience – Python is preferred.

Language: English, with Q&A worskshop in Croatian language

Syllabus

• Introduction to Machine Learning

• Exploring Data

• Data Preparation and Cleaning

• Getting Started with Supervised Learning

• Improving Model Performance

• Machine Learning Algorithms

• Unsupervised Learning

Page 22: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

22

Microsoft Professional Program: Data Science

Unit 3 - Applied Data Science, Course 10a: Developing

Big Data Solutions with Azure Machine Learning

The past can often be the key to predicting the future. Big data from historical

sources is a valuable resource for identifying trends and building machine learning

models that apply statistical patterns and predict future outcomes.

This course introduces Azure Machine Learning, and explores techniques and

considerations for using it to build models from big data sources, and to integrate

predictive insights into big data processing workflows.

What you'll learn

• How to create predictive web services with Azure Machine Learning

• How to work with big data sources in Azure Machine Learning

• How to integrate Azure Machine Learning into big data batch processing

pipelines

• How to integrate Azure Machine Learning into real-time big data

processing solutions

Duration: 2 weeks

Total effort: 12 – 16 hours

Level: Intermediate

Prerequisite knowledge:

• Building data processing pipelines with Azure Data Factory

• Building real-time data processing solutions with Azure Stream

Analytics

Language: English, with Q&A worskshop in Croatian language

Syllabus

• Module 1: Introduction to Azure Machine Learning

• Module 2: Building Predictive Models with Azure Machine Learning

• Module 3: Operationalizing Machine Learning Models

• Module 4: Using Azure Machine Learning in Big Data Solutions

Page 23: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

23

Microsoft Professional Program: Data Science

Unit 3 - Applied Data Science, Course 10b:

Implementing Predictive Solutions with Spark in Azure

HDInsight

Learn how to use Spark in Microsoft Azure HDInsight to create predictive analytics

and machine learning solutions.

Are you ready for big data science? In this course, learn how to implement predictive

analytics solutions for big data using Apache Spark in Microsoft Azure HDInsight.

See how to work with Scala or Python to cleanse and transform data and build

machine learning models with Spark ML (the machine learning library in Spark).

What you'll learn

• Using Spark to explore data and prepare for modeling

• Build supervised machine learning models

• Evaluate and optimize models

• Build recommenders and unsupervised machine learning models

Duration: 3 weeks

Total effort: 18 – 24 hours

Level: Intermediate

Prerequisite knowledge: Familiarity with Azure HDInsight. Familiarity with

databases and SQL. Some programming experience. A willingness to learn actively

in a self-paced manner.

Language: English, with Q&A worskshop in Croatian language

System Requirements

To complete the hands-on elements in this course, you will require an Azure

subscription and a Windows client computer. You can sign up for a free Azure trial

subscription (a valid credit card is required for verification, but you will not be

charged for Azure services). Note that the free trial is not available in all regions.

Syllabus

Module 1: Introduction to Data Science with Spark

Get started with Spark clusters in Azure HDInsight, and use Spark to run Python or

Scala code to work with data.

Module 2: Getting Started with Machine Learning

Learn how to build classification and regression models using the Spark ML library.

Module 3: Evaluating Machine Learning Models

Learn how to evaluate supervised learning models, and how to optimize model

parameters.

Module 4: Recommenders and Unsupervised Models

Learn how to build recommenders and clustering models using Spark ML.

Page 24: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

24

Microsoft Professional Program: Data Science

Unit 3 - Applied Data Science, Course 10c: Analyzing

Big Data with Microsoft R

Learn how to use Microsoft R Server to analyze large datasets using R, one of the

most powerful programming languages.

The open-source programming language R has for a long time been popular

(particularly in academia) for data processing and statistical analysis. Among R's

strengths are that it's a succinct programming language and has an extensive

repository of third party libraries for performing all kinds of analyses. Together, these

two features make it possible for a data scientist to very quickly go from raw data to

summaries, charts, and even full-blown reports. However, one deficiency with R is

that traditionally it uses a lot of memory, both because it needs to load a copy of the

data in its entirety as a data.frame object, and also because processing the data

often involves making further copies (sometimes referred to as copy-on-modify).

This is one of the reasons R has been more reluctantly received by industry

compared to academia.

The main component of Microsoft R Server (MRS) is the RevoScaleR package,

which is an R library that offers a set of functionalities for processing large datasets

without having to load them all at once in the memory. RevoScaleR offers a rich set

of distributed statistical and machine learning algorithms, which get added to over

time. Finally, RevoScaleR also offers a mechanism by which we can take code that

we developed on our laptop and deploy it on a remote server such as SQL Server or

Spark (where the infrastructure is very different under the hood), with minimal effort.

In this course, we will show you how to use MRS to run an analysis on a large

dataset and provide some examples of how to deploy it on a Spark cluster or a SQL

Server database. Upon completion, you will know how to use R for big-data

problems.

Since RevoScaleR is an R package, we assume that the course participants are

familiar with R. A solid understanding of R data structures (vectors, matrices, lists,

data frames, environments) is required. Familiarity with 3rd party packages such as

dplyr is also helpful.

What you'll learn

• You will learn how to use MRS to read, process, and analyze large

datasets including:

• Read data from flat files into R’s data frame object, investigate the

structure of the dataset and make corrections, and store prepared

datasets for later use

• Prepare and transform the data

Page 25: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

25

Microsoft Professional Program: Data Science

• Calculate essential summary statistics, do crosstabulation, write your own

summary functions, and visualize data with the ggplot2 package

• Build predictive models, evaluate and compare models, and generate

predictions on new data

Duration: 2 weeks

Total effort: 8 – 16 hours

Level: Intermediate

Prerequisite knowledge:

• Familiarity with R

Language: English, with Q&A worskshop in Croatian language

Page 26: Unit 1 Fundamentals, Course 1: Introduction to Data …data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet

26

Microsoft Professional Program: Data Science

Unit 4 - Capstone Project: Data Science

Solve a real-world data science problem in this capstone project for the Microsoft

Professional Program in Data Science.

Showcase the knowledge and skills you’ve acquired during the Microsoft

Professional Program for Data Science, and solve a real-world data science problem

in this program capstone project. The project takes the form of a challenge in which

you will explore a dataset and develop a machine learning solution that is tested and

scored to determine your grade.

Duration: 4 weeks

Total effort: 12 – 16 hours

Level: Advanced

Language: English, with Q&A worskshop in Croatian language