Data Analytics: Using an Interdisciplinary Approach to ...
Transcript of Data Analytics: Using an Interdisciplinary Approach to ...
![Page 1: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/1.jpg)
CS-22-Data Analytics: Using an Interdisciplinary Approach to Teach STEM and non-STEM Students
Grambling State UniversityConnie Walton, Corisma Akins, Yenumula Reddy
December 2019 Annual SACSCOC Meeting
Date/Time: 12/8/2019: Sunday: 1:30PM - 2:30PM
Location: 351 F, Level 3, GRB
![Page 2: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/2.jpg)
Grambling State University
Founded in 1901
Located in north Louisiana
Enrollment ~5200 students
Offer degrees at bachelor, master, doctoral levels
Center of Academic Excellence in Mathematical Achievement for Science & Technology
![Page 3: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/3.jpg)
Academic Divisions
• College of Education and Graduate Studies
• College of Business
• College of Professional Studies
• College of Arts & Sciences
Accreditations/Certifications
AACSB, ABET-CS, ABET-TAC, ACEN, ACS-Committee on Professional Training, CAEP, COAPRT, CWSE, NASM, NAST, NASPAA
![Page 4: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/4.jpg)
NSF HBCU-UP FUNDED PROJECT
Expand Data Science/Data Analytics Training of undergraduate STEM and non-STEM Students
![Page 5: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/5.jpg)
Data Analytics
(source of info-https://searchdatamanagement.techtarget.com/definition/data-analytics)
“Data analytics (DA) is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software. “
Used to make business decisions and used by researchers to prove or disprove theories.
“Data analytics applications involve more than just analyzing data. Particularly on advanced analytics projects, much of the required work takes place upfront, in collecting, integrating and preparing data and then developing, testing and revising analytical models to ensure that they produce accurate results. In addition to data scientists and other data analysts, analytics teams often include data engineers, whose job is to help get data sets ready for analysis.”
Data from different source systems may need to be combined via data integration routines, transformed into a common format and loaded into an analytics system, such as a Hadoop cluster, NoSQL database or data warehouse.
![Page 6: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/6.jpg)
https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Analytics/Our%20Insights/The%20age%20of%20analytics%20Competing%20in%20a%20data%20driven%20world/MGI-The-Age-of-Analytics-Full-report.ashx
![Page 7: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/7.jpg)
National need
2011 McKinsey & Company Report
https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/MGI_big_data_exec_summary.ashx
The United States faces a shortage of between 140,000-190,000 workers with deep analytic skills.
An additional 1.5 million managers and analysts who understand big data science enough to ask the correct questions and use the results effectively to solve problems are also needed.
“ just three exabytes of data existed in 1986—but by 2011, that figure was up to more than 300 exabytes. The trend has not only continued but has accelerated since then. One analysis estimates that the United States alone has more than two zettabytes (2,000 exabytes) of data, and that volume is projected to double every three years.”
![Page 8: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/8.jpg)
Strategies to Expand Data Analytics Skills of GSU Undergraduate Students
Certificate Program in Data Analytics
Infuse Topics into Existing Courses
Undergraduate Research Projects
Professional Development for Faculty
![Page 9: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/9.jpg)
Certificate Courses
INTRO TO DATA ANALYTICS
DATA ANALYTICS STATISTICS
![Page 10: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/10.jpg)
Intro to Big Data Course
3 credit hour Computer Science course
100 level course
Topics that include characteristics of big data, sources of big data, big data platforms, text analysis/streams, and introduction to the R language
![Page 11: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/11.jpg)
Data Analytics Course
Sophomore level course
Learning outcomes include -demonstrating a fundamental understanding of Hadoop Distributed File Systems, understanding how to test and debug MapReduce applications, and using RHadoop to analyze big data.
Mini-projects infused
![Page 12: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/12.jpg)
Intro to Big Data Course-CS 112
First semester offered students felt it was a programming course
Course-Introduce Hadoop Apache and R language.
![Page 13: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/13.jpg)
Big Data Science Campmiddle & high school students
• One Week Summer Camp
• Daily Themes (Healthcare, Sports, Social Media, Natural Disaster, Music, etc.)
• Mini Projects
• Guest Speakers from Different Professions
• Daily Presentations
![Page 14: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/14.jpg)
Sample Social Media Project
![Page 15: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/15.jpg)
Sample Social Media Project
LeBron James was traded during the camp
• Observed data changing in real time
• Experienced how socialmedia data could be used toanalyze various aspects ofthe sports industry
Stephen Curry
LeBron James
![Page 16: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/16.jpg)
Sample Project
Students completed projects using ArcMap and ArcGIS
![Page 17: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/17.jpg)
Lessons Learned from Camp
Use activities that are of interest to students
Health Care
Music SportsNatural
disastersPolitics
Use activities that show diverse uses of data
![Page 18: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/18.jpg)
Revamped Intro to Big Data Course
Team Taught
Less coding
Solicited mini projects from campus community and alumni
![Page 19: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/19.jpg)
Students Introduced to Data Analysis through Varied Projects
Find common and distinctive words in
song lyrics and books
Discover trends in university class
registration data
Compare nutritional information from different cereal
brands
Correlate bike accidents by
weather, conditions, and driver sex
Track the shift in literary genres by
distribution of texts
Time how long politicians take to delete typos on
Measure the emotions expressed
by social media users
Determine a flower’s species using machine
learning
![Page 20: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/20.jpg)
Faculty shared the data processes in their research
![Page 21: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/21.jpg)
Faculty created example reports to show some data workflows
![Page 22: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/22.jpg)
Students enrolled in Intro to Big Data presented at
• Cancer and Cyberbullying: Monitoring and analyzing Data from Social Media
• Predictive Modelling of Gender Classification with Caret
• 12th Annual Undergraduate Research Conference hosted at University of Louisiana at Lafayette (November 2019)
![Page 23: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/23.jpg)
Certificate Program
Need a certificate program that can address needs of both STEM & non-STEM majors
Need to have core set of required foundational courses that will be taken by both STEM and non-STEM majors
Have a set of required courses for STEM majors…… then have a set of required courses for non-STEM majors (courses at 300 & 400 levels)
Require completion of 18 credit hours, half at 300 & 400 levels
![Page 24: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/24.jpg)
Probability and Statistics I Course
Data Analytics
Basic Probability and Statistical Distributions
Data Manipulation
Data Visualization and Statistical Graphics
Statistical Inference
Techniques for Supervised Learning
Techniques for Unsupervised Learning
![Page 25: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/25.jpg)
Statistics I Course
The focus is to prepare students on how to use data to obtain information.
Extensive examples using actual data are provided, illustrating diverse informatics sources in socioeconomics, marketing, advertising and finance, among many others.
In many cases, computer code using Python is employed to analyze the data.
![Page 26: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/26.jpg)
Getting Insights from Data (1)
Descriptive Statistics
• Scale Types
• Descriptive Univariate Analysis
• Descriptive Bivariate Analysis
Descriptive Multivariate Analysis
• Multivariate Frequencies
• Multivariate Data Visualization
• Multivariate Statistics
![Page 27: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/27.jpg)
Getting Insights from Data (2)
Data Quality and Preprocessing
• Data Quality
• Converting to a Different Scale Type
• Data Transformation
• Dimensionality Reduction
Clustering
• Distance Measure
• Clustering Validation
• Clustering Techniques
![Page 28: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/28.jpg)
A Project on Data Analytics- Statistics I Course
Understanding the problem to be solvedUnderstanding
Defining the objectives of the projectsDefining
Looking for the necessary dataLooking
Preparing these data so that they can be usedPreparing
Identifying suitable methods and choosing between themIdentifying
Analyzing and evaluating the resultsAnalyzing and
evaluating
Redoing the pre-processing tasks and repeating the experimentsRedoing
![Page 29: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/29.jpg)
Data Analysis Application Examples
Data Munging
Cleaning Data
Filtering
Merging Data
Reshaping Data
Data Aggregation
Grouping Data
![Page 30: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/30.jpg)
Infusion of Data Analytics into
Existing Courses
![Page 31: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/31.jpg)
Infusion of Big Data in Existing Courses
BIOL 409: Biological Research
CHEM 226: Organic Chemistry Lab
CS 435: Big Data and Cloud Computing
Select Business Courses
![Page 32: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/32.jpg)
Big Data in BIOL 409: Biological Research
• Fall 2018: 6 students
• Spring 2019: 10 students
• Offered only as a Spring course starting 2020
Enrollment
• Training in use of big data analytics in biological research applications, culminating in group project
• In class lectures
• Online bioinformatics modules via Pine Biotech (New Orleans, LA)
• Bioinformatics analyses via T-BioInfo platform
Description
• Understand research methodologies and experimental design
• Apply descriptive and inferential statistical methods to datasets
• Analyze Next Generation Sequencing (NGS) datasets using GENOMIC/TRANSCRIPTOMIC approaches
Objectives
![Page 33: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/33.jpg)
BIOL 409 Data Analytics Content
Statistics
• Descriptive: mean, median, mode, range, standard deviation, frequency table, frequency histogram, bivariate scatterplot
• Inferential: Pearson's correlation coefficient, chi-square test, Student's T-test, factor regression, null and alternate hypothesis testing
Transcriptomics
• Map RNA sequencing reads to reference genome using TopHat > Cufflinks > Cuffmerge > Bowtie2-t
• Convert to gene expression levels using RsemExptable
• Find differential gene expression using DESeq2
• Visualize and compress data using Principal Component Analysis
Genomics
• Map genomic sequencing reads to refernce genome using Bowtie2
• Call variants using Strelka
• Visualize Single Nucleotide Polymorphisms using UCSC Genome Browser
![Page 34: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/34.jpg)
BIOL 409 Modules to Pipelines to Data
![Page 35: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/35.jpg)
BIOL 409 Project
• Bioinformatics project relevant to students in Environmental Science concentration
• Genes involved in drought resistance
• Discuss journal article
• Use bioinformatics tools to explore published results
• Future: carry out novel analysis
![Page 36: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/36.jpg)
Big Data in CHEM 226:
Organic Chemistry Lab
Molecular Modeling Experiment:
• NIH database of small molecules
• Dock molecules in a protein binding site
• Molecules get scored based on various properties such as intermolecular vs. intramolecular bonds
• Put together a drug molecule for the disease state
![Page 37: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/37.jpg)
Seminars on Big Data
Health Disparities
Data Analytics Professionals
Corporate Executives
![Page 38: Data Analytics: Using an Interdisciplinary Approach to ...](https://reader031.fdocuments.net/reader031/viewer/2022020700/61f4b0f7e8a2b704ef4fcfb1/html5/thumbnails/38.jpg)
Contact Information
Dr. Connie Walton
Mrs. Corisma Akins
Dr. Yenumula Reddy