Introduction to Data Science and Data...

14
Rapidminer Juan Camilo Estevez Cárdenas July 5th to 29th of 2016

Transcript of Introduction to Data Science and Data...

Page 1: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Rapidminer

Juan Camilo Estevez Cárdenas

July 5th to 29th of 2016

Page 2: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Juan Camilo

Estevez

Cárdenas

Ingeniero de Sistemas Universidad Nacional de Colombia

2013

Maestría en Ingeniería Industrial Universidad Nacional de Colombia

2015

Beca Asistente Docente Programación de computadores

Universidad Nacional de Colombia

2013 – 2014

Universidad de Buenos Aires UBA

Gerencia de proyectos informáticos

Sistemas Inteligentes

2015 – I

Project Manager Professional (PMP)

Project Management Institute

Page 3: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Organizational analytical evolution

Page 4: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Advanced analytics

Page 5: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Business Intelligence Architecture

(Rapidminer,2015)

(Chaudhuri,2011)

Page 6: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Rapidminer

OPEN SOURCE DATA SCIENCE PLATFORM

Prep data, create models, validate, operationalize and embed in business processes.

https://rapidminer.com/

http://www.kdnuggets.com

/

Data scientist tool free

of code

Page 7: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Characteristics

Connection to different data sources

- Excel, CSV, data bases, text files,

dropbox, amazon, twitter, salesforce.

Preprocessing or data preparation (format

and cleaning)

- Creation attributes, - Format and cleaning attributes, - Table operations, replaces, - Filters- Type conversions- Missing values treatment- Normalization- Oultiers treatment.

Page 8: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Characteristics

Modeling (Data mining)

- Predictive

- Segmentation (Clustering)

- Classification

- Association

- Correlation

Models Validation

- Cross validation, split validation...

Page 9: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Characteristics

Extensions

- Series

- R

- Python

- Text processing

- Weka

- Reporting

Page 10: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Learning rapidminer

Documentation

- Web page: http://docs.rapidminer.com/

- Stand alone installation:

Examples

- Welcome window

- Click on operator and review help menú

- Repository Samples

Page 11: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Rapidminer Academia

https://rapidminer.com/academia/studen-ts/

Page 12: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Rapidminer example Beauty Data

- Load data from BeautyData.csv

- Exploratory data analysis.

- Example of Decision tree with rapidminer

Page 13: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,

Bibliography

Chaudhuri, S., Dayal, U., & Narasayya, V. (2011). An overview of

business intelligence technology. Communications of the ACM,

54(8), 88. doi:10.1145/1978542.1978562

Laudon, K. C., & Laudon, J. P. (2012). Management Information

Systems (12th ED). Prentice Hall.

http://businessanalytics.com.mx/2014/08/27/diferencias-entre-

business-analytics-y-business-intelligence/

Gartner.Magic Cuadrant Survey, 2012.

Rapidminer. 2015. An introduction to advanced analytics

Page 14: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,