Beyond Comments: How to Build an Awesome API Doc and Be a Better Person
Want Awesome Models? Build Awesome Training Data!
-
Upload
rapidminer -
Category
Technology
-
view
69 -
download
0
Transcript of Want Awesome Models? Build Awesome Training Data!
©2016 RapidMiner, Inc. All rights reserved. - 1 -
TM
Expertly Prepared Data Produces Better Models, Faster!
Tom OttMarketing Data Scientist
RapidMiner
©2016 RapidMiner, Inc. All rights reserved. - 2 -
TM
- 2 -
TMToday’s Agenda • Introduction• Challenges of Dirty Data• Data Prep Overview
– Data Exploration– Data Blending– Data Cleansing
• Demo • Q&A
©2016 RapidMiner, Inc. All rights reserved. - 3 -
TM
- 3 -
TMUnified Platform Accelerates Time to Value
Data PrepSpeed & optimize ALL
dataexploration, blending
& cleansing tasks
Operationalize
Easily deploy & maintain models and
embed analytic results
Model & Validate
Apply machine learning to rapidly prototype & confidently validate predictive models
Embed results in all types of
business apps & data
visualization tools
Incorporate all types of
data
ACCELERATES TIME TO VALUE
©2016 RapidMiner, Inc. All rights reserved. - 4 -
TM
- 4 -
TMData in the Real World is…Dirty• Incomplete: lacking attribute values, lacking
certain attributes of interest, or containing only aggregate data – e.g., occupation=“”
• Noisy: containing errors or outliers – Salary=“-10”, Age=“222”
• Inconsistent: containing discrepancies in codes or names – e.g., Age=“42” Birthday=“03/07/1997” – e.g., Was rating “1,2,3”, now rating “A, B, C” – e.g., discrepancy between duplicate records
©2016 RapidMiner, Inc. All rights reserved. - 5 -
TM
- 5 -
TMTime Consuming • Every real world dataset needs some kind of
data pre-processing – Deal with missing values– Correct erroneous values – Select relevant attributes – Adapt data set format to the model type
• In general, data prep or pre-processing consumes greater than 60% of a data science project effort
©2016 RapidMiner, Inc. All rights reserved. - 6 -
TM
- 6 -
TMReduces Model Accuracy & Performance
©2016 RapidMiner, Inc. All rights reserved. - 7 -
TM
- 7 -
TMIt’s Time to Wrangle Some Data!• Data Exploration
– Discovery through Stats, Charts and Graphs
• Data Blending– Attribute Selection & Generation– Data Types & Conversions– Filters, Sorts & Joins– Sampling
• Data Cleansing – Missing Values– Transformation - Normalization– Outliers– Feature Selection
©2016 RapidMiner, Inc. All rights reserved. - 8 -
TM
- 8 -
TM
Demonstration
©2016 RapidMiner, Inc. All rights reserved. - 9 -
TM
- 9 -
TMNext Steps• Resources
– RapidMiner Blog: rapidminer.com/resources/blog/– RapidMiner Community: community.rapidminer.com
• On-Demand Demos– Advanced Data Prep: rapidminer.com/resource/advanced-data-prep/– Data Prep Subprocess:
rapidminer.com/resource/creating-data-prep-Subprocess
• Training Videos– Data Exploration: rapidminer.com/training/videos/– Data Prep: rapidminer.com/training/videos/
©2016 RapidMiner, Inc. All rights reserved. - 10 -
TM
- 10 -
TM
Contact Us [email protected] @RapidMinerwww.rapidminer.com
Q & A
Discuss Data Prep in the Community
community.rapidminer.com