Inspire 2015 - Alteryx: Data Blending: Best Practices

20
#inspire 15 Data Blending: Best Practices Tuesday, May 19, 2015 Ben Gomez, Senior Product Manager, Alteryx Dr. Poornima Farrar, Product Manager, Alteryx

Transcript of Inspire 2015 - Alteryx: Data Blending: Best Practices

Page 1: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Data Blending: Best Practices

Tuesday, May 19, 2015

Ben Gomez, Senior Product Manager, AlteryxDr. Poornima Farrar, Product Manager, Alteryx

Page 2: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Agenda

• Develop Workflows Effectively• Evaluate the Data• Sample the Data

• Develop Clear Workflows• Rename Fields• Simplify the Process

• Develop Efficient Workflows• Sort Data Sparingly• Organize Data Sources• Process Near the Data

Page 3: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Effective Workflow Development

Page 4: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Effective Workflows

Evaluate the Data

• Data problems can slow down your workflow development or give you invalid results• Duplicate records• Missing values• Unexpected characters• Invalid values or ranges

Page 5: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Demo – Field Summary

Page 6: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Effective Workflows

Sample the Data

Sample limits the data stream to a number, percentage or random set of records.

Random % Sample generates a random number or percentage of records passing through the data stream.

Oversample Field samples incoming data to ensure equal representation of data values.

Page 7: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Clear Workflows

Page 8: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Clear Workflows

Rename fields

Page 9: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Clear Workflows

Simplify the Process

How would one parse an email address? [email protected]([^@]*)(@)([^\.]*)(.*)

Page 10: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Demo - Parsing

Page 11: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Demo - Data Macros

Page 12: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Efficient Workflows

Page 13: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Efficient Workflows

Sort Data Sparingly

• Sorting is an expensive operation.• Sorting is necessary for several operations.

• When sorting, the more data in each record, the longer the sort will take

• Alteryx holds onto a sort if possible.• Formula resets the sort.• Sorting by a new field resets the sort.

Page 14: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Demo - Sorting

Page 15: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Effecient Workflows

Gathering Data Sources

http://www.alteryx.com/technical-specifications#data-sources

Page 16: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Efficient Workflows

Configuring Data Sources Format Selection

Bulk Load

Page 17: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Efficient Workflows

Configuring Data Sources

Page 18: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Efficient Workflows

Processing Near the Data

• Private/Public Server• Amazon Redshift and S3• Marketo and Salesforce

Page 19: Inspire 2015 - Alteryx: Data Blending: Best Practices

#inspire15

Summary

• Evaluate and clean your data (Field Summary Tool)• Simplify your process when possible• Rename your fields• Control your sorts • Set data aside and rejoin it later• Best: Add a Record ID field early that can be used to rejoin records

later• More Advanced: Keep track of records and join by record position

• Create Input Macros• Keep your processing close to your data sources

Page 20: Inspire 2015 - Alteryx: Data Blending: Best Practices

THANK YOU!

#inspire15