ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

24
ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Transcript of ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Page 1: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

ACIS 1504 - Introduction to Data Analytics & Business Intelligence

Text MiningData Cleaning

Page 2: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Concept MapText Mining

Implementation

Mixed Cell References

Design: Accuracy

Random

Search, Left, Right, Mid,

Len, &

Paste Values

Page 3: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Objectives

• Define Text Mining

• Demonstrate Excel features that support text mining.

Page 4: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Segment A:Text Mining

Page 5: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Text Analytics / Text Mining

• Software that searches vast amounts of textual data (unstructured) identifying patterns.

Page 6: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Nestle• Nestle processes Social Media

http://uk.reuters.com/article/video/idUKBRE89P07Q20121026?videoId=238680321

Page 7: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Segment B:Text Functions

Page 8: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Text Mining

• Search

• Parse

• Concatenate

• SEARCH

• LEFT, MID, RIGHT, LEN

• &

Page 9: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Name Example

Open Grades Textfile.xlsx.

Divide Last Name, First Name into two separate columns.

1. Locate the comma (SEARCH)2. Extract all characters to left of comma (LEFT)3. Locate end of full name (LEN)4. Extract almost all characters between comma

and end of name (RIGHT)

Page 10: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

SEARCH Function

Page 11: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

LEFT Function

Page 12: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

LEN or Length Function

Page 13: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

RIGHT Function

Page 14: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

MID FunctionExtract the first initial of first name.

Page 15: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Concatenate• Combine First Name, space and Last

Name.

• & is the concatenate symbol

• Quotes are required around constant strings of text

Page 16: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Student ID Example

Extract each student’s PID from their email address.

Create a new student identifier by combining the first three letters of the last name with the last four digits of the student ID number.

Page 17: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Segment C:Data Cleaning & Generation

Page 18: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Data Cleaning• Delete Unnecessary Columns & Rows• Resize Columns• Format Numeric Values• Separate Distinct Values • Shorten Lengthy Values• Data Validation for Future Entries• Generate Values

Page 19: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Favorite Pie Example

Page 20: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Favorite Pie Example

1. Ensure pie flavor data is consistent.

2. Replace confidential clicker ID # with randomly generated 6 digit number.

3. Ensure new ID number is static and unique.

Page 21: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Favorite Pie Example

Original Sorted Consistent

Page 22: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Random Number Functions

• =RAND()

• =RANDBETWEEN(low#, high#)

Page 23: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Paste Special - Values

MAC: Edit Menu, Paste Special

Page 24: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Exam Feedback Example

Open Exam Feedback.xlsx