BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

28
BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America

Transcript of BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Page 1: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

BIA 660 Web Analytics - Midterm

Akshta ChouguleHao HanDi HuoXi LuLaura Sills

Bank Of America

Page 2: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Business Problem

Customer Strategy: grow base by forming life-long banking relationships with young adults

Current Account Demographics Report Shows● fewer new student accounts● increase in cancellation of accounts by

the young adult demographic

Impact: Losing market share to other banks

Page 3: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Business Questions

● What is Bank of America’s reputation with this age group - do they like Bank of America or not?

● How does Bank of America compare to other banks?

● Are customers in this demographic group unhappy with the bank’s services?

● Are there any banking products which customers in this group want not offered by Bank of America?

Page 4: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Source of Information

Online social media sites are a good source for comments from this age group

Page 5: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

YouTube Statistics

●More than 1 billion unique users monthly

● Nielsen ratings show that YouTube reaches more US adults ages 18-34 than any other cable network

http://www.youtube.com/yt/press/statistics.html

Page 6: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Demographics of Reddit

http://www.theatlantic.com/technology/archive/2013/07/reddit-demographics-in-one-chart/277513/

Page 7: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

What do People Think About Banks?

Page 8: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Topic Reddit YouTube Twitter

mortgage 5% 6% 30%

loan 5% 13% 0%

fraud 6% 7% 0%

insurance 1% 2% 0%

branch 3% 1% 0%

hours 2% 1% 0%

account 19% 16% 20%

overdraft 8% 1% 0%

bailout 1% 6% 0%

fee 18% 11% 20%

customer 13% 8% 0%

representative / teller 7% 18% 20%

[credit] union 10% 7% 10%

computer 1% 1% 0%

CEO 2% 2% 0%

Page 9: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Data Gathering and Validation

Use Python to obtain comments from web

● Crawling Reddit

● API for Twitter

● API for YouTube

Page 10: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Data Cleansing and Exploration

● Delete incomplete comments, extra whitespace, and punctuation, stopwords

● Explore data using Python to analyze the frequency of words in the comments in order to identify “key words” related to banking

● Word scan confirmed the key words

Page 11: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Gathering data from Twitter ● Technique: twitter API● Amount of tweets:

BOA -- 125KB

Citibank-- 104 KB

Chase -- 100 KB

● Timestamp: 1 week ● Type of Data:

Tweet text

Tweet created_at

Geocode

Page 12: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Data Processing

● Two libraries: positive & negative

● Score each tweet

Page 13: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Tweets by Location

Page 14: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Data Processing

● Summary for BOA tweets:

● Good or bad?

Min. 1st Qu. Median Mean 3rd Qu. Max.

-0.20000 -0.04348 0.00000 -0.01176 0.02857 0.20000

Page 15: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Competitor Analysis

Page 16: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Distribution for tweets’ score

Mean:

BOA: -0.01176

Citi bank: -0.0006146

Chase: -0.00731

Page 17: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Two Sample T-test

Null hypothesis: true difference in means is equal to 0Alpha=0.1

● BOA and Citi bank:

p-value = 0.0009004 < 0.1● Citi bank and Chase:

p-value = 0.06971 < 0.1● BOA and Chase

p-value = 0.2289 > 0.1

Page 18: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Gathering data from YouTube

● Techniques: BeautifulSoup

g.data

● Amount for general analysis: 3097

Page 19: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Topic Reddit YouTube Twitter

mortgage 5% 6% 30%

loan 5% 13% 0%

fraud 6% 7% 0%

insurance 1% 2% 0%

branch 3% 1% 0%

hours 2% 1% 0%

account 19% 16% 20%

overdraft 8% 1% 0%

bailout 1% 6% 0%

fee 18% 11% 20%

customer 13% 8% 0%

representative / teller 7% 18% 20%

[credit] union 10% 7% 10%

computer 1% 1% 0%

CEO 2% 2% 0%

Page 20: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

YouTube data for each category

● Training data: 600

● Loan: 2430

● Account: 2700

● Service: 520

Page 21: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Naive Bayes Classification Algorithm

A naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable。

Page 22: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Naive Bayes Classification Algorithm

Splitting the dataset into training and test data

(Manual rating of comments)

● Training (400)

● Testing (200)

● Predicting (5700)

Page 23: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Primary Categories of Customer Complaints

Page 24: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Accuracy of Classification

● Mortgage: 64.5%

● Accounts: 58.7%

● Service: 68.4%

Page 25: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Mortgage

Page 26: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Account

Page 27: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Service

Page 28: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Thank you!