BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.
-
Upload
virginia-bement -
Category
Documents
-
view
217 -
download
0
Transcript of BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.
BIA 660 Web Analytics - Midterm
Akshta ChouguleHao HanDi HuoXi LuLaura Sills
Bank Of America
Business Problem
Customer Strategy: grow base by forming life-long banking relationships with young adults
Current Account Demographics Report Shows● fewer new student accounts● increase in cancellation of accounts by
the young adult demographic
Impact: Losing market share to other banks
Business Questions
● What is Bank of America’s reputation with this age group - do they like Bank of America or not?
● How does Bank of America compare to other banks?
● Are customers in this demographic group unhappy with the bank’s services?
● Are there any banking products which customers in this group want not offered by Bank of America?
Source of Information
Online social media sites are a good source for comments from this age group
YouTube Statistics
●More than 1 billion unique users monthly
● Nielsen ratings show that YouTube reaches more US adults ages 18-34 than any other cable network
http://www.youtube.com/yt/press/statistics.html
Demographics of Reddit
http://www.theatlantic.com/technology/archive/2013/07/reddit-demographics-in-one-chart/277513/
What do People Think About Banks?
Topic Reddit YouTube Twitter
mortgage 5% 6% 30%
loan 5% 13% 0%
fraud 6% 7% 0%
insurance 1% 2% 0%
branch 3% 1% 0%
hours 2% 1% 0%
account 19% 16% 20%
overdraft 8% 1% 0%
bailout 1% 6% 0%
fee 18% 11% 20%
customer 13% 8% 0%
representative / teller 7% 18% 20%
[credit] union 10% 7% 10%
computer 1% 1% 0%
CEO 2% 2% 0%
Data Gathering and Validation
Use Python to obtain comments from web
● Crawling Reddit
● API for Twitter
● API for YouTube
Data Cleansing and Exploration
● Delete incomplete comments, extra whitespace, and punctuation, stopwords
● Explore data using Python to analyze the frequency of words in the comments in order to identify “key words” related to banking
● Word scan confirmed the key words
Gathering data from Twitter ● Technique: twitter API● Amount of tweets:
BOA -- 125KB
Citibank-- 104 KB
Chase -- 100 KB
● Timestamp: 1 week ● Type of Data:
Tweet text
Tweet created_at
Geocode
Data Processing
● Two libraries: positive & negative
● Score each tweet
Tweets by Location
Data Processing
● Summary for BOA tweets:
● Good or bad?
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.20000 -0.04348 0.00000 -0.01176 0.02857 0.20000
Competitor Analysis
Distribution for tweets’ score
Mean:
BOA: -0.01176
Citi bank: -0.0006146
Chase: -0.00731
Two Sample T-test
Null hypothesis: true difference in means is equal to 0Alpha=0.1
● BOA and Citi bank:
p-value = 0.0009004 < 0.1● Citi bank and Chase:
p-value = 0.06971 < 0.1● BOA and Chase
p-value = 0.2289 > 0.1
Gathering data from YouTube
● Techniques: BeautifulSoup
g.data
● Amount for general analysis: 3097
Topic Reddit YouTube Twitter
mortgage 5% 6% 30%
loan 5% 13% 0%
fraud 6% 7% 0%
insurance 1% 2% 0%
branch 3% 1% 0%
hours 2% 1% 0%
account 19% 16% 20%
overdraft 8% 1% 0%
bailout 1% 6% 0%
fee 18% 11% 20%
customer 13% 8% 0%
representative / teller 7% 18% 20%
[credit] union 10% 7% 10%
computer 1% 1% 0%
CEO 2% 2% 0%
YouTube data for each category
● Training data: 600
● Loan: 2430
● Account: 2700
● Service: 520
Naive Bayes Classification Algorithm
A naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable。
Naive Bayes Classification Algorithm
Splitting the dataset into training and test data
(Manual rating of comments)
● Training (400)
● Testing (200)
● Predicting (5700)
Primary Categories of Customer Complaints
Accuracy of Classification
● Mortgage: 64.5%
● Accounts: 58.7%
● Service: 68.4%
Mortgage
Account
Service
Thank you!