Pictures through Numbers, OpenDataCamp 2012 Bangalore
Transcript of Pictures through Numbers, OpenDataCamp 2012 Bangalore
DATA VISUALISATION
PICTURES USING NUMBERS
S ANANDDATA SCIENTISTGRAMENER.COM
You will be shown a set of numbersalong with a summary (average, etc)Can you make sense of the figures?
WHY VISUALISE?
So is the variance in sales.The variance in price is the same.
The average sales is the same too.The average price is the same.
Take a look at the sales report alongside. A company has branches in 4 cities, and each branch changes the product price every month. This leads to a corresponding change in the sales.
Here is the performance of the 4 branches with their monthly price and sales for each month.
Looking at the average, the four branches have an identical performance.
2010 Bangalore DelhiHyderaba
dMumbai
MonthPric
eSale
sPric
eSale
sPric
eSale
sPric
eSale
s
Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
Mar 13.0 7.58 13.0 8.74 13.012.7
48.0 7.71
Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.012.5
0
Sep 12.010.8
412.0 9.13 12.0 8.15 8.0 5.56
Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Average
9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50
Variance
10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75
DO THESE FOUR CITIES LOOK IDENTICAL TO YOU?
DO YOU AGREE?
ARE THEY REALLY IDENTICAL? CHECK AGAIN…But in fact, the four cities are totally different in behaviour.
Bangalore sales has generally increased with price.
Hyderabad has a nearly perfect increase in sales with price, except for one aberration.
Delhi shows a decline in sales beyond a price of 10.
Mumbai’s sales fluctuates despite a nearly constant price.
DETECTING FRAUD
“We know meter readings are incorrect, for various reasons.
We don’t, however, have the concrete proof we need to start the process of meter reading automation.
Part of our problem is the volume of data that needs to be analysed. The other is the inexperience in tools or analyses to identify such patterns.
ENERGY UTILITY
This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of
readings are aligned with the tariff slab boundaries.
This clearly shows collusion of some form with the customers.
Apr-10May-10Jun-
10Jul-10Aug-10Sep-
10Oct-
10Nov-10Dec-
10Jan-
11Feb-
11Mar-
11217 219 200 200 200 200 200 200 200 350 200 200250 200 200 200 201 200 200 200 250 200 200 150250 150 150 200 200 200 200 200 200 200 200 150150 200 200 200 200 200 200 200 200 200 200 50200 200 200 150 180 150 50 100 50 70 100 100100 100 100 100 100 100 100 100 100 100 110 100100 150 123 123 50 100 50 100 100 100 100 100
0 111 100 100 100 100 100 100 100 100 50 500 100 27 100 50 100 100 100 100 100 70 1001 1 1 100 99 50 100 100 100 100 100 100
This happens with specific customers, not randomly. Here are such customers’ meter readings.
Section Apr-10May-10Jun-10Jul-10
Aug-10
Sep-10
Oct-10Nov-10
Dec-10
Jan-11
Feb-11
Mar-11
Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%Section 2 66% 92% 66% 87% 70% 64% 63% 50% 58% 38% 41% 54%Section 3 90% 46% 47% 43% 28% 31% 50% 32% 19% 38% 8% 34%Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%
If we define the “extent of fraud” as the percentage excess of the 100 unitmeter reading, the value varies considerably across sections, and time
New section manager arrives
… and is transferred
out
… with some explainable anomalies.
Why would these
happen?
EDUCATION
PREDICTING MARKS
What determines a child’s marks?
Do girls score better than boys?
Does the choice of subject matter?
Does the medium of instruction matter?
Does community or religion matter?
Does their birthday matter?
Does the first letter of their name matter?
Based on the results of the 20 lakh students taking the Class XII exams at Tamil Nadu over the last 3 years, it appears that the month you were born in can make a difference of as much as 120 marks out of 1,200.
June borns score the
lowest
The marks shoot up for Aug borns
… and peaks for Sep-borns
120 marks out of 1200
explainable by month of birth
An identical pattern was observed in 2009 and 2010…
… and across districts, gender, subjects, and class X & XII.
“It’s simply that in Canada the eligibility cutoff for age-class hockey is January 1. A boy who turns ten on January 2, then, could be playing alongside someone who doesn’t turn ten until the end of the year—and at that age, in preadolescence, a twelve-month gap in age represents an enormous difference in physical maturity.”
-- Malcolm Gladwell, Outliers
CRICKET
FASTEST SCORERS
“I’ve always been curious… who among India’s prolific one-day run-getters had the best strike rate?
Sachin?
Sehwag?
What about the rest of the world?
INDIAN ODI BATTING
http://gramener.com/cricket
http://gramener.com/cricket
FINDING PATTERNS
Which securities move together?
How should I diversify?
What should I sell to reduce risk?
What’s a reliable predictor of a security?
SECURITIES
68% correlation between AUD &
EUR
Plot of 6 month daily AUD - EUR
values
Block of correlated currencies
… clustered hierarchically
… that move counter-cyclically
to indices
EDUCATION
VISUALISING CHANGE
WEATHER
What was the weather in India like…
THE LAST 100 YEARS?
DASHBOARDS
WEB ANALYTICS
“Today, we use a 40-page weekly report summarise our online operations.
This is prepared by a team of 6 analysts pulling data from multiple sources – both online and offline.
We distribute it to the entire senior management team.
I’m fairly sure they don’t read it.
ASSET MANAGEMENT
COMPUTER USAGE
NETWORKS
EXPLORING RELATIONS
This is the social network of programmers across various Indian cities, using the follower network at Github.com – a Facebook for developers.
Each circle represents a coder. The size shows their number of followers. The colour shows the language they develop in. The lines show whom they follow.
PuneChennaiBangalore
DelhiMumbaiHyderabad
http://gramener.com/codersearch
SIMPLEREDESIGNS
TIMING
We handle terabyte-size data
via non-traditional analytics
and visualise it in real-time.
Gramener visualises your data
Gramener transforms your data into concise dashboardsthat make your business problem & solution visually obvious.We help you find insights quickly, based on cognitive research,and our visualisations guide you towards actionable decisions.
GramenerA data analytics and visualisation company