Big Data. New Physics. And Why Geospatial Data is Analytic SuperFood
description
Transcript of Big Data. New Physics. And Why Geospatial Data is Analytic SuperFood
© 2011 IBM Corporation1
Big Data. New Physics.And Why Geospatial Data is Analytic SuperFood
Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics
January 18th, 2011
© 2011 IBM Corporation2
The data will find the data … and the relevance
will find you.
© 2011 IBM Corporation3
My Background
Early 80’s: Founded Systems Research & Development (SRD), a custom software consultancy
1989 – 2003: Built numerous systems for Las Vegas casinos including a technology known as Non-Obvious Relationship Awareness (NORA)
2005: IBM acquires SRD, now chief scientist of IBM Entity Analytics
Personally designed and deployed +/- 100 systems, a number of which contained multi-billions of transactions describing 100’s of millions of entities
Today: My focus is in the area of ‘sensemaking on streams’ with special attention towards privacy and civil liberties protections
© 2011 IBM Corporation4
Sensemaking on Streams
1) Evaluate new information against previous information … as it arrives.
2) Determine if what is being observing is relevant.
3) Deliver this relevant, actionable insight fast enough to do something about it … as it’s happening.
4) Do this with sufficient accuracy and scale to really matter.
© 2011 IBM Corporation5
Time
Com
pu
tin
g P
ow
er
Gro
wth
Sensemaking
Algorithms
Available Observation
Space
Context
Trend: Organizations Are Getting Dumber
• Your transactional data (inc. logs)• Available reference data• Plus, shared third party data• And an avalanche of open source=
© 2011 IBM Corporation6
Simply Overwhelming
“Every two days now we create as much information as we did from the dawn of civilization up until 2003.”
~ Eric Schmidt, CEO Google
© 2011 IBM Corporation7
Time
Com
pu
tin
g P
ow
er
Gro
wth
Sensemaking
Algorithms
Available Observation
Space
Context
Trend: Organizations Are Getting Dumber
WHY?
© 2011 IBM Corporation8
Algorithms at Dead End.
You Can’t Squeeze Knowledge
Out of a Pixel.
© 2011 IBM Corporation10
Context, definition
Better understanding something by taking into account the things around it.
© 2011 IBM Corporation11
Information in Context … and Accumulating
Top 200Customer
Job Applicant
IdentityThief
CriminalInvestigation
© 2011 IBM Corporation12
From Pixels to Pictures to Insight
Observations
Contextualization
Information inContext
Relevance
Consumer(An analyst, a system, the sensor itself, etc.)
© 2011 IBM Corporation13
The Puzzle Metaphor
Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors
What it represents is unknown (there is no picture on hand)
Is it one puzzle, 15 puzzles, or 1,500 different puzzles?
Some pieces are duplicates, missing, incomplete, low quality, or have been misinterpreted
Some pieces may even be professionally fabricated lies
Point being: Until you take the pieces to the table and attempt assembly, you don’t know what you are dealing with
© 2011 IBM Corporation14
How Context Accumulates
With each new observation … one of three assertions are made: 1) Un-associated; 2) placed near like neighbors; or 3) connected
Must favor the false negative
New observations sometimes reverse earlier assertions
Some observations produce novel discovery
As the working space expands, computational effort increases
Given sufficient observations, there can come a tipping point, at which time: 1) confidence begins to improve; and 2) computational effort begins to decrease!
© 2011 IBM Corporation15
One Form of Context Is “Expert Counting”
Is it 5 people each with 1 account … or is it 1 person with 5 accounts?
Is it 20 cases of H1N1 in 20 cities … or one case reported 20 times?
If one cannot count … one cannot estimate vector or velocity (direction and speed).
Without vector and velocity … prediction is nearly impossible.
© 2011 IBM Corporation16
Counting: Degrees of Difficulty
Exactly Same
Fuzzy
IncompatibleFeatures
Deceit
Bob Jones123455
Bob Jones123455
Bob Jones123455
Robert T Jonnes000123455
Bob Jones123455
bjones@hotmail
Bob Jones123455
Ken Wells550119
© 2011 IBM Corporation17
“Key Features” Enable Expert Counting
People Cars Router
Name Make Device IDAddress Model MakeDate of Birth Year ModelPhone License Plate No. Firmware Vers.Passport VIN Asset IDNationality Owner Etc.Biometric Etc.Etc.
© 2011 IBM Corporation18
Consider Lying Identical Twins
#123Sue3/3/84UberstanExp 2011
PASSPORT#123Sue3/3/84UberstanExp 2011
PASSPORT
Fingerprint
DNAMost Trusted
Authority
“Same person –
trust me.”
Most TrustedAuthority
© 2011 IBM Corporation19
The same thing cannot be in two places … at the same time.
Two different things cannot occupy the same space … at the same time.
© 2011 IBM Corporation20
Space & Time Enables Absolute Disambiguation
People Cars RouterName Make Device IDAddress Model MakeDate of Birth Year ModelPhone License Plate No. Firmware Vers.Passport VIN Asset IDNationality Owner Etc.Biometric Etc.Etc.
When When WhenWhere Where Where
© 2011 IBM Corporation21
“Life Arcs” Are Also Telling
Bill Smith4/13/67
Salem, Oregon
Bill Smith4/13/67
Seattle, Washington
Address History
Tampa, FL 2008-2008
Biloxi, MS 2005-2008
NY, NY 1996-2005
Tampa, FL 1984-1996
Address History
San Diego, CA 2005-2009
San Fran, CA 2005-2005
Phoenix, AZ 1990-2005
San Jose, CA 1982-1990
© 2011 IBM Corporation22
Space-Time-Travel
© 2011 IBM Corporation23
Space-Time-Travel
Cell phones are generating a staggering amount of geo-locational data – 600B transactions per day being created in the US alone
This data is being “de-identified” and shared with third parties – in volume and in real-time
Your movement quickly reveals where you spend your time (e.g., evenings vs. working hours) and who you spend your time with
Re-identification (figuring out who is who) is somewhat trivial
© 2011 IBM Corporation24
Analytic Superfood for Prediction
Route suggestions pushed to drivers, just-in-time, to avert significant traffic events
Search results optimized using personalized life arc forecasts
A nation able to work right through an extreme global pandemic
© 2011 IBM Corporation25
And Other Predictions …
Prediction with 87% certainty where you will be next Thursday at 5:35pm
Names of the top 10 people you co-locate with, not at home and not at work
The Uberstan intelligence service preempts the next mass protest in real-time
A political opponent is crushed and resigns two days after announcing their candidacy
© 2011 IBM Corporation26
Consequences
Space-time-travel data is the ultimate biometric
It will enable enormous opportunity
It will unravel one’s secrets
It will challenge existing notions of privacy
And, it’s here now and more to come
© 2011 IBM Corporation27
Surveillance society
is irresistible.
And you are doing it.GPS-enhanced search, free email, Facebook, etc.
© 2011 IBM Corporation28
Responsible innovation
Privacy by design
Better data protectionData anonymization, active audit logs, etc.
© 2011 IBM Corporation29
Closing Thoughts
© 2011 IBM Corporation30
Time
Com
pu
tin
g P
ow
er
Gro
wth
Sensemaking
Algorithms
Available Observation
Space
Context
Wish This On The Adversary
© 2011 IBM Corporation31
Time
Com
pu
tin
g P
ow
er
Gro
wth
Context Accumulation: The Way Forward
Sensemaking
Algorithms
Available Observation
SpaceContext Context
Accumulation
© 2011 IBM Corporation32
Geospatial-Enabled Intelligence ... Today
GeospatialAnalytics
GeospatialVisualization
Current Focus
© 2011 IBM Corporation33
GeospatialVisualization
GeospatialAnalytics
Future Focus
Geospatial-Enabled Intelligence … Tomorrow
© 2011 IBM Corporation34
Big Data. New Physics.
More Data: Better prediction– Less false positives
– Less false negatives
More Data: Bad data good
More Data: Less compute effort
© 2011 IBM Corporation35
Related Blog Posts
Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel
Puzzling: How Observations Are Accumulated Into Context
Big Data. New Physics.
Smart Sensemaking Systems, First and Foremost, Must be Expert Counting Systems
Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food!
Big Data Flows vs. Wicked Leaks
Data Finds Data
“Macro Trends: The Privacy and Civil Liberties Consequences … and Comments on Responsible Innovation” – My DHS DPIAC Testimony, September 2008
© 2011 IBM Corporation36
Big Data. New Physics.And Why Geospatial Data is Analytic SuperFood
Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics
January 18th, 2011