DAMA Presentation

51

Click here to load reader

Transcript of DAMA Presentation

Page 1: DAMA Presentation

S

Driving  Business  Transformations  with  Big  Data  Analytics    

DAMA  SouthWest  Ohio  September  13,  2012  

Page 2: DAMA Presentation

Key  Business  Trends  

S  Mega  Trends  S  Socializa1on  S  Collabora1on  S  Gamifica1on  S  Mobile  

S  Micro  Trends  S  Micro-­‐Segmenta1on  S  Advanced  Analy1cs  

Page 3: DAMA Presentation

copyright  @Sixth  Sense  Advisors  Inc  2012  3  

Crowdsourcing  &  Collabora1on  

Within  1  month:  

S  More  than  1000  virtual  prospectors  

S  50  countries  

S  110  new  targets,  50%  previously  uniden1fied  

S  80%  yielded  gold  

•  $575,000 prize money •  400Mb data •  55,000 acres

 Within  a  few  years:  •  From  a  $100  million  company  into  a  $9  

billion  juggernaut    

GoldCorp  

Page 4: DAMA Presentation

Collaboration  &  GamiCication  

copyright  @Sixth  Sense  Advisors  Inc  2012  4  

Page 5: DAMA Presentation

Gamifica1on  

Page 6: DAMA Presentation

Peer  2  Peer  Collabora1on  

Page 7: DAMA Presentation

Crowdsourcing    

Page 8: DAMA Presentation

Game  Changer  

S  To  become  a  leader  from  a  compe1tor  and  create  an  undisputed  market  presence,  companies  need  to  create  new  and  vibrant  business  models  

S  These  business  models  need  a  lot  of  research,  idea1on  and  execu1on  (read  –  Data,  Data  and  more  Data)  

S  Companies  that  can  harvest  data  efficiently  and  effec1vely  will  emerge  as  the  winner  of  the  Game,  ul1mately  changing  the  Game.  

Page 9: DAMA Presentation

S

What  Does  It  Take  

Page 10: DAMA Presentation

A  Growing  Trend  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  10  

Requirement   ExpectaDons   Reality  

Speed   Speed  of  the  Internet   Speed  =  Infra  +  Arch  +  Design  

Accessibility   Accessibility  of  a  Smartphone  

BI  Tool  licenses  &  security  

Usability   IPAD  -­‐  Mobility   Web  Enabled  BI  Tool  

Availability   Google  Search   Data  &  Report  Metadata  

Delivery   Speed  of  ques1ons   Methodology  &  Signoff  

Data   Access  to  everything     Structured  Data  

Scalability   Cloud  (Amazon)   Exis1ng  Infrastructure  

Cost   Cell  phone  or  Free  WIFI   Millions  

Expecta1ons  for  BI  are  changing  w/o  anyone  telling  us  

Page 11: DAMA Presentation

20%

The New Way (with a bigger, longer tail)

The Old Way (Pareto Principle,

or 80/20 rule) Control

When Web 2.0 is applied…

Source: http://en.wikipedia.org/wiki/The_Long_Tail

Long  Tail  

copyright:  Sixth  Sense  Advisors  Inc  @2012  

Page 12: DAMA Presentation

2008 US Presidential Elections

copyright:  Sixth  Sense  Advisors  Inc  @2012  

$32 million raised from 275,000 people who gave $100 or less

Page 13: DAMA Presentation

20% Source: http://en.wikipedia.org/wiki/The_Long_Tail

High $ value donors, Small

constellation

Low $ value donors, Larger constellation

Web 2.0 significantly increases total value contributed/received by aggregating the “long tail” of smaller value donors.

Long  Tail  Example  

copyright:  Sixth  Sense  Advisors  Inc  @2012  

Page 14: DAMA Presentation

Brand  Management  

copyright:  Sixth  Sense  Advisors  Inc  @2012  

Page 15: DAMA Presentation

S

Big  Data  

Page 16: DAMA Presentation

The  Buzz  

copyright:  Sixth  Sense  Advisors  Inc  @2012  

Page 17: DAMA Presentation

Data  Disruptions  

copyright:  Sixth  Sense  Advisors  Inc  @2012  17  Porter  CompeDDve  Model  

Page 18: DAMA Presentation

State  of  Data  Today  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  18  

Page 19: DAMA Presentation

Future  of  Data  

copyright  @Sixth  Sense  Advisors  Inc  2012  19  

Page 20: DAMA Presentation

Big  Data  

copyright:  Sixth  Sense  Advisors  Inc  @2012  20  

Big Data can be defined as data that can grow in volume, velocity, variety and complexity at unprecedented pace. The growth and complexity present challenges with the capture, storage, management, analysis and visualization using the typical BI tool stack

Page 21: DAMA Presentation

Tapping into the data

copyright:  Sixth  Sense  Advisors  Inc  @2012  21  

Big Data existing across the enterprise that can be made available to business  

Structured data used today  

Today  we  do  Big  or  Small  compute  with  Small  and  Large  structured  data  sets  

Big  Data  will  mean  Big  or  Small  compute  with  Big  data  sets,  not  always  available  in  structured  or  semi-­‐structured  formats  

Business   Infrastructure  

Page 22: DAMA Presentation

Analytics  S  Analy1cs  is  the  key  visualiza1on  technique  to  analyze  and  mone1ze  

from  Big  Data  

S  The  field  of  analy1cs  is  resurging  from  the  advent  of  Big  Data    S  Social  Analy1cs  S  Sensor  Analy1cs  S  Text  Analy1cs  S  Deep  Data  Mining  

S  Analy1cs  needs  metadata  for  integra1on  

S  Applica1ons  S  Fraud  Detec1on  S  Campaign  Op1miza1on  S  Demand  and  Supply  Op1miza1on  S  Forecast  Op1miza1on  

copyright:  Sixth  Sense  Advisors  Inc  @2012  22  

Page 23: DAMA Presentation

What’s  so  Big  about  Big  Data  

Velocity  Volume  Variety  

Complexity  Ambiguity  

 ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  

Reserved  23  

Page 24: DAMA Presentation

What  do  we  collect  

copyright:  Sixth  Sense  Advisors  Inc  @2012  24  

•  Facebook has an average of 30 billion pieces of content added every month

•  YouTube receives 24hours of video, every minute

•  5 Billion mobile phones in use in 2010

•  A leading retailer in the UK collects 1.5 billion pieces of information to adjust prices and promotions

•  Amazon.com: 30% of sales is out of its recommendation engine

•  A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements

Page 25: DAMA Presentation

Potential  Business  Insights    

S  Trends  

S  Brand  Iden1ty  &  Management  

S  Consumer  Educa1on  

S  Compe11ve  Intelligence  

S  Micro-­‐Targe1ng  Leverage  “Crowdsourcing”  driven  innova1on  to  beger  products  and  services  (DELL,  Innocen1ve  (SAP,  P&G))  

S  eDiscovery  (Legal  trends  and  pagerns,  financial  fraud)  

S  Pharmaceu1cal  Companies    S  Pa1ent  Educa1on  S  Physician  Enriched  Content  

Management  S  Reduce  Clinical  Trial  Cycles  and  

Errors  S  Pharmacovigilance  

S  Financial  S  Fraud  S  Customer  Management  

S  Manufacturing  S  Supply  chain  op1miza1on  S  Track  &  Trace  S  Compliance  

 

 copyright:  Sixth  Sense  Advisors  Inc  @2012  

Page 26: DAMA Presentation

Base  Graph  Courtesy  –  Dr.  Richard  Hackathorn  

Why  DWBI  Fails  Repeatedly  

copyright:  Sixth  Sense  Advisors  Inc  @2012  26  

AcDon  Dme  or  AcDon  distance  Time  

Business  Value  

Data  Latency  

Analysis  Latency  

Decision  Latency  

Business  SituaDon  

Data  is  ready  

InformaDon  is  available  

Decision  is  made  

Lost  

Value  

Lost  value  =  Sum  (Latencies)+  Opportunity  Cost  

Page 27: DAMA Presentation

The  Data  Landscape  

copyright:  Sixth  Sense  Advisors  Inc  @2012  27  Data  Transforma1on  

Transac1onal  Systems   ODS  

Enterprise  Datawarehouse    

Datamarts  &  Analy1cal  Databases  

Datamarts  &  Analy1cal  Databases  

Datamarts  &  Analy1cal  Databases  

Transac1onal  Systems   ODS  

Transac1onal  Systems   ODS  

Reports  

Dashboards  

Analy1c  Models  

Other  Applica1on

s  

Page 28: DAMA Presentation

ACID  Kills  

S  Atomic – All of the work in a transaction completes (commit) or none of it completes

S  Consistent – A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints.

S  Isolated – The results of any changes made during a transaction are not visible until the transaction has committed.

S  Durable – The results of a committed transaction survive failures

copyright:  Sixth  Sense  Advisors  Inc  @2012  28  

Page 29: DAMA Presentation

BIG  Data  Scenarios  EXAMPLES  

copyright:  Sixth  Sense  Advisors  Inc  @2012  29  

To:  [email protected]    Dear  Mr.  Collins,    This  email  is  in  reference  to  my  bank  account  which  has  been  efficiently  handled  by  your  bank  for  more  than  five  years.  There  has  been  no  problem  1ll  date  un1l  last  week  the  situa1on  went  out  of  the  hand.    I  have  deposited  one  of  my  high  amount  cheque  to  my  bank  account  no:  65656512  which  was  to  be  credited  same  day  but  due  to  your  staff  carelessness  it  wasn’t  done  and  because  of  this  negligence  my  reputa1on  in  the  market  has  been  tarnished.  Furthermore  I  had  issued  one  payment  cheque  to  the  party  which  was  showing  bounced  due  to  “Insufficient  balance”  just  because  my  cheque  didn’t  make  on  1me.    My  rela1onship  with  your  bank  has  matured  with  the  1me  and  it’s  a  shame  to  tell  you  about  this  kind  of  services  are  not  acceptable  when  it  is  ques1on  of  somebody’s  reputa1on.  I  hope  you  got  my  point  and  I  am  agaching  a  copy  of  the  same  for  further  rapid  procedures  and  remit  into  my  account  in  a  day.    Yours  sincerely    Daniel  Carter    Ph:  564-­‐009-­‐2311  

Page 30: DAMA Presentation

BIG  Data  Text  Example  S  We  will  ooen  imply  addi1onal  informa1on  in  spoken  language  by  the  way  we  place  

stress  on  words.    

S  The  sentence  "I  never  said  she  stole  my  money"  demonstrates  the  importance  stress  can  play  in  a  sentence,  and  thus  the  inherent  difficulty  a  natural  language  processor  can  have  in  parsing  it.    S  "I  never  said  she  stole  my  money"  -­‐  Someone  else  said  it,  but  I  didn't.    S  "I  never  said  she  stole  my  money"  -­‐  I  simply  didn't  ever  say  it.    S  "I  never  said  she  stole  my  money"  -­‐  I  might  have  implied  it  in  some  way,  but  I  never  

explicitly  said  it.    S  "I  never  said  she  stole  my  money"  -­‐  I  said  someone  took  it;  I  didn't  say  it  was  she.    S  "I  never  said  she  stole  my  money"  -­‐  I  just  said  she  probably  borrowed  it.    S  "I  never  said  she  stole  my  money"  -­‐  I  said  she  stole  someone  else's  money.    S  "I  never  said  she  stole  my  money"  -­‐  I  said  she  stole  something,  but  not  my  money  

S  Depending  on  which  word  the  speaker  places  the  stress,  this  sentence  could  have  several  dis1nct  meanings.  

copyright:  Sixth  Sense  Advisors  Inc  @2012  30  Example Source: Wikepedia

Page 31: DAMA Presentation

Pattern  Detection  

copyright:  Sixth  Sense  Advisors  Inc  @2012  31  

Reduc1on  Techniques  Backward  Elimina1on  Forward  Selec1on  Agribute  Removal  Principal  Components  

Clustering  Techniques  K-­‐Means  Maximin  Agglomera1ve  Divisive  Regression  

Classifica1on  Techniques  Na1ve  Bayes  Neural  Networks  

Back  Propoga1onal  Recursively  Spliung    

K-­‐Nearest  Neighbor  Minimum  Distance  

U1li1es  Accuracy  Measures  Range  Filters  K-­‐Fold  Cross  Valida1on  Merge  &  Subset  Vector  Magnitude  

Examples    • Text  –  OCR,  Machine,  Digital  •   Face  recogni1on,  verifica1on,  retrieval.    •   Finger  prints  recogni1on.  •   Speech  recogni1on.  •   Medical  diagnosis:  X-­‐Ray,  EKG  analysis  •     Machine  diagnos1cs  data  •   Geological  data  •   Automated  Target  Recogni1on  (ATR).  •     Image  segmenta1on  and  analysis  (recogni1on  from  aerial  or  satelite  photographs).  

Page 32: DAMA Presentation

So  you  are  about  to  start  the  Big  Data  Project  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  32  

Tools  

instruc1ons  

Data  

Output  

Page 33: DAMA Presentation

The  Normal  Way  Results  In  ……..  

@2012  Copyright  Sixth  Sense  Advisors  33  

Page 34: DAMA Presentation

Performance  

copyright:  Sixth  Sense  Advisors  Inc  @2012  34  

+ New Data Types

+ New volume

+ New Analytics

+ New Data Retention

+ New Data Workloads

Re-­‐Engineering  a  Ferrari  Engine  in  a  Yugo  does  not  make  the  fastest  race  car.

Page 35: DAMA Presentation

BIG  Data  

ü  Workload  Demands  ü  Process  dynamic  data  content  ü  Process  unstructured  data  ü  Systems  that  can  scale  up  and  

scale  out  with  high  volume  data  ü  Perform  complex  opera1ons  

within  reasonable  response  1me  

ü  Infrastructure  Needs  ü  Scalable  plaxorm  ü  Database  independence  ü  Fault  Tolerance  ü  Supported  by  standard  toolsets  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  35  

Page 36: DAMA Presentation

Data  Warehouse  Appliance  

High Availability  

Standard SQL Interface  

Advanced Compression  

MPP  

Leverages existing BI, ETL and OLTP investments  

Hadoop & MapReduce Interface / Embedded  

Minimal  disk  I/O  bogleneck;  simultaneously  load  &  query  

Auto Database Management  

copyright:  Sixth  Sense  Advisors  Inc  @2012  36  

•  A  Data  Warehouse  (DW)  Appliance  is  an  integrated  set  of  servers,  storage,  OS,  database  and  interconnect  specifically  preconfigured  and  tuned  for  the  rigors  of  data  warehousing.    

•  DW  appliances  offer  an  agrac1ve  price  /  performance  value  proposi1on  and  are  frequently  a  frac1on  of  the  cost  of  tradi1onal  data  warehouse  solu1ons.    

Page 37: DAMA Presentation

Hadoop  

copyright:  Sixth  Sense  Advisors  Inc  @2012  37  

Page 38: DAMA Presentation

Hadoop & RDBMS Analogy

Cargo train: •  rough •  missing a lot of “luxury”

•  slow to accelerate •  carries almost anything •  moves a lot of stuff very

efficiently copyright:  Sixth  Sense  Advisors  Inc  @2012  38  

Sports car: •  refined •  has a lot of features •  accelerates very fast •  pricey •  expensive to maintain  

RDBMS   Hadoop  

*  Original  Slide  Author-­‐  Amr  Adwallah  ,  CloudEra  

Page 39: DAMA Presentation

NoSQL  S  Stands  for  Not  Only  SQL  

S  Based  on  CAP  Theorem  /  BASE  

S  Usually  do  not  require  a  fixed  table  schema  nor  do  they  use  the  concept  of  joins  

S  All  NoSQL  offerings  relax  one  or  more  of  the  ACID  properDes    

S  Scalable replication and distribution S  Potentially thousands of machines S  Potentially distributed around the world

S  Queries need to return answers quickly

S  Mostly query, few updates

S  Asynchronous Inserts & Updates

S  NoSQL  databases  come  in  a  variety  of  flavors  S  XML  (myXMLDB,  Tamino,  Sedna)    S  Wide  Column  (Cassandra,  Hbase,  Big  Table)  S  Key/Value  (Redis,  Memcached  with  BerkleyDB)      S  Graph  (neo4j,  InfoGrid)  S  Document  store  (CouchDB,  MongoDB)  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  39  

Page 40: DAMA Presentation

NoSQL  Footprint  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  40  

Size  

Complexity  

Amazon  Dynamo  

Google  Big  Table  

Cassandra  

Lotus  Notes  

HBase  

Voldermort  

Graph  Theory  

Page 41: DAMA Presentation

Map  Reduce  

n  Technique  for  indexing  and  searching  large  data  volumes  

n  Two  Phases,  Map  and  Reduce  n  Map  

n  Extract  sets  of  Key-­‐Value  pairs  from  underlying  data  n  Poten1ally  in  Parallel  on  mul1ple  machines  

n  Reduce  n  Merge  and  sort  sets  of  Key-­‐Value  pairs  n  Results  may  be  useful  for  other  searches  

copyright:  Sixth  Sense  Advisors  Inc  @2012  41  

Page 42: DAMA Presentation

Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of data that can be analyzed by standard analytical tools

Textual  ETL  Engine  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  42  

ü  Textual  ETL  Engine  provides  a  robust  user  interface  to  define  rules  (or  pagerns  /  keywords)  to  process  unstructured  or  semi-­‐structured  data.  

ü  The  rules  engine  encapsulates  all  the  complexity  and  lets  the  user  define  simple  phrases  and  keywords  

ü  Easy  to  implement  and  easy  to  realize  ROI  

ü  Advantages  ü  Simple  to  use  ü  No  MR  or  Coding  required  for  text  

analysis  and  mining  ü  Extensible  by  Taxonomy  integra1on  ü  Works  on  standard  and  new  databases  ü  Produces  a  highly  columnar  key-­‐value  

store,  ready  for  metadata  integra1on  

ü  Disadvantages  ü  Not  integrated  with  Hadoop  as  a  rules  

interface  ü  Currently  uses  Sqoop  for  metadata  

interchange  with  Hadoop  or  NoSQL  interfaces  

ü  Current  GA  does  not  handle  distributed  processing  outside  Windows  plaxorm  

Page 43: DAMA Presentation

Integration  

S  All  RDBMS  vendors  today  are  suppor1ng  Hadoop  or  NoSQL  as  an  integra1on  or  extension  S  Oracle  Exaly1cs  /  Big  Data  Appliance  S  Teradata  Aster  Appliance  S  EMC  Greenplum  Appliance  S  IBM  BigInsights  S  Microsoo  Windows  Azure  Integra1on  

S  There  are  mul1ple  providers  of  Hadoop  distribu1on  S  CloudEra  S  HortonWorks  S  Hadapt  S  Zegaset  S  IBM  

S  Adapters  from  vendors  to  interface  with  CloudEra  or  HortonWorks  distribu1ons  of  Hadoop  are  available  today.  There  are  integra1on  efforts  to  release  Hadoop  as  an  integral  engine  across  the  RDBMS  vendor  plaxorms  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  43  

Page 44: DAMA Presentation

Conceptual  Solu1on  Architecture  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  44  

Metadata  

Data  Warehouse  

Taxonomy  

Big  Data  DW  Textual  

ETL  

ETL  ELT  CDC  

MDM  

DataMart’s  

OLTP  

BIG  Data  Content  Email  Docs  

And  /  Or  

Page 45: DAMA Presentation

Which  Tool  

ApplicaDon   Hadoop   NoSQL   Textual  ETL  

Machine  Learning   x   x  

Sen1ments   x   x   x  

Text  Processing   x   x   x  

Image  Processing   x   x  

Video  Analy1cs   x   x  

Log  Parsing   x   x   x  

Collabora1ve  Filtering  

x   x   x  

Context  Search   x  

Email  &  Content   x  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  45  

Page 46: DAMA Presentation

Integration  Tips    

S  The  key  to  the  castle  in  integra1ng  Big  Data  is  metadata  

S  Whatever  the  tool,  technology  and  technique,  if  you  do  not  know  your  metadata,  your  integra1on  will  fail  

S  Seman1c  technologies  and  architectures  will  be  the  way  to  process  and  integrate  the  Big  Data,  much  akin  to  Web  2.0  models  

S  Data  quality  for  Big  Data  is  a  very  ques1onable  goal.  To  get  some  semblance  of  quality,  taxonomies  and  ontologies  can  be  of  help  

S  3rd  part  data  providers  also  provide  keywords,  trending  tags  and  scores,  these  can  provide  a  lot  of  integra1on  support  

S  Wri1ng  business  rules  for  Big  Data  can  be  very  cumbersome  and  not  all  programs  can  be  wrigen  in  MapReduce  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  46  

Page 47: DAMA Presentation

Success  Stories  S  Machine  learning  &  Recommenda1on  Engines  –  Amazon,  Orbitz  

S  CRM  -­‐  Consumer  Analy1cs,  Metrics,  Social  Network  Analy1cs,  Churn,  Sen1ment,  Influencer,  Proximity  

S  Finance  –  Fraud,  Compliance  

S  Telco  –  CDR,  Fraud  

S  Healthcare  –  Provider  /  Pa1ent  analy1cs,  fraud,  proac1ve  care  

S  Lifesciences  –  clinical  analy1cs,  physician  outreach  

S  Pharma  –  Pharmacovigilance,  clinical  trials  

S  Insurance  –  fraud,  geo-­‐spa1al  

S  Manufacturing  –  warranty  analy1cs,  supplier  quality  metrics  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  47  

Page 48: DAMA Presentation

Big  Data  Challenges  

S  Integra1on  to  the  EDW  is  s1ll  an  open  issue  –  Big  Data  reduces  to  small  metrics,  and  this  translates  into  the  current  state  issues  faced  with  EDW  data  

S  Big  Data  requires  lot  of  Taxonomy  processing  especially  in  Content  related  Search  

S  There  are  several  applica1ons  that  need  high  performing  memory  architectures  as  data  is  compute  intensive  –  example  image  processing  of  brain  scans  

S  Technology  is  improving  by  the  day,  but  integra1on  and  deployment  are  becoming  equally  complex.  

copyright:  Sixth  Sense  Advisors  Inc  @2012  48  

Page 49: DAMA Presentation

Data  Science  

©2012  Sixth  Sense  Advisors,  Inc.    All  Rights  Reserved  49  

   

Data Analytics    Content  Customer Product Behaviors  Optimization

Big Data Processing & ETL

Business  Intelligence   Advanced  Analy1cs  

Art  &  Science  

Business  Analysts,  Data  Analysts,  Metadata  Architects,  Data  Architects  are  all  in  some  evolu1onary  stage  of  a  Data  Scien1st  

Page 50: DAMA Presentation

Summary  

S  With  effec1ve  use  of  Big  Data  and  Analy1cs  S  You  can  drive  successful  business  transforma1ons  S  Create  an  agile  environment  for  business  decision  processes  S  Use  the  Data  Warehouse  for  Analy1cal  Processes  as  it  was  

originally  designed  for  S  Create  predic1ve  insights  S  Prac1cally  “mine  (explore)”  any  data  from  any  source  S  Create  powerful  dashboards  from  near  real  1me  data  S  Reduce  risk  S  Increase  compe11veness  

Page 51: DAMA Presentation

Contact  

Krish  Krishnan  

[email protected]  

Twiger  -­‐  @datagenius  

copyright:  Sixth  Sense  Advisors  Inc  @2012  51