Expanding Big Data Science: Forward & Backward April 4, 2013Technology Trends, Big Data and...
-
Upload
katrina-lewis -
Category
Documents
-
view
214 -
download
1
Transcript of Expanding Big Data Science: Forward & Backward April 4, 2013Technology Trends, Big Data and...
Expanding Big Data Science:Forward & Backward
April 4, 2013 Technology Trends, Big Data and Data-Driven Decisions
C. Randall (Randy) Howard, Ph.D., PMP Big Data Scientist, Thought Leader, Systems Innovation Analyst, Solutions Architect
Sr. Data Scientist, Novetta SolutionsAdjunct Professor, Mason’s Volgenau School of Engineering
[email protected]://www.crhphdconsulting.net/
May 20, 2014
Expanding Big Data Science: Forward & Backward 2
C. Randall (Randy) Howard, Ph.D., PMP Senior Data Scientist, Novetta Solutions Adjunct Professor, Volgenau School of Engineering, GMU
o Big Data Overviewo Systems Analysis & Design Determining Needs in Big Data o Big Data, Small Details & Time (Metadata)
2013 Teaching Excellence Award Nominee Co-Organizer of Big Data Lecture Series, EIT Award Nominee Member, Data Science Working Groups & Sub-teams
International Author & Speaker 30 years IT & systems engineering, architecture, trouble-shooting,
change & innovation
Ph.D., Information Technology, GMU BS, MS: Information Systems, VCU
Expanding Big Data Science: Forward & Backward 3
Agenda Context: What is Big Data All About?
Forward: Considering Multiple Perspectives
Backward: Refactor/Repurpose Legacy Approaches
Expanding Big Data Science: Forward & Backward 4
Context: What is Big Data Science All About?
Expanding Big Data Science: Forward & Backward 5
Context of Material
How was the big data collected?o Empirical Observations & Applicationso Critical Thinking
Where is it stored?o Case Studieso Feverishly Codifyingo Move from Rescuing to Preventing
What are the results?o Clarifying and Connecting Disparate, Contentious Pieceso Still Working…
Expanding Big Data Science: Forward & Backward 6
My Positions on Big Data
Big Data Scienceo Big Data: Problem & Opportunity Spaceo Data Science: Potential Solution Disciplineo Big Data Science: “Applying Data Science to Big Data”
Technology “Reboot” CAN Usher in New Generation of Capabilitieso Big Data Todayo New “Big Data” Tomorrow
Must Clarify Business Value
Have To Think Horizontally & Corporately
But, I am a professor… Heresy Now? Genius Tomorrow?
Expanding Big Data Science: Forward & Backward 7
IT Disasters & Dilemmas: Possible w/ Big Data? [IT-Failures]
UK Inland Revenue*
$3.5B:Software ErrorsFBI’s Trilogy Virtual
Case File*
$170M:Scrapped
Dis
aste
rs
Dilemmas
Economic Winter(Do more w/ Less)
Ford’s Purchasing System*$400M:Abandoned
NSA Trailblazer *
$1.2B: over-budget, ineffective,
7-yr boondoggle
What is it? Exactly?
Obama Care?
Expanding Big Data Science: Forward & Backward 8
Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest trigger significant publicity. Often no usable products exist and commercial viability is unproven.
Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters.
Peak of Inflated Expectations: Early publicity produces a number of success stories—often accompanied by scores of failures. Some companies take action; many do not.
Slope of Enlightenment: More instances of how the technology can benefit the enterprise start to crystallize and become more widely understood. Second- and third-generation products appear from technology providers. More enterprises fund pilots; conservative companies remain cautious.
Plateau of Productivity: Mainstream adoption starts to take off. Criteria for assessing provider viability are more clearly defined. The technology’s broad market applicability and relevance are clearly paying off.
Curve of Complacency: Early successes satisfy stakeholders that the problem or opportunity is handled, and it is time to move on to the next issue. Meanwhile the Plateau of Productivity that is achieved is much lower.[crh]Dr. C. Randall Howard, PMP (Not a position of Gartner or Dr. Aiken-yet)
[Aiken] [Gartner]
My Big Concern!!
Expanding Big Data Science: Forward & Backward 9
[Conway]
Big Data & Data Science “1-Page Summary”
Big Data “V”s[IBM]:o Volume (How much in total)o Variety (How many sources)o Velocity (How fast does it come in)o Veracity, Variability, Complexity, etc.[various]
“Hard” Data Science[various]
o Math, Science, Analyticso Data-Driven Organizationso Creating data productso Looking to the future
“Soft” Data Science? (Hold on)
Creation & Collection
Capabilities
Time
Data
V’s
Processing & Analytical
Capabilities
Capability gaps due to surges in data collections
NOTIONAL DEPICITION
• Increases in Sensors• Social Media• Mobile Data
Expanding Big Data Science: Forward & Backward 10
Soft Data Science [crh]
Creation & Collection
Capabilities
Time
Data
V’s
Processing & Analytical
Capabilities
Shrink the Capability Gap
w/ “Hard” Data
Science Alone
w/ “Soft” &
“Hard”
Data Science
“Soft Head Start”
NOTIONAL DEPICITION
• Backlogs increase exponentially• Signals become noise• “Action” windows lost / missed• We become bottlenecks to partners
• Notoriety to date• Performed by a few• Bottlenecked by a few?
Changing Term to Tacit Data Science, but that’s another talk
Hardening the “Soft”•Automate “Hard-to-Automate”•Predict Predictable•To-be Performed by Many
Expanding Big Data Science: Forward & Backward 11
Big Data Science Value Parameters
Increased Actionable Intelligence
Trends Noticed / Confirmed
Leverage Unstructured
Faster Knowledge / Awareness / Ability to Search Data
Flexibility / Extensibility of Data Utilization
New, More Adaptable HW/SW Acquisition Models
More TBD
Expanding Big Data Science: Forward & Backward 12
Other Big Data Considerations
Capabilities Their Own Separate ROI’s
Process Data w/in Acceptable Tolerances:o Timeo Errorso Accuracyo Reliabilityo Etc.
Accountability: Find Critical Intelligence & Make Time Windows
Thus, Big Data Is “Having more data than you can process and manage within acceptable tolerances (e.g. time, quality, cost)”[crh]
Expanding Big Data Science: Forward & Backward 13
Forward: Considering Multiple Perspectives
Expanding Big Data Science: Forward & Backward 14
BDLS: A Broader Look Big Data Science
Each channel is difficult
Each complements the other
Complexities are compounded exponentially in cross-sections
Expanding Big Data Science: Forward & Backward 15
Multi-disciplinary[Gartner-ERDS] teams[Patil] a “broad sample of the population” & involves “teams that frequently partner w/ diverse roles in an organization… to gather, organize, & make use of their data”[EMC-DS]
“Wetware[Gleichauf]” (vs. HW & SW): “People, their skillsets, corporate policies, & organizational structures that define our analytic communities”
Soft Skills[Gartner-ERDS]: o Communicationo Collaborationo Leadershipo Creativityo Disciplineo Passion
Data Scientist can be invaluable…unique combination of technical & business skills…makes them difficult to to find or cultivate. [Gartner-ERDS]
Multiple Perspectives in Publications
Expanding Big Data Science: Forward & Backward 16
Data Science Teams
Data Science Teams[Patil]
o Small-team members should sit close to each other o Mix of skill-sets, some experts, some noto Train people to fish o Functional areas must stay in regular contact and communication.
Impedimentso Measuring Performance: Rewarding & Disciplining Teams vs. Individualso Sharing Intellectual Property w/ Integrated Product Teams (esp. cross-vendor)o “Expert Teams”????
“Expert Teams”o May find Big Data Science trivialo Typically
• have more control over their environment• Don’t need to have the masses engaged
But …o Most organizations need to have the knowledge & skills spread out to “Non-experts”
Expanding Big Data Science: Forward & Backward 17
Life-Cycle Service Orchestration
Legal Review
Life Cycle
OODA Loop
Acquisition (FAR)
Expanding Big Data Science: Forward & Backward 18
Classroom Exercise Findings
Expanding Big Data Science: Forward & Backward 19
Wicked Problems
Expanding Big Data Science: Forward & Backward 20
Wicked Problems Tip-off Words[Nixon]
Integrated Joint
Interoperable
Shared
Cross-organizational
Networked
Multi-organizational
Virtual
Coalition
Community
Combined
Big Data is a Wicked Problem!
Expanding Big Data Science: Forward & Backward 21
Wicked Problems[Nixon]
Requires Multiple Stakeholders’ Perspectives Key Driver: Social Complexity from Integrated Networks Traditional linear solution styles are not well suited
Needs focus on:o Social Aspectso Gaining Shared Understandingo Try Thingso Let Solution Emerge From Cycle of Adaptation
Thus[crh], o Multiple Perspectives Involves Collaborationo Collaboration Technologies MUST BE INNOVATED
Expanding Big Data Science: Forward & Backward 22
Sample Collaboration Innovation[InnovationGames]
Expanding Big Data Science: Forward & Backward 23
Sample Collaboration Innovation[InnovationGames]
[InnovationGames] http://innovationgames.com/
Expanding Big Data Science: Forward & Backward 24
Learning Organizations
Expanding Big Data Science: Forward & Backward 25
Learning Organization [Senge]
Peter Senge (http://www.infed.org/thinkers/senge.htm)o Studied how adaptive capabilities developedo The Fifth Discipline(1990) ‘Learning Organization' (LO)
Basic Learning Organization Disciplines:o Systems Thinkingo Personal Masteryo Mental Modelso Building Shared Visiono Team Learning
Expanding Big Data Science: Forward & Backward 26
Learning Organizations’ Disciplines
Discipline Explanation
Systems Thinking
• Cannot understand the parts until you understand the whole[Aiken]
• Balance• Theory w/ Data[Barbara’] • Ideas w/ Tools [Sagan]
System Maps Diagrams that show key elements of systems and how they connect. You may have heard them called Landscape or Ecosystem
Personal Mastery Clarify & deepen our personal vision…of seeing reality objectivelyOR Know yourself
Mental Models
Carry on ‘learningful’ conversations that:•Expose our internal pictures of the world & hold them up to scrutiny•Balance inquiry and advocacy, where people share their thoughts. OR Express Yourself
Building Shared vision
Capacity to hold a share picture of the future we seek to create Has power to encourage experimentation and innovation.
Team Learning Process of aligning & developing capacities of a team to create results its members truly desire
Expanding Big Data Science: Forward & Backward 27
Changing Culture
Expanding Big Data Science: Forward & Backward 28
Culture Obstacles[econBD]
Expanding Big Data Science: Forward & Backward 29
Changing Culture
Examples:o Hard-driveso Management Visibility of Data Processingo Target’s former CEO?
Leadership needs to foster a culture of:o Increased curiosity about datao Rewarding experimentationo Counting “Assists”
Need ‘democratization’, or open-access, of data”[Patil]
o Or Horizontal Orientation / Governance of Data[crh]
Not trivial - Sharing data exposes risks of:o Misinterpretationo Loss of “credit” associated with results from the data
Expanding Big Data Science: Forward & Backward 30
Education
Expanding Big Data Science: Forward & Backward 31
Education
Establish a new baseline of knowledge to advance
Mason’s Big Data Lecture Series Purpose: o Separate Hype from Realityo Have marquee experts expose what in Big Data:
• Is really working and making a difference? • Shows promise?• Has failed? Needs another try? • Are the impediments?
o Convey daunting challenge Is feasible, but still a challenge
Expanding Big Data Science: Forward & Backward 32
Big Data Adoption [IBM-Analytics]
Expanding Big Data Science: Forward & Backward 33
Learning Revolution [Robinson]
Big Data Science is a REVOLUTION that starts (& continues) w/ LEARNINGo Requires new skillso New leadership models
http://www.ted.com/talks/sir_ken_robinson_bring_on_the_revolution.html
Expanding Big Data Science: Forward & Backward 34
Backward: Refactor / Repurpose Legacy Approaches
Expanding Big Data Science: Forward & Backward 35
What is Legacy?
What “brought us here”o Business Basics (e.g., Planning, ROI)o Structured Systems Analysis (e.g. Waterfall methodology, CMMI)
Yes,o Very Cumbersomeo Have Failed tooBut…o Developed by Very Smart Peopleo For Very Similar Issueso Been “Tested”So…..o Re-invent the Wheel?
To leverage:o Consider Context: Intent & Issueso Re-calibrate / Re-factor For Todayo Come Back to “Common Sense”, What Works
Examples: o Meeting Managemento Scaled Agile
Expanding Big Data Science: Forward & Backward 36
Enterprise Architecture
“Process of translating business vision and strategy into effective enterprise change by creating, communicating and improving the key requirements, principles and models that describe the enterprise's future state and enable its evolution.[Gartner-EA]
Short: Simple Structure & Alignment of Technical & Business Capabilities
So…. Take “Business Back to IT”[crh]
Maintain Line-of-Sight to Value[crh]
Focus on the Mission and Mission Capabilities!
Expanding Big Data Science: Forward & Backward 37
Capability Dependencies Hierarchy
Example: Tool x requires staff time for training & learning
Expanding Big Data Science: Forward & Backward 38
Strategic Planning Survey[Bain]
14-year Compilation of:o 11 Surveyso 8,504 respondents
2006: 88% 3.93
Expanding Big Data Science: Forward & Backward 39
Establish Enterprise-wide Decision Criteria
Convey & Carry Commander’s Intent to Execution Levels
Strategy to Tactics Line-of-Sight[crh]
Expanding Big Data Science: Forward & Backward 40
Engineering “Risky Art” Landscape
• Most impactful, hardest to tame, most ignored
• Least concrete, hardest to sell / prove
• Needs the most “innovation attention”
Expanding Big Data Science: Forward & Backward 4141
Big Data Lecture SeriesFall 2012
Session 4: Solving the Risk EquationBig Data Systems Analysis & Engineering “So-What”
41
Users
A Big Data Systems Analysis & Engineering “Success” Story
Lots of ways to do this.Lots of requirements.Lots of ways to get requirements across lots of different stakeholders
Expanding Big Data Science: Forward & Backward 42
Wrapup
Expanding Big Data Science: Forward & Backward 43
Big Data Science Postulates[crh]
If Big Data Science is not a technology problem, then let’s focus on the PROBLEM: the non-technology side, or the human-side.
We must perfect the blending of disciplines to educate & train on Big Data Science (vs. perfecting specific disciplines)
Doing what you are doing will not get you out of the fix you are in since it got you in the fix in the first place – innovate and improve!
Our Big Data Science, Analytics & Intelligence is an ENVIRONMENT and a SYSTEM, not an APP
Expanding Big Data Science: Forward & Backward 44
Big Data / Data Science Postulates (cont’d)
Expanding Big Data Science: Forward & Backward 45
How did we do?
One last time…
Expanding Big Data Science: Forward & Backward 46
References
Expanding Big Data Science: Forward & Backward 47
References [1000v] URL: http://www.1000ventures.com/design_elements/selfmade/quaity_cost-4components_6x4.png [Aiken] Dr. Peter Aiken, Data Blueprint, 2012-2013 [arcweb] http://www.arcweb.com/events/arc-orlando-forum/pages/analytics-for-industry.aspx [asq] URL: http://asq.org/learn-about-quality/cost-of-quality/overview/read-more.html [Bain] http://www.bain.com/management_tools/management_tools_and_trends_2007.pdf [Barbara’] Dr. Daniel Barbara’, George Mason University, 2012 Big Data Lecture Series [Batni] Carlo Batini, Cinzia Cappiello, Chiara Francalanci, and Andrea Maurino. 2009. Methodologies for data quality assessment
and improvement. ACM Comput. Surv. 41, 3, Article 16 (July 2009), 52 pages. DOI=10.1145/1541880.1541883 http://doi.acm.org/10.1145/1541880.1541883
[Conway] http://www.drewconway.com/zia/?p=2378 [coq] URL: http://costofquality.org/wp-content/uploads/2011/02/Cost-of-Quality.jpg [crh] Dr. C. Randall Howard, PMP, crhPhDConsulting.net [Crosby] http://www.philipcrosby.com/25years/crosby.html [ct-bdtech] http://cloudtimes.org/2013/06/13/big-data-techniques-for-analyzing-large-data-sets-infographic/ [dddm] http://www.clrn.org/elar/dddm.cfm [DTIC] http://www.dtic.mil/doctrine/new_pubs/ [econBD] http://www.economistinsights.com/analysis/evolving-role-data-decision-making, August 12th 2013 [EMC-DS] http://www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf [Forbes] http://www.forbes.com/sites/christopherfrank/2012/03/25/improving-decision-making-in-the-world-of-big-data/ [FSAM/BAH] http://www.fsam.gov/about-federal-segment-architecture-methodology.php [Gartner-EA] http://www.gartner.com/technology/it-glossary/enterprise-architecture.jsp [Gartner-ERDS] "Emerging Role of the Data Scientist and the Art of Data Science", Gartner, 20 March 2012, ID:G00227058,
Douglas Laney, Lisa Kart [Gartner-HC] http://www.gartner.com/newsroom/id/1763814
Expanding Big Data Science: Forward & Backward 48
References [gayatri-patele-bay] http://www.slideshare.net/AsterData/gayatri-patele-bay [Gleichauf] See Bob Gleichauf’s article: http://www.iqt.org/technology-portfolio/on-our-radar/Big_Data_Advanced_Analytics.pdf [IBM-usingBD] ftp://ftp.software.ibm.com/software/tw/Using_Big_Data_for_Smarter_Decision-Making_v.pdf [IBM]
http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2011_Issue2/BigData/index.html?cmp=dw&cpb=dwinf&ct=dwnew&cr=dwnen&ccy=zz&csr=051211
[IBM-Analytics] http://www-935.ibm.com/services/multimedia/Analytics_The_real_world_use_of_big_data_in_Financial_services_Mai_2013.pdf [Infocus] http://infocus.emc.com/robert_abate/the-business-case-for-big-data-part-1/ Infostory] http://infostory.com/2012/03/28/data-information-knowledge-web/ [InnovationGames] http://innovationgames.com/ [IT-Failures]
o [http://it-project-failures.blogspot.como http://it.slashdot.org/submissiono http://www.sfgate.com]
[Lwanga] The Job of the Information/Data Quality Professional (2010) Lwanga, Walenta, Talburt (IAIDQ Publication) [Madnick] Stuart E. Madnick, Richard Y. Wang, Yang W. Lee, and Hongwei Zhu. 2009. Overview and Framework for Data and Information
Quality Research. J. Data and Information Quality 1, 1, Article 2 (June 2009), 22 pages. DOI=10.1145/1515693.1516680 http://doi.acm.org/10.1145/1515693.1516680
[Mason-BDLS] George Mason University Volgenau School of Engineering Big Data Lecture Series, 2011-2012 [MIT] http://lean.mit.edu/downloads/2010-theses/view-category.html [Nixon] [email protected] - 08/29/2011, Mason Big Data Lecture Series 2011 [Nonaka, Hirotaka, Knowledge-Creating Company] Nonaka, Ikujiro, and Hirotaka Takeuchi. The knowledge-creating company: How Japanese
companies create the dynamics of innovation. Oxford University Press, USA, 1995. [O’Reily] https://docs.google.com/present/view?hl=en_US&id=0AXaXKp9bt6OXZGd4YzlnYmRfNThjMmo4dm5yaA from What is data
science? O'Reilly Radar [p36] http://information-retrieval.info/taipale/papers/p36-popp.pdf [Patil] Patil, D.J., Building Data Science Teams, 2011 [RG] http://www.riskglossary.com/link/risk_metric_and_risk_measure.htm [Robinson] http://www.ted.com/talks/sir_ken_robinson_bring_on_the_revolution.html [Sagan] Dr. Philip Sagan, Infiniti, 2012 Big Data Lecture Series [Senge] http://www.infed.org/thinkers/senge.htm [Talburt] Dr. John Talburt, 2012 Big Data Lecture Series [Tandem] http://www.tandemlabs.com/documents/CPSA2008.pdf
Expanding Big Data Science: Forward & Backward 49
Backup Slides
Expanding Big Data Science: Forward & Backward 50
J. C. R. Lickleider's Man-Computer Symbiosis[Aiken]
Humans Generally Better Machines Generally Better• Sense low level stimuli• Detect stimuli in noisy background• Recognize constant patterns in varying situations• Sense unusual and unexpected events• Remember principles and strategies• Retrieve pertinent details without a priori connection• Draw upon experience and adapt decision to situation• Select alternatives if original approach fails• Reason inductively; generalize from observations• Act in unanticipated emergencies and novel situations• Apply principles to solve varied problems• Make subjective evaluations• Develop new solutions• Concentrate on important tasks when overload occurs• Adapt physical response to changes in situation
• Sense stimuli outside human's range• Count or measure physical quantities• Store quantities of coded information accurately• Monitor prespecified events, especially infrequent• Make rapid and consisted responses to input signals• Recall quantities of detailed information accurately• Retrieve pertinent detailed without a priori connection• Process quantitative data in prespecified ways• Perform repetitive preprogrammed actions reliably• Exert great, highly controlled physical force• Perform several activities simultaneously• Maintain operations under heavy operation load• Maintain performance over extended periods of time
Best approaches combines manual and automated reconciliation!