Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
-
Upload
lucidworks-archived -
Category
Technology
-
view
236 -
download
2
Transcript of Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges
in the DoD and IC
Wes Caldwell Chief Architect
Intelligent Software Solutions
Topics
• Introduction to ISS
• The growth of data
• Our customer’s data environment
• The need for effective big-data management
• Search as the cornerstone of a big-data strategy
About ISS
• Headquartered in Colorado Springs • Other offices located in Washington DC, Hampton VA,
Tampa FL, and Rome NY
• Innovative Solutions from “Space to Mud and Everything Between” • Sole prime on multiple Air Force Research Labs
programs IDIQ • Currently Executing More Than 100 Software
Development Projects • Over 800 employees • Strength in Solutions Development and
Deployment
• Consistently Recognized as a Leader • Recognized as a Deloitte Fast 50 Colorado
company and a Deloitte Fast 500 company over eight consecutive years
• Three-time Inc. Magazine 500 winner • 2009 Defense Company of the Year
ISS Solution Space/Value Proposition
• Reusable and license-free to US Federal Government (GOTS)
• Committed to providing best ROI to our customers by integrating leading open-source solutions into our products and services
• Scalable from a single desktop solution to large distributed networks with thousands of users
• Customizable to each organization’s unique analytical and information technology infrastructure
• Operationally proven, secure and accredited for all major classified networks
ISS Business Strategy
Government
Off The Shelf
(GOTS)
Commercial
Off The Shelf
(COTS)
Subject
Matter Experts
(SMEs)
• Low Barrier to Entry: No license fees to US Government Agencies
• Fast: Proven baseline provides immediate capability
• Turnkey: Highly customizable solutions can be implemented quickly with no development
• Solutions Oriented: Subject Matter Experts support implementation in each domain
• Low Cost: Cost of Adding Features is shared across large customer base; all customers benefit
Blending the best elements of each industry model to provide low risk, nonproprietary, high payoff solutions—fast! 6
The growth of data
• Most electronic information is not relational,
but unstructured (textual, binary) or semi-
structured (spreadsheet, RSS feed, etc.)
– In 2007, the estimated information content of all
human knowledge was 295 exabytes(295 million
terabytes)
– Data production will be 44 times greater in 2020
than in 2009
• Approx 35 zetabytes total (35 billion terabytes)
• A majority of the data produced in the future will
be unstructured
– A tremendous amount of information and
knowledge is dormant within unstructured data
Our customer’s data environment
• Literally thousands of data sources/feeds
from a variety of strategic, national, and
tactical sources
– Media (documents, images, etc.)
– Human interactions
– Geospatial
– Open Source (News feeds, RSS)
– Imagery/Video
– Many more…
How our analysts feel
The need for effective “big-data” management
• Analysts are looking to extract knowledge from the massive heterogeneous
data sets, providing “actionable intelligence”
• Tactical environments absolutely demand effective management of data
– Time to live on the relevance of data collected can be very short
– Communications pipes aren’t as optimal as large CONUS-based data
centers, so reduction of data based on tactical conditions (i.e. AOR,
Problem Domain, etc.) is critical
• Search and Analytics are key enablers to allow an analyst to reliably search
through large amounts of information, and to focus their efforts around a
subset of that information to perform deeper analysis
Search IS the cornerstone of an effective big-data strategy
Structured Content
Semi-Structured Content
Un-Structured Content
Content Cache (Haystacks)
Content Acquisition
Tenets • Connector architecture • Data normalization • Data staging • Data Compartmenting
(Multiple Haystacks)
Tenets • Optimized Index of Content
for Search and Discovery of Big Data
• Analyst Topics that “Shrink the Haystack” Search Features (Facets, Auto-Complete, Tagging, Comments, etc.)
• Semantic (Synonym) Search based on pluggable taxonomies
Search/Discovery
Content Index
NLP Pipeline
Semantic Enrichment
Categorization
Named Entity
Recognition
Clustering
Gazetteers
Tenets • “Domain Spaces” that
support pluggable entity recognition and categorization
• Continuous feedback loop that improves the system over time with analyst input
• Lexicon-based analytics that allows for targeted categorization across corpus of data
Tenets • Data Reduction into
focused “Data Perspectives”
• Data perspectives stored in optimized formats (e.g. Graph, Time Series, Geo, etc.) for the questions being asked
• Leveraging industry-standard parallel processing frameworks for scalable analytics
Data Perspectives
Data
How can Search help you?
Have a great conference!!!