Witness tree text analysis
-
Upload
cole-capital -
Category
Data & Analytics
-
view
153 -
download
0
Transcript of Witness tree text analysis
“Making Mole-hills out of Mountains”
www.witnesstree.com
30-70% of Big Data is Unstructured
• Difficult to mine and analyze
• Ergo, Largely ignored
• Represents a potential gold
mine undiscovered
• NEED:: a seamless, structured representation of unstructured data
Text Analytics
• Software and transformational processes that uncovers business value in unstructured text
• Uses statistical, linguistic, machine learning, data analysis and visualization techniques
• $2Bn market expected to grow @ 25% CAGR
WitnessTree Analytics
API
VISUALIZE
Structured Data
Unstructured Data
Data Information Knowledge
DISCOVER
REDUCE
ORGANIZE
WitnessTree: Text Analytics Discover Boost search accuracy Reduce ambiguity Contextual analysis
Reduce Analyze relevant data Identify & Define themes Content + contextual similarity
Organize Dynamic categories, Named-Entity (people, places, brands, dates), Facets (metadata – real and derived)
WT Semantic Analysis Machine (SAM)
6
Near Duplicate Detector
Thread Analyzer Topic Explorer Search & Facet
API/web service API/web service API/web service API/web service
Client App/service
Semantic Analysis Machine
Named Entity Extractor
API/web service
Unsupervised Doc Clustering
API/web service
Theme Detector
API/web service
Started with 1,000,000
docs
draw associations with no prior knowledge of docs
Clustering
SET-UP Near-Dup De-dup
Reduce redundant docs by 40% to 60%
SET-UP
Smart Search
Categories
Clustering “on the fly”
Refine Search
Found 10,000 docs the Few,
the Relevant
WitnessTree hosted solution for legal eDiscovery How to e-discover 10,000 from 1M?
“Find the Relevant. With intuitive ease."
chains near-dups
removes duplicates
Labeled
cluster tree
600k unique docs
create “categories” of search results
dynamic clustering on categories
concept, example, similarity, paragraph, boolean, proximity , fuzzy
Topic detection
Email threading
Recreates email threads + Id’s Missing & Inclusive emails
Extracts themes from clusters
Backend
SaaS
H
ost
ed
Lice
nse
d
Application Platforms / Development Tools
Presentation Technologies
Operating Systems
Inte
grat
ion
Ser
vice
s
WitnessTree Technology Stack
Topic Explorer
• Discover concepts. • Cross-reference ideas. • Connect the dots. • Build relevant queries. • Get results. INSTANTLY!!!
(Un)supervised Doc Clustering
• Clusters related documents Hierarchical clustering
• Labels each cluster • User-guided,
system-generated Guided flexibility!!!
• Re-construct email threads
• Identify Inclusive emails
• Find Missing/Deleted emails
Email Thread Analyzer
Near-Duplicate Detection
Theme Detection
• Detects recurring themes
• Filters based on relevancy
ranking
• Search Wide, Dig Deep
Named Entity Recognition
Identifies: • People • Places • Companies • Time/Date • Monetary
Crew members on the ISS will open the hatch Monday and unload 2,780 pounds of supplies and experiments, the news release said.
"From the men and women involved in the design, integration and test, to those who launched the Antares (rocket) and operated the Cygnus, our whole team”, said David W. Thompson, president and chief executive officer of Orbital, in a written statement from the company.
It will burn up during re-entry over the Pacific Ocean, officials said.
Orbital has a $1.9 billion contract with NASA to make eight flights to the space station under the space agency's commercial supply program.
Our Differentiators
• Structured and unstructured (text) data
• API or web application Analytics Framework
• Minimal training required.
• Web browser + internet connection Easy to Use
• Hosted model, SaaS, Licensed in-house Flexibility
• Document classification, visualization, categorization, API Versatility
• State-of-the-art feature set, in place Rich Feature-set
• OEM, white-label, reseller Partnership Models