Witness tree text analysis

15
“Making Mole-hills out of Mountains” www.witnesstree.com

Transcript of Witness tree text analysis

Page 1: Witness tree   text analysis

“Making Mole-hills out of Mountains”

www.witnesstree.com

Page 2: Witness tree   text analysis

30-70% of Big Data is Unstructured

• Difficult to mine and analyze

• Ergo, Largely ignored

• Represents a potential gold

mine undiscovered

• NEED:: a seamless, structured representation of unstructured data

Page 3: Witness tree   text analysis

Text Analytics

• Software and transformational processes that uncovers business value in unstructured text

• Uses statistical, linguistic, machine learning, data analysis and visualization techniques

• $2Bn market expected to grow @ 25% CAGR

Page 4: Witness tree   text analysis

WitnessTree Analytics

API

VISUALIZE

Structured Data

Unstructured Data

Data Information Knowledge

DISCOVER

REDUCE

ORGANIZE

Page 5: Witness tree   text analysis

WitnessTree: Text Analytics Discover Boost search accuracy Reduce ambiguity Contextual analysis

Reduce Analyze relevant data Identify & Define themes Content + contextual similarity

Organize Dynamic categories, Named-Entity (people, places, brands, dates), Facets (metadata – real and derived)

Page 6: Witness tree   text analysis

WT Semantic Analysis Machine (SAM)

6

Near Duplicate Detector

Thread Analyzer Topic Explorer Search & Facet

API/web service API/web service API/web service API/web service

Client App/service

Semantic Analysis Machine

Named Entity Extractor

API/web service

Unsupervised Doc Clustering

API/web service

Theme Detector

API/web service

Page 7: Witness tree   text analysis

Started with 1,000,000

docs

draw associations with no prior knowledge of docs

Clustering

SET-UP Near-Dup De-dup

Reduce redundant docs by 40% to 60%

SET-UP

Smart Search

Categories

Clustering “on the fly”

Refine Search

Found 10,000 docs the Few,

the Relevant

WitnessTree hosted solution for legal eDiscovery How to e-discover 10,000 from 1M?

“Find the Relevant. With intuitive ease."

chains near-dups

removes duplicates

Labeled

cluster tree

600k unique docs

create “categories” of search results

dynamic clustering on categories

concept, example, similarity, paragraph, boolean, proximity , fuzzy

Topic detection

Email threading

Recreates email threads + Id’s Missing & Inclusive emails

Extracts themes from clusters

Page 8: Witness tree   text analysis

Backend

SaaS

H

ost

ed

Lice

nse

d

Application Platforms / Development Tools

Presentation Technologies

Operating Systems

Inte

grat

ion

Ser

vice

s

WitnessTree Technology Stack

Page 9: Witness tree   text analysis

Topic Explorer

• Discover concepts. • Cross-reference ideas. • Connect the dots. • Build relevant queries. • Get results. INSTANTLY!!!

Page 10: Witness tree   text analysis

(Un)supervised Doc Clustering

• Clusters related documents Hierarchical clustering

• Labels each cluster • User-guided,

system-generated Guided flexibility!!!

Page 11: Witness tree   text analysis

• Re-construct email threads

• Identify Inclusive emails

• Find Missing/Deleted emails

Email Thread Analyzer

Page 12: Witness tree   text analysis

Near-Duplicate Detection

Page 13: Witness tree   text analysis

Theme Detection

• Detects recurring themes

• Filters based on relevancy

ranking

• Search Wide, Dig Deep

Page 14: Witness tree   text analysis

Named Entity Recognition

Identifies: • People • Places • Companies • Time/Date • Monetary

Crew members on the ISS will open the hatch Monday and unload 2,780 pounds of supplies and experiments, the news release said.

"From the men and women involved in the design, integration and test, to those who launched the Antares (rocket) and operated the Cygnus, our whole team”, said David W. Thompson, president and chief executive officer of Orbital, in a written statement from the company.

It will burn up during re-entry over the Pacific Ocean, officials said.

Orbital has a $1.9 billion contract with NASA to make eight flights to the space station under the space agency's commercial supply program.

Page 15: Witness tree   text analysis

Our Differentiators

• Structured and unstructured (text) data

• API or web application Analytics Framework

• Minimal training required.

• Web browser + internet connection Easy to Use

• Hosted model, SaaS, Licensed in-house Flexibility

• Document classification, visualization, categorization, API Versatility

• State-of-the-art feature set, in place Rich Feature-set

• OEM, white-label, reseller Partnership Models