Extractiv

7
E X T R A C T I V The long-awaited beta!

description

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

Transcript of Extractiv

Page 1: Extractiv

E X T R A C T I V

The long-awaited beta!

Page 2: Extractiv

E X T R A C T I V

What it is: Semantics services that transforms unstructured web

content into structured semantic data.

What it does:

• Crawls millions of pages

• Applies NLP tools to perform entity extraction

• Produces marked-up files for you to use

Who it’s for:

• Need high-volume text extraction

• Need more types of entities

• Want to go above OpenCalais limits

• Don’t want to pay a ton

Page 3: Extractiv

E X T R A C T I V

Demo time

Page 4: Extractiv

How it works

Pass Job to Crawling Service

Pass Work Unit to Grid Server

Pass Version to Nodes

Create Extractiv Job

Node Completes Work Units

• Checks duplicate links against link graph

• Packages links and app into a work unit

• Packages work units into version

• Identifies available nodes

• Sends version to nodes

• Downloads content of link

• Runs app, returns result

• Sends results back

• Automatically builds extraction app

• Packages entities into data model blob

Page 5: Extractiv

Some fun metrics

Max theoretical processing speed (per user):

5 million documents per hour

Available node pool:

50,000+ completely heterogeneous PCs

Back-end architecture:

12 grid service servers

8 crawling service servers

Number of available entities:

239 now, 1000+ in the next few months

Minimum time to create new entity:

2-3 hours

Page 6: Extractiv

Coming soon…

New features:

RDF

Facts

Relations

Entity linking

Triples

Pricing plans:

Monthly access + per-document pricing

Higher document limits

Advanced features

API:

Create jobs and retrieve results

Integrate directly with your applications

Page 7: Extractiv

If you want to try it out

Go to http://www.extractiv.com

Follow us @extractiv