Extractiv

E X T R A C T I V

The long-awaited beta!

E X T R A C T I V

What it is: Semantics services that transforms unstructured web

content into structured semantic data.

What it does:

• Crawls millions of pages

• Applies NLP tools to perform entity extraction

• Produces marked-up files for you to use

Who it’s for:

• Need high-volume text extraction

• Need more types of entities

• Want to go above OpenCalais limits

• Don’t want to pay a ton

E X T R A C T I V

Demo time

How it works

Pass Job to Crawling Service

Pass Work Unit to Grid Server

Pass Version to Nodes

Create Extractiv Job

Node Completes Work Units

• Checks duplicate links against link graph

• Packages links and app into a work unit

• Packages work units into version

• Identifies available nodes

• Sends version to nodes

• Downloads content of link

• Runs app, returns result

• Sends results back

• Automatically builds extraction app

• Packages entities into data model blob

Some fun metrics

Max theoretical processing speed (per user):

5 million documents per hour

Available node pool:

50,000+ completely heterogeneous PCs

Back-end architecture:

12 grid service servers

8 crawling service servers

Number of available entities:

239 now, 1000+ in the next few months

Minimum time to create new entity:

2-3 hours

Coming soon…

New features:

RDF

Facts

Relations

Entity linking

Triples

Pricing plans:

Monthly access + per-document pricing

Higher document limits

Advanced features

API:

Create jobs and retrieve results

Integrate directly with your applications

If you want to try it out

Go to http://www.extractiv.com

Follow us @extractiv

Extractiv

Documents

Transcript of Extractiv