Extractiv
-
Upload
steve-watt -
Category
Documents
-
view
967 -
download
1
description
Transcript of Extractiv
E X T R A C T I V
The long-awaited beta!
E X T R A C T I V
What it is: Semantics services that transforms unstructured web
content into structured semantic data.
What it does:
• Crawls millions of pages
• Applies NLP tools to perform entity extraction
• Produces marked-up files for you to use
Who it’s for:
• Need high-volume text extraction
• Need more types of entities
• Want to go above OpenCalais limits
• Don’t want to pay a ton
E X T R A C T I V
Demo time
How it works
Pass Job to Crawling Service
Pass Work Unit to Grid Server
Pass Version to Nodes
Create Extractiv Job
Node Completes Work Units
• Checks duplicate links against link graph
• Packages links and app into a work unit
• Packages work units into version
• Identifies available nodes
• Sends version to nodes
• Downloads content of link
• Runs app, returns result
• Sends results back
• Automatically builds extraction app
• Packages entities into data model blob
Some fun metrics
Max theoretical processing speed (per user):
5 million documents per hour
Available node pool:
50,000+ completely heterogeneous PCs
Back-end architecture:
12 grid service servers
8 crawling service servers
Number of available entities:
239 now, 1000+ in the next few months
Minimum time to create new entity:
2-3 hours
Coming soon…
New features:
RDF
Facts
Relations
Entity linking
Triples
Pricing plans:
Monthly access + per-document pricing
Higher document limits
Advanced features
API:
Create jobs and retrieve results
Integrate directly with your applications
If you want to try it out
Go to http://www.extractiv.com
Follow us @extractiv