Physics, Health, and Human Trafficking?i.stanford.edu/~ullman/cs341slides/deepdive.pdf · For DARPA...

Physics, Health, and Human Trafficking?

CS341 datasets:

NOT exhaustive

chrismre@cs.stanford.edu

Extraction

The DeepDive Team

http://deepdive.stanford.edu/

Dark Data System: ETL on Steroids

Quality that can exceed paid human annotators and volunteers

Human Trafficking on the (Dark) Web…

In Plain sight: Web ads for such services

In Use by Law Enforcement

For DARPA MEMEX, we were operational in 6 months

Processed >35M documents (~26M records)

Tens of columns (location, phone #, price, etc)

With compute times of less than a day

>90% Precision for most relations

New York DA use MEMEX Data for all trafficking investigations this year. Real Arrests

Project: Analyze world’s biggest (oldest) trade.

For Official Use Only – Not for Public Release

Extracting P-Values from Scientific Literature

• P-value used to show the statistical significance of the study or findings

• Under no hacking p-values show be uniformly distributed!

• Using DeepDive we analyzed PubMed articles and extracted p-value mentions

• Processed 500,000 articles and extracted 94,434 p-value mentions

• Automatic generation of p-curves for 2,000 Journals

- Extraction count - Observed distribution - Uniform distribution

Do it for real!

TAIR: A Plant Genome Database

• Model plant: lessons learned in this organism can be translated to plants that are important to humans: food, medicine, biofuel, housing, etc.

• genome completely sequenced in 2000, 30K+ genes, 10K labs all over the world

• 50-70K academic papers describing basic and applied research. Need many more!

• www.arabidopsis.org

Class project pitches

Learning representations from time series data

• Extracting features from time series

is a painstaking process

relies on domain knowledge

often requires multiple iterations

• Project aim: automate the extraction process

• Data: accelerometer step counts,

motion tracking wave forms, vital sign data

Learning representations from time series data

Contacts: Ina Fiterau (mfiterau@cs.stanford.edu), Jason Fries (jfries@stanford.edu)

References:

[1] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.

[2] Zhihua Chen, Wangmeng Zuo, Qinghua Hu, Liang Lin, Kernel sparse representation for time series classification, Information Sciences, Volume 292, 20 January 2015

Time series representations

Recurrent neural networks Kernel representation

Extreme learning machines

Detecting Running Injury Patterns

Running Injury Clinic

Extracted features (structured)

obtained by applying domain expertise

Stereo recordings (video)

Motion tracking (time series)

Medical chart (structured) NOT UNIFORM

(collection depends on study)

Detecting Running Injury Patterns

• 1645 sessions (clinic visits), 1424 patients, 18 studies

• Studies handle patients with different conditions

• Aim: predict injury type/severity based on running pattern

• Topics: missing data imputation, domain adaptation

• Collaborators: Jennifer Hicks, Kevin Thomas (MD PhD student)

• Data agreement signature required

Session Data - subject height, gender, study - running frequency, distance - skeletal structure measurements - muscle strength, flexibility - injury description

Kinematics session (as many as 10 per subject) - at least one per session - different test conditions - 3 types of markersets - features extracted from motion tracking

(foot, ankle, knee, hip, pelvis)

Contact: Ina Fiterau (mfiterau@cs.stanford.edu)

SLAC. Imaging and Physics

Email chrismre@cs.stanford.edu to get access to physics projects.

Physics, Health, and Human Trafficking?i.stanford.edu/~ullman/cs341slides/deepdive.pdf · For DARPA...

Documents

Transcript of Physics, Health, and Human Trafficking?i.stanford.edu/~ullman/cs341slides/deepdive.pdf · For DARPA...

Memex Project--DARPA (2014) DARPA-BAA-14-21

BPS Memex Intelligence System - Aims and ObjectivesTitle: BPS Memex Intelligence System - Aims and Objectives Author: benzihug Created Date: 10/21/2014 3:38:51 PM

Bot2.0: The Memex and Social Learning

Ean-26M Digital Inclinometer

memex-merlin-continuous-improvement-roadmap (pdf)

Memex-Merlin FOEE S102134 view...MEMEX, with its deep commitment towards machine connectivity, offers solutions that are focused on finding hidden capacity by measuring and managing

MyLifeBits: Attempting to realize the Memex Vision

Memex Memory Upgrade Manual for Fanuc 16 & 18

Opp ortunities for Online P - i.stanford.edu

MEMEX YASNAC i80 MEMORY UPGRADE INSTALLATION MANUAL · MEMEX YASNAC i80 MEMORY UPGRADE INSTALLATION MANUAL Memex Automation Inc. 200 – 3425 Harvester Rd., Burlington, Ontario, Canada

Mx1000 BTR Installation & User’s Manual - MEMEX Inc.€¦ · Mx1000 BTR Installation & User’s Manual For Fanuc Tape Readers © 1994-2007 Memex Automation Inc. 777 Walkers Line,

Memex - PyData Seattle

“Integrating Sustainable Urban Logistics Plans (SULP) … · 2016-04-25 · MemEx Srl IEE ENCLOSE Project Coordinator MemEx general manager US-TRB Urban Freight Transportation Committee

Celebrate - King School › uploaded › King_School › ... · 2016-2017 Operating Revenue $26M 2016-2017 Operating Expenses $26M Net Tuition & Fees 91% Annual Fund 8% Endowment

Department of Computer Science - i.stanford.edu

From Memex to Google in 120 minutes

Kreativ Downloaddata M 26M RXB en,TemplateId=Render

Memex Merlin Presentation

Municipals 26M - Candidatures municipals de la zona de la ...

Friday, December 2, 2005 Human Information Processing Personal Memex Virginia Tech Plans with SenseCam: Early Ideas Microsoft Memex Workshop, July 2006.