IPTC EXTRA - Open Source Rules Classification

9
“Extra” by Jeremy Brooks https :// flic.kr/p/4aKH3c

Transcript of IPTC EXTRA - Open Source Rules Classification

Page 1: IPTC EXTRA - Open Source Rules Classification

“Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c

Page 2: IPTC EXTRA - Open Source Rules Classification

EXTRA

Stuart Myles * Associated Press * 14th June 2016© 2016 IPTC (www.iptc.org) All rights reserved

https://flic.kr/p/tgYcsA

Page 3: IPTC EXTRA - Open Source Rules Classification

EXTRAEXTraction Rules Apparatus

• Rules-based classification of text• Open source software

• EXTRA is being developed by the IPTC• Grant from the Digital News Initiative

https://iptc.github.io/extra/

© 2016 IPTC (www.iptc.org) All rights reserved 3

Page 4: IPTC EXTRA - Open Source Rules Classification

Google DNI• Google’s €150 million Digital News Initiative fund

– Stimulate innovation among European news organizations– https://www.digitalnewsinitiative.com/fund /

• Multiple funding rounds– First funding of €27 million to projects in 23 countries– http://

googlepolicyeurope.blogspot.gr/2016/02/digital-news-initiative-first-funding_24.html

• IPTC’s EXTRA project funded in first round - October 2015– Developer €35,000– Linguist €10,000– Project Manager €5,000– Total grant to IPTC from DNI = €50,000

© 2016 IPTC (www.iptc.org) All rights reserved

Page 5: IPTC EXTRA - Open Source Rules Classification

EXTRAEXTraction Rules Apparatus

• Open source– IPTC always uses open licenses

• Rules-based– Better for breaking news than statistical methods– More consistent and scalable than hand tagging– Easier to explain why rules classify content

• Multilingual– Developing rules for two IPTC Media Topics Languages

• News classification– Rules will be developed using news content corpora

© 2016 IPTC (www.iptc.org) All rights reserved 5

Page 6: IPTC EXTRA - Open Source Rules Classification

EXTRA Progress

Technical use caseshttps://

docs.google.com/document/d/1O8pmFlohcGXThzyrWil_OFbDyqJk1Hcjpml_RRXuw6U/edit?usp=sharing

Rules language requirementshttps://

docs.google.com/document/d/1MMv5qlrLF71bBN1w1ErXaSyTKB2Kd1ksgixBnUOw0fQ/edit?usp=sharing

Delivered roadmap to DNI

Securing news corpora in two Media Topics languages– English from Thomson Reuters– German from APA– French from AFP

© 2016 IPTC (www.iptc.org) All rights reserved 6

Page 7: IPTC EXTRA - Open Source Rules Classification

Communications plan– Working on EXTRA – but who might not make every meeting– IPTC membership who are interested in EXTRA– Beyond IPTC who are interested / might want to work on EXTRA

– Teleconferences https://iptc.org/events/– Email https://groups.yahoo.com/neo/groups/iptc-extra/info– Documentation

• http://dev.iptc.org/Topic-EXTRA• https://iptc.github.io/extra/

• Do we need– Team communications - Slack?– Outreach - Twitter? Blog? Medium? LinkedIn?

© 2016 IPTC (www.iptc.org) All rights reserved 7

Page 8: IPTC EXTRA - Open Source Rules Classification

EXTRA Wednesday Workshop

• Review technical use cases• Review rules language requirements• Select licenses

– Source code– Corpora

• Decide on communications plan• A plan for a plan

– Technical foundations– Select consultants

© 2016 IPTC (www.iptc.org) All rights reserved 8

Page 9: IPTC EXTRA - Open Source Rules Classification

Date and Place of Next MeetingBerlin, Germany 24 – 26 October 2016

https://flic.kr/p/dzWJBTack och adjö!

© 2016 IPTC (www.iptc.org) All rights reserved 9