Crowdsourcing and Natural Language Processing for Humanitarian Response

37
Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University Crowdsourcing and Natural Language Processing for Humanitarian Response Robert Munro Tulane Disaster Resilience Leadership Academy 2013

Transcript of Crowdsourcing and Natural Language Processing for Humanitarian Response

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Crowdsourcing and Natural Language Processing for Humanitarian Response

Robert MunroTulane Disaster Resilience Leadership Academy

2013

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Crowdsourcing and Natural Language Processing:

• Mission 4636, Haiti

• Pakreport, Pakistain

• MapMill, Sandy

• Libya Crisis Map

• EpidemicIQ

Outline

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Information is increasing

• Scale (well-known)

• Diversity (less understood)

– On a given day, what is the average number of languages that someone could potentially hear?

– How has this changed?

Processing information at scale

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

99% of languages don’t have machine-translation or similar services:

• Disproportionately lower healthcare & education

• Disproportionately greater exposure to disasters

Crowdsourcing can bridge part of the gap.

Diversity

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

GRAPH OF DEPLOYMENTS

Crowdsourcing

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Crowdsourcedprocessing of information in Haitian Kreyol.

1000s of Haitians in Haiti and among the diaspora.

Haiti – Mission 4636

Apo

Dalila

Haiti

(18.4957, -72.3185)

“I need Thomassin Apo please”

“Kenscoff Route: Lat: 18.4957, Long:-72.3185”

“This Area after Petion-Ville and Pelerin 5 is not on Google Map. We have no streets name”

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Lopital Sacre-Coeur ki nan vil Okap, pre pou li resevwamoun malad e lap mande pou mounki malad yo ale la.

“Sacre-Coeur Hospital which located in this village of Okap is ready to receive those who are injured. Therefore, we are asking those who are sick to report to that hospital.”

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Lopital Sacre-Coeur ki nan vil Okap, pre pou li resevwamoun malad e lap mande pou mounki malad yo ale la.

“Sacre-Coeur Hospital which located in this village of Okap is ready to receive those who are injured. Therefore, we are asking those who are sick to report to that hospital.”

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Lopital Sacre-Coeur ki nan vil Okap, pre pou li resevwamoun malad e lap mande pou mounki malad yo ale la.

“Sacre-Coeur Hospital which located in this village of Okap is ready to receive those who are injured. Therefore, we are asking those who are sick to report to that hospital.”

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Evaluating local knowledge

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Haitians (volunteers and paid), and those working with them.

“Non-Haitians,Ushahidi@Tufts

3,000 messages< 5 minutes each

> 4 hours each

45,000 messages

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Lopital Sacre-Coeur ki nan vilOkap, pre pou li resevwa mounmalad e lapmande pou mounki malad yo alela.

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

The public facing mapThe map was not time-critical.This email chain was about successful responses to 4636 already sent through private channels.

This was the majority decision.Ushahidi@Tufts might not have realized that Mission 4636 was 20x larger, but they did know we had the majority of Haitian volunteers.

Mission 4636 stayed with the decision. Ushahidi@Tufts rejected the decision, citing permission from lawyers.(exact response from Ushahidi@Tufts is omitted as we don’t have permission to reprint their communications)

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

• In 2012 we found that no legal support existed. Just two (non-expert) emails:

• “If people are texting you, with the intent of getting aid or reaching out to someone, then consent would be implied. They know what you're doing, right?”

• “I am not sure who would have the expertise on this, but it seems quite clear to me that if you are able to obtain their numbers and they are sending you this information, consent is implied.”

• The public map was not known in Haiti (Clemenzo, 2011), so we assume the lawyers agree with Mission 4636.(NB: these emails have been reprinted elsewhere with various edits)

Haiti – Mission 4636

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

• Broadly, so we do not repeat the mistake.

• More narrowly, while the response is over in Haiti, people’s identities remain at risk, as Ushahidi@Tufts members are still putting 4636 messages online. Last month, the full name and phone number of a school girl. They refused to remove it, supported by post-hoc cover-ups:– “I did not receive a single email from anyone involved in Mission 4636

demanding that the SMS’s not be made public.” (Patrick Meier; compare to prior slide)

– Consent was given from ‘experts’, note again: “I am not sure who would have the expertise on this, but it seems quite clear to me that if you are able to obtain their numbers and they are sending you this information, consent is implied.” (part in red ommited from public statements by Meier)

Why it still matters

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Haiti – Mission 4636

Lessons learned

• Default to private data practicesMajority decision was not to use a public map

Duplicated data on the map led to false responses

• Find volunteers through strong social ties 10x larger/faster than the publicized efforts

• Avoid anyone with their own agenda coming in (‘bloggers’, ‘crisis-mappers’ …)

• Localize to the crisis-affected community25% of work was by paid workers in Haiti

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Haiti – Mission 4636

Paid workers in Mirebalais, Haiti (FATEM)

Benchmarks we can use:*

$ 0.25 per translation

$ 0.20 per geolocation

$ 0.05 per categorization / filtering

4:00 minutes per report processed

Can volunteerism undercut this cost?

* Munro. 2012. Crowdsourcing and crisis-affected community: lessons learned and looking forward from Mission 4636. Journal of Information Retrieval

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Data-structuring for 2010 floods in Pakistan

Pakreport

*Chohan, Hester and Munro. 2012. Pakreport: Crowdsourcing for Multipurpose and MulticategoryClimate-related Disaster Reporting. Climate Change, Innovation & ICTs Project. CDI

Multiple inexperienced people are more accurate than one experienced person.*

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Pakreport

Lessons learned

• Default to private data practices (!)

Taliban threatened to attack mapped aid workers

• Cross-validate tasks across multiple workers

We used CrowdFlower, as with Mission 4636

• Localize to the crisis-affected community

Data obtained by hand / created jobs

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Sandy Mapmill

Volunteer damage assessment of Civil Air Patrol photographs

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Sandy Mapmill

Used to populate a damage map for FEMA.

30,000+ images.

(run by Idibon’s CTO,

Schuyler Erle, on his

first 2 days on the job!)

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

• f

Volunteers v professionals

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Inter-annotator agreement can smooth errors

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

MapMill Sandy

Lessons learned

• We were able to successfully get enough volunteers at scale.

• Volunteers were substantially less accurate than professionals, but multiple judgments help mitigate via inter-annotator agreement.

• Despite the scale, it still would have only cost ~$3000 to pay crowdsourcing professionals

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Libya Crisis Map

A negative example

• 2283 reports already-open, English sources

• 1 month of full-time management and contributions from >100 volunteers

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Libya Crisis Map

Equivalent cost from paid workers

• $575.75

- or about $800 with multiple steps

Equivalent time cost from Libyan nationals:

• 152.2 hours = less than 1 month for 1 person

- would also address some security concerns

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Libya Crisis Map

Lessons learned

• Crowdsourced volunteers were not required

- cost more to run than was saved by not paying

- a single in-house Libyan could have achieved more

• Default to private data practices

- assume all identities of volunteers were exposed

- many Libyans expressed concern about the public map and decided not to take part

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

• Imagine if there was too much information, even for large volumes of crowdsourcedworkers?

Natural Language Processing

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Scaling beyond purely manual processing.

Disease outbreaks are the world’s single greatest killer.

No organization is tracking them all.

Epidemics

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Diseases eradicated in the last 75 years:

Increase in air travel in the last 75 years:

sma l l p o x

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

90% of ecological diversity

90% of linguistic diversity

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Reported locally before identification

H1N1 (Swine Flu)

months

(10% of world infected)

HIV

decades

(35 million infected)

H1N5 (Bird Flu) weeks

(>50% fatal)

Simply finding these early reports can help prevent epidemics.

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

epidemicIQ

M a c h i n e -

l e a r n i n g

( m i l l i o n s )

R e p o r t s

( m i l l i o n s )

M i c r o t a s k e r s

( t h o u s a n d s )

A n a l y s t s – d o m a i n

e x p e r t s

( c a p p e d n u m b e r )

в предстоящий осенне-

зимний период в Украине

ожидаются две эпидемии

гриппа

نفلونزمزيد من في مصرا الطيور ا

香港现1例H5N1禽流感病例曾游上海南京等地

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

E Coli in Germany

The AI head-start

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

epidemicIQ

Lessons learned

• Current data privacy practices are insufficient

- reports from areas where victims are vilified

• Crowdsourcing can provide needed skill-sets

- 100s of German speakers at short notice

• Natural language processing can scale beyond human processing capacity

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Crowdsourcing and risk

People’s real-time locations are their most sensitive personal information.

Crowdsourcing distributes information to a large number of individuals for processing.

For information about at-risk individuals:

• Is it right to crowdsource the processing?

• Is it right to use a public-facing map?

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Natural Language Processing and risk

We can trust machines with some information that we don’t want people to see.

Machine-learning is never 100% accurate:

• What is the risk of having less time for a human to evaluate each data point?

• Does the pattern of the machine-learning errors contain inherent biases?

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Recommendations

• Engage people with local knowledge

• Employ people with local knowledge

• Statistically cross-validate on-the-fly

• Default to private data practices

• Scale via natural language processing

• Use professional technology solutions

Conclusions

Crowdsourcing and Natural Language Processing for Humanitarian Response | Robert Munro Disaster Resilience Leadership Academy | Tulane University

Thank you

Robert Munro

@WWRob