Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

24
http://www.plantnet-project.org/ Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet 1 Alexis Joly , Hervé Goëau, Julien Champ, Samuel Dufour-Kowalski, Henning Müller, Pierre Bonnet Acknowledgement: Nozha Boujemaa, Daniel Barthelemy, Jean-François Molino

Transcript of Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Page 1: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

http://www.plantnet-project.org/

Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

1

Alexis Joly, Hervé Goëau, Julien Champ, Samuel Dufour-Kowalski, Henning Müller, Pierre Bonnet

Acknowledgement: Nozha Boujemaa, Daniel Barthelemy, Jean-François Molino

Page 2: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

2

• Global warming, food crisis and biodiversity erosion• Accurate knowledge of living species distribution and

evolution is essential• Ultimate goal: sustainable and global biodiversity

monitoring tools– Surveillance of global warming consequences, plant & animal diseases,

human activities impact, invasive species propagation• The Taxonomic impediment

– Less and less people can identify plants and animals– Less and less nature observers can produce biodiversity data

Context

Page 3: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Pl@ntNet project (launched 2010)

Bridging the taxonomic impediment thanks to an innovative

crowdsourcing workflow based on automated plant identification

Page 4: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

The positive feedback loop does work !

+

++

Pl@ntNet project (launched 2010)

Page 5: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Pl@ntNet app today2,5 M downloads

14 M sessions10-50 K users / day

150 Countries

5LanguagesFR, EN, ES, IT, PT,DE, AR, ZH, SK

Page 6: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Pl@ntNet dataValidated data = 3% of the queried plant images

- 30K collaboratively revised observations per year (TelaBotanica)- Publicly available through international initiatives (GBIF, LifeCLEF)- Validation is a slow and hard process

Page 7: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Pl@ntNet data

Unlabeled data = 97% of the raw query stream- > 1 Million of observations per year (5.1M today)- Not exploited today- A high potential for biodiversity monitoring

Page 8: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Pl@ntNet mobile search logs

Page 9: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Species Distribution Modelling from UGC image streams ?

Can we predict (real-time and/or long-term) Species Distribution Models directly from Pl@ntNet mobile search logs ?

Or from any other UGC image stream ?

9

Page 10: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Challenges1. Improve recognition in open-world streams

10

Page 11: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Recognizing plants in an open world

11

An open-set recognition problem- With 10K’s of known and unknown classes- Highly imbalanced training data

We carried out an evaluation within LifeCLEF 2016- Training set of 1000 known species (113K pictures)- Test set = 8K manually annotated Pl@ntNet queries (half

known, half distractors)- Classification Mean Average Precision on a subset of 26

invasive species

??

? ? ?

? ?

Page 12: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

1. Improve automatic recognition of plants in open-world streams- Novelty affects all systems, whatever the used rejection method (even supervised)- No rejection method can deal with strong novelty rates

→ we are still far from being able to monitor biodiversity in Twitter or Snapchat streams !

12

Recognizing plants in an open world

Page 13: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Challenges1. Improve recognition in open-world streams

2. Use geo-location and date

13

Page 14: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Geo-location and date ?- Not so easy !

- No real success within 5 years of PlantCLEF challenge- Why ?

- Plant distributions are not well known (this is actually our objective !)- Habitats are extremely heterogeneous from a species to another one (some

plants live everywhere while others live in very specific biotopes)- What can we do ?

- Big occurrence data (like GBIF) might help but is biased, heterogeneous and incomplete (no absence data)

- Environmental variables might help but heterogeneous, incomplete, noisy, etc.→ This will be one of the focus of LifeCLEF 2017

Page 15: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Challenges1. Improve recognition in open-world streams

2. Use geo-location and date

3. Use taxonomy

15

Page 16: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Using taxonomy ?Taxonomy = a hierarchical classification built by botanists for hundreds of years

→ 600 families > 14K genus > 300K species

But, taxonomy is highly heterogeneous and imbalanced

→ Classical hierarchical classification algorithms can be not be directly used

- Some genus with up to 1000 very similar species- But many genus and families include very distinct species- The long tail distribution occurs at each level and in each

node

Genus Orobanche

Genus Bupleurum

Family Bupleurum

Page 17: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Challenges1. Improve recognition in open-world streams

2. Use geo-location and date

3. Use taxonomy

4. Optimize and boost training data production

17

Page 18: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Pro-active crowdsourcing

Classifier (CNN)Annotators (heterogeneous skills)

Tasks selection & assignment

?

?

?

Page 19: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Training Training

2. Create quizzes by Monte-carlo sampling

Beginner

Intermediate

1. ConvNet predictions

3. Sort quizzes by difficulty (= success expectation across all workers)

Page 20: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Identification success rate

Experiments: Simpson’s paradox

20

Declared expertise

Workers are assigned tasks they have been trained on before !

Page 21: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Challenges1. Improve recognition in open-world streams

2. Use geo-location and date

3. Use taxonomy

4. Optimize and boost data validation processes

5. Control bias in Species Distribution Models

21

Page 22: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

22

Objectif: Estimate the relative abundance Aij of species i in place j supposing

Nij ~ Law( Aij , Bij ) Nij: Number of observations of i in j

Aij: Abundance of i in j

Bij: Bias that might be complex because of the diversity of contributors, the opportunistic property of the observations and the confusions

Modeling bias factors ?

Page 23: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Conclusion: biodiversity informatics needs MM

23

Biodiversity Dimension

Biodiversity Conservation Challenge

Who? Multimedia research topics

Aesthetic Enjoy and love it Everybody IR, Recommendation

Diverse Identify and classify Taxonomists Multimodal & Large-scale classification

Complex Decipher & model Biologists Multimedia Data analytics

Unknown Discover & associate Taxonomists Multimedia Data mining

Endangered Define & implement policies Decision makers Visualization, Interactivity

Indispensable Use sustainably Everybody Cross-media streams monitoring

Page 24: Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Thank you