Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet
Transcript of Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet
http://www.plantnet-project.org/
Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet
1
Alexis Joly, Hervé Goëau, Julien Champ, Samuel Dufour-Kowalski, Henning Müller, Pierre Bonnet
Acknowledgement: Nozha Boujemaa, Daniel Barthelemy, Jean-François Molino
2
• Global warming, food crisis and biodiversity erosion• Accurate knowledge of living species distribution and
evolution is essential• Ultimate goal: sustainable and global biodiversity
monitoring tools– Surveillance of global warming consequences, plant & animal diseases,
human activities impact, invasive species propagation• The Taxonomic impediment
– Less and less people can identify plants and animals– Less and less nature observers can produce biodiversity data
Context
Pl@ntNet project (launched 2010)
Bridging the taxonomic impediment thanks to an innovative
crowdsourcing workflow based on automated plant identification
The positive feedback loop does work !
+
++
Pl@ntNet project (launched 2010)
Pl@ntNet app today2,5 M downloads
14 M sessions10-50 K users / day
150 Countries
5LanguagesFR, EN, ES, IT, PT,DE, AR, ZH, SK
Pl@ntNet dataValidated data = 3% of the queried plant images
- 30K collaboratively revised observations per year (TelaBotanica)- Publicly available through international initiatives (GBIF, LifeCLEF)- Validation is a slow and hard process
Pl@ntNet data
Unlabeled data = 97% of the raw query stream- > 1 Million of observations per year (5.1M today)- Not exploited today- A high potential for biodiversity monitoring
Pl@ntNet mobile search logs
Species Distribution Modelling from UGC image streams ?
Can we predict (real-time and/or long-term) Species Distribution Models directly from Pl@ntNet mobile search logs ?
Or from any other UGC image stream ?
9
Challenges1. Improve recognition in open-world streams
10
Recognizing plants in an open world
11
An open-set recognition problem- With 10K’s of known and unknown classes- Highly imbalanced training data
We carried out an evaluation within LifeCLEF 2016- Training set of 1000 known species (113K pictures)- Test set = 8K manually annotated Pl@ntNet queries (half
known, half distractors)- Classification Mean Average Precision on a subset of 26
invasive species
??
? ? ?
? ?
1. Improve automatic recognition of plants in open-world streams- Novelty affects all systems, whatever the used rejection method (even supervised)- No rejection method can deal with strong novelty rates
→ we are still far from being able to monitor biodiversity in Twitter or Snapchat streams !
12
Recognizing plants in an open world
Challenges1. Improve recognition in open-world streams
2. Use geo-location and date
13
Geo-location and date ?- Not so easy !
- No real success within 5 years of PlantCLEF challenge- Why ?
- Plant distributions are not well known (this is actually our objective !)- Habitats are extremely heterogeneous from a species to another one (some
plants live everywhere while others live in very specific biotopes)- What can we do ?
- Big occurrence data (like GBIF) might help but is biased, heterogeneous and incomplete (no absence data)
- Environmental variables might help but heterogeneous, incomplete, noisy, etc.→ This will be one of the focus of LifeCLEF 2017
Challenges1. Improve recognition in open-world streams
2. Use geo-location and date
3. Use taxonomy
15
Using taxonomy ?Taxonomy = a hierarchical classification built by botanists for hundreds of years
→ 600 families > 14K genus > 300K species
But, taxonomy is highly heterogeneous and imbalanced
→ Classical hierarchical classification algorithms can be not be directly used
- Some genus with up to 1000 very similar species- But many genus and families include very distinct species- The long tail distribution occurs at each level and in each
node
Genus Orobanche
Genus Bupleurum
Family Bupleurum
Challenges1. Improve recognition in open-world streams
2. Use geo-location and date
3. Use taxonomy
4. Optimize and boost training data production
17
Pro-active crowdsourcing
Classifier (CNN)Annotators (heterogeneous skills)
Tasks selection & assignment
?
?
?
Training Training
2. Create quizzes by Monte-carlo sampling
Beginner
Intermediate
1. ConvNet predictions
3. Sort quizzes by difficulty (= success expectation across all workers)
Identification success rate
Experiments: Simpson’s paradox
20
Declared expertise
Workers are assigned tasks they have been trained on before !
Challenges1. Improve recognition in open-world streams
2. Use geo-location and date
3. Use taxonomy
4. Optimize and boost data validation processes
5. Control bias in Species Distribution Models
21
22
Objectif: Estimate the relative abundance Aij of species i in place j supposing
Nij ~ Law( Aij , Bij ) Nij: Number of observations of i in j
Aij: Abundance of i in j
Bij: Bias that might be complex because of the diversity of contributors, the opportunistic property of the observations and the confusions
Modeling bias factors ?
Conclusion: biodiversity informatics needs MM
23
Biodiversity Dimension
Biodiversity Conservation Challenge
Who? Multimedia research topics
Aesthetic Enjoy and love it Everybody IR, Recommendation
Diverse Identify and classify Taxonomists Multimodal & Large-scale classification
Complex Decipher & model Biologists Multimedia Data analytics
Unknown Discover & associate Taxonomists Multimedia Data mining
Endangered Define & implement policies Decision makers Visualization, Interactivity
Indispensable Use sustainably Everybody Cross-media streams monitoring
Thank you