Plant identification based on noisy web data: the amazing performance of deep learning (LifeCLEF 2017)

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study investigates performance disparities between large-scale, noisy web-crawled plant image datasets—containing substantial label inaccuracies—and small-scale, expert-annotated high-fidelity datasets in a continental-scale plant identification task involving over 100,000 species. Method: We employ deep convolutional neural networks enhanced with transfer learning, aggressive data augmentation, and label-noise-robust training strategies to model unstructured plant images sourced from blogs, websites, and image-sharing platforms. Contribution/Results: Our experiments demonstrate that models trained exclusively on noisy data achieve accuracy on independent test sets comparable to—and in some cases exceeding—that of models trained on expert-labeled data. This constitutes the first empirical validation of deep neural networks’ high robustness to pervasive label noise in fine-grained visual recognition. The findings establish a low-cost, scalable paradigm for species identification under data scarcity, advancing the deployment of plant AI in real-world, open-domain scenarios.

Technology Category

Application Category

📝 Abstract

The 2017-th edition of the LifeCLEF plant identification challenge is an important milestone towards automated plant identification systems working at the scale of continental floras with 10.000 plant species living mainly in Europe and North America illustrated by a total of 1.1M images. Nowadays, such ambitious systems are enabled thanks to the conjunction of the dazzling recent progress in image classification with deep learning and several outstanding international initiatives, such as the Encyclopedia of Life (EOL), aggregating the visual knowledge on plant species coming from the main national botany institutes. However, despite all these efforts the majority of the plant species still remain without pictures or are poorly illustrated. Outside the institutional channels, a much larger number of plant pictures are available and spread on the web through botanist blogs, plant lovers web-pages, image hosting websites and on-line plant retailers. The LifeCLEF 2017 plant challenge presented in this paper aimed at evaluating to what extent a large noisy training dataset collected through the web and containing a lot of labelling errors can compete with a smaller but trusted training dataset checked by experts. To fairly compare both training strategies, the test dataset was created from a third data source, i.e. the Pl@ntNet mobile application that collects millions of plant image queries all over the world. This paper presents more precisely the resources and assessments of the challenge, summarizes the approaches and systems employed by the participating research groups, and provides an analysis of the main outcomes.

Problem

Research questions and friction points this paper is trying to address.

Evaluating if large noisy web datasets can outperform smaller expert-verified datasets

Developing automated plant identification systems for 10,000 European and North American species

Assessing deep learning performance on plant identification using 1.1 million noisy web images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning for plant identification challenge

Web-collected noisy training dataset evaluation

Comparison with expert-checked trusted dataset

🔎 Similar Papers

Automatic Fused Multimodal Deep Learning for Plant Identification