The Artificial Intelligence benchmark datasets for Land Cover Classification from Satellite Imagery (AI4LCC) dataset collection is an initiative of the Continental Surfaces Data and Services Hub –
THEIA, part of the
DATA TERRA Research Infrastructure for the distribution of AI-based training sets for the classification of landcover from satellite imagery. The datasets can be used to train a classical machine learning or more advanced Deep learning algorithm to process information.
Currently, the collection consists of the following datasets:
1) Collection MultiSenGE for multi-temporal and multi-modal landcover classification, with 8157 multi-temporal patches of Sentinel-1 and Sentinel-2 imagery (256x256) over the Grand-Est region (France). The collection is organized in a standard procedure described in the metadata. The products are available at :
- The metadata :
AI4LCC-MultiSenGE.json
- The Sentinel-1 temporal serie patches (GRD) :
Sentinel-1 patches
- The Sentinel-2 temporal serie patches (L2A) :
Sentinel-2 patches
- Ground reference patches :
Ground reference patches
- JSON files for each patch :
label files
2) Collection MultiSenNA for multi-temporal and multi-modal landcover classification, with 12258 multi-temporal patches of Sentinel-1 and Sentinel-2 imagery (256x256) over the Nouvelle-Aquitaine region (France). The collection is organized in a standard procedure described in the metadata. The products are available at :
- The metadata :
AI4LCC-MultiSenNA.json
- The Sentinel-1 temporal serie patches (GRD) :
Sentinel-1 patches
- The Sentinel-2 temporal serie patches (L2A) :
Sentinel-2 patches
- Ground reference patches :
Ground reference patches
- JSON files for each patch :
label files
Information on these collections is available in
Wenger & al., 2022[1] and
Wenger & al., 2022[2]
The PhD thesis related to this work is available here
Wenger, 2023[3]
A scientific publication related to this work is available here
Wenger, 2025[4]
In addition, usefull Python tools can be found on
Github to extract information on the dataset.