Multimodal Artificial Intelligence Strategies for Remote Sensing Earth Observation

Barbato, M

The study of the land is one of the most relevant tasks for the influence that Earth and the management of its resources have on our lives as individuals and as a society. From the location in which we live to the distribution of the population, the food we consume, the culture, and the socio-relationship between the different societies of the world are partially defined by the characteristics of the surrounding lands. These are the reasons that create our necessity of observing and studying the Earth. These studies have the scope to describe the features of the terrains and can be linked to many tasks, varying from classification, segmentation, estimation of soil characteristics, etc., with the final goal to obtain information that is fundamental in many applications from agriculture of precision to study of land cover and land use. To this end, the use of remote sensing technologies has exponentially increased, consequently enhancing the availability and collection of data. This increment and the use of new technologies open the remote sensing field to two crucial advantages: 1) the possibilities of using AI techniques for Earth Observation and 2) data that not only increase in cardinality but also in the kinds of information that they convey. The former of these opportunities allows for the use of incredibly efficient techniques derived in particular from computer vision that can greatly improve our ability to study Earth. The latter enables us to multimodal strategies. These strategies aim to combine different kinds of data (modalities), such as RGB images, hyperspectral data, LiDAR, etc., to exploit the information that comes from each of them. In many computer vision tasks, multimodal approaches have posed themselves as a new step for a better understanding of reality, thus improving our ability to handle data resources. However, in remote sensing applications, it is still difficult to consider these approaches together with AI techniques due to the lack of datasets that involve both high cardinality and modalities. This thesis wants to analyze and deepen the usefulness of multimodality in remote sensing. With this goal, different tasks that can characterize a remote sensing multimodal application will be investigated, starting from the acquisition of new data to the study of specific tasks and their integration in a real scenario. In particular, the considered tasks will consist of 1) Hyperspectral Pansharpening for the enhancement of this kind of data; 2) Unsupervised Segmentation with Hyperspectral data; 3) Multimodal Supervised Semantic Segmentation; 4) Digital Soil Mapping for the estimation of soil parameters (such as chemical and texture features). For each of these tasks, the goal will be to demonstrate the usefulness of information that differs from the typical RGB images and the advantages that derive from combining these data using AI techniques. Finally, the knowledge derived from these studies will result in the creation of a real case pipeline for the estimation of the parameters in agricultural areas to help manage the resources. The analysis presented in this work demonstrates that each of these tasks benefits from the use of multimodality, also providing new data and techniques that can support future studies in Earth Observation.

Lo studio del territorio è uno dei compiti più rilevanti per l'influenza che la Terra e la gestione delle sue risorse hanno sulla nostra vita come individui e come società. Dal luogo in cui viviamo alla distribuzione della popolazione, il cibo che consumiamo, la cultura e le relazioni sociali tra le diverse società del mondo sono in parte definite dalle caratteristiche delle terre che ci circondano. Queste sono le ragioni che creano la necessità di osservare e studiare la Terra. Questi studi hanno lo scopo di descrivere le caratteristiche dei terreni e possono essere collegati a molti compiti, che variano dalla classificazione, alla segmentazione, alla stima delle caratteristiche del suolo, ecc. con l'obiettivo finale di ottenere informazioni che sono fondamentali in molte applicazioni, dall'agricoltura di precisione allo studio della copertura e dell'uso del suolo. A tal fine, l'uso delle tecnologie di remote sensing è aumentato esponenzialmente, migliorando di conseguenza la disponibilità e la raccolta di dati. Questo incremento e l'uso di nuove tecnologie aprono il campo del remote sensing a due vantaggi cruciali: 1) la possibilità di utilizzare tecniche di intelligenza artificiale per l'osservazione della Terra e 2) dati che non solo aumentano in cardinalità ma anche nel tipo di informazioni che trasmettono. La prima di queste opportunità consente l'uso di tecniche incredibilmente efficienti, derivate in particolare dalla computer vision, che possono migliorare notevolmente la nostra capacità di studiare la Terra. La seconda ci permette di adottare strategie multimodali. Queste strategie mirano a combinare diversi tipi di dati (modalità), come immagini RGB, dati iperspettrali, LiDAR, ecc. per sfruttare le informazioni provenienti da ciascuno di essi. In molti compiti di computer vision, gli approcci multimodali si sono proposti come un nuovo passo per una migliore comprensione della realtà, migliorando così la nostra capacità di gestire le risorse di dati. Tuttavia, nelle applicazioni di remote sensing, è ancora difficile considerare questi approcci insieme alle tecniche di IA a causa della mancanza di dataset che coinvolgano sia un'elevata cardinalità che modalità. Questa tesi vuole analizzare e approfondire l'utilità della multimodalità nel remote sensing. Con questo obiettivo, verranno studiati diversi compiti che possono caratterizzare un'applicazione multimodale di remote sensing, a partire dall'acquisizione di nuovi dati fino allo studio di compiti specifici e alla loro integrazione in uno scenario reale. In particolare, i compiti considerati consisteranno in 1) pansharpening iperspettrale per il miglioramento di questo tipo di dati; 2) Segmentazione non supervisionata con dati iperspettrali; 3) segmentazione semantica supervisionata e multimodale; 4) mappatura digital del terreno per la stima dei parametri del suolo (come caratteristiche chimiche e texture). Per ognuno di questi compiti, l'obiettivo sarà quello di dimostrare l'utilità di informazioni diverse dalle tipiche immagini RGB e i vantaggi che derivano dalla combinazione di questi dati con tecniche di intelligenza artificiale. Infine, le conoscenze derivate da questi studi porteranno alla creazione di una pipeline di casi reali per la stima dei parametri nelle aree agricole per aiutare la gestione delle risorse. L'analisi presentata in questo lavoro dimostra che ognuno di questi compiti trae vantaggio dall'uso della multimodalità, fornendo anche nuovi dati e tecniche che possono supportare studi futuri nell'ambito dell'osservazione della Terra.

(2024). Multimodal Artificial Intelligence Strategies for Remote Sensing Earth Observation. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2024).