Feature Extraction and Analysis for Information Mining in large EO Data Archives
Bruzzone, Lorenzo1; Demir, Begüm1; Bovolo, Francesca1; Brockmann, Carsten2; Fomferra, Norman2; Iapaolo, Michele3; Jha, Rajesh4; Lu, Jun4; Marchetti, Pier Giorgio3; Quast, Ralf2; Stelzer, Kerstin2; Veci, Luis4
1University of Trento, ITALY; 2Brockmann Consult GmbH, GERMANY; 3ESA, ITALY; 4Array Systems Computing Inc., CANADA

NNowadays, several optical and SAR sensors operate on board of satellites showing a variety of properties in terms of spatial, spectral, and temporal resolutions. Thus, millions of Earth Observation (EO) scenes are collected in very large EO data archives. Analysing, mining and retrieving useful information from them is a big challenge [1]. EO data archives grow rapidly motivating the need of efficient and effective processing and analysis tools. This will be evermore true when the new Sentinel missions will be launched and operated by ESA. In this context, this paper presents the activities developed in the framework of the ESA Long Term Data Preservation (LTDP) - Product Feature Extraction and Analysis project. This project, which is at an initial stage, aims to efficiently exploit the above-mentioned huge amount of data by defining: i) feature extraction methods for populating an EO data base with a set of effective features computed on different kinds of remote sensing data (i.e., SAR, optical and also time series of them), and ii) data analysis methods for extracting the semantic from the features in the context of different scenarios and applications.
The high spatial and temporal resolution of images acquired by the new generation of satellite sensors (e.g., future Sentinel 1 and 2 missions for high spatial resolution SAR and optical images, respectively; and future Sentinel 3 mission for high temporal resolution images) require robust feature extractors that can emphasize the high information content of images. In the project we focus the attention on the implementation of feature extractors that can be effective on a large amount of EO data and on several kinds of EO data. These features extractors include: i) Features capable to effectively model the spatial/geometrical information in a large variety of EO image data (e.g. SAR and optical high resolution images) such as attribute filters (which contain as special case morphological filters) [2], attribute morphological profiles, etc. ii) Features that capture the multitemporal nature of satellite data such as the backscattering temporal variability and long-term coherence for SAR time series, the Fourier descriptors for optical and SAR time-series [3], etc. iii) Features that model the multiscale nature of spatial information in EO data such as stationary and non-stationary discrete Wavelet transform and Gabor filters. These methods can be applied to both SAR and optical images, as well as to model the multiscale time variability of temporal signatures in time-series.
Extracted features will be employed in three main scenarios: i) content based image retrieval; ii) content based time-series retrieval; and iii) unsupervised classification with kernel methods.
The first and second scenarios are devoted to fast and effective content based image retrieval on single images and image time-series. To this end a query data should be defined that can be an EO image, parameter values (such as a threshold values), temporal-trend of a time series or a step-change in bitemporal images. Once the query is fixed, efficient approaches are required to retrieve from large EO data archives images (or time series) that match the query. Here active-learning-based methods [4] are considered in the context of relevance feedback [5] for both scenarios in order to efficiently exploit interaction with the user. The classification stage of the content based retrieval will be based on machine learning classifiers such as SVMs which are non-parametric (and thus suitable for any kind of data) and widely recognized as effective [6]. As in the analysis of EO archives it is not realistic to have ground truth data, the third scenario is devoted to efficient and robust unsupervised classification for scene understanding and interpretation. Here, the most promising unsupervised classification techniques, i.e., kernel based methods as kernel k-means [7], will be considered. In the project special emphasis is devoted to: 1) give priority (when possible) to the development of data independent methodologies within the above-mentioned scenarios (i.e., methodologies that can be suitable for SAR and optical images, and time series of SAR and optical images); and 2) develop a coherent data processing framework where the methodologies being implemented for the individual scenarios can be exploited for the other considered scenarios.
[1] M. Datcu, S. D'Elia, R. L. King, and L. Bruzzone, "Introduction to the special section on image information mining for earth observation data," IEEE Trans. Geosci. Rem. Sen., vol. 45, no. 4, pp. 795-798, 2007.
[2] M. Dalla Mura, J.A. Benediktsson, B. Waske, L. Bruzzone, "Morphological attribute profiles for the analysis of very high resolution images," IEEE Trans. Geosci. Remote Sens., vol. 48, no. 10, pp. 3747-3762, 2010.
[3] L. Bruzzone, M. Marconcini, U. Wegmuller, A. Wiesmann, "An advanced system for the automatic classification of multitemporal SAR images," IEEE Trans. Geosci. Rem. Sens., vol. 25, no. 13, 2004, pp. 1491-1500.
[4] B. Demir, C. Persello, L. Bruzzone, "Batch mode active learning methods for the interactive classification of remote sensing images," IEEE Trans. on Geosci. and Rem. Sens., vol. 49, no.3, pp. 1014-1031, 2011.
[5] M. Ferecatu, N. Boujemaa, "Interactive Remote-Sensing Image Retrieval Using Active Relevance Feedback", IEEE Trans. on Geosci. and Rem. Sensing, vol. 45, no. 4, pp. 818-826, 2007.
[6] G. Camps-Valls and L. Bruzzone, "Kernel-based methods for hyperspectral image classification," IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351-1362, 2005.
[7] I.R. Zhang, A.I. Rudnicky, "A Large scale clustering scheme for kernel k-means," IEEE Int. Conf. on Pattern Recog., pp. 289-292, 2002.