Data Mining
Prof. Dr. Eng. habil. Mihai Datcu
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 2
EO Data Mining EO Data Mining, in analogy to other forms of Data Mining, such as text or semantic web technologies, aims at making the image content accessible. It enables: • Automatically highlighting of satellite image subsets • Generates semantic catalogues • Explores and helps finding specific objects of interest • Assists image content assessment and understanding • Supports time-consuming visual interpretation • Performs analysis on multi-temporal and multi-sensor datasets.
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 3
EO images
Geodata
Metadata
GIS
Geoinformation extraction
Reasoning & AI
Semantic annoation
Shareable information
Data Features Semantics Information
The Concept: from Data to Information
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 4
Physical Parameters
Geometrical Parameters
Patterns Objects Structures
Atmosphere Meteorology LR Superspectral
Single Pass InSAR Altimeter PS InSAR
PolSAR
Multispectral
VHR Optical VHR SAR
A taxonomy of EO sensors
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 5
A [log]History 2014 Sentinel 1 Free data at 5 m resolution
2010 TanDEM-X Free scientific data at 1-3m resolution
2007 CosmoSky Med, RADRSAT 2 TerraSAR-X Free scientific data at 1m resolution 2000 SRTM First time > 80% of continental DEM in 30 m grid 1992 ERS-1 First time a lot of free data !
1974 Sea-SAT First EO spaceborne SAR
1954 Wiley Patent on SAR
WW I – WW II RADAR
1904 Hulsmeyer Patent on Telemobilskope
1897 Marconi First wireless communication
1888 Hertz Experimental validation
1865 Maxewll Theory of electromagnetical waves
100 years 10 years
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 6
Title • First idea
• Second idea
History 1900s
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 7
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 8
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 9
The EO Big Data
" EO Archive = O {10 000 000} EO Products " EO product = “image” + “xml file” " “image” = 30 000 x 30 000 pixels " “pixel” = 16/32…/1000s bits, real/complex/multiband " “xml file” = a lot about the imaging mode, orbit, position, timing
… " TOTAL = many ZettaBytes
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 10
Information vs. data
www.DLR.de • Chart 10 > Lecture > Jagmal Singh • IGARS2012 > 25 July 2012
TerraSAR-X 11-OCT-2008
512x512 pixels
ERS1 24-JUL-1992
512x512 pixels
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 11
SPATIAL CONTEXT & INFORMATION CONTENT
At meter resolution, if context is ignored (the analysis windows being used are relatively small as compared to the object scale), very different scene classes may be confused. The example shows a bridge (left) and buildings (right) which are comprehensible only when the window size is sufficiently large to incorporate relevant context. Otherwise, both scene classes “look” similar and can be confused.
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 12
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 13
1 HS TerraSAR-X Scene = up to10 000 image patches (100 x 100 m)
SCENE CATEGORIES & INFORMATION CONTENT
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 14
EXOGENOUS SOURCES & INFORMATION CONTENT
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 15
Concept and Components
Data Model Generation
Visual Data Mining
Interpretation
&
Understanding
Query, Data Mining & Knowledge Discovery
Users DBMS
Data Sources
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 16
Visual Data Mining
Interpretation
&
Understanding
Query, Data Mining
&
Knowledge Discovery
Users
DBMS
Data Sources Content Analysis Context Analysis
Image Content
Metadata Content
GIS Content
Context Analysis
GIS
Metadata
EO images
Ø Metadata extraction Ø Patch generation Ø Feature extraction methods: Gabor filters, Grey-
Level Co-occurrence Ø Matrix (GLCM), Bag of words, dictionary-based
compression features, etc.
Data Model Generation
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 17
Interpretation & Understanding
Visual Data Mining Data Model
Generation Users
Data Sources
Strabon
Ø SQL_RETRIEVE_QUERY_BY_DICTIONARY=""" WITH
user_dict_size AS (
SELECT uls.id AS id, uls.label as label, COUNT(*) AS cnt
FROM user_labels AS uls, user_dictionaries AS uds
WHERE uls.label LIKE '%(user_label_label)s' AND uls.id=uds.user_label_id
GROUP BY uls.id, uls.label ),
DBMS
Ø Database scheme relation DB Ø Some data mining functions
• Similarity metrics Ø SQL functions for querying
• Distances
Query, Data Mining &
Knowledge Discovery
Database Management System
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 18
Database Management System
Example of data model for Earth-Observation images
Ø SQL_RETRIEVE_QUERY_BY_DICTIONARY=""" WITH
user_dict_size AS (
SELECT uls.id AS id, uls.label as label, COUNT(*) AS cnt
FROM user_labels AS uls, user_dictionaries AS uds
WHERE uls.label LIKE '%(user_label_label)s' AND uls.id=uds.user_label_id
GROUP BY uls.id, uls.label
),
inter_size AS (
-- size of dictionary intersection for all pairs of patches
SELECT
ud1.user_label_id AS user_label_1 ,
ul1.label AS label_1 ,
d2.patch_id AS patch_2 ,
p2.label AS label_2 ,
count(*) AS cnt_1_2
FROM
……………….
Fast Compression distance computation
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 19
Interpretation & Understanding
Visual Data Mining
Data Model Generation
Query, Data Mining & Knowledge Discovery
Users DBMS
Data Sources
Ø Queries: • Query Builder • Ontologies: Strabon • SQL • Query by Example Ø Data Mining Ø Semantic Definition
Primitive Feature Extraction Blocks
NL-STFT QMF
Gabor GLCM
GUI Relevance Feedback
SVM
Semantic Class DB
Query, Data Mining and Knowledge Discovery
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 20
TerraSAR-X metadata used for RDFs and Queries: • XML file contains information about productComponent, annotation, imageData, missionInfo, acquisitionInfo,
sceneInfo, etc
Extraction of Metadata from XML annotation file
queries
Ontology
Query by Metadata and Ontologies
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 21
Query by Semantics
Semantic annotation of TerraSAR-X image content
queries
Earth-Observation data model
SELECT label_id, name, FROM annotation a Join label l on a.label_id=l.label_id
Ur
ba n
+
W at
er
U
rban
type
1
Vege
tatio
n an
d W
ater
B
uoy
W
ater
type
1 W a
te
r an d
B oa t
s
Fo re s
t t
ype
1
1 8 1 7
Gra
ssl
an
d F
ores
t typ
e 2
B
ridge
type
2 W
at
er
an
d U
rba
n
R
oa
d
an
d
Stru
ctur
e ro
of
Rai
lway
trac
ks
or V
eget
atio
n
V
eget
ati
on
ertic
al
Urb
an
type
2
G
rass
land
Gra
ssla
nd
wit
h ob
jec
ts
B
uil
ding
s
ha
pe
Urb
an
type
3 B
uil
ding
re
flec
tio
n
Veg
etat
ion
and
w
ith
rect
ang
ular
sh
ap
e
U
rban
R o a ds
a n d
Ur
b a n Tr
e es
a n d
B uil
di
n gs
W at
er
wi
th
la n d
C h a n n el
Ai
rp or
t
F or
es t
U
rb a n
ty p e
4
8 1 7
4 4 4
B ri
d g e
ty p e
1 H ar b or R iv er
d e p o si
ts A gr ic ul
tu re B re a ki
n g
w a v e s F or e st
a n d
W at er
V e g et at io n 1 7
B uil
di
n g
re fle cti
o n
R o a d
+
fo re st
U
rb a n
ty p e
5 R o a ds
–
h
ig h w ay
U
rb a n
ty p e
6
W a
te
r ty p e
2
C h a n n el
6 1
Sky
scr
ape
rs
Urb
an ty
pe 7
Str
ee
ts
wit
h bu
ildin
gs
Spo
rt
fi
eld
s
R
ail
wa
y tra
cks
S
kysc
rape
rs
typ
e1
horiz
onta
l
ty
pe
2
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 22 11/09/15 22
Semantic catalogues
- Bangkok (Thailand); - Shenyang (China); - Nazca Lines (Peru); - Havana (Cuba); - Venice (Italy); - Vasteras (Sweden); - Oran (Algeria); - Bogota (Columbia).
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 23
Interpretation & Understanding
Visual Data Mining
Data Model Generation
Query, Data Mining & Knowledge Discovery
Users DBMS
Data Sources
Ø Queries: • Query Builder • Ontologies: Strabon • SQL • Query by Example Ø Data Mining Ø Semantic Definition
Primitive Feature Extraction Blocks
NL-STFT QMF
Gabor GLCM
GUI Relevance Feedback
SVM
Semantic Class DB
Query, Data Mining and Knowledge Discovery
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 24
PF algorithm Classification SVM with RF
Ground truth
Annotated category
Optimal parameters:
• product type (MGD) , • mode (High resolution Spotlight), • geometric resolution configurations (RE), • patch size (160 x 160 pixels); • PF algorithm (Gabor filters)
Semantic Patches
Collections
Methodology:
Semantic annotation
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 25
Data Model composed of: ~110000 patches ~320 semantic categories
Location of the 100 TerraSAR-X scenes and the distribution of the scenes over the World
Data Model Generation: Ingested images
C.O. Dumitru and M. Datcu, “Information Content of Very High Resolution SAR Images: Study of Feature Extraction and Imaging Parameters”, IEEE Trans. Geoscience and Remote Sensing. To be published
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 26
Visual Data Mining
Data Model Generation
Query, Data Mining & Knowledge Discovery
Users
DBMS Data
Sources
Ø Rapid Mapping generation
Interpretation and Understanding
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 27
Example of Image understanding The damages in the agriculture can be clearly seen by comparing the classification in pre disaster image (left figure) with the post disaster image (right figure).
Agriculture
Bridges
Aquaculture
H. Voltage poles
Flooded areas
Bridges
Debris
H. Voltage poles
TerraSAR-X scene before Tsunami – 20.10.2010 TerraSAR-X scene after Tsunami – 12.03.2011
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 28
Automated Rapid Mapping scenario: Tsunami in Japan 2011
28
Two multitemporal TerraSAR-X images used for Japan Tsunami scenario
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 29
Damage Assessment
29
Flooded areas Ocean Bridges
after Flooded areas
Flooded areas Ocean Flooded
areas Debris
Agriculture Bridges before
High voltage poles
Mountains Structures
Changes 504 20 0 8 12 75 16 2
0
100
200
300
400
500
600
Bridges before Tsunami Bridges after Tsunami
Debris caused by Tsunami
Mountains 9% Debris
0%
Flooded area 7%
High voltage poles 0%
Ocean 78%
Bridges before
0% Bridges
after 0%
Agriculture
5%
Structures 1%
Semantic class Total of annotated
patches Mountains 1874 Debris 33 Flooded area 1428
High voltage poles 50 Ocean 16562
Bridges before 66 Bridges after 66 Agriculture 981 Structures 287
Semantic Catalogue
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 30
Interactive Learning for Rapid Mapping
City Industry
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 31
Visual Data Mining
Query, Data Mining & Knowledge Discovery
Interpretation and Understanding
Visual Data Mining
Data Model Generation
Users
DBMS Data
Sources
Ø Representation of the data in the 3D space using a Laplacian eigenmap
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 32
Immersive Visual Information Mining for EO image archives Navigation inside the EO image collections using the CAVE automatic virtual environment
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 33
EO data and Users
Volume and multimodality of data is growing Data and information is spatio-temporal and unstructured Users want to have the knowledge Interactive is the only way of operation Exploration is the predominant mode of interaction Context is critical and relevant Users are interested in information and knowledge independent of conjecture
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 34
Big Data trends
Information must be obtained from the data Databases and search engines were not designed to provide contents Visualization is very important HMI are crucial "Cognitive vs. instruments""Reactive systems (may be more then concious ...)""All data must be availabe (no data, no discovery)""Simple, but not naïve""Selective, not only addaptive
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 35
Processing and learning
Pre-processing integrated with post processing Data cleaning integrated with semantic understanding Non-visual integrated in visual Latent integrated in intuitive learning Learning controlled by unlearning Information integrated with human centric knowledge Only realtime
Institut für Methodik der Fernerkundung bzw. Deutsches Fernerkundungsdatenzentrum
Folie 36
Algorithms and tools
Pattern Recognition and Data Mining Using Data Compression EO Ontologies Geospatial Information Retrieval and Indexing Systems Content Mining, Semantics Modeling, and Complex Queries Mining Image Time Series DM supported by Web Services, CLOUD technologies
Top Related