The EarthServer initiative: towards Agile Big Data Services
description
Transcript of The EarthServer initiative: towards Agile Big Data Services
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
The EarthServer initiative: towards Agile Big Data Services
2nd GEOSS Science and Technology Stakeholder WorkshopBonn, Germany, 2012-aug-29
Peter Baumann
Jacobs University | rasdaman GmbHBremen, Germany
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
About the Presenter
Professor of CS, Jacobs University• Head, Large-Scale Scientific Information Systems
research group
Main outcome so far:
rasdaman• first „Big Raster Data Analytics“ server
Standardization• OGC: chair of raster-relevant working groups,
editor of 12+ standards & candidate standards• ISO: working on Raster („Array“) SQL• INSPIRE: Invited expert for coverages
www.jacobs-university.de/lsis, www.rasdaman.org
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Roadmap
OGC standards
rasdaman
EarthServer
EarthServer & GEOSS
Conclusions
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Feature and Coverage Data Standards
Core element in OGC: geographic feature• = abstraction of a real world phenomenon• associated with a location relative to Earth
Special kind of feature: coverage• = space-time varying multi-dimensional phenomenon• Typical representative: raster image• ...but there is more!
Typically, coverages are Big Data
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
«FeatureType»AbstractCoverage
MultiSolidCoverage
MultiSurfaceCoverage
MultiCurveCoverage
MultiPointCoverage
DiscreteCoverage
ContinuousCoverage
as per GML 3.2.1
RectifiedGridCoverage
ReferenceableGridCoverage
GridCoverage
all n-DNew subtypes possible
Coverage Types
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Coverage Encoding
Pure GML: complete coverage represented by GML
Special Format: other suitable file format (ex: MIME type “image/tiff”)
Multipart-Mixed: multipart MIME, type “multipart/mixed”
GML CoverageDomain set
Range type
Range set
App Metadata
GML CoverageDomain set
Range type
xlinkApp Metadata
NetCDF file
NetCDFDomain set
Range type
Range set
App Metadata
GeoTIFF
Range type
Range set
6
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
feature coverage
data
WMS
images data
meta
data
WCPS
WCS-T
WCS
FE
WFS-T
WFS
CQL
CS-T
CS-W
Core OGC Service Standards
• WMS "portrays spatial data” pictures• WCS "provides data + descriptions; data with original semantics,
may be interpreted, extrapolated, etc.“ [09-110r4]
……
…
7
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Web Coverage Service (WCS) Core: Simple & efficient access to multi-dimensional coverages
subset = trim | slice
WCS Extensions for additional functionality facets• “band extraction”, scaling, reprojection, interpolation, query language, ...
Application Profiles define domain-oriented bundling
8
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Raster Query Language: ad-hoc navigation, extraction, aggregation, analytics
Time series
Image processing
Summary data
Sensor fusion& pattern mining
Web Coverage Processing Service (WCPS)
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Scalable On-Demand Processing for the Earth Sciences• EU funded, 3 years, 5.85 mEUR• Platform: rasdaman (Array Analytics server) • Distributed query processing, integrated data/metadata search, 3D clients
Strictly open standards: OGC WMS+WCS+WCPS; W3C Xquery; X3D
6 * 100+ TB databases for all Earth sciences + planetary science
EarthServer: Big Earth Data Analytics
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Array DBMS for massive n-D raster data• new database attribute type: array<celltype,extent>• Data integration: rasters stored in standard database
Extending ISO SQL with array operators: •
“tile streaming” architecture• n-D array set of n-D tiles• extensive optimization, hw/sw parallelization
In operational use • dozen-Terabyte objects• Analytics queries in 50 ms on laptop
The rasdaman Raster Analytics Server
select img.green[x0:x1,y0:y1] > 130from LandsatArchive as img
www.rasdaman.org
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Value-Added Satellite Image Archive
[Diedrich et al 2001]
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
WCPS peer-to-peer cloud • each node accepts all requests • Incoming node distributes query, semantics based• Manifold optimization criteria
for $a in ( A ), $b in ( B )return encode( ( ($a.nir - $a.red) / ($a.nir + $a.red) - ($b.nir - $b.red) / ($b.nir + $b.red) ), “HDF5“ )
coverageA
for $b in ( B )return encode( ($b.nir - $b.red) / ($b.nir + $b.red), “array-compressed“ )
for $a in ( A )return encode( ($a.nir - $a.red) / ($a.nir + $a.red), “array-compressed“ )
rasdaman: Distributed Query Processing
coverageB
[Owonibi 2012]
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
EarthServer Contribution to GEOSS
Integrated n-D coverage data / metadata search• Smooth integration with Broker
[Nativi, Mazzetti 2012]
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Integrated n-D coverage data / metadata search• Smooth integration with Broker
EarthServer Contribution to GEOSS
• Including „reverse lookup“ queries: „give me metadata for data with specific properties“• Also integration with MapServer, GDAL, ...
Scalable n-D interfaces, based on OGC standards• Working „in situ“on existing archives; no copying!
Flexible ad-hoc processing & filtering• Through OGC standardized query language
nD visual Web clients• 1D diagrams, 2D maps, 3D data cubes, 3D timeseries sets, ...• Dynymically composed from query results
EarthServer :: GEOSS 2012, Bonn :: ©2012 Peter Baumann
Conclusion
Sensor, image, & statistics data = a main source of Big Data in Earth Sciences• Petrol industry has „more bytes than barrels“
OGC standards offer common platform• spatio-temporal coverages – a unified, cross-domain data model• Web Coverage Service suite – from simple download to flexible analytics• www.ogcnetwork.net/wcs
EarthServer can contribute Agile Analytics to GEOSS• OGC coverage standards• rasdaman technology
www.earthserver.eu