SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for...
Transcript of SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for...
![Page 2: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/2.jpg)
2
EO DATA ORGANIZATION
2
LANDSAT 8
![Page 3: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/3.jpg)
3
EO DATA ORGANIZATION
3
SENTINEL 2
![Page 4: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/4.jpg)
4
EO DATA ORGANIZATION
4
SENTINEL 2
![Page 5: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/5.jpg)
5
EO DATA ORGANIZATION
5
SENTINEL 2
![Page 6: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/6.jpg)
6
EO DATA ORGANIZATION
6
SENTINEL 2
![Page 7: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/7.jpg)
7
EO DATA ORGANIZATION
7
SENTINEL 2
![Page 8: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/8.jpg)
8
EO DATA ORGANIZATION
• EO image deployment is file-based • GDAL interfaces EO imagery with GIS software • Difficult to analyze large image collections due to
– data volume – Irregularities – lack of time support in GDAL
• Higher-level data organization as an alternative to files? – Key requirement: scalability
8
![Page 9: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/9.jpg)
9
SCIDB INTRODUCTION
• Array-based data management and analytical system [1] • Relies on shared nothing architectures • Open-source version available, extensible by UDFs • Basic data representation as multidimensional arrays:
– 𝑛 dimensions, 𝑚 attributes with different data types
tim
e
longitude
lati
tud
e
longitude
tim
e
[1] Stonebraker, M., Brown, P., Zhang, D., & Becla, J. (2013). SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62.
9
![Page 10: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/10.jpg)
10
SCIDB ARCHITECTURE
10
Coordinator Node
Instance 0
Instance 1
Instance 2
Instance 3
Worker Node
Instance 4
Instance 5
Instance 6
Instance 7
Worker Node
Instance 8
Instance 9
Instance 10
Instance 11
Worker Node
Instance 12
Instance 13
Instance 14
Instance 15
Clients
…
![Page 11: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/11.jpg)
11
SCIDB ARCHITECTURE
• arrays are divided into equally sized chunks
• chunks are distributed over many SciDB instances
• Size and shape of chunks are
defined by users per array and have strong effects on computation times
• Storage is nearly sparse
11
![Page 12: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/12.jpg)
12
QUERY LANGUAGE AND FUNCTIONALITY
• SciDB query language: Array Functional Language (AFL)
• Built in functionality: – Load / write arrays from / to files – Arithmetic operations – subsetting by dimensions, attributes, or values – Aggregations – Joins – Changing array schemas (repartitioning, redimensioning) – Linear algebra routines: (GEMM, GESVD, basic statistics) – …
12
![Page 13: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/13.jpg)
13
EXTENSIONS FOR EO DATA
• scidb4geo (https://github.com/appelmar/scidb4geo)
– SciDB plugin adds metadata and simple operations on space-time referenced arrays
• scidb4gdal (https://github.com/appelmar/scidb4gdal)
– ingest / download to / from GDAL supported files – spacetime mosaicing
• R package scidbst (https://github.com/flahn/scidbst)
– mimics functionality of common packages on SciDB arrays 13
![Page 14: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/14.jpg)
14
SCIDB CLIENTS
• Low-level clients: iquery, Shim
• High-level R client (similar for Python) – overrides standard methods, e.g. %*% – make extensive use of proxy objects – lazy evaluation:
• compute things when result is being read • ignore computations for unread parts of the results
14
![Page 15: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/15.jpg)
15
SCIDB STREAMING
• Run external programs (e.g., R, python) within SciDB at chunk level parallelism
chunk size selection must be adapted to the analysis
15
![Page 16: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/16.jpg)
16
STUDY CASE: LAND USE CHANGE MONITORING IN SOUTH WEST ETHIOPIA FROM LANDSAT 7 IMAGERY
• Landsat 7 data from 12 tiles captured between 2003-07-21 and 2014-12-27 1975 scenes
• approx. 325,000 km2
• monitor changes starting
with 2010-01-01
• using R and Breaks For Additive Season and Trend and its R implementation [1]
16 [1] Verbesselt, J., Hyndman, R., Newnham, G., & Culvenor, D. (2010). Detecting trend and seasonal changes in satellite image time series. Remote Sensing of Environment, 114, 106-115. DOI: 10.1016/j.rse.2009.08.014.
![Page 17: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/17.jpg)
17
EO DATA AS REGULAR ARRAYS
17
![Page 18: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/18.jpg)
18
LANDSAT 7 IN SCIDB
Images form a single three-dimensional array with daily temporal resolution and • 49548 x 47713 x 4177 cells in total • Only 0.5% (54 ⋅ 109) of the cells contain data sparse
storage
![Page 19: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/19.jpg)
19
STUDY CASE IMPLEMENTATION
1. Ingestion using GDAL
2. Preprocessing (with built-in SciDB functionality) – remove any values <= -9999 or >10000 – compute NDVI vegetation index – Reorganize chunks such that one chunk stores complete time series of
64 x 64 pixels
3. Run R scripts on all chunks using streaming
4. Postprocessing (with built-in SciDB functionality) – Reshape one-dimensional result array to form a two-dimensional map
5. Export results using GDAL
19
![Page 20: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/20.jpg)
20
STUDY CASE: RESULTS
![Page 21: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/21.jpg)
21
STUDY CASE SCALABILITY
• 16 SciDB instances
• running change analysis repeatedly with different number of available CPU cores
21
![Page 22: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/22.jpg)
22
CONCLUSIONS
• The array model with chunking and sparse storage seems well-suited to represent large EO datasets from many scenes at a higher level than files
• Analyses scale well with available hardware • Little reimplementation needed to scale complex time-series processing
through streaming (and no need to care about parallelization / external memory)
• Installation and data ingestion not straightforward and time-consuming
• Mostly useful for re-analysis but not real-time processing
• Missing interactive(!) user interfaces (á la Google Earth Engine) to make the
technology more accessible to end users?
22
![Page 23: SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9](https://reader034.fdocuments.net/reader034/viewer/2022050107/5f4544f5f159de455e1399e6/html5/thumbnails/23.jpg)
23
THANK YOU
• Questions?
• Hands-on with SciDB tomorrow!
• Slides available at GitHub: https://github.com/appelmar/edcforum2017
• Contact [email protected]
23