GeoServer Experimental Solution Exploiting XML record sets Pavel Golodoniuc Computer scientist Perth...

33
GeoServer Experimental Solution Exploiting XML record sets Pavel Golodoniuc Computer scientist Perth – 11 March 2011

Transcript of GeoServer Experimental Solution Exploiting XML record sets Pavel Golodoniuc Computer scientist Perth...

GeoServer Experimental SolutionExploiting XML record sets

Pavel Golodoniuc

Computer scientist

Perth – 11 March 2011

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Brainstorming session

1. Solve ORM problem properly? 12 months?

2. Andrea’s idea – caching

3. XSLT backend

4. More threading in feature building

5. Virtual tables

6. Something we’ve crossed out

7. XML record sets

8. Denormalized views – chain simple properties

9. Decorators on PGSDF…• to allow SQL smuggling past GeoAPI

Potential improvement options

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Post-processing

Data replicationXML DB Cache GML

Victor

Rini

Niels

Pavel

1. Solve ORM problem properly? 12 months?

2. Andrea’s idea – caching

3. XSLT backend

4. More threading in feature building

5. Virtual tables

6. Something we’ve crossed out

7. XML record sets

8. Denormalized views – chain simple properties

9. Decorators on PGSDF…• to allow SQL smuggling past GeoAPI

Potential improvement options

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Post-processing

Data replicationXML DB Cache GML

Victor

Rini

Niels

Pavel

Idea’s foundation: XML record sets

• XML data type support

• FOR XML clause that enables you to further enhance the XML support in your applications and to write easy-to-maintain relational data-to-XML aggregations

• AUTO mode that employs a heuristic to infer a simple, one element name-per-level hierarchy based on the lineage information and the order of the data in a SELECT statement

• Generation of the XML in a streamable way that allows to produce large documents efficiently

• XSLT-based post-processing

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Goals

• One SQL query per WFS request – Goal #1

• Generic solution that may use present configuration approach

• Straightforward data processing

• Extensible and optimisable solution

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

“Big picture”

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

WFS config DB schema

PrefabricatedSQL SELECT statements

Dynamically generatedXLST stylesheet

+

IncomingWFS request

Generated WHERE clausefor SQL query

+

DBMS

Raw XMLXSLT

transformation

GML

Start-up costs Runtime costs

Generation of SQL queries – 1 of 4Top level request – simple data retrieval pattern

SELECT TOP 50

q0.GML_ID,

q0.DESCRIPTION,

q0.GML_NAME,

q0.OBSERVATION_METHOD,

q0.OBSERVATION_METHOD_CODESPACE,

q0.SPECIFICATION_URI,

q0.SHAPE.AsGml() AS 'SHAPE',

'http://www.opengis.net/gml/srs/epsg.xml#' + CONVERT(VARCHAR(10), q0.SHAPE.STSrid) AS 'SHAPE/srsName',

q0.POSITIONAL_ACCURACY,

q0.POSITIONAL_ACCURACY_UOM

FROM dbo.GSML_MAPPEDFEATURE q0

FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Limiting the amount of returned features at the database level. Support for paging.

Simple single-valued properties.Geometry objects are encoded as GML

and returned along with supplemental attributes (e.g. SRID).

Source table/view name is taken from FeatureTypeMapping/sourceType element in

the configuration file.

All records are serialized in XML at the database level.

Generation of SQL queries – 2 of 4Property nesting pattern

SELECT TOP 50

-- All single-valued properties are retrieved here.

(

-- Get associated mineral occurrences.

SELECT

q1.GML_ID,

q1.SITE_ID,

q1.URI,

q1.OCCURRENCE_URI,

q1.DEPTH,

q1.LENGTH,

q1.WIDTH,

q1.TYPE

-- All other multi-valued properties go here.

FROM dbo.ER_MINERALOCCURRENCE q1

WHERE q1.NAME_VALUE = q0.SPECIFICATION_URI

FOR XML PATH('MineralOccurrence'), TYPE

) AS __SPECIFICATION

FROM dbo.GSML_MAPPEDFEATURE q0

FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Use the same pattern as before to get single-valued properties.

All other simple and nested multi-valued properties are encoded using the same

patterns as many time as required.

Subquery is also serialised in XML under __SPECIFICATION element.

Property nesting pattern

Linkage according to the mapping file:

<AttributeMapping> <targetAttribute> gsml:specification </targetAttribute> <sourceExpression> <OCQL>SPECIFICATION_URI</OCQL> <linkElement>er:MineralOccurrence</linkElement> <linkField>gml:name</linkField> </sourceExpression></AttributeMapping>

Generation of SQL queries – 3 of 4Encoding pattern for polymorphic properties

SELECT TOP 50

-- Omitted for simplicity.

(

-- Get associated mineral occurrences.

SELECT

-- Omitted for simplicity.

(

SELECT

-- Get properties for er:Resource objects.

FROM dbo.ER_MINOCC_RESOURCE q2

WHERE q2.SITE_ID = q1.SITE_ID

FOR XML PATH('Resource'), TYPE

) AS __ORE_AMOUNT,

(

SELECT

-- Get properties for er:Reserve objects.

FROM dbo.ER_MINOCC_RESERVE q2

WHERE q2.SITE_ID = q1.SITE_ID

FOR XML PATH('Reserve'), TYPE

) AS __ORE_AMOUNT

FROM dbo.ER_MINERALOCCURRENCE q1

WHERE q1.NAME_VALUE = q0.SPECIFICATION_URI

FOR XML PATH('MineralOccurrence'), TYPE

) AS __SPECIFICATION

FROM dbo.GSML_MAPPEDFEATURE q0

FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Encoding of polymorphic properties

Linkage according to the mapping file for er:Resource:

<AttributeMapping> <targetAttribute> er:oreAmount </targetAttribute> <sourceExpression> <OCQL>SITE_ID</OCQL> <linkElement>'_er_mo_oreAmount_Resource'</linkElement> <linkField>FEATURE_LINK</linkField> </sourceExpression> <isMultiple>true</isMultiple></AttributeMapping>

Linkage according to the mapping file for er:Reserve:

<AttributeMapping> <targetAttribute> er:oreAmount </targetAttribute> <sourceExpression> <OCQL>SITE_ID</OCQL> <linkElement>'_er_mo_oreAmount_Reserve'</linkElement> <linkField>FEATURE_LINK</linkField> </sourceExpression> <isMultiple>true</isMultiple></AttributeMapping>

Both properties are mapped to the same intermediate element.

Different types are disambiguated by the name of sub-element.

Generation of SQL queries – 4 of 4Summary

• Straightforward iterative process• Follows a few very simple encoding patterns• Every nested property adds another subquery to the parent query• Results in a formidable “query from hell”

• Database-specific routine• Depends on a specific SQL dialect

• Implements ‘Single SQL query per WFS request’ approach• XML record set is structured and is easy to process• XML record set is self-sufficient and contains all the required

information that is needed to encode target feature type

• Process needs to be performed once during start-up

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Some interesting facts

The query that is used to obtain all the information required for encoding of gsml:MappedFeature feature type according toAuScope EarthResourceML Profile 1.1…

• Consists of 250+ lines of SQL code (without WHERE clause)• Is a query with 20 subqueries• Retrieves data from 13 views• All views involved in query contain 881,184 records in total• Serializes and returns data in XML format automatically

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Generation of XSLT stylesheets

• Similar iterative process• Follows a few very simple encoding patterns• Each feature type and nested property have their own XSLT

template for processing• XSLT may be generated in parallel with SQL query generation

• All information required for generation of XSLT stylesheets is obtained from GeoServer configuration files and XML Schemas

• XSLT transformation also contains a few helper templates

• Process needs to be performed once during start-up

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

• Conversion of WFS filters into SQL WHERE clauses for prefabricated SQL SELECT statements

• SQL query execution

• Conversion of raw XML obtained from database into GML using pre-generated XSLT stylesheet

• Prefabrication of SQL SELECT statements

• Generation of XSLT stylesheet for each feature type

Start-up vs. Runtime costs

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Start-up costs Runtime costs

Conversion of WFS filters into SQL

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

<?xml version="1.0" encoding="UTF-8"?><wfs:GetFeature service="WFS" version="1.1.0" xmlns:wfs="http://www.opengis.net/wfs" xmlns:ogc="http://www.opengis.net/ogc" xmlns:gml="http://www.opengis.net/gml" xmlns:er="urn:cgi:xmlns:GGIC:EarthResource:1.1" maxFeatures="50"> <wfs:Query typeName="gsml:MappedFeature" srsName="EPSG:4326"> <ogc:Filter> <ogc:And> <ogc:BBOX> <ogc:PropertyName>gsml:shape</ogc:PropertyName> <gml:Envelope srsName="EPSG:4326"> <gml:lowerCorner>78.75 -54.5210814954436 </gml:lowerCorner> <gml:upperCorner>352.08984375 11.350796722383672 </gml:upperCorner> </gml:Envelope> </ogc:BBOX> <ogc:PropertyIsEqualTo> <ogc:PropertyName>gsml:specification/er:MineralOccurrence/er:commodityDescription/er:Commodity/er:commodityName</ogc:PropertyName> <ogc:Literal>urn:cgi:classifier:GA:commodity:Ag</ogc:Literal> </ogc:PropertyIsEqualTo> </ogc:And> </ogc:Filter> </wfs:Query></wfs:GetFeature>

WHERE SHAPE.STIntersects( geometry::STGeomFromText('POLYGON((78.75 -54.5210814954436, 78.75 11.350796722383672, 352.08984375 11.350796722383672, 352.08984375 -54.5210814954436, 78.75 -54.5210814954436))', 4326) ) = 1 AND EXISTS( SELECT TOP 1 1 FROM dbo.ER_MINERALOCCURRENCE q1 INNER JOIN dbo.ER_COMMODITY q2 ON q2.SOURCE_URI = q1.NAME_VALUE WHERE q1.NAME_VALUE = q0.SPECIFICATION_URI AND q2.COMMODITYNAME = 'urn:cgi:classifier:GA:commodity:Ag' )

BBOX filter is directly mapped to STIntersects function.

“XPath” expression is tracked down to identify the source of the underlying property.

Logical operations are preserved in the generated SQL query.

Spatial Reference System is taken into account in the

generated SQL query.

Literal operand is passed directly into SQL query.

Final query

SELECT TOP 50

q0.GML_ID,

q0.DESCRIPTION,

q0.GML_NAME,

q0.OBSERVATION_METHOD,

q0.OBSERVATION_METHOD_CODESPACE,

q0.SPECIFICATION_URI,

q0.SHAPE.AsGml() AS 'SHAPE',

'http://www.opengis.net/gml/srs/epsg.xml#' + CONVERT(VARCHAR(10), q0.SHAPE.STSrid) AS 'SHAPE/srsName',

q0.POSITIONAL_ACCURACY,

q0.POSITIONAL_ACCURACY_UOM,

(

-- Get associated mineral occurrences.

SELECT

q1.GML_ID,

q1.SITE_ID,

q1.URI,

q1.OCCURRENCE_URI,

(

SELECT

q2.NAME_VALUE,

q2.NAME_CODESPACE

FROM dbo.ER_MINERALOCCURRENCE q2

WHERE q2.GML_ID = q1.GML_ID

FOR XML PATH('name'), TYPE

) AS __NAME,

q1.DEPTH,

q1.LENGTH,

q1.WIDTH,

q1.TYPE,

(

-- Get er:expression attributes

SELECT

q2.EXPRESSION

FROM dbo.ER_MINOCC_EXPRESSION q2

WHERE q2.SITE_ID = q1.SITE_ID

FOR XML PATH('expression'), TYPE

) AS __EXPRESSION,

(

-- Get er:form attributes

SELECT

q2.FORM

FROM dbo.ER_MINOCC_FORM q2

WHERE q2.SITE_ID = q1.SITE_ID

FOR XML PATH('form'), TYPE

) AS __FORM,

(

SELECT

q2.GML_ID,

q2.SITE_ID,

q2.DIP_NUMERIC,

q2.DIP_STRING,

q2.AZIMUTH_NUMERIC,

q2.AZIMUTH_STRING,

q2.CONVENTION

FROM dbo.ER_MINOCC_PLANAR_ORIENTATION q2

WHERE q2.SITE_ID = q1.SITE_ID

FOR XML PATH('planarOrientation'), TYPE

) AS __PLANAR_ORIENTATION,

(

-- Get er:shape attributes

SELECT

q2.SHAPE

FROM dbo.ER_MINOCC_SHAPE q2

WHERE q2.SITE_ID = q1.SITE_ID

FOR XML PATH('shape'), TYPE

) AS __SHAPE,

(

SELECT

TOP 1 -- Temporary. Assuming that ER_MINERALDEPOSITMODEL shouldn't be denormalised.

q2.GML_ID,

q2.SITE_ID,

(

SELECT DISTINCT

q3.GROUP_VALUE,

q3.GROUP_CODESPACE

FROM dbo.ER_MINERALDEPOSITMODEL q3

WHERE q3.GML_ID = q2.GML_ID

FOR XML PATH('mineralDepositGroup'), TYPE

) AS __GROUP,

(

SELECT DISTINCT

q3.TYPE_VALUE,

q3.TYPE_CODESPACE

FROM dbo.ER_MINERALDEPOSITMODEL q3

WHERE q3.GML_ID = q2.GML_ID

FOR XML PATH('mineralDepositType'), TYPE

) AS __TYPE

FROM dbo.ER_MINERALDEPOSITMODEL q2

WHERE q2.SITE_ID = q1.SITE_ID

FOR XML PATH('classification'), TYPE

) AS __CLASSIFICATION,

(

-- Get associated commodities

SELECT

q2.GML_ID,

(

SELECT

q3.GML_NAME,

q3.GML_NAME_CODESPACE

FROM dbo.ER_COMMODITY q3

WHERE q3.GML_ID = q2.GML_ID

FOR XML PATH('name'), TYPE

) AS __NAME,

q2.COMMODITYNAME,

q2.COMMODITYNAME_CODESPACE,

q2.COMMODITYGROUP,

q2.COMMODITYRANK,

q2.SOURCE_URI

FROM dbo.ER_COMMODITY q2

WHERE q2.SOURCE_URI = q1.NAME_VALUE

AND q2.GML_NAME_CODESPACE = 'http://www.ietf.org/rfc/rfc2616' -- Temporary. Assuming that ER_COMMODITY shouldn't be denormalised.

FOR XML PATH('Commodity'), TYPE

) AS __COMMODITY_DESCRIPTION,

(

SELECT

q2.GML_ID,

q2.SITE_URI,

q2.MATERIAL_ROLE,

q2.LITHOLOGY_URI

FROM dbo.ER_EARTHRESOURCEMATERIAL q2

WHERE q2.SITE_URI = q1.URI

FOR XML PATH('composition'), TYPE

) AS __COMPOSITION,

(

SELECT

q2.GML_NAME

FROM dbo.ER_MININGACTIVITY q2

WHERE q2.DEPOSIT_URI = q1.NAME_VALUE

FOR XML PATH('resourceExtraction'), TYPE

) AS __RESOURCE_EXTRACTION,

(

SELECT

q2.GML_ID,

q2.SITE_ID,

q2.ESTIMATE_ID,

q2.CALCULATION_METHOD,

q2.DATE,

q2.SOURCE_REFERENCE,

q2.ORE,

q2.CATEGORY,

(

-- Get er:measureDetails

SELECT

q3.GML_ID,

q3.ESTIMATE_ID,

q3.COMMODITY_AMOUNT,

q3.COMMODITY_AMOUNT_UOM,

q3.CUTOFF_GRADE,

q3.CUTOFF_GRADE_UOM,

q3.GRADE,

q3.GRADE_UOM,

q3.COMMODITY_OF_INTEREST,

(

-- Get associated commodities

SELECT

q4.GML_ID,

q4.GML_NAME,

q4.GML_NAME_CODESPACE,

q4.COMMODITYNAME,

q4.COMMODITYNAME_CODESPACE,

q4.COMMODITYGROUP,

q4.COMMODITYRANK,

q4.SOURCE_URI

FROM dbo.ER_COMMODITY q4

WHERE q4.GML_NAME = q3.COMMODITY_OF_INTEREST

AND q4.GML_NAME_CODESPACE = 'http://www.ietf.org/rfc/rfc2616' -- Temporary. Assuming that ER_COMMODITY shouldn't be denormalised.

FOR XML PATH('Commodity'), TYPE

) AS __COMMODITY_OF_INTEREST

FROM dbo.ER_MINOCC_OREAMOUNT_MEASUREDETAILS q3

WHERE q3.ESTIMATE_ID = q2.ESTIMATE_ID

FOR XML PATH('CommodityMeasure'), TYPE

) AS __MEASURE_DETAILS

FROM dbo.ER_MINOCC_RESOURCE q2

WHERE q2.SITE_ID = q1.SITE_ID

FOR XML PATH('Resource'), TYPE

) AS __ORE_AMOUNT,

(

SELECT

q2.GML_ID,

q2.SITE_ID,

q2.ESTIMATE_ID,

q2.CALCULATION_METHOD,

q2.DATE,

q2.SOURCE_REFERENCE,

q2.ORE,

q2.CATEGORY,

(

-- Get er:measureDetails

SELECT

q3.GML_ID,

q3.ESTIMATE_ID,

q3.COMMODITY_AMOUNT,

q3.COMMODITY_AMOUNT_UOM,

q3.CUTOFF_GRADE,

q3.CUTOFF_GRADE_UOM,

q3.GRADE,

q3.GRADE_UOM,

q3.COMMODITY_OF_INTEREST,

(

-- Get associated commodities

SELECT

q4.GML_ID,

q4.GML_NAME,

q4.GML_NAME_CODESPACE,

q4.COMMODITYNAME,

q4.COMMODITYNAME_CODESPACE,

q4.COMMODITYGROUP,

q4.COMMODITYRANK,

q4.SOURCE_URI

FROM dbo.ER_COMMODITY q4

WHERE q4.GML_NAME = q3.COMMODITY_OF_INTEREST

AND q4.GML_NAME_CODESPACE = 'http://www.ietf.org/rfc/rfc2616' -- Temporary. Assuming that ER_COMMODITY shouldn't be denormalised.

FOR XML PATH('Commodity'), TYPE

) AS __COMMODITY_OF_INTEREST

FROM dbo.ER_MINOCC_OREAMOUNT_MEASUREDETAILS q3

WHERE q3.ESTIMATE_ID = q2.ESTIMATE_ID

FOR XML PATH('CommodityMeasure'), TYPE

) AS __MEASURE_DETAILS

FROM dbo.ER_MINOCC_RESERVE q2

WHERE q2.SITE_ID = q1.SITE_ID

FOR XML PATH('Reserve'), TYPE

) AS __ORE_AMOUNT

FROM dbo.ER_MINERALOCCURRENCE q1

WHERE q1.NAME_VALUE = q0.SPECIFICATION_URI

FOR XML PATH('MineralOccurrence'), TYPE

) AS __SPECIFICATION

FROM dbo.GSML_MAPPEDFEATURE q0

WHERE

SHAPE.STIntersects(

geometry::STGeomFromText('POLYGON((78.75 -54.5210814954436, 78.75 11.350796722383672, 352.08984375 11.350796722383672, 352.08984375 -54.5210814954436, 78.75 -54.5210814954436))', 4326)

) = 1

AND EXISTS(

SELECT TOP 1 1

FROM dbo.ER_MINERALOCCURRENCE q1

INNER JOIN dbo.ER_COMMODITY q2 ON q2.SOURCE_URI = q1.NAME_VALUE

WHERE

q1.NAME_VALUE = q0.SPECIFICATION_URI

AND q2.COMMODITYNAME = 'urn:cgi:classifier:GA:commodity:Ag'

)

FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

WHERE clause. The only part that changes according to WFS request.

Improvement options

• Normalization of underlying views• Will reduce the size of views by factor of 2-3• Will simplify SQL query

• Database views may be tailored for this specific solution• Proof-of-concept solution has been implemented on existing GSWA

database without any structural changes

• Proper use of spatial indices

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Query execution and post-processing

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Constructed SQL query

DBMS

Raw XML

XSLT transformation

GML

XSLT stylesheet

BenchmarkingData retrieval

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

3 6 12 24 48 96 192

384

7681,

536

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

Raw XML size

Number of Features

Fil

e S

ize

(byt

es)

3 6 12 24 48 96 192 384 7681,5360

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

Data retrieval and serialization

Number of Features

Dat

a re

trie

val

(ms)

BenchmarkingXSLT execution time

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

3 6 12 24 48 96 192 384 768 1,5360

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

XSLTC 4.0.30319 Saxon 9.1.0.2

Number of Features

(ms)

BenchmarkingXSLT processing (XSLTC vs. Saxon)

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

3 6 12 24 48 96 192 384 768 1,536

(ms)

Number of Features

Total XSLT processing time with XSLTC 4.0.30319

Execution time (ms)

XSLT compilation time (ms)

05,000

10,00015,00020,00025,00030,00035,00040,00045,000

3 6 12 24 48 96 192 384 768 1,536

(ms)

Number of Features

Total XSLT processing time with Saxon 9.1.0.2

Execution time (ms)

XSLT compilation time (ms)

0

5,000

10,000

15,000

20,000

25,000

3 6 12 24 48 96 192 384 768 1,536

(ms)

Number of Features

Total processing time with XSLTC 4.0.30319

Execution time (ms)

XSLT compilation time (ms)

Data retrieval (ms)

0

10,000

20,000

30,000

40,000

50,000

60,000

3 6 12 24 48 96 192 384 768 1,536

(ms)

Number of Features

Total processing time with Saxon 9.1.0.2

Execution time (ms)

XSLT compilation time (ms)

Data retrieval (ms)

Approximately 10 times faster than Saxon

Comparison with existing Web Feature Service

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

3 6 12 24 48 96 192 384 768 1,53610

100

1,000

10,000

100,000

1,000,000

10,000,000

XSLTC 4.0.30319

Saxon 9.1.0.2

Existing solution10/3/2011

Existing solution9/3/2011

Number of Features

Pro

cess

ing

tim

e (m

s)

On average 61 slower than proposed solution with Saxon processing pipeline and 269 times slower when using XSLTC processing pipeline.

Often timeouts after an hour of execution when requesting “too many” features.

Benefits

• No complex processing logic behind the scene• ORM is not a problem in this solution

• No Hibernate• No intermediate mapping to business objects

• Implements ‘Single SQL query per WFS request’ approach• All data retrieval logic is performed at the DBMS level that opens

up opportunities for further SQL query and execution improvements• No XSLT knowledge required from the user

• Unlike deegree or GIN Mediator• Internal XSLT processing is just natural when it comes to XML

processing• Highly optimised XSLT processors are readily available

• Feature chaining / nesting is transparent

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

1. Solve ORM problem properly? 12 months?

2. Andrea’s idea – caching

3. XSLT backend

4. More threading in feature building

5. Virtual tables

6. Something we’ve crossed out

7. XML record sets

8. Denormalized views – chain simple properties

9. Decorators on PGSDF…• to allow SQL smuggling past GeoAPI

Potential improvement options

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Post-processing

Data replicationXML DB Cache GML

Victor

Rini

Niels

Pavel

1. Solve ORM problem properly? 12 months?

2. Andrea’s idea – caching

3. XSLT backend

4. More threading in feature building

5. Virtual tables

6. Something we’ve crossed out

7. XML record sets

8. Denormalized views – chain simple properties

9. Decorators on PGSDF…• to allow SQL smuggling past GeoAPI

Potential improvement options

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Post-processing

Data replicationXML DB Cache GML

Victor

Rini

Niels

Pavel

<gsml:MappedFeature gml:id="gsml.mappedfeature.S0000263"> <gml:description>KAMBALDA</gml:description> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/mappedfeature/S0000263</gml:name> <gsml:observationMethod> <gsml:CGI_TermValue> <gsml:value codeSpace="http://www.dmp.wa.gov.au/371.aspx">Aerial Photograph</gsml:value> </gsml:CGI_TermValue> </gsml:observationMethod> <gsml:positionalAccuracy> <gsml:CGI_NumericValue> <gsml:principalValue uom="urn:ogc:def:nil:OGC::unknown">10.000</gsml:principalValue> </gsml:CGI_NumericValue> </gsml:positionalAccuracy> <gsml:samplingFrame xlink:href="http://resource.geosciml.org/uri-cgi/earthnaturalsurface" xlink:title="Earth's Natural Surface"/> <gsml:specification> <er:MineralOccurrence gml:id="er.mineraloccurrence.S0000263"> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/mineraloccurrence/S0000263</gml:name> <gml:name codeSpace="http://www.dmp.wa.gov.au/371.aspx">PARIS</gml:name> <gsml:observationMethod> <gsml:CGI_TermValue> <gsml:value codeSpace="http://urn.opengis.net/">urn:ogc:def:nil:OGC::missing</gsml:value> </gsml:CGI_TermValue> </gsml:observationMethod> <gsml:purpose>instance</gsml:purpose> <gsml:occurrence xlink:href="http://services.auscope.org/resource/feature/gswa/mappedfeature/S0000263"/> <er:commodityDescription> <er:Commodity gml:id="er.commodity.S0000263.121"> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/commodity/S0000263/121</gml:name> <gml:name codeSpace="http://www.dmp.wa.gov.au/371.aspx">Gold (Au)</gml:name> <er:commodityGroup codeSpace="http://www.dmp.wa.gov.au/371.aspx">Precious metal</er:commodityGroup> <er:commodityName codeSpace="urn:cgi:classifierScheme:GA:commodity">urn:cgi:classifier:GA:commodity:Au</er:commodityName> <er:commodityRank>1</er:commodityRank> <er:source xlink:href="http://services.auscope.org/resource/feature/gswa/mineraloccurrence/S0000263"/> </er:Commodity> </er:commodityDescription> <er:commodityDescription> <er:Commodity gml:id="er.commodity.S0000263.264"> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/commodity/S0000263/264</gml:name> <gml:name codeSpace="http://www.dmp.wa.gov.au/371.aspx">Silver</gml:name> <er:commodityGroup codeSpace="http://www.dmp.wa.gov.au/371.aspx">Precious metal</er:commodityGroup> <er:commodityName codeSpace="urn:cgi:classifierScheme:GA:commodity">urn:cgi:classifier:GA:commodity:Ag</er:commodityName> <er:source xlink:href="http://services.auscope.org/resource/feature/gswa/mineraloccurrence/S0000263"/> </er:Commodity> </er:commodityDescription> <er:commodityDescription> <er:Commodity gml:id="er.commodity.S0000263.88"> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/commodity/S0000263/88</gml:name> <gml:name codeSpace="http://www.dmp.wa.gov.au/371.aspx">Copper (Cu)</gml:name> <er:commodityGroup codeSpace="http://www.dmp.wa.gov.au/371.aspx">Base metal</er:commodityGroup> <er:commodityName codeSpace="urn:cgi:classifierScheme:GA:commodity">urn:cgi:classifier:GA:commodity:Cu</er:commodityName> <er:source xlink:href="http://services.auscope.org/resource/feature/gswa/mineraloccurrence/S0000263"/> </er:Commodity> </er:commodityDescription> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/183"/> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/184"/> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/185"/> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/186"/> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/187"/> <er:type>mineral deposit</er:type> </er:MineralOccurrence> </gsml:specification> <gsml:shape> <Point srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns="http://www.opengis.net/gml"> <pos>121.975881 -31.588763</pos> </Point> </gsml:shape></gsml:MappedFeature>

GML Caching

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

DBMS

Database as persistent layer for XML Cache

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Every record that represents a feature in the database is also associated with its

GML representation in an XML field which is readily available for retrieval.

The same WHERE clause as before

Simplification of SQL query

SELECT TOP 50

q0.XML_CACHE

FROM dbo.GSML_MAPPEDFEATURE q0

WHERE

SHAPE.STIntersects(

geometry::STGeomFromText('POLYGON((78.75 -54.5210814954436,

78.75 11.350796722383672, 352.08984375 11.350796722383672,

352.08984375 -54.5210814954436, 78.75 -54.5210814954436))', 4326)

) = 1

AND EXISTS(

SELECT TOP 1 1

FROM dbo.ER_MINERALOCCURRENCE q1

INNER JOIN dbo.ER_COMMODITY q2 ON q2.SOURCE_URI = q1.NAME_VALUE

WHERE

q1.NAME_VALUE = q0.SPECIFICATION_URI

AND q2.COMMODITYNAME = 'urn:cgi:classifier:GA:commodity:Ag'

)

FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Same mechanism may be used to limit the amount of returned features. Support for paging.

XML_CACHE is the only field you need to retrieve. No formidable subqueries.

Cached vs. Non-cached data retrieval

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

3 6 12 24 48 96 192 384 768 1,5360

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

Non-cached retrieval

Cached retrieval

Number of Features

Exe

cuti

on

tim

e (m

s)

Retrieval from XML Cache is on average 3.6 times faster.

Total processing timeRetrieval from Source vs. XML Cache

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

3 6 12 24 48 96 192 384 768 1,5360

10,000

20,000

30,000

40,000

50,000

60,000

Non-cached data retrieval with XSLTC processing Cached data retrieval with XSLTC processing

Non-cached data retrieval with Saxon processing Cached data retrieval with Saxon processing

Number of Features

To

tal

exec

uti

on

tim

e (m

s)

Total processing time when using data from XML Cache is on average 2.1 times faster when using Saxon processing pipeline and

3.5 times faster when using XSLTC processing pipeline.

The bottom line

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

While people are struggling for 5% in performance

improvement…

…this solution gives 98.36% performance boost (61 times faster) even when not using XML Cache and 99.06% performance boost (106 times faster) when using XML Cache* using Saxon processing pipeline

… or 99.63% (269 times faster) and 99.88% (822 times faster) respectively when using XSLTC processing pipeline.

Disclaimer

This is an experimental solution and is not guaranteed to be bug free and/or fool-proof. Some aspects have not been investigated thoroughly and require additional in-depth research. However, proposed approach has been tested on an existing production-grade dataset and proved itself as a feasible solution. All intermediate SQL queries and XSLT stylesheets have been hand-crafted according to the proposed approach. SQL and XSLT stylesheet generation process needs to be automated in the final product.

Although proposed approach has been tested on the most sophisticated feature type we had implemented in the existing GeoServer environment it may need more extensive testing with other information models and use cases.

CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.

Contact UsPhone: 1300 363 400 or +61 3 9545 2176

Email: [email protected] Web: www.csiro.au

Thank you

CSIRO Earth Science and Resource EngineeringPavel GolodoniucComputer scientist

Phone: +61 8 6436 8776Email: [email protected]: www.csiro.au/cesre