GeoServer Experimental Solution Exploiting XML record sets Pavel Golodoniuc Computer scientist Perth...
-
Upload
oswald-mcbride -
Category
Documents
-
view
231 -
download
0
Transcript of GeoServer Experimental Solution Exploiting XML record sets Pavel Golodoniuc Computer scientist Perth...
GeoServer Experimental SolutionExploiting XML record sets
Pavel Golodoniuc
Computer scientist
Perth – 11 March 2011
1. Solve ORM problem properly? 12 months?
2. Andrea’s idea – caching
3. XSLT backend
4. More threading in feature building
5. Virtual tables
6. Something we’ve crossed out
7. XML record sets
8. Denormalized views – chain simple properties
9. Decorators on PGSDF…• to allow SQL smuggling past GeoAPI
Potential improvement options
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Post-processing
Data replicationXML DB Cache GML
Victor
Rini
Niels
Pavel
1. Solve ORM problem properly? 12 months?
2. Andrea’s idea – caching
3. XSLT backend
4. More threading in feature building
5. Virtual tables
6. Something we’ve crossed out
7. XML record sets
8. Denormalized views – chain simple properties
9. Decorators on PGSDF…• to allow SQL smuggling past GeoAPI
Potential improvement options
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Post-processing
Data replicationXML DB Cache GML
Victor
Rini
Niels
Pavel
Idea’s foundation: XML record sets
• XML data type support
• FOR XML clause that enables you to further enhance the XML support in your applications and to write easy-to-maintain relational data-to-XML aggregations
• AUTO mode that employs a heuristic to infer a simple, one element name-per-level hierarchy based on the lineage information and the order of the data in a SELECT statement
• Generation of the XML in a streamable way that allows to produce large documents efficiently
• XSLT-based post-processing
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Goals
• One SQL query per WFS request – Goal #1
• Generic solution that may use present configuration approach
• Straightforward data processing
• Extensible and optimisable solution
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
“Big picture”
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
WFS config DB schema
PrefabricatedSQL SELECT statements
Dynamically generatedXLST stylesheet
+
IncomingWFS request
Generated WHERE clausefor SQL query
+
DBMS
Raw XMLXSLT
transformation
GML
Start-up costs Runtime costs
Generation of SQL queries – 1 of 4Top level request – simple data retrieval pattern
SELECT TOP 50
q0.GML_ID,
q0.DESCRIPTION,
q0.GML_NAME,
q0.OBSERVATION_METHOD,
q0.OBSERVATION_METHOD_CODESPACE,
q0.SPECIFICATION_URI,
q0.SHAPE.AsGml() AS 'SHAPE',
'http://www.opengis.net/gml/srs/epsg.xml#' + CONVERT(VARCHAR(10), q0.SHAPE.STSrid) AS 'SHAPE/srsName',
q0.POSITIONAL_ACCURACY,
q0.POSITIONAL_ACCURACY_UOM
FROM dbo.GSML_MAPPEDFEATURE q0
FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Limiting the amount of returned features at the database level. Support for paging.
Simple single-valued properties.Geometry objects are encoded as GML
and returned along with supplemental attributes (e.g. SRID).
Source table/view name is taken from FeatureTypeMapping/sourceType element in
the configuration file.
All records are serialized in XML at the database level.
Generation of SQL queries – 2 of 4Property nesting pattern
SELECT TOP 50
-- All single-valued properties are retrieved here.
(
-- Get associated mineral occurrences.
SELECT
q1.GML_ID,
q1.SITE_ID,
q1.URI,
q1.OCCURRENCE_URI,
q1.DEPTH,
q1.LENGTH,
q1.WIDTH,
q1.TYPE
-- All other multi-valued properties go here.
FROM dbo.ER_MINERALOCCURRENCE q1
WHERE q1.NAME_VALUE = q0.SPECIFICATION_URI
FOR XML PATH('MineralOccurrence'), TYPE
) AS __SPECIFICATION
FROM dbo.GSML_MAPPEDFEATURE q0
FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Use the same pattern as before to get single-valued properties.
All other simple and nested multi-valued properties are encoded using the same
patterns as many time as required.
Subquery is also serialised in XML under __SPECIFICATION element.
Property nesting pattern
Linkage according to the mapping file:
<AttributeMapping> <targetAttribute> gsml:specification </targetAttribute> <sourceExpression> <OCQL>SPECIFICATION_URI</OCQL> <linkElement>er:MineralOccurrence</linkElement> <linkField>gml:name</linkField> </sourceExpression></AttributeMapping>
Generation of SQL queries – 3 of 4Encoding pattern for polymorphic properties
SELECT TOP 50
-- Omitted for simplicity.
(
-- Get associated mineral occurrences.
SELECT
-- Omitted for simplicity.
(
SELECT
-- Get properties for er:Resource objects.
FROM dbo.ER_MINOCC_RESOURCE q2
WHERE q2.SITE_ID = q1.SITE_ID
FOR XML PATH('Resource'), TYPE
) AS __ORE_AMOUNT,
(
SELECT
-- Get properties for er:Reserve objects.
FROM dbo.ER_MINOCC_RESERVE q2
WHERE q2.SITE_ID = q1.SITE_ID
FOR XML PATH('Reserve'), TYPE
) AS __ORE_AMOUNT
FROM dbo.ER_MINERALOCCURRENCE q1
WHERE q1.NAME_VALUE = q0.SPECIFICATION_URI
FOR XML PATH('MineralOccurrence'), TYPE
) AS __SPECIFICATION
FROM dbo.GSML_MAPPEDFEATURE q0
FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Encoding of polymorphic properties
Linkage according to the mapping file for er:Resource:
<AttributeMapping> <targetAttribute> er:oreAmount </targetAttribute> <sourceExpression> <OCQL>SITE_ID</OCQL> <linkElement>'_er_mo_oreAmount_Resource'</linkElement> <linkField>FEATURE_LINK</linkField> </sourceExpression> <isMultiple>true</isMultiple></AttributeMapping>
Linkage according to the mapping file for er:Reserve:
<AttributeMapping> <targetAttribute> er:oreAmount </targetAttribute> <sourceExpression> <OCQL>SITE_ID</OCQL> <linkElement>'_er_mo_oreAmount_Reserve'</linkElement> <linkField>FEATURE_LINK</linkField> </sourceExpression> <isMultiple>true</isMultiple></AttributeMapping>
Both properties are mapped to the same intermediate element.
Different types are disambiguated by the name of sub-element.
Generation of SQL queries – 4 of 4Summary
• Straightforward iterative process• Follows a few very simple encoding patterns• Every nested property adds another subquery to the parent query• Results in a formidable “query from hell”
• Database-specific routine• Depends on a specific SQL dialect
• Implements ‘Single SQL query per WFS request’ approach• XML record set is structured and is easy to process• XML record set is self-sufficient and contains all the required
information that is needed to encode target feature type
• Process needs to be performed once during start-up
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Some interesting facts
The query that is used to obtain all the information required for encoding of gsml:MappedFeature feature type according toAuScope EarthResourceML Profile 1.1…
• Consists of 250+ lines of SQL code (without WHERE clause)• Is a query with 20 subqueries• Retrieves data from 13 views• All views involved in query contain 881,184 records in total• Serializes and returns data in XML format automatically
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Generation of XSLT stylesheets
• Similar iterative process• Follows a few very simple encoding patterns• Each feature type and nested property have their own XSLT
template for processing• XSLT may be generated in parallel with SQL query generation
• All information required for generation of XSLT stylesheets is obtained from GeoServer configuration files and XML Schemas
• XSLT transformation also contains a few helper templates
• Process needs to be performed once during start-up
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
• Conversion of WFS filters into SQL WHERE clauses for prefabricated SQL SELECT statements
• SQL query execution
• Conversion of raw XML obtained from database into GML using pre-generated XSLT stylesheet
• Prefabrication of SQL SELECT statements
• Generation of XSLT stylesheet for each feature type
Start-up vs. Runtime costs
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Start-up costs Runtime costs
Conversion of WFS filters into SQL
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
<?xml version="1.0" encoding="UTF-8"?><wfs:GetFeature service="WFS" version="1.1.0" xmlns:wfs="http://www.opengis.net/wfs" xmlns:ogc="http://www.opengis.net/ogc" xmlns:gml="http://www.opengis.net/gml" xmlns:er="urn:cgi:xmlns:GGIC:EarthResource:1.1" maxFeatures="50"> <wfs:Query typeName="gsml:MappedFeature" srsName="EPSG:4326"> <ogc:Filter> <ogc:And> <ogc:BBOX> <ogc:PropertyName>gsml:shape</ogc:PropertyName> <gml:Envelope srsName="EPSG:4326"> <gml:lowerCorner>78.75 -54.5210814954436 </gml:lowerCorner> <gml:upperCorner>352.08984375 11.350796722383672 </gml:upperCorner> </gml:Envelope> </ogc:BBOX> <ogc:PropertyIsEqualTo> <ogc:PropertyName>gsml:specification/er:MineralOccurrence/er:commodityDescription/er:Commodity/er:commodityName</ogc:PropertyName> <ogc:Literal>urn:cgi:classifier:GA:commodity:Ag</ogc:Literal> </ogc:PropertyIsEqualTo> </ogc:And> </ogc:Filter> </wfs:Query></wfs:GetFeature>
WHERE SHAPE.STIntersects( geometry::STGeomFromText('POLYGON((78.75 -54.5210814954436, 78.75 11.350796722383672, 352.08984375 11.350796722383672, 352.08984375 -54.5210814954436, 78.75 -54.5210814954436))', 4326) ) = 1 AND EXISTS( SELECT TOP 1 1 FROM dbo.ER_MINERALOCCURRENCE q1 INNER JOIN dbo.ER_COMMODITY q2 ON q2.SOURCE_URI = q1.NAME_VALUE WHERE q1.NAME_VALUE = q0.SPECIFICATION_URI AND q2.COMMODITYNAME = 'urn:cgi:classifier:GA:commodity:Ag' )
BBOX filter is directly mapped to STIntersects function.
“XPath” expression is tracked down to identify the source of the underlying property.
Logical operations are preserved in the generated SQL query.
Spatial Reference System is taken into account in the
generated SQL query.
Literal operand is passed directly into SQL query.
Final query
SELECT TOP 50
q0.GML_ID,
q0.DESCRIPTION,
q0.GML_NAME,
q0.OBSERVATION_METHOD,
q0.OBSERVATION_METHOD_CODESPACE,
q0.SPECIFICATION_URI,
q0.SHAPE.AsGml() AS 'SHAPE',
'http://www.opengis.net/gml/srs/epsg.xml#' + CONVERT(VARCHAR(10), q0.SHAPE.STSrid) AS 'SHAPE/srsName',
q0.POSITIONAL_ACCURACY,
q0.POSITIONAL_ACCURACY_UOM,
(
-- Get associated mineral occurrences.
SELECT
q1.GML_ID,
q1.SITE_ID,
q1.URI,
q1.OCCURRENCE_URI,
(
SELECT
q2.NAME_VALUE,
q2.NAME_CODESPACE
FROM dbo.ER_MINERALOCCURRENCE q2
WHERE q2.GML_ID = q1.GML_ID
FOR XML PATH('name'), TYPE
) AS __NAME,
q1.DEPTH,
q1.LENGTH,
q1.WIDTH,
q1.TYPE,
(
-- Get er:expression attributes
SELECT
q2.EXPRESSION
FROM dbo.ER_MINOCC_EXPRESSION q2
WHERE q2.SITE_ID = q1.SITE_ID
FOR XML PATH('expression'), TYPE
) AS __EXPRESSION,
(
-- Get er:form attributes
SELECT
q2.FORM
FROM dbo.ER_MINOCC_FORM q2
WHERE q2.SITE_ID = q1.SITE_ID
FOR XML PATH('form'), TYPE
) AS __FORM,
(
SELECT
q2.GML_ID,
q2.SITE_ID,
q2.DIP_NUMERIC,
q2.DIP_STRING,
q2.AZIMUTH_NUMERIC,
q2.AZIMUTH_STRING,
q2.CONVENTION
FROM dbo.ER_MINOCC_PLANAR_ORIENTATION q2
WHERE q2.SITE_ID = q1.SITE_ID
FOR XML PATH('planarOrientation'), TYPE
) AS __PLANAR_ORIENTATION,
(
-- Get er:shape attributes
SELECT
q2.SHAPE
FROM dbo.ER_MINOCC_SHAPE q2
WHERE q2.SITE_ID = q1.SITE_ID
FOR XML PATH('shape'), TYPE
) AS __SHAPE,
(
SELECT
TOP 1 -- Temporary. Assuming that ER_MINERALDEPOSITMODEL shouldn't be denormalised.
q2.GML_ID,
q2.SITE_ID,
(
SELECT DISTINCT
q3.GROUP_VALUE,
q3.GROUP_CODESPACE
FROM dbo.ER_MINERALDEPOSITMODEL q3
WHERE q3.GML_ID = q2.GML_ID
FOR XML PATH('mineralDepositGroup'), TYPE
) AS __GROUP,
(
SELECT DISTINCT
q3.TYPE_VALUE,
q3.TYPE_CODESPACE
FROM dbo.ER_MINERALDEPOSITMODEL q3
WHERE q3.GML_ID = q2.GML_ID
FOR XML PATH('mineralDepositType'), TYPE
) AS __TYPE
FROM dbo.ER_MINERALDEPOSITMODEL q2
WHERE q2.SITE_ID = q1.SITE_ID
FOR XML PATH('classification'), TYPE
) AS __CLASSIFICATION,
(
-- Get associated commodities
SELECT
q2.GML_ID,
(
SELECT
q3.GML_NAME,
q3.GML_NAME_CODESPACE
FROM dbo.ER_COMMODITY q3
WHERE q3.GML_ID = q2.GML_ID
FOR XML PATH('name'), TYPE
) AS __NAME,
q2.COMMODITYNAME,
q2.COMMODITYNAME_CODESPACE,
q2.COMMODITYGROUP,
q2.COMMODITYRANK,
q2.SOURCE_URI
FROM dbo.ER_COMMODITY q2
WHERE q2.SOURCE_URI = q1.NAME_VALUE
AND q2.GML_NAME_CODESPACE = 'http://www.ietf.org/rfc/rfc2616' -- Temporary. Assuming that ER_COMMODITY shouldn't be denormalised.
FOR XML PATH('Commodity'), TYPE
) AS __COMMODITY_DESCRIPTION,
(
SELECT
q2.GML_ID,
q2.SITE_URI,
q2.MATERIAL_ROLE,
q2.LITHOLOGY_URI
FROM dbo.ER_EARTHRESOURCEMATERIAL q2
WHERE q2.SITE_URI = q1.URI
FOR XML PATH('composition'), TYPE
) AS __COMPOSITION,
(
SELECT
q2.GML_NAME
FROM dbo.ER_MININGACTIVITY q2
WHERE q2.DEPOSIT_URI = q1.NAME_VALUE
FOR XML PATH('resourceExtraction'), TYPE
) AS __RESOURCE_EXTRACTION,
(
SELECT
q2.GML_ID,
q2.SITE_ID,
q2.ESTIMATE_ID,
q2.CALCULATION_METHOD,
q2.DATE,
q2.SOURCE_REFERENCE,
q2.ORE,
q2.CATEGORY,
(
-- Get er:measureDetails
SELECT
q3.GML_ID,
q3.ESTIMATE_ID,
q3.COMMODITY_AMOUNT,
q3.COMMODITY_AMOUNT_UOM,
q3.CUTOFF_GRADE,
q3.CUTOFF_GRADE_UOM,
q3.GRADE,
q3.GRADE_UOM,
q3.COMMODITY_OF_INTEREST,
(
-- Get associated commodities
SELECT
q4.GML_ID,
q4.GML_NAME,
q4.GML_NAME_CODESPACE,
q4.COMMODITYNAME,
q4.COMMODITYNAME_CODESPACE,
q4.COMMODITYGROUP,
q4.COMMODITYRANK,
q4.SOURCE_URI
FROM dbo.ER_COMMODITY q4
WHERE q4.GML_NAME = q3.COMMODITY_OF_INTEREST
AND q4.GML_NAME_CODESPACE = 'http://www.ietf.org/rfc/rfc2616' -- Temporary. Assuming that ER_COMMODITY shouldn't be denormalised.
FOR XML PATH('Commodity'), TYPE
) AS __COMMODITY_OF_INTEREST
FROM dbo.ER_MINOCC_OREAMOUNT_MEASUREDETAILS q3
WHERE q3.ESTIMATE_ID = q2.ESTIMATE_ID
FOR XML PATH('CommodityMeasure'), TYPE
) AS __MEASURE_DETAILS
FROM dbo.ER_MINOCC_RESOURCE q2
WHERE q2.SITE_ID = q1.SITE_ID
FOR XML PATH('Resource'), TYPE
) AS __ORE_AMOUNT,
(
SELECT
q2.GML_ID,
q2.SITE_ID,
q2.ESTIMATE_ID,
q2.CALCULATION_METHOD,
q2.DATE,
q2.SOURCE_REFERENCE,
q2.ORE,
q2.CATEGORY,
(
-- Get er:measureDetails
SELECT
q3.GML_ID,
q3.ESTIMATE_ID,
q3.COMMODITY_AMOUNT,
q3.COMMODITY_AMOUNT_UOM,
q3.CUTOFF_GRADE,
q3.CUTOFF_GRADE_UOM,
q3.GRADE,
q3.GRADE_UOM,
q3.COMMODITY_OF_INTEREST,
(
-- Get associated commodities
SELECT
q4.GML_ID,
q4.GML_NAME,
q4.GML_NAME_CODESPACE,
q4.COMMODITYNAME,
q4.COMMODITYNAME_CODESPACE,
q4.COMMODITYGROUP,
q4.COMMODITYRANK,
q4.SOURCE_URI
FROM dbo.ER_COMMODITY q4
WHERE q4.GML_NAME = q3.COMMODITY_OF_INTEREST
AND q4.GML_NAME_CODESPACE = 'http://www.ietf.org/rfc/rfc2616' -- Temporary. Assuming that ER_COMMODITY shouldn't be denormalised.
FOR XML PATH('Commodity'), TYPE
) AS __COMMODITY_OF_INTEREST
FROM dbo.ER_MINOCC_OREAMOUNT_MEASUREDETAILS q3
WHERE q3.ESTIMATE_ID = q2.ESTIMATE_ID
FOR XML PATH('CommodityMeasure'), TYPE
) AS __MEASURE_DETAILS
FROM dbo.ER_MINOCC_RESERVE q2
WHERE q2.SITE_ID = q1.SITE_ID
FOR XML PATH('Reserve'), TYPE
) AS __ORE_AMOUNT
FROM dbo.ER_MINERALOCCURRENCE q1
WHERE q1.NAME_VALUE = q0.SPECIFICATION_URI
FOR XML PATH('MineralOccurrence'), TYPE
) AS __SPECIFICATION
FROM dbo.GSML_MAPPEDFEATURE q0
WHERE
SHAPE.STIntersects(
geometry::STGeomFromText('POLYGON((78.75 -54.5210814954436, 78.75 11.350796722383672, 352.08984375 11.350796722383672, 352.08984375 -54.5210814954436, 78.75 -54.5210814954436))', 4326)
) = 1
AND EXISTS(
SELECT TOP 1 1
FROM dbo.ER_MINERALOCCURRENCE q1
INNER JOIN dbo.ER_COMMODITY q2 ON q2.SOURCE_URI = q1.NAME_VALUE
WHERE
q1.NAME_VALUE = q0.SPECIFICATION_URI
AND q2.COMMODITYNAME = 'urn:cgi:classifier:GA:commodity:Ag'
)
FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
WHERE clause. The only part that changes according to WFS request.
Improvement options
• Normalization of underlying views• Will reduce the size of views by factor of 2-3• Will simplify SQL query
• Database views may be tailored for this specific solution• Proof-of-concept solution has been implemented on existing GSWA
database without any structural changes
• Proper use of spatial indices
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Query execution and post-processing
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Constructed SQL query
DBMS
Raw XML
XSLT transformation
GML
XSLT stylesheet
BenchmarkingData retrieval
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
3 6 12 24 48 96 192
384
7681,
536
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
Raw XML size
Number of Features
Fil
e S
ize
(byt
es)
3 6 12 24 48 96 192 384 7681,5360
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
Data retrieval and serialization
Number of Features
Dat
a re
trie
val
(ms)
BenchmarkingXSLT execution time
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
3 6 12 24 48 96 192 384 768 1,5360
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
XSLTC 4.0.30319 Saxon 9.1.0.2
Number of Features
(ms)
BenchmarkingXSLT processing (XSLTC vs. Saxon)
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
3 6 12 24 48 96 192 384 768 1,536
(ms)
Number of Features
Total XSLT processing time with XSLTC 4.0.30319
Execution time (ms)
XSLT compilation time (ms)
05,000
10,00015,00020,00025,00030,00035,00040,00045,000
3 6 12 24 48 96 192 384 768 1,536
(ms)
Number of Features
Total XSLT processing time with Saxon 9.1.0.2
Execution time (ms)
XSLT compilation time (ms)
0
5,000
10,000
15,000
20,000
25,000
3 6 12 24 48 96 192 384 768 1,536
(ms)
Number of Features
Total processing time with XSLTC 4.0.30319
Execution time (ms)
XSLT compilation time (ms)
Data retrieval (ms)
0
10,000
20,000
30,000
40,000
50,000
60,000
3 6 12 24 48 96 192 384 768 1,536
(ms)
Number of Features
Total processing time with Saxon 9.1.0.2
Execution time (ms)
XSLT compilation time (ms)
Data retrieval (ms)
Approximately 10 times faster than Saxon
Comparison with existing Web Feature Service
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
3 6 12 24 48 96 192 384 768 1,53610
100
1,000
10,000
100,000
1,000,000
10,000,000
XSLTC 4.0.30319
Saxon 9.1.0.2
Existing solution10/3/2011
Existing solution9/3/2011
Number of Features
Pro
cess
ing
tim
e (m
s)
On average 61 slower than proposed solution with Saxon processing pipeline and 269 times slower when using XSLTC processing pipeline.
Often timeouts after an hour of execution when requesting “too many” features.
Benefits
• No complex processing logic behind the scene• ORM is not a problem in this solution
• No Hibernate• No intermediate mapping to business objects
• Implements ‘Single SQL query per WFS request’ approach• All data retrieval logic is performed at the DBMS level that opens
up opportunities for further SQL query and execution improvements• No XSLT knowledge required from the user
• Unlike deegree or GIN Mediator• Internal XSLT processing is just natural when it comes to XML
processing• Highly optimised XSLT processors are readily available
• Feature chaining / nesting is transparent
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
1. Solve ORM problem properly? 12 months?
2. Andrea’s idea – caching
3. XSLT backend
4. More threading in feature building
5. Virtual tables
6. Something we’ve crossed out
7. XML record sets
8. Denormalized views – chain simple properties
9. Decorators on PGSDF…• to allow SQL smuggling past GeoAPI
Potential improvement options
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Post-processing
Data replicationXML DB Cache GML
Victor
Rini
Niels
Pavel
1. Solve ORM problem properly? 12 months?
2. Andrea’s idea – caching
3. XSLT backend
4. More threading in feature building
5. Virtual tables
6. Something we’ve crossed out
7. XML record sets
8. Denormalized views – chain simple properties
9. Decorators on PGSDF…• to allow SQL smuggling past GeoAPI
Potential improvement options
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Post-processing
Data replicationXML DB Cache GML
Victor
Rini
Niels
Pavel
<gsml:MappedFeature gml:id="gsml.mappedfeature.S0000263"> <gml:description>KAMBALDA</gml:description> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/mappedfeature/S0000263</gml:name> <gsml:observationMethod> <gsml:CGI_TermValue> <gsml:value codeSpace="http://www.dmp.wa.gov.au/371.aspx">Aerial Photograph</gsml:value> </gsml:CGI_TermValue> </gsml:observationMethod> <gsml:positionalAccuracy> <gsml:CGI_NumericValue> <gsml:principalValue uom="urn:ogc:def:nil:OGC::unknown">10.000</gsml:principalValue> </gsml:CGI_NumericValue> </gsml:positionalAccuracy> <gsml:samplingFrame xlink:href="http://resource.geosciml.org/uri-cgi/earthnaturalsurface" xlink:title="Earth's Natural Surface"/> <gsml:specification> <er:MineralOccurrence gml:id="er.mineraloccurrence.S0000263"> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/mineraloccurrence/S0000263</gml:name> <gml:name codeSpace="http://www.dmp.wa.gov.au/371.aspx">PARIS</gml:name> <gsml:observationMethod> <gsml:CGI_TermValue> <gsml:value codeSpace="http://urn.opengis.net/">urn:ogc:def:nil:OGC::missing</gsml:value> </gsml:CGI_TermValue> </gsml:observationMethod> <gsml:purpose>instance</gsml:purpose> <gsml:occurrence xlink:href="http://services.auscope.org/resource/feature/gswa/mappedfeature/S0000263"/> <er:commodityDescription> <er:Commodity gml:id="er.commodity.S0000263.121"> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/commodity/S0000263/121</gml:name> <gml:name codeSpace="http://www.dmp.wa.gov.au/371.aspx">Gold (Au)</gml:name> <er:commodityGroup codeSpace="http://www.dmp.wa.gov.au/371.aspx">Precious metal</er:commodityGroup> <er:commodityName codeSpace="urn:cgi:classifierScheme:GA:commodity">urn:cgi:classifier:GA:commodity:Au</er:commodityName> <er:commodityRank>1</er:commodityRank> <er:source xlink:href="http://services.auscope.org/resource/feature/gswa/mineraloccurrence/S0000263"/> </er:Commodity> </er:commodityDescription> <er:commodityDescription> <er:Commodity gml:id="er.commodity.S0000263.264"> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/commodity/S0000263/264</gml:name> <gml:name codeSpace="http://www.dmp.wa.gov.au/371.aspx">Silver</gml:name> <er:commodityGroup codeSpace="http://www.dmp.wa.gov.au/371.aspx">Precious metal</er:commodityGroup> <er:commodityName codeSpace="urn:cgi:classifierScheme:GA:commodity">urn:cgi:classifier:GA:commodity:Ag</er:commodityName> <er:source xlink:href="http://services.auscope.org/resource/feature/gswa/mineraloccurrence/S0000263"/> </er:Commodity> </er:commodityDescription> <er:commodityDescription> <er:Commodity gml:id="er.commodity.S0000263.88"> <gml:name codeSpace="http://www.ietf.org/rfc/rfc2616">http://services.auscope.org/resource/feature/gswa/commodity/S0000263/88</gml:name> <gml:name codeSpace="http://www.dmp.wa.gov.au/371.aspx">Copper (Cu)</gml:name> <er:commodityGroup codeSpace="http://www.dmp.wa.gov.au/371.aspx">Base metal</er:commodityGroup> <er:commodityName codeSpace="urn:cgi:classifierScheme:GA:commodity">urn:cgi:classifier:GA:commodity:Cu</er:commodityName> <er:source xlink:href="http://services.auscope.org/resource/feature/gswa/mineraloccurrence/S0000263"/> </er:Commodity> </er:commodityDescription> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/183"/> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/184"/> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/185"/> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/186"/> <er:resourceExtraction xlink:href="http://services.auscope.org/resource/feature/gswa/miningactivity/S0000263/187"/> <er:type>mineral deposit</er:type> </er:MineralOccurrence> </gsml:specification> <gsml:shape> <Point srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns="http://www.opengis.net/gml"> <pos>121.975881 -31.588763</pos> </Point> </gsml:shape></gsml:MappedFeature>
GML Caching
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
DBMS
Database as persistent layer for XML Cache
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Every record that represents a feature in the database is also associated with its
GML representation in an XML field which is readily available for retrieval.
The same WHERE clause as before
Simplification of SQL query
SELECT TOP 50
q0.XML_CACHE
FROM dbo.GSML_MAPPEDFEATURE q0
WHERE
SHAPE.STIntersects(
geometry::STGeomFromText('POLYGON((78.75 -54.5210814954436,
78.75 11.350796722383672, 352.08984375 11.350796722383672,
352.08984375 -54.5210814954436, 78.75 -54.5210814954436))', 4326)
) = 1
AND EXISTS(
SELECT TOP 1 1
FROM dbo.ER_MINERALOCCURRENCE q1
INNER JOIN dbo.ER_COMMODITY q2 ON q2.SOURCE_URI = q1.NAME_VALUE
WHERE
q1.NAME_VALUE = q0.SPECIFICATION_URI
AND q2.COMMODITYNAME = 'urn:cgi:classifier:GA:commodity:Ag'
)
FOR XML PATH('MappedFeature'), ROOT('RawFeatureCollection');
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Same mechanism may be used to limit the amount of returned features. Support for paging.
XML_CACHE is the only field you need to retrieve. No formidable subqueries.
Cached vs. Non-cached data retrieval
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
3 6 12 24 48 96 192 384 768 1,5360
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
Non-cached retrieval
Cached retrieval
Number of Features
Exe
cuti
on
tim
e (m
s)
Retrieval from XML Cache is on average 3.6 times faster.
Total processing timeRetrieval from Source vs. XML Cache
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
3 6 12 24 48 96 192 384 768 1,5360
10,000
20,000
30,000
40,000
50,000
60,000
Non-cached data retrieval with XSLTC processing Cached data retrieval with XSLTC processing
Non-cached data retrieval with Saxon processing Cached data retrieval with Saxon processing
Number of Features
To
tal
exec
uti
on
tim
e (m
s)
Total processing time when using data from XML Cache is on average 2.1 times faster when using Saxon processing pipeline and
3.5 times faster when using XSLTC processing pipeline.
The bottom line
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
While people are struggling for 5% in performance
improvement…
…this solution gives 98.36% performance boost (61 times faster) even when not using XML Cache and 99.06% performance boost (106 times faster) when using XML Cache* using Saxon processing pipeline
… or 99.63% (269 times faster) and 99.88% (822 times faster) respectively when using XSLTC processing pipeline.
Disclaimer
This is an experimental solution and is not guaranteed to be bug free and/or fool-proof. Some aspects have not been investigated thoroughly and require additional in-depth research. However, proposed approach has been tested on an existing production-grade dataset and proved itself as a feasible solution. All intermediate SQL queries and XSLT stylesheets have been hand-crafted according to the proposed approach. SQL and XSLT stylesheet generation process needs to be automated in the final product.
Although proposed approach has been tested on the most sophisticated feature type we had implemented in the existing GeoServer environment it may need more extensive testing with other information models and use cases.
CSIRO. GeoServer Experimental Solution: Exploiting XML record sets.
Contact UsPhone: 1300 363 400 or +61 3 9545 2176
Email: [email protected] Web: www.csiro.au
Thank you
CSIRO Earth Science and Resource EngineeringPavel GolodoniucComputer scientist
Phone: +61 8 6436 8776Email: [email protected]: www.csiro.au/cesre