Cleansing land ownership data, an FME use case - David Eagle
-
Upload
association-for-geographic-information-agi -
Category
Technology
-
view
331 -
download
0
description
Transcript of Cleansing land ownership data, an FME use case - David Eagle
Cleansing land ownership data, an FME use case
David EaglePrincipal [email protected] @david_eagle
Agenda
• 1Spatial• Asset management, the case for good data• The data challenge• Technical solution
– Regex and Lists
• Benefits
• Founded in 1969– Part of the Cambridge Tech Cluster
• Headquarters in Cambridge, UK– International offices in Australia, Ireland,
Belgium & France
• A group of innovative, market leading technology companies:
Our Customers•A specialist provider to National Mapping and Charting Agencies, Government, Defence and Utilities
Our Partners
Customer Case Study
• Fisher German– Multi-discipline firm of Chartered Surveyors, Town
Planners, Property Consultants & Specialist Engineers– Management of:
• 4000km of high pressure oil pipeline• 2500km fibre network
– Creators of:• www.linesearchbeforeudig.co.uk a free to use enquiry tool used
by BT, HA, Utilities, Local Gov’t etc• >45 members with protected assets such as:
Linear Asset Management
• Key role is management and protection of buried and overhead assets:– High pressure oil and gas pipelines– Fibre optics– Overhead power lines
• Need to ensure access to assets for inspection, maintenance, upgrade and safety.
• Document, maintain and manage details of land ownership in the vicinity of assets.
Why is Linear Asset Management Important?
Hunton Hill – Birmingham
Shop - New gas supply connection
25mm PE connection to a 150mm cast iron main
1hr job!
Found 300mm steel pipe
Drilled anyway
3hrs later…
A close call…
5mm wall
0.5mm left
Petrol pressure 100 Bar (1400psi)
Gas main is 100psi
Cut-out showing carrier pipe and epoxy shell repairCross section highlighting carrier pipe and epoxy shell repair
The importance of accurate data
• Ownership rights – Gas pipe and pond in Dorset• Incorrect grantor was on the mailing list• Land Registry data saves the day
The systemsBefore•Asset management system – UDB•Desktop GIS – Spatial data managed and edited
– No synchronisation and some duplication
After•Database extended to support ‘spatial’•Single data source served to UDB and desktop•Addition of web client for view only•Data editing via WFS-t
Mitigating the risk
• New project = New desk exercise• Data is purchased from the Land Registry• Known ownership along alignment is collated• Site visits enhance ownership details
– Access points– Difficult access– Tenants– Where is asset exactly?– Dogs!
Data to feed the systems
• At the start of a project it’s necessary to collate a number of datasets
• Project inputs:1. Existing asset data and records
2. Route Corridor
3. Land Registry Shape and CSV
4. On site inspection data
5. Constraints mapping – Environmental Stewardship, Commonland Register
6. Other External Datasets
The process
• Manual QA and formatting steps:1. Processing of the CSVs into the required schema
2. Merge with the cleaned and aggregated geospatial data
3. Import into online management systems
• Manual Process could take several days to process and involve 2 or 3 people– Each project can have over 10,500 title deeds & 7,000 grantors
• 300 grantors = 2 days of manual effort
• Fundamental but presents some challenges
• The deed address details are supplied in a CSV– Title Number – Title reference number
– Tenure – Freehold etc
– Proprietor – Full name and address
– Address – Description of position of address/land
• Extra fee to get a ‘slightly’ better structure• It still requires significant manual effort to format
Land Registry - Attributes
Land Registry - Geometry• All geometry (each title polygon) is held in an
ESRI Shape file• Many polygons are split into a number of pieces• The Land Registry holds and exports the data
tiled• Features are not aggregated on export• The geometry needs joining to the attributes
before with the PK
What is FME?• Industry standard translation and transformation software• Supports >300 formats• Allows manipulation of many data types:
The case for FME• FME is often bought for a specific task.• The value comes when it’s used for tasks not previously
considered– Fisher German’s initial impetus was loading their database
• They turned to FME to clean and conflate their data later• Building a case for FME wasn’t necessary
– Re-use the flexible technology and get a better ROI
Automate and re-use• Automate out the mundane with FME
• Avoid hours of Excel copy/paste
• Allow staff to focus on the analysis
• First task, process 6 linear asset project files
• 24,000 Land Registry records processed in 30 seconds with FME
• Previously this would have taken >6 days.
• Subsequent steps clean up the geometry and merge the attributes – but this is a classic FME task!
Automate and re-use
• Lots of Testers/TestFilters
• Popular Transformers: http://goo.gl/4rOGf
• Adopt “If, then else” approach.
• FME 2013 SP1 more capable with ‘Conditional Mapping’
• http://evangelism.safe.com/fmeevangelist113/
• The success of the process relies on two capabilities.
1. Lists
2. Regex
Lists
• A list is a method by which FME permits a single attribute to hold multiple values
Polygon contains 12
trees
Polygon contains 12
trees
tree.Species{0} oaktree.Species{1} ashtree.Species{2} birchtree.Species{3} oaktree.Species{4} birchtree.Species{5} birch
tree.Species{0} oaktree.Species{1} ashtree.Species{2} birchtree.Species{3} oaktree.Species{4} birchtree.Species{5} birch
Challenge 1: Split the ‘Proprietor’ into ‘Name’ & ‘Address’
“ SOUTH EASTERN POWER NETWORKS PLC Newington House, 99 Southwark Bridge Street, London SN1 1AB ”
•Tester – Pass: If Proprietor Begins with <space>
•AttributeSetter: It’s a Commercial business
•AttributeSplitter: Split on 2 <spaces> and trim whitespace• proprietor.Proprietor{0} SOUTH EASTERN POWER NETWORKS PLC
• proprietor.Proprietor{1} Newington House, 99 Southwark Bridge Street, London SN1 1AB
•AttributeRenamer:• Name = SOUTH EASTERN POWER NETWORKS PLC
• Address = Newington House, 99 Southwark Bridge Street, London SN1 1AB
Challenge 1: Split the ‘Proprietor’ into ‘Name’ & ‘Address’
“JOHN EDMUND SMITH Big Farm, Preston, Canterbury, Kent ” *
•Tester - Fail: (Proprietor did NOT begin with <space>)
•AttributeSetter: It’s a Residential property
•AttributeSplitter: Split on 4 <spaces> and trim whitespace• proprietor.Proprietor{0} JOHN EDMUND SMITH• proprietor.Proprietor{1} Big Farm, Preston, Canterbury, Kent
•AttributeRenamer:• Name = JOHN EDMUND SMITH• Address = Big Farm, Preston, Canterbury, Kent
Challenge 2: Split the Address into appropriate parts
“Newington House, 99 Southwark Bridge Street, London SN1 1AB”
•AttributeSplitter: Split on , and trim whitespace• proprietor.Address{0} Newington House• proprietor.Address{1} 99 Southwark Bridge Street• proprietor.Address{2} London SN1 1AB
• ListElementCounter = 3
• AttributeRenamer:• Address1 = Newington House• Address2 = 99 Southwark Bridge Street• Town = London SN1 1AB
• Depending on data, 3 elements may or may not include a postcode!?
Regex
• Regular Expressions are a language used for:• Pattern matching• String searching• String parsing• String replacement
/colou?r/ “FME is colourful!” “FME is colorful!”
? optionalchar.
“We love FME 2013!” /FME/ “FME is great!”
“We love FME 2013!” /^FME/^ at start$ at end
“FME is great!”
Challenge 3: Spot the Postcode
• Regex = pattern matching and string manipulation
• http://rubular.com/ - Helps you test!
String: AGI NORTHRegex: ([A-Z]*)[ ]([A-Z]*)
String: London SN1 1ABRegular Expression: ^(.*\S)\s+(\S{2,4}\s\S{3})\s*$
• Use StringSearcher = Matched output port provides…• _matched_parts{0} London• _matched_parts{1} SN1 1AB
There were lots more challenges on a similar theme…
Other tasks: Structure and Schema
• Remove duplicate records• Apply common format to names e.g. A A Smith to A.A. Smith
• Resolve addresses listed twice in the same string• Common where 2 partners live at same address
• “2, High Street, Leicester 2 High Street Leicester”
• Apply Title Case to names & tidy up use of hyphens
• Add extra columns and fixed values for target schema
• Split first names and last name into 2 columns – more Regex!
• Validate the County names against a list of allowed Counties & resolve abbreviations - AttributeValueMapper
Summary
• Saves time
• Before: >1 day of data prep per project
• After: Using FME, a few seconds to do 80% of the work
• Save money
• No extra fee to the Land Registry to restructure the data
• No unnecessary staff time on mundane formatting tasks
• Increased ROI
• Fisher German already had FME
• Just consider what else you could adapt FME to do…