Cleansing land ownership data, an FME use case - David Eagle

32
Cleansing land ownership data, an FME use case David Eagle Principal Consultant david.eagle@1spatial .com @david_eagle

description

 

Transcript of Cleansing land ownership data, an FME use case - David Eagle

Page 1: Cleansing land ownership data, an FME use case - David Eagle

Cleansing land ownership data, an FME use case

David EaglePrincipal [email protected] @david_eagle

Page 2: Cleansing land ownership data, an FME use case - David Eagle

Agenda

• 1Spatial• Asset management, the case for good data• The data challenge• Technical solution

– Regex and Lists

• Benefits

Page 3: Cleansing land ownership data, an FME use case - David Eagle

• Founded in 1969– Part of the Cambridge Tech Cluster

• Headquarters in Cambridge, UK– International offices in Australia, Ireland,

Belgium & France

Page 4: Cleansing land ownership data, an FME use case - David Eagle

• A group of innovative, market leading technology companies:

Page 5: Cleansing land ownership data, an FME use case - David Eagle

Our Customers•A specialist provider to National Mapping and Charting Agencies, Government, Defence and Utilities

Page 6: Cleansing land ownership data, an FME use case - David Eagle

Our Partners

Page 7: Cleansing land ownership data, an FME use case - David Eagle

Customer Case Study

• Fisher German– Multi-discipline firm of Chartered Surveyors, Town

Planners, Property Consultants & Specialist Engineers– Management of:

• 4000km of high pressure oil pipeline• 2500km fibre network

– Creators of:• www.linesearchbeforeudig.co.uk a free to use enquiry tool used

by BT, HA, Utilities, Local Gov’t etc• >45 members with protected assets such as:

Page 8: Cleansing land ownership data, an FME use case - David Eagle

Linear Asset Management

• Key role is management and protection of buried and overhead assets:– High pressure oil and gas pipelines– Fibre optics– Overhead power lines

• Need to ensure access to assets for inspection, maintenance, upgrade and safety.

• Document, maintain and manage details of land ownership in the vicinity of assets.

Page 9: Cleansing land ownership data, an FME use case - David Eagle

Why is Linear Asset Management Important?

Hunton Hill – Birmingham

Shop - New gas supply connection

25mm PE connection to a 150mm cast iron main

1hr job!

Found 300mm steel pipe

Drilled anyway

3hrs later…

Page 10: Cleansing land ownership data, an FME use case - David Eagle
Page 11: Cleansing land ownership data, an FME use case - David Eagle

A close call…

5mm wall

0.5mm left

Petrol pressure 100 Bar (1400psi)

Gas main is 100psi

Cut-out showing carrier pipe and epoxy shell repairCross section highlighting carrier pipe and epoxy shell repair

Page 12: Cleansing land ownership data, an FME use case - David Eagle

The importance of accurate data

• Ownership rights – Gas pipe and pond in Dorset• Incorrect grantor was on the mailing list• Land Registry data saves the day

Page 13: Cleansing land ownership data, an FME use case - David Eagle

The systemsBefore•Asset management system – UDB•Desktop GIS – Spatial data managed and edited

– No synchronisation and some duplication

After•Database extended to support ‘spatial’•Single data source served to UDB and desktop•Addition of web client for view only•Data editing via WFS-t

Page 14: Cleansing land ownership data, an FME use case - David Eagle

Mitigating the risk

• New project = New desk exercise• Data is purchased from the Land Registry• Known ownership along alignment is collated• Site visits enhance ownership details

– Access points– Difficult access– Tenants– Where is asset exactly?– Dogs!

Page 15: Cleansing land ownership data, an FME use case - David Eagle

Data to feed the systems

• At the start of a project it’s necessary to collate a number of datasets

• Project inputs:1. Existing asset data and records

2. Route Corridor

3. Land Registry Shape and CSV

4. On site inspection data

5. Constraints mapping – Environmental Stewardship, Commonland Register

6. Other External Datasets

Page 16: Cleansing land ownership data, an FME use case - David Eagle

The process

• Manual QA and formatting steps:1. Processing of the CSVs into the required schema

2. Merge with the cleaned and aggregated geospatial data

3. Import into online management systems

• Manual Process could take several days to process and involve 2 or 3 people– Each project can have over 10,500 title deeds & 7,000 grantors

• 300 grantors = 2 days of manual effort

Page 17: Cleansing land ownership data, an FME use case - David Eagle

• Fundamental but presents some challenges

• The deed address details are supplied in a CSV– Title Number – Title reference number

– Tenure – Freehold etc

– Proprietor – Full name and address

– Address – Description of position of address/land

• Extra fee to get a ‘slightly’ better structure• It still requires significant manual effort to format

Land Registry - Attributes

Page 18: Cleansing land ownership data, an FME use case - David Eagle

Land Registry - Geometry• All geometry (each title polygon) is held in an

ESRI Shape file• Many polygons are split into a number of pieces• The Land Registry holds and exports the data

tiled• Features are not aggregated on export• The geometry needs joining to the attributes

before with the PK

Page 19: Cleansing land ownership data, an FME use case - David Eagle

What is FME?• Industry standard translation and transformation software• Supports >300 formats• Allows manipulation of many data types:

Page 20: Cleansing land ownership data, an FME use case - David Eagle

The case for FME• FME is often bought for a specific task.• The value comes when it’s used for tasks not previously

considered– Fisher German’s initial impetus was loading their database

• They turned to FME to clean and conflate their data later• Building a case for FME wasn’t necessary

– Re-use the flexible technology and get a better ROI

Page 21: Cleansing land ownership data, an FME use case - David Eagle

Automate and re-use• Automate out the mundane with FME

• Avoid hours of Excel copy/paste

• Allow staff to focus on the analysis

• First task, process 6 linear asset project files

• 24,000 Land Registry records processed in 30 seconds with FME

• Previously this would have taken >6 days.

• Subsequent steps clean up the geometry and merge the attributes – but this is a classic FME task!

Page 22: Cleansing land ownership data, an FME use case - David Eagle

Automate and re-use

• Lots of Testers/TestFilters

• Popular Transformers: http://goo.gl/4rOGf

• Adopt “If, then else” approach.

• FME 2013 SP1 more capable with ‘Conditional Mapping’

• http://evangelism.safe.com/fmeevangelist113/

• The success of the process relies on two capabilities.

1. Lists

2. Regex

Page 23: Cleansing land ownership data, an FME use case - David Eagle

Lists

• A list is a method by which FME permits a single attribute to hold multiple values

Polygon contains 12

trees

Polygon contains 12

trees

tree.Species{0} oaktree.Species{1} ashtree.Species{2} birchtree.Species{3} oaktree.Species{4} birchtree.Species{5} birch

tree.Species{0} oaktree.Species{1} ashtree.Species{2} birchtree.Species{3} oaktree.Species{4} birchtree.Species{5} birch

Page 24: Cleansing land ownership data, an FME use case - David Eagle

Challenge 1: Split the ‘Proprietor’ into ‘Name’ & ‘Address’

“ SOUTH EASTERN POWER NETWORKS PLC Newington House, 99 Southwark Bridge Street, London SN1 1AB ”

•Tester – Pass: If Proprietor Begins with <space>

•AttributeSetter: It’s a Commercial business

•AttributeSplitter: Split on 2 <spaces> and trim whitespace• proprietor.Proprietor{0} SOUTH EASTERN POWER NETWORKS PLC

• proprietor.Proprietor{1} Newington House, 99 Southwark Bridge Street, London SN1 1AB

•AttributeRenamer:• Name = SOUTH EASTERN POWER NETWORKS PLC

• Address = Newington House, 99 Southwark Bridge Street, London SN1 1AB

Page 25: Cleansing land ownership data, an FME use case - David Eagle

Challenge 1: Split the ‘Proprietor’ into ‘Name’ & ‘Address’

“JOHN EDMUND SMITH Big Farm, Preston, Canterbury, Kent ” *

•Tester - Fail: (Proprietor did NOT begin with <space>)

•AttributeSetter: It’s a Residential property

•AttributeSplitter: Split on 4 <spaces> and trim whitespace• proprietor.Proprietor{0} JOHN EDMUND SMITH• proprietor.Proprietor{1} Big Farm, Preston, Canterbury, Kent

•AttributeRenamer:• Name = JOHN EDMUND SMITH• Address = Big Farm, Preston, Canterbury, Kent

Page 26: Cleansing land ownership data, an FME use case - David Eagle

Challenge 2: Split the Address into appropriate parts

“Newington House, 99 Southwark Bridge Street, London SN1 1AB”

•AttributeSplitter: Split on , and trim whitespace• proprietor.Address{0} Newington House• proprietor.Address{1} 99 Southwark Bridge Street• proprietor.Address{2} London SN1 1AB

• ListElementCounter = 3

• AttributeRenamer:• Address1 = Newington House• Address2 = 99 Southwark Bridge Street• Town = London SN1 1AB

• Depending on data, 3 elements may or may not include a postcode!?

Page 27: Cleansing land ownership data, an FME use case - David Eagle

Regex

• Regular Expressions are a language used for:• Pattern matching• String searching• String parsing• String replacement

/colou?r/ “FME is colourful!” “FME is colorful!”

? optionalchar.

“We love FME 2013!” /FME/ “FME is great!”

“We love FME 2013!” /^FME/^ at start$ at end

“FME is great!”

Page 28: Cleansing land ownership data, an FME use case - David Eagle

Challenge 3: Spot the Postcode

• Regex = pattern matching and string manipulation

• http://rubular.com/ - Helps you test!

String: AGI NORTHRegex: ([A-Z]*)[ ]([A-Z]*)

String: London SN1 1ABRegular Expression: ^(.*\S)\s+(\S{2,4}\s\S{3})\s*$

• Use StringSearcher = Matched output port provides…• _matched_parts{0} London• _matched_parts{1} SN1 1AB

Page 29: Cleansing land ownership data, an FME use case - David Eagle

There were lots more challenges on a similar theme…

Page 30: Cleansing land ownership data, an FME use case - David Eagle

Other tasks: Structure and Schema

• Remove duplicate records• Apply common format to names e.g. A A Smith to A.A. Smith

• Resolve addresses listed twice in the same string• Common where 2 partners live at same address

• “2, High Street, Leicester 2 High Street Leicester”

• Apply Title Case to names & tidy up use of hyphens

• Add extra columns and fixed values for target schema

• Split first names and last name into 2 columns – more Regex!

• Validate the County names against a list of allowed Counties & resolve abbreviations - AttributeValueMapper

Page 31: Cleansing land ownership data, an FME use case - David Eagle

Summary

• Saves time

• Before: >1 day of data prep per project

• After: Using FME, a few seconds to do 80% of the work

• Save money

• No extra fee to the Land Registry to restructure the data

• No unnecessary staff time on mundane formatting tasks

• Increased ROI

• Fisher German already had FME

• Just consider what else you could adapt FME to do…

Page 32: Cleansing land ownership data, an FME use case - David Eagle

Thank you

David EaglePrincipal [email protected] @david_eagle