Geocoding – the Columbus way! Bakshi. About the Research

44
Geocoding – the Columbus way! Rahul Bakshi

Transcript of Geocoding – the Columbus way! Bakshi. About the Research

Page 1: Geocoding – the Columbus way! Bakshi. About the Research

Geocoding – the Columbus way!

Rahul Bakshi

Page 2: Geocoding – the Columbus way! Bakshi. About the Research

About the Research

Part of Masters’ ThesisAdvisor: Craig KnoblockOther Committee members:Cyrus Shahabi and John WilsonBuild a Geocoder with maximum accuracy

Page 3: Geocoding – the Columbus way! Bakshi. About the Research

Thesis statement

The accuracy of the geocoded coordinates of a location can be significantly improved by exploiting online property-related data

Page 4: Geocoding – the Columbus way! Bakshi. About the Research

Motivating Problem

Inaccuracies in the existing applicationsThe error margins become critical in some applications:

Aligning Vector Data and Satellite ImageryEnvironmental Health StudiesUrban Rescue and Recovery Operations

Page 5: Geocoding – the Columbus way! Bakshi. About the Research

Positional Error Comparison

Reference: Cayo, M. R. and T. O. Talbot (2003). "Positional error in automated geocoding of residential addresses." International Journal of Health Geographics 2(10).

Page 6: Geocoding – the Columbus way! Bakshi. About the Research

Street Data

For the US, there are three main providers for street data

Geographic Data Technology (GDT)Navigation Technologies (NavTech)TIGER/Lines (Bureau of the Census)

Page 7: Geocoding – the Columbus way! Bakshi. About the Research

Limitations of these sources

Provide the address ranges and latitude/longitude information for the end pointsNo data about number of addresses in a segmentNo data about the size of address/lots

Page 8: Geocoding – the Columbus way! Bakshi. About the Research

Information in Street Sources

Page 9: Geocoding – the Columbus way! Bakshi. About the Research

Existing Approach

Address range methodGet the street data from sources like NavTech, GDT, TigerLinesApproximate the location based on information in the street dataExample

Address to locate: 645 Sierra St, El Segundo, CA -90245

Page 10: Geocoding – the Columbus way! Bakshi. About the Research

ExampleSierra StFrom: A ( 33.923413, -118.408709 )To: B ( 33.924813, -118.408809 )

Addresses on the Left: 601-699Addresses on the Right: 600-698

645: Left Side22nd out of the 50 addresses on the left side

Interpolate the address on the street

A

B

Page 11: Geocoding – the Columbus way! Bakshi. About the Research

Limitations of the existing approach

Assumes all addresses are present in the given range – which is seldom the caseDoes not take into account the lot sizesGeocodes non-existent addresses as wellE.g.: The following address does not exist -2622 Ellendale Pl, Los Angeles, CA – 90007Lets see what do the existing services have to say…

Page 12: Geocoding – the Columbus way! Bakshi. About the Research

All of them geocode it !

Page 13: Geocoding – the Columbus way! Bakshi. About the Research

The Columbus approach

Make use of the data already on the InternetProperty tax sites – repository of information that one requires to make the interpolations more accurateTake the number of houses in accountTake the lot sizes in account

Page 14: Geocoding – the Columbus way! Bakshi. About the Research

Uniform lot-size method

Works when data source having information on the property parcels/addresses existsExploits these sources to get the number of lots on the street segmentAssumes all lots are equal in dimension

Page 15: Geocoding – the Columbus way! Bakshi. About the Research

Outline of the method

Get the information of the street segment from the street data sourceQuery the property tax source to get the number of parcels before and after the current addressApproximate the location of the address based on the new values

Page 16: Geocoding – the Columbus way! Bakshi. About the Research

Corner lot problem

Number of dimensions on the street =number of lots on the street +

corner lot

Page 17: Geocoding – the Columbus way! Bakshi. About the Research

AlgorithmGet the street data from the street-data-sourceGet number of lots before and after the current address from the property data sourceAdd a corner lotCalculate the street length in terms of earth coordinatesCalculate the lot size based on the street length and the number of lots on the streetInterpolate the location of the address based on the average lot size

Page 18: Geocoding – the Columbus way! Bakshi. About the Research

Address-range (traditional) method

Page 19: Geocoding – the Columbus way! Bakshi. About the Research

Uniform lot-size method

Page 20: Geocoding – the Columbus way! Bakshi. About the Research

Actual lot-size method

The corner lot problem motivates us to optimize furtherPalm St, I do worse than traditional approachPossible only if the lot sizes available in the Property Tax sitesCompute the sizes of each of the lots/streets and then run a matching algorithmWorks on rectangular blocks

Page 21: Geocoding – the Columbus way! Bakshi. About the Research

136

240

482575

256

240

420575

204

240

482533

324

240

420533

136

120

542575

256

120

482575

204

120

542533

324

120

482533

136

256

542482

256

256

482482

204

256

542440

324

256

440

136

375

482482

256

375

420482

204

375

482440

324

375

440

482

420

Page 22: Geocoding – the Columbus way! Bakshi. About the Research

Finding the optimal layout

Calculate the actual length and breadth (width) of the block using the information in the street data source[length, width]

Truedim

257

257

480480

Page 23: Geocoding – the Columbus way! Bakshi. About the Research

Finding the optimal layout

Get the coordinates of the block from the street data sourceQuery the property source and get the dimension of every lot on the blockCompute the dimensions of the 16 possible orientationsCompare these with the true dimensionThe layout that most closely matches / least error is chosen as the layout

Page 24: Geocoding – the Columbus way! Bakshi. About the Research

Integrating data sources

Unified Query InterfaceLarge number of property sitesQuery a single relations

Different property sources for different placesNew York: State, Los Angeles: CountyDisparate representations : structure and attribute namesStreet Data: organized by county or states

Page 25: Geocoding – the Columbus way! Bakshi. About the Research

Source Descriptions

Describe the Source as view over Domain description

A single property relation

Three types of SourcesProperty TaxProperty Tax with details of dimensionsStreet Data Sources

Page 26: Geocoding – the Columbus way! Bakshi. About the Research

PropertyTax

USPDR

PropertyTaxCA PropertyTaxNY

State = ‘CA’ State = ‘NY’

PropertyTaxLA PropertyTaxSF

LA Property SF Property

County = ‘LA’ City = ‘SF’

LAProperty(sa, ci, st, zi, fraddr, fraddl, toaddr, toaddl, before, after) :-

PropertyTax(sa, ci, co, st, zi, fraddr, fraddl, toaddr, toaddl, before, after, lotwidth, lotdepth)^

(co = ‘Los Angeles’)^

(st = ‘CA’)

Page 27: Geocoding – the Columbus way! Bakshi. About the Research

UniformLotSizeGeocoder

PropertyTaxStreet

JoinUniformLotSizeApproximation

Join

UniformLotSizeGeocoder(sa, ci, co, st, zi, lat, lon):-

Street(sa, ci, co, st, zi, frlat, frlon,tolat, tolon, fename, fetype, zipl, zipr, fraddr, fraddl, toaddr, toaddl)^

PropertyTax(sa, ci, co, st, zi, fraddr, fraddl, toaddr, toaddl, before, after,lotwidth, lotdepth)^

UniformLotApproximation(frlat, frlon, tolat, tolon, before, after, lat, lon)

Page 28: Geocoding – the Columbus way! Bakshi. About the Research

Query

•Inverse the source descriptions

•Generate datalog program to solve the query

Page 29: Geocoding – the Columbus way! Bakshi. About the Research

Datalog program generated

Page 30: Geocoding – the Columbus way! Bakshi. About the Research

Advantage of this model

GLAV (Global-Local as View)Easy to add new sources

Page 31: Geocoding – the Columbus way! Bakshi. About the Research

ResultsChosing a region

El Segundo

Data SourceConflated TIGER/Lines

Fetch Agent Platform to convert website data into XMLPrometheus 2.0 information mediatorGeocoded 267 addresses spanning 13 blocksActual lot-size method could not be applied to 58 addressesNone of the methods could be applied to one addressResults based on the remaining 208 addresses

Page 32: Geocoding – the Columbus way! Bakshi. About the Research

N

Chosen area for goecoding

Page 33: Geocoding – the Columbus way! Bakshi. About the Research

Driving distance

Page 34: Geocoding – the Columbus way! Bakshi. About the Research

Address-range (traditional) method

Page 35: Geocoding – the Columbus way! Bakshi. About the Research

Uniform lot-size method

Page 36: Geocoding – the Columbus way! Bakshi. About the Research

Actual lot-size method

Page 37: Geocoding – the Columbus way! Bakshi. About the Research

506 Oak A

ve

504 Oak A

ve

508 Oak A

ve

512 Oak A

ve

510 Oak A

ve

514 Oak A

ve

518 Oak A

ve

501 E Palm A

ve

505 E Palm A

ve

509 E Palm A

ve

513 E Palm A

ve

519 E Palm A

ve

521 E Palm A

ve

591 E Palm A

ve

Page 38: Geocoding – the Columbus way! Bakshi. About the Research

501 Mariposa Ave

511

Mar

ipos

a A

ve

517

Mar

ipos

a A

ve

523

Mar

ipos

a

525

Mar

ipos

a52

7 M

arip

osa

535 Mariposa Ave

615 Penn St

609 Penn St

627 Penn St

621 Penn St

633 Penn St

639 Penn St

645 Penn St

524

Palm

Ave

520

Palm

Ave

610 Sheldon St

622 Sheldon St

628 Sheldon St

634 Sheldon St

640 Sheldon St

646 Sheldon St

616 Sheldon St

Page 39: Geocoding – the Columbus way! Bakshi. About the Research

Comparison of Results

7.8024256.6407273.80526Maximum Error

0.034870.070860.86578Minimum Error

1.469589.9236120.49335Standard Deviation

1.629937.8714936.85359Average Error

Actual lot-sizeUniform lot-sizeAddress-range(all errors are in meters)

Average percentage of improvement over traditional approach

Uniform lot-size method: 78.65%Actual lot-size method: 95.59%

Page 40: Geocoding – the Columbus way! Bakshi. About the Research

Address Range Methodµ = 36.85 σ =20.49

Uniform lot-size Methodµ = 7.87 σ = 9.92

Actual lot-size Methodµ = 1.63 σ = 1.47

Error in meter

Pro

bab

ilit

y

Normal Distribution of the error

Page 41: Geocoding – the Columbus way! Bakshi. About the Research

Related WorkCayo, M. R. and T. O. Talbot (2003) Positional error in automated geocoding of residential addressesRatcliffe (2001) On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal unitsKrieger et al. (2001) Evaluating the accuracy of geocoding in public health research Gupta, Marciano et al.(1999) Integrating GIS and Imagery through XML-Based Information Mediation

Page 42: Geocoding – the Columbus way! Bakshi. About the Research

Conclusion & Future Work

More accurate geocoding achievedIntegrating other sources to get property dataSolved the address-validating problemExtend the actual lot size method to non-rectangular blocksIntegrate more property tax data sources

Page 43: Geocoding – the Columbus way! Bakshi. About the Research

Acknowledgements

Thanks to Craig for his valuable guidance, Snehal for help with the algorithms and implementation, Shou-de for the calculations in the actual lot size methodThanks to Cyrus Shahabi and John Wilson

Page 44: Geocoding – the Columbus way! Bakshi. About the Research

Questions / Comments