Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises –...

23
Geospatial data on enterprises challenges to geolocate enterprises and their local units for statistical purposes Zrinka Pavlović Head of Statistical Business Register, Classifications, Sampling, Statistical Methods and Analyses Department INSPIRE Conference 2016 Barcelona

Transcript of Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises –...

Page 1: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

Geospatial data on enterprises –

challenges to geolocate

enterprises and their local units for

statistical purposes

Zrinka PavlovićHead of Statistical Business Register, Classifications,

Sampling, Statistical Methods and Analyses Department

INSPIRE Conference 2016 Barcelona

Page 2: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Lifecycle

– Birth, existence, death – demography/business demography

statistics

– Enterprises can be transformed through the years

• “ same enterprise or not?”

• Location

– Residence/ headquarter address

– Enterprises can operate in many locations – local units

• Both in focus of statistics

Strana 2

Enterprises

vs.

people

Page 3: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Allocation of enterprises within the country

– Cities/parts of cities

– Territorial / climate / relief characteristics

– Other

• Is “doing business” easier in some parts of country?

• Is more enterprises born, survive longer, grow faster or steadier or

die faster in some parts of country?

• Are enterprises more successful in some parts of country?

• Which activities are present or missing in some areas?

• Why?

• Should government put more effort and assets in improving

conditions for “doing business” in some areas to stop

depopulation?

Strana 3

Enterprises

vs.

teritory

Page 4: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Statistics can give a lot of information about business population

related to location (number of units, demography, density,

activities, employment, efficiency, etc.)

• Precondition :

– Accurate

– Comprehensive

– Good coverage

– Relevant

– Good quality

Strana 4

Statistical observation of

enterprises

Business register

Page 5: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

Legal background

• Commission Regulation 177/2008 requires that Business

Register must contain geographical location code on local units

2.11 Geographical location code

Purpose: The geographical location code complements the address and

postal codes (2.2) and can be used to derive classifications relating to the

geographical location of units at the most detailed level. Other national

classifications such as administrative regions, travel-to-work areas, health

or education regions etc. can also be derived from it.

Strana 5

Page 6: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

Merging statistics and geospatial information Strana 6

Geolocation of enterprises

Settlement(city/village),

CodeStreet name

Housenumber

Appendix to house

number

Only settlement code is unique and official and used by majority of

administrative sources. Other parts of the address are in free text form

Address data consist of:

Page 7: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Statistical Business Register is compiled from several sources

which provide address data :

Strana 7

Sources of address data

StatisticalBusinessRegister

AdministrativeBusinessRegister

Central Register of

CraftBusinesses

TaxAdministration

CommercialCourt Register

Page 8: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Central Register of Territorial Units – State Geodetic

Administration (SGA) in the Republic of Croatia

• Statistical Spatial Register in Croatian Bureau of Statistics is

updated from SGA register

• Not all administrative registers and administrative records are

directly connected or updated with SGA register data

Strana 8

Sources of spatial data

Page 9: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Goal:

Geolocate enterprises and their local units by assigning them geocode

• Activities:

– Establish system that will enable every new address entry in SBR to get a unique official street code, street name and geocode.

– Develop application that will automatically or by clerk intervention match new address entries with official addresses

– Coding existing addresses in Statistical Business Register

– Publish geographical presentation of selected SBR data on the CBS website

Strana 9

Project – merging statistics and geospatial

information

Page 10: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

Starting point

• SBR contains addresses for enterprises and their local units that had to

be coded:

– More then 500.000 enterprises (regardless of activity status)

– More then 600.000 local units (98.000 <> legal unit address)

– More then 1.100.000 address information

– 109.215 different forms of street names within certain settlements

– 68.863 different forms of street names regardless of settlement

• In the Statistical Spatial Register:

– 51.798 streets in 6759 settlements

Strana 10

Page 11: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

Strana 11

Existing situation

Page 12: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Street names in Statistical Business Register are stored in the

version as in administrative sources – not official name of the

street as in Register of Territorial Units. E.g. :

• Many variations of the same street name:

Strana 12

Street code Street name Sett.code Sett. name

0721501371 ULICA KNEZA BRANIMIRA 072150 Zagreb

Existing situation

Page 13: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

Conditions:

• In order to assign code to these forms of name of the same

street in the same town, all this forms should be stored in the

Thesaurus.

• Matching of names must be 100% because one letter means

difference.

Strana 13

ULICA MARIJA BABIĆA Ulica M. Babića

ULICA NENADA BABIĆA Ulica N. Babića

ULICA MATIJE BAKIĆA Ulica M. Bakića

ULICA VOJINA BAKIĆA Ulica V. Bakića

Page 14: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

1. Street name without code + settlement code

Strana 14

3a) Entry is found in the list of units in selected settlement-updating of thesaurus

2b) Street is not identified automatically – mannual connecting is needed

SBR

MODULE FOR CODING OF

STREETS

MANNUAL CONNECTING ENTRIES WITH

OFFICIAL STREET NAME

2a) Street is identified and code is selected.Code of stret saved in SBR

Thesaurus of street names

3b) updating of SBRwith street code

4) Entry is not found in

the list of units in

selected settlement

-investigation needed –

possible miss-linking of

settlement and street

Data on settlement code

is changed in SBR

Page 15: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Complex spatial situation with many incorrect address entries:

– Streets that does not belong to settlement as registered

– Streets that does not exist in Stat. Spatial Register

• Address contain not valid/former street name

• Address contain never – existing street name

– House number is not registered in Stat. Spatial Register

• Various format of street names and house numbers with

appendices

– Street name contain house numbers and appendices

– House number is mixed with appendices

– Appendices are separated with different characters

Strana 15

Challenges faced during project:

Page 16: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• House numbers and appendices in different format , non –

existing house numbers in the Register of Territorial Units

• Non existing unique and comprehensive data base of historical

street names

• The most difficult cases – units that are no more active

• Too many clarical work to investigate all problematic street

names = focus on active units

Strana 16

Challenges faced during project (cont.)

Page 17: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Normalising of data from a both data bases needed (removing special characters, double spaces, normalising usual abbreviations and some very common street titulars.

• Automatic matching at the beginning of the project: a little bit above 40 % for unique street names with little possibility to increase automatic procedure since only 100 % match of cleaned data counts (one character might mean difference).

• Clerical matching the rest of non-coded streets

• Thesaurus filled with normalised data and pairs of street names (Stat. Business Register and official street name from Stat. Spatial Register) and consecutively supplemented with new manual entries

• Many dead units with non existing addresses – disregarded

• Searching for files with changed street names in several institutions (in larger cities)

Strana 17

Way forward.....

Page 18: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• IT Application developed for coding streets with thesaurus of

streets

– Stand-alone aplication consisting of three modules:

• Street data-base from Statistical Spatial Register

• Module for matching street names that enables:

– Transmission of non-coded streets from the source

data-base

– Automatic coding of streets

– Module for manual matching of street names

• Thesaurus of street names

• 98% Addresses of active enterprises coded with street codes

Strana 18

Results:

Page 19: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Automatic procedure developed for assigning geocodes from

Statistical Spatial Register data base, based on street code and

house number – by the moment of entering in the SBR data

base

– Batch updating

– Manual updating

• Upgraded Statistical Business Register with additional attributes

Merging statistics and geospatial information Strana 19

Results (cont.):

Page 20: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Business Demography Statistics convenient for geographical

presentation

• Available tools for presentation:

– ArcMap 10

– Geostat Portal of CBS

Strana 20

Geographical presentation of Business Register

Data

Page 21: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Addresses of enterprises are in many cases problematic –

identified main types of mistakes – new system enables

detecting mistakes and gives opportunity to correct them

• Accuracy of spatial data varies between counties

• Spatial registers need to be improved in following years,

historical data needed.

• Administrative sources should be contacted in order to put much

more focus on accuracy of their address data (link of settlement

and street) – connection to official Register of territorial units

Strana 21

Lessons learned

Page 22: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

• Module for coding streets with thesaurus and functionality for

manual matching and supplementing content of thesaurus was

developed as separate application

• Envisaged use for other users in CBS – e.g. Farm Register,

Population Register, Population Census

• Possible availability for other institutions – administrative

sources of SBR in-coming data of better quality?

Strana 22

Potential reuse

Page 23: Geospatial data on enterprises challenges to geolocate ... · Geospatial data on enterprises – challenges to geolocate enterprises and their local units for statistical purposes

Upišite naziv prezentacije Strana 23

Thank you for your attention!

Zrinka Pavlović

Head of Statistical Business Register, Classifications, Sampling,

Statistical Methods and Analyses Department

Tel: +385 (0)1 4893 514

E-mail: [email protected]