Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless...

36
Parallel Python: Multiprocessing With ArcPy Clinton Dow Geoprocessing Neeraj Rajasekar Spatial Analyst

Transcript of Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless...

Page 1: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Parallel Python Multiprocessing With

ArcPyClinton Dow ndash Geoprocessing

Neeraj Rajasekar ndash Spatial Analyst

Agenda

bull What Multiprocessing Is

bull What Multiprocessing Is Not

bull Demo of Multiprocessing Modules

- Multiprocessing

- Subprocess

- Asyncio

bull Multiprocessing in GIS

- Raster Concepts

- Demo of Multiprocessing with Rasters

- Vector Concepts

- Demo of Multiprocessing with Vectors

bull Best practices and considerations

bull Resources

Multiprocessing and

Concurrency In Pure Python

What is Multiprocessing

bull Modern Computers have multiple CPUs

bull Single CPU vs Multiple CPU

bull Program lsquoThreadsrsquo

- A sewing thread is interwoven fibers

- Twisted together tied at the ends

- A threaded application is interwoven processes

- Running together aggregated at the ends

bull Single lane road vs Multi-lane Highway

- Cost of infrastructure

- Resilience to traffic jams and accidents

- Complexity of Interoperability

Multiprocessing In Python

What isnrsquot Multiprocessing

bull Serial Programs

- One Instruction at a time in one process

- Instruction runs from start to finish

- Crash ends the whole thing

- Start over again

- Fail and close

bull Concurrent Programs

- Multiple instructions at a time in one

process

- Instructions may lsquosleeprsquo to let another run

- Only one instruction is being executed at any

one time

bull Decentralized Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Controlled by a central lsquohubrsquo

bull Distributed Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Each computer operates independantly

Multiprocessing In Python

Python Multiprocessing Ideals

bull Replace all loops with parallel iteration

bull Replace all collections with iteratorsgenerators

bull Combine Multiprocessing and Concurrency

- Parallel functions with concurrent instructions

bull Fault Tolerance

- A failed process does not halt the application

- Ability to lsquotry againrsquo in parallel

bull Throttled by input or lsquomappingrsquo function

- Validate send data to available CPUs

bull Two forms of output

- Discrete returns individually processed units of data

- Aggregated lsquoreducesrsquo combined units of data to a collection

Multiprocessing In Python

Python Modules

bull threading

- Donrsquot use unless you have a very specific reason to do so

- core developers

- Global Interpreter Lock

- Two threads controlled by a single pythonexe cannot run at the same time

bull multiprocessing

- Creates multiple pythonexe instances

- Not subject to GIL problem

- Operating System deals with threading of pythonexe

bull subprocess

- Use to launch non pythonexe processes

- Serial or Parallel

- Callback allows subprocess to run in parallel

Multiprocessing In Python

Python Modules conrsquot

bull twisted

- lsquoTwistsrsquo threads from threading for you

- Open source Python package

- Designed to handle IO concurrency

bull asyncio

- Pure Python Package

- Designed to handle IO concurrency

- Brand new Fully accepted in Python 360

Concurrency In Python

Multiprocessing In Action

Multiprocessing Demo

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 2: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Agenda

bull What Multiprocessing Is

bull What Multiprocessing Is Not

bull Demo of Multiprocessing Modules

- Multiprocessing

- Subprocess

- Asyncio

bull Multiprocessing in GIS

- Raster Concepts

- Demo of Multiprocessing with Rasters

- Vector Concepts

- Demo of Multiprocessing with Vectors

bull Best practices and considerations

bull Resources

Multiprocessing and

Concurrency In Pure Python

What is Multiprocessing

bull Modern Computers have multiple CPUs

bull Single CPU vs Multiple CPU

bull Program lsquoThreadsrsquo

- A sewing thread is interwoven fibers

- Twisted together tied at the ends

- A threaded application is interwoven processes

- Running together aggregated at the ends

bull Single lane road vs Multi-lane Highway

- Cost of infrastructure

- Resilience to traffic jams and accidents

- Complexity of Interoperability

Multiprocessing In Python

What isnrsquot Multiprocessing

bull Serial Programs

- One Instruction at a time in one process

- Instruction runs from start to finish

- Crash ends the whole thing

- Start over again

- Fail and close

bull Concurrent Programs

- Multiple instructions at a time in one

process

- Instructions may lsquosleeprsquo to let another run

- Only one instruction is being executed at any

one time

bull Decentralized Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Controlled by a central lsquohubrsquo

bull Distributed Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Each computer operates independantly

Multiprocessing In Python

Python Multiprocessing Ideals

bull Replace all loops with parallel iteration

bull Replace all collections with iteratorsgenerators

bull Combine Multiprocessing and Concurrency

- Parallel functions with concurrent instructions

bull Fault Tolerance

- A failed process does not halt the application

- Ability to lsquotry againrsquo in parallel

bull Throttled by input or lsquomappingrsquo function

- Validate send data to available CPUs

bull Two forms of output

- Discrete returns individually processed units of data

- Aggregated lsquoreducesrsquo combined units of data to a collection

Multiprocessing In Python

Python Modules

bull threading

- Donrsquot use unless you have a very specific reason to do so

- core developers

- Global Interpreter Lock

- Two threads controlled by a single pythonexe cannot run at the same time

bull multiprocessing

- Creates multiple pythonexe instances

- Not subject to GIL problem

- Operating System deals with threading of pythonexe

bull subprocess

- Use to launch non pythonexe processes

- Serial or Parallel

- Callback allows subprocess to run in parallel

Multiprocessing In Python

Python Modules conrsquot

bull twisted

- lsquoTwistsrsquo threads from threading for you

- Open source Python package

- Designed to handle IO concurrency

bull asyncio

- Pure Python Package

- Designed to handle IO concurrency

- Brand new Fully accepted in Python 360

Concurrency In Python

Multiprocessing In Action

Multiprocessing Demo

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 3: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Multiprocessing and

Concurrency In Pure Python

What is Multiprocessing

bull Modern Computers have multiple CPUs

bull Single CPU vs Multiple CPU

bull Program lsquoThreadsrsquo

- A sewing thread is interwoven fibers

- Twisted together tied at the ends

- A threaded application is interwoven processes

- Running together aggregated at the ends

bull Single lane road vs Multi-lane Highway

- Cost of infrastructure

- Resilience to traffic jams and accidents

- Complexity of Interoperability

Multiprocessing In Python

What isnrsquot Multiprocessing

bull Serial Programs

- One Instruction at a time in one process

- Instruction runs from start to finish

- Crash ends the whole thing

- Start over again

- Fail and close

bull Concurrent Programs

- Multiple instructions at a time in one

process

- Instructions may lsquosleeprsquo to let another run

- Only one instruction is being executed at any

one time

bull Decentralized Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Controlled by a central lsquohubrsquo

bull Distributed Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Each computer operates independantly

Multiprocessing In Python

Python Multiprocessing Ideals

bull Replace all loops with parallel iteration

bull Replace all collections with iteratorsgenerators

bull Combine Multiprocessing and Concurrency

- Parallel functions with concurrent instructions

bull Fault Tolerance

- A failed process does not halt the application

- Ability to lsquotry againrsquo in parallel

bull Throttled by input or lsquomappingrsquo function

- Validate send data to available CPUs

bull Two forms of output

- Discrete returns individually processed units of data

- Aggregated lsquoreducesrsquo combined units of data to a collection

Multiprocessing In Python

Python Modules

bull threading

- Donrsquot use unless you have a very specific reason to do so

- core developers

- Global Interpreter Lock

- Two threads controlled by a single pythonexe cannot run at the same time

bull multiprocessing

- Creates multiple pythonexe instances

- Not subject to GIL problem

- Operating System deals with threading of pythonexe

bull subprocess

- Use to launch non pythonexe processes

- Serial or Parallel

- Callback allows subprocess to run in parallel

Multiprocessing In Python

Python Modules conrsquot

bull twisted

- lsquoTwistsrsquo threads from threading for you

- Open source Python package

- Designed to handle IO concurrency

bull asyncio

- Pure Python Package

- Designed to handle IO concurrency

- Brand new Fully accepted in Python 360

Concurrency In Python

Multiprocessing In Action

Multiprocessing Demo

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 4: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

What is Multiprocessing

bull Modern Computers have multiple CPUs

bull Single CPU vs Multiple CPU

bull Program lsquoThreadsrsquo

- A sewing thread is interwoven fibers

- Twisted together tied at the ends

- A threaded application is interwoven processes

- Running together aggregated at the ends

bull Single lane road vs Multi-lane Highway

- Cost of infrastructure

- Resilience to traffic jams and accidents

- Complexity of Interoperability

Multiprocessing In Python

What isnrsquot Multiprocessing

bull Serial Programs

- One Instruction at a time in one process

- Instruction runs from start to finish

- Crash ends the whole thing

- Start over again

- Fail and close

bull Concurrent Programs

- Multiple instructions at a time in one

process

- Instructions may lsquosleeprsquo to let another run

- Only one instruction is being executed at any

one time

bull Decentralized Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Controlled by a central lsquohubrsquo

bull Distributed Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Each computer operates independantly

Multiprocessing In Python

Python Multiprocessing Ideals

bull Replace all loops with parallel iteration

bull Replace all collections with iteratorsgenerators

bull Combine Multiprocessing and Concurrency

- Parallel functions with concurrent instructions

bull Fault Tolerance

- A failed process does not halt the application

- Ability to lsquotry againrsquo in parallel

bull Throttled by input or lsquomappingrsquo function

- Validate send data to available CPUs

bull Two forms of output

- Discrete returns individually processed units of data

- Aggregated lsquoreducesrsquo combined units of data to a collection

Multiprocessing In Python

Python Modules

bull threading

- Donrsquot use unless you have a very specific reason to do so

- core developers

- Global Interpreter Lock

- Two threads controlled by a single pythonexe cannot run at the same time

bull multiprocessing

- Creates multiple pythonexe instances

- Not subject to GIL problem

- Operating System deals with threading of pythonexe

bull subprocess

- Use to launch non pythonexe processes

- Serial or Parallel

- Callback allows subprocess to run in parallel

Multiprocessing In Python

Python Modules conrsquot

bull twisted

- lsquoTwistsrsquo threads from threading for you

- Open source Python package

- Designed to handle IO concurrency

bull asyncio

- Pure Python Package

- Designed to handle IO concurrency

- Brand new Fully accepted in Python 360

Concurrency In Python

Multiprocessing In Action

Multiprocessing Demo

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 5: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

What isnrsquot Multiprocessing

bull Serial Programs

- One Instruction at a time in one process

- Instruction runs from start to finish

- Crash ends the whole thing

- Start over again

- Fail and close

bull Concurrent Programs

- Multiple instructions at a time in one

process

- Instructions may lsquosleeprsquo to let another run

- Only one instruction is being executed at any

one time

bull Decentralized Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Controlled by a central lsquohubrsquo

bull Distributed Programs

- Multiple Computers on a network

- ProcessesInstructions run across the

network

- Each computer operates independantly

Multiprocessing In Python

Python Multiprocessing Ideals

bull Replace all loops with parallel iteration

bull Replace all collections with iteratorsgenerators

bull Combine Multiprocessing and Concurrency

- Parallel functions with concurrent instructions

bull Fault Tolerance

- A failed process does not halt the application

- Ability to lsquotry againrsquo in parallel

bull Throttled by input or lsquomappingrsquo function

- Validate send data to available CPUs

bull Two forms of output

- Discrete returns individually processed units of data

- Aggregated lsquoreducesrsquo combined units of data to a collection

Multiprocessing In Python

Python Modules

bull threading

- Donrsquot use unless you have a very specific reason to do so

- core developers

- Global Interpreter Lock

- Two threads controlled by a single pythonexe cannot run at the same time

bull multiprocessing

- Creates multiple pythonexe instances

- Not subject to GIL problem

- Operating System deals with threading of pythonexe

bull subprocess

- Use to launch non pythonexe processes

- Serial or Parallel

- Callback allows subprocess to run in parallel

Multiprocessing In Python

Python Modules conrsquot

bull twisted

- lsquoTwistsrsquo threads from threading for you

- Open source Python package

- Designed to handle IO concurrency

bull asyncio

- Pure Python Package

- Designed to handle IO concurrency

- Brand new Fully accepted in Python 360

Concurrency In Python

Multiprocessing In Action

Multiprocessing Demo

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 6: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Python Multiprocessing Ideals

bull Replace all loops with parallel iteration

bull Replace all collections with iteratorsgenerators

bull Combine Multiprocessing and Concurrency

- Parallel functions with concurrent instructions

bull Fault Tolerance

- A failed process does not halt the application

- Ability to lsquotry againrsquo in parallel

bull Throttled by input or lsquomappingrsquo function

- Validate send data to available CPUs

bull Two forms of output

- Discrete returns individually processed units of data

- Aggregated lsquoreducesrsquo combined units of data to a collection

Multiprocessing In Python

Python Modules

bull threading

- Donrsquot use unless you have a very specific reason to do so

- core developers

- Global Interpreter Lock

- Two threads controlled by a single pythonexe cannot run at the same time

bull multiprocessing

- Creates multiple pythonexe instances

- Not subject to GIL problem

- Operating System deals with threading of pythonexe

bull subprocess

- Use to launch non pythonexe processes

- Serial or Parallel

- Callback allows subprocess to run in parallel

Multiprocessing In Python

Python Modules conrsquot

bull twisted

- lsquoTwistsrsquo threads from threading for you

- Open source Python package

- Designed to handle IO concurrency

bull asyncio

- Pure Python Package

- Designed to handle IO concurrency

- Brand new Fully accepted in Python 360

Concurrency In Python

Multiprocessing In Action

Multiprocessing Demo

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 7: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Python Modules

bull threading

- Donrsquot use unless you have a very specific reason to do so

- core developers

- Global Interpreter Lock

- Two threads controlled by a single pythonexe cannot run at the same time

bull multiprocessing

- Creates multiple pythonexe instances

- Not subject to GIL problem

- Operating System deals with threading of pythonexe

bull subprocess

- Use to launch non pythonexe processes

- Serial or Parallel

- Callback allows subprocess to run in parallel

Multiprocessing In Python

Python Modules conrsquot

bull twisted

- lsquoTwistsrsquo threads from threading for you

- Open source Python package

- Designed to handle IO concurrency

bull asyncio

- Pure Python Package

- Designed to handle IO concurrency

- Brand new Fully accepted in Python 360

Concurrency In Python

Multiprocessing In Action

Multiprocessing Demo

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 8: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Python Modules conrsquot

bull twisted

- lsquoTwistsrsquo threads from threading for you

- Open source Python package

- Designed to handle IO concurrency

bull asyncio

- Pure Python Package

- Designed to handle IO concurrency

- Brand new Fully accepted in Python 360

Concurrency In Python

Multiprocessing In Action

Multiprocessing Demo

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 9: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Multiprocessing In Action

Multiprocessing Demo

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 10: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Subprocess In Action

Subprocessing Demo

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 11: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Concurrency In Action

Concurrency Demo

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 12: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Multiprocessing and

Concurrency In GIS

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 13: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Considerations

bull ldquoGIS is the Language of Geographyrdquo ndash Jack Dangermond Esri UC 2004

bull ldquoPython is the Language of GISrdquo ndash Bill Moreland Esri Dev Summit 2014

bull GIS is inherently more complex than Python on its own

- Combining Programming with Geography

bull Challenges

- Projected Data

- User Interface Considerations

Multiprocessing in GIS

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 14: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Working with Rasters in Multiprocessing

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 15: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Pleasingly parallel problems

Worker-1

SlopeWorker-1

Aspect Worker-1

Reclassify

Worker-1

Weighted

Sum

Elevation

Raster

Elevation

Raster

Land use

raster

Output

suitability

raster

Why is multiprocessing relevant to Geoprocessing workflows

Serialized execution of a model

workflow

As time proceedshellip

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 16: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Pleasingly parallel problems

Parallelized execution of a model workflow

Why is multiprocessing relevant to Geoprocessing workflows

Elevation

RasterWorker-1 Slope

Elevation

RasterWorker-2 Aspect

Land use

RasterWorker-3 Reclassify

Worker-4Weighted Sum

Output

suitability

raster

As time proceedshellip

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 17: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Working with Rasters in Multiprocessing

Candidate operations for parallelization

- Parallelize independent tasks within a workflow

- Process a large raster parallelly (limited to localfocalzonal operations)

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 18: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Easy-to-parallelize raster operations

- Four types of

raster

operations

- Local

- Focal

- Zonal

- Global

Multiprocessing in GIS

Local raster operation Focal raster operation

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 19: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Pleasingly parallel problems

Tool executed serially on a large input

dataset

Why is multiprocessing relevant to Geoprocessing workflows

Large

Elevation

Raster

SquareRoot

(math) tool

Output

SquareRoot

raster

Worker-1

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 20: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Pleasingly parallel problems

Why is multiprocessing relevant to Geoprocessing workflows

Tool executed parallelly on a large

input dataset

Large

Elevation

Raster

Output

SquareRoot

raster

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

SquareRoot

(math) tool

Worker-1

Worker-2

Worker-3

Worker-4

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 21: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 22: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 23: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 24: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 25: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 26: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Demo of multiprocessing with Rasters

Run a local raster operation parallelly on a large raster dataset

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 27: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 28: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Raster analysis performance improvements with multiprocessing

Run a local raster operation parallelly on a large raster dataset

0

50

100

150

200

250

300

350

400

450

500

0 1 2 3 4 5 6 7 8 9

Tim

e t

o e

xe

cu

te (

se

co

nd

s)

Number of processes

Execute tool in parallel on 5734457344 cells Method Focal function in parallel 16 chunks created Method Local function in parallel 16 chunks created

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 29: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Working with Vectors in Multiprocessing

bull Vectors are discrete units of data

- Concept aligns easily with multiprocessing

- Each unit of data can be mapped to an independent process

- To aggregate return results

- Independent processes return results to a collection

- Minimal amount of functionality on aggregation step

- Collection can be returned to calling process or yielded to chained process

- Example ndash for each point in a collection project the point add to array return array as line

- To return discrete results

- Independent processes yield results to calling process or chained process

- Example ndash for each location-enabled tweet in a collection geocode the coordinates return the

address

Multiprocessing in GIS

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 30: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Example of multiprocessing with Vectors

Vectors in ArcPy

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 31: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Example of multiprocessing with Vectors

Vectors in Arcpy

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 32: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Example of multiprocessing with Vectors

Vectors in Arcpy

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 33: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

ArcPy Multiprocessing Best Practices

bull Use ldquoin_memoryrdquo workspace to store temporary results

bull Avoid writing to FGDB data types or GRID raster data types These data formats can

often cause schema lock synchronization issues

bull Use ArcGIS Pro 14 OR ArcGIS Server 105 OR ArcMAP with ArcGIS for Desktop-

Background Geoprocessing (64-bit) Using 64-bit processing to perform analysis on

systems with large amounts of RAM may help when processing large data which

may have otherwise failed in a 32-bit environment

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 34: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Looking ahead

bull Dask ndash Parallelization on local machine or distributed

- httpdaskpydataorgenlatest

bull Asynchrony ndash Utilizing asyncio to concurrently readwrite and process data

- Python 36+

bull Enhanced Interoperability with Services ndash Utilizing services is natural with

multiprocessing

- Deploy as packages or through desktop publishing workflow

- Chaining Services together through scripts

Looking Ahead

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 35: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two

Resources

bull Obtain sample scripts and data that you saw in the demos -

httpsgithubcomnRajasekar92DevSummit-2017

bull Esri blog post on parallel geoprocessing -

httpsblogsesricomesriarcgis20120926distributed-processing-with-arcgis-

part-1

bull Python 35 multiprocessing API-

httpsdocspythonorg35librarymultiprocessinghtml

Page 36: Parallel Python: Multiprocessing With ArcPy€¦ · Python Modules •threading-Don’t use unless you have a very specific reason to do so-core developers-Global Interpreter Lock-Two