Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring...
-
Upload
randall-eaton -
Category
Documents
-
view
248 -
download
8
Transcript of Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring...
Spring BatchChristopher Jeffers
August 2012
2
Agenda
• Intro to Spring Batch and Use-Cases
• Spring Batch Technical Explanation– Architecture
– The Batch Job
– Skipping and Retrying Steps
– Scaling Features
• Spring Batch Evaluation– Solving Use-Cases
– Benefits
– Issues
– Integration Options
– Future Steps
3
Spring Batch Overview
• Lightweight framework designed to enable the development of robust batch applications used in enterprise systems
• As a part of Spring, it builds on the ease of use of the POJO-based development approach, while making it easy for developers to use more advanced enterprise services when necessary
• Provides reusable functions that are essential in processing large volumes of data
• Provides scaling features, including multi-threading and massive parallelism for Spring Batch Jobs
4
Batch Use-Cases
• DataRoomBatch– Physically delete all rows marked for deletion from a given
bucket (DeepSix)
– Rerun user documents through publishing workflow
– Proactive auditing of the environment
• Public Records Batch Processing– User inputs file with search criteria for many individuals
and program searches database for changes in information, returning a report of hits to user
– Read, Process, and Write sequence
– Satisfies Government and Corporate requirements
5
Reason for Spring Batch POC
• Current batch system for public records is not powerful enough to handle very large requests
• Have had to turn away customers because of this
• A more powerful and flexible batch solution could solve this problem
6
Agenda
• Intro to Spring Batch and Use-Cases
• Spring Batch Technical Explanation– Architecture
– The Batch Job
– Skipping and Retrying Steps
– Scaling Features
• Spring Batch Evaluation– Solving Use-Cases
– Benefits
– Issues
– Integration Options
– Future Steps
7
Architecture
• Layered architecture
• The application layer contains all batch jobs and custom code
• Batch Core contains runtime classes necessary to launch and control a batch job
• Batch Infrastructure contains common readers and writers, and services used by both the application and the core framework
http://static.springsource.org/spring-batch/reference/html/spring-batch-intro.html
8
The Batch Job
• A Job entity encapsulates an entire batch process
• A Job is comprised of Steps, which encapsulate a phase of a batch job– Step can be as complex or simple as developer wants
http://static.springsource.org/spring-batch/reference/html/domain.html
9
Chunk Processing
• Typical Spring Batch Step– Read, Process, Write sequence
• Multiple items are read and processed before being written as a “chunk”– Size of chunk declared in configuration (commit-interval)
http://static.springsource.org/spring-batch/reference/html/configureStep.html
10
Step Flow
• Steps can be configured to flow sequentially or conditionally– Allows for some complex jobs
http://static.springsource.org/spring-batch/reference/html/configureStep.html
11
Job Repository
• The JobRepository is used to do CRUD operations with Meta-Data relating to Job and Step execution– Example: Job Parameters, Job/Step status, etc.
http://static.springsource.org/spring-batch/reference/html/domain.html
12
Step Skipping
• Step is skipped if an exception listed in the configuration is thrown, rather than stopping the batch execution
• Used for exceptions that will be thrown on every attempt of the Step– FileNotFoundException, Parse Exceptions, etc.
• SkipListener can be used to log skipped items
13
Retrying Steps
• If an exception listed in the configuration is thrown, the operation is attempted again
• Used for exceptions that may not be thrown on every attempt of the Step– ConcurrencyFailureException,
DeadlockLoserDataAccessException, etc.
• Can set a limit on number of retries
• RetryListener can be used to log retried items
• RetryTemplate can be used to further customize retry logic
14
Scaling Features (Single Process)
• Multi-Threaded Jobs or Steps– Using Spring’s TaskExecutor object
• Parallel Steps– Using split flows and a TaskExecutor in Job configuration.
http://static.springsource.org/spring-batch/reference/html/scalability.html
15
Scaling Features (Multi-Process)
• Remote Chunking– Splits Step processing across multiple processes, using
some middleware to communicate
http://static.springsource.org/spring-batch/reference/html/scalability.html
16
Scaling Features (Multi-Process)
• Step Partitioning– Splits input and executes remote steps in parallel
– PartitionHandler sends StepExecution requests to remote steps
– Partitioner generates the input for new step executions
http://static.springsource.org/spring-batch/reference/html/scalability.html
17
Job Flow with Client/Server and Partitioning
18
Agenda
• Intro to Spring Batch and Use-Cases
• Spring Batch Technical Explanation– Architecture
– The Batch Job
– Skipping and Retrying Steps
– Scaling Features
• Spring Batch Evaluation– Solving Use-Cases
– Benefits
– Issues
– Integration Options
– Future Steps
19
Solving the Use-Cases
• DataRoomBatch (DeepSix Example)– Bucket is input to JdbcCursorItemReader
– Create an Item Processor to check if the row is marked for deletion and delete it if so
– Item Writer could be empty or used to output statistics
– Partitioning easily done by dividing up number of rows per partition
20
Solving the Use-Cases
• Public Records Batch Processing– Input file is input to FlatFileItemReader
– Custom Item Processor to search the database for hits
– Custom Item Writer to compile report of search results
– Following step to send report to user
– Easy to implement a Partitioner for the input file
21
Benefits of Spring Batch
• Part of Spring Framework– Allows easy integration with other Spring features
– General simplicity offered by Spring
• Step flow customizable
• Basic Item Readers and Writers already available
• Features available for monitoring Jobs and Steps
• Many scaling options available
22
Issues with Spring Batch
• No built-in scheduler– Not a big issue, scheduler libraries easily integrated
• Potentially a lot of XML configuration– Business logic across Java and XML files can complicate
debugging and maintenance
– Annotations can help
• Anything but very basic components will need to be created as new classes
23
Helpful Integration Options
• Spring Batch Admin– Web-Based administration console
– Contains Spring Batch Integration, allowing use of Spring Integration messages to launch and monitor jobs
• Scheduler (cron, Spring Scheduling, Quartz)
• Clustering Framework (Hadoop, GridGain, Terracotta)– Ideal for improving horizontal scaling
– Spring Data Hadoop is a fairly new Spring feature that helps integrate Spring with Hadoop
24
Future Steps
• Get Spring Batch set up with a clustered environment– Evaluate performance
– Figure out dynamic load balancing
• Play around with more features and integration options– Spring Batch Admin, manual job restarting, etc.
• Implement Spring Batch Admin into Cobalt GUI?
• Look more into the information stored in Meta-data database and figure out how to use for monitoring/managing jobs
• Look into Partitioning and how much must be done to implement sending partitions off to remote machines
• Look into job/step timeout
Questions?