Continuous ETL Testing for Pentaho Data Integration (kettle)
-
Upload
slawomir-chodnicki -
Category
Data & Analytics
-
view
714 -
download
14
Transcript of Continuous ETL Testing for Pentaho Data Integration (kettle)
![Page 2: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/2.jpg)
TwineworksThe sample project
2
.
|-- bin # entry point scripts
|-- environments # environment configuration
|-- etl # ETL solution
`-- spec # tests and helpers
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 3: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/3.jpg)
TwineworksTest orchestration
$ bin/robot test
3Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 4: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/4.jpg)
TwineworksTest orchestration
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin4
![Page 5: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/5.jpg)
TwineworksTest orchestration
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin5
![Page 6: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/6.jpg)
TwineworksTest orchestration
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin6
![Page 7: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/7.jpg)
Twineworks
Jenkins is a continuous integration server.
It’s basic role is to run the test suite and build any artifacts upon changes
in version control.
Example server:
http://ci.pentaho.com/
![Page 8: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/8.jpg)
Twineworks
![Page 9: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/9.jpg)
Twineworks
![Page 10: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/10.jpg)
Twineworks
![Page 11: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/11.jpg)
Twineworks
Jenkins is a continuous integration server.
It’s basic role is to run the test suite and build any artifacts upon changes
in version control.
Example server:
http://ci.pentaho.com/
![Page 12: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/12.jpg)
Twineworks
Testable solutions
![Page 13: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/13.jpg)
TwineworksConfiguration
management• configure all data sources/targets and
paths through kettle variables or
parameters
• local environment - not in version control
• test environment - reference
environment
• production environment - optional13
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 14: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/14.jpg)
TwineworksConfiguration
management
14
environments
`-- local # dev environment – not in version control
|-- environment.sh # shell environment variables
|-- my.cnf # database config file
`-- .kettle # KETTLE_HOME
|-- shared.xml # database connections
`-- kettle.properties # kettle variables
`-- test # test environment – in version control
`-- production # other environments
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 15: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/15.jpg)
TwineworksEnvironments are self-contained
• share nothing
• reproducible results:
– you can run it
– your team can run it
– ci-server can run it
15Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 16: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/16.jpg)
TwineworksConfiguration
management$ bin/robot spoon
16Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 17: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/17.jpg)
TwineworksConfiguration
management
17Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 18: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/18.jpg)
TwineworksInitialize db
$ bin/robot db reset
clearing database [ OK ]
initializing database
2017/10/20 15:32:37 - Kitchen - Start of run.
2017/10/20 15:32:37 - reset_dwh - Start of job execution
…
…
2017/10/20 15:32:41 - Kitchen - Finished!
2017/10/20 15:32:41 - Kitchen - Processing ended after 4 seconds.
[ OK ]
18Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 19: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/19.jpg)
TwineworksSub-systems/Phases
• Define sub-systems/phases – define pre-requisites
• data expected in certain sources
– define outcomes• data written to certain sinks
• A sub-system/phase of the ETL process is responsible for a small set of related side-effects to happen
19Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 20: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/20.jpg)
TwineworksSub-systems/Phases
20Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 21: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/21.jpg)
TwineworksSub-systems/Phases
21Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 22: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/22.jpg)
TwineworksEntry points
• Define entry points with a full functional
contract.
• An entry point implements an application
feature.
22Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 23: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/23.jpg)
TwineworksEntry points
23
![Page 24: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/24.jpg)
Twineworks
Testing ETL
solutions
![Page 25: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/25.jpg)
TwineworksKinds of automated tests
• Computation tests
• Integration tests
• Functional tests
• Non-functional tests
25Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 26: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/26.jpg)
TwineworksComputation tests
• Single unit of ETL under test
• Performs a computation (no side-effects)
• What is a “unit” in PDI?
– Job?
– Transformation?
– Sub-transformation (mapping)?
26Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 27: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/27.jpg)
TwineworksA simple computation test
Test job
spec/dwh/validate_params/validate_params_spec.kjb
27Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 28: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/28.jpg)
TwineworksA simple computation test
Test job
spec/dwh/validate_params/validate_params_spec.kjb
The job calls
etl/dwh/validate_params.kjb
- with DATA_DATE=2016-07-01 and expects it to succeed
- with DATA_DATE=1867-12-21 and expects it to fail
The job succeeds if all expectations are met. It fails otherwise.
28Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 29: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/29.jpg)
TwineworksTest jobs
29Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 30: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/30.jpg)
TwineworksTest transformation
results
30Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 31: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/31.jpg)
TwineworksTest transformation
results
31Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 32: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/32.jpg)
TwineworksTest sub-transformations
32Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 33: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/33.jpg)
TwineworksTest sub-transformations
33Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 34: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/34.jpg)
Twineworks
Integration tests
![Page 35: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/35.jpg)
TwineworksIntegration tests
• ETL responsible for a set of related side-
effects under test
• Most common case in ETL testing
– Test individual phases of a batch process
35Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 36: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/36.jpg)
TwineworksIntegration tests
36Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 37: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/37.jpg)
TwineworksIntegration tests
37Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 38: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/38.jpg)
TwineworksIntegration tests
38Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 39: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/39.jpg)
Twineworks
Functional testing
![Page 40: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/40.jpg)
TwineworksFunctional tests
• Entry point of ETL solution under test
• Assertions reflect invocation contract
– Behavior on happy path
– Behavior on errors
– Behavior on incorrect invocation
40Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 41: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/41.jpg)
TwineworksTest the daily run
41Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 42: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/42.jpg)
TwineworksTest the daily run
42Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 43: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/43.jpg)
TwineworksTest the daily run
43Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 44: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/44.jpg)
Twineworks
Non-functional tests
![Page 45: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/45.jpg)
TwineworksNon-functional tests
• Performance– how long does workload x take?
• Stability– what does it take to break it?
– How much memory is too little?
– What happens when loading unexpected data? (truncated file, column too long, 50MB XML in string field, badly formatted CSV reads as single field, empty files)
45Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 46: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/46.jpg)
TwineworksNon-functional tests
• Security
– Verify configuration assumptions
automatically
• Compliance
– We must use version x of library y
46Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 47: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/47.jpg)
TwineworksTest compliance
47Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 48: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/48.jpg)
Twineworks
Scripting tests
![Page 49: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/49.jpg)
TwineworksJRuby
JRuby is the ruby language on the JVM
http://jruby.org
Maintained by Redhat.
Runs Rails on JBoss
49Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 50: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/50.jpg)
TwineworksRspec
Rspec is a testing framework for ruby
http://rspec.info/
https://relishapp.com/rspec
50Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 51: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/51.jpg)
TwineworksRspec
$ bin/robot test
• Includes helper files in spec/support
• traverses the spec folder looking for files
whose names end in _spec.rb and loads
them as tests
51Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 52: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/52.jpg)
Twineworks
52
describe "dwh clear job" do
end
end
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 53: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/53.jpg)
Twineworks
53
describe "dwh clear job" do
describe "when db is not empty" do
end
end
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 54: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/54.jpg)
Twineworks
54
describe "dwh clear job" do
describe "when db is not empty" do
before :all do
dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"
@result = run_job "etl/dwh/load/load.kjb", {}
end
end
end
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 55: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/55.jpg)
TwineworksRspec helpers
55
spec/support/spec_helpers.rb
def dwh_db
...
end
Returns a JDBC database object.
Connects on demand, and closes automatically when test-
suite ends.
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 56: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/56.jpg)
TwineworksRspec helpers
56
spec/support/spec_helpers.rb
def dwh_db
...
end
In additiondwh_db.load_fixture(path) allows loading a sql or json fixture file
dwh_db.reset() triggers $ bin/robot db reset
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 57: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/57.jpg)
Twineworks
57
describe "dwh clear job" do
describe "when db is not empty" do
before :all do
dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"
}
end
end
end
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 58: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/58.jpg)
Twineworks
58
describe "dwh clear job" do
describe "when db is not empty" do
before :all do
dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"
@result = run_job "etl/dwh/util/clear.kjb"
end
end
end
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 59: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/59.jpg)
TwineworksRspec helpers
59
spec/support/spec_helpers.rb
def run_job file, params
...
end
Runs a kettle job and returns a map {
:successful? => true/false,
:log => “log text”,
:result => [row1, row2, row3, …]
}
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 60: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/60.jpg)
Twineworks
60
describe "dwh clear job" do
describe "when db is not empty" do
before :all do
dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"
@result = run_job "etl/dwh/util/clear.kjb"
end
end
end
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 61: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/61.jpg)
Twineworks
61
describe "dwh clear job" do
describe "when db is not empty" do
before :all do
dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"
@result = run_job "etl/dwh/util/clear.kjb"
end
it "completes successfully" do
expect(@result[:successful?]).to be true
end
end
end
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 62: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/62.jpg)
Twineworks
62
describe "dwh clear job" do
describe "when db is not empty" do
before :all do
dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"
@result = run_job "etl/dwh/util/clear.kjb"
end
it "completes successfully" do
expect(@result[:successful?]).to be true
end
it "clears the db" do
expect(dwh_db.query("SHOW TABLES").to_a.length).to eq 0
end
end
end
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 63: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/63.jpg)
TwineworksTest orchestration
$ bin/robot test
63Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 64: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/64.jpg)
TwineworksTest orchestration
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin64
![Page 65: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/65.jpg)
TwineworksTest orchestration
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin65
![Page 66: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/66.jpg)
TwineworksTest orchestration
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin66
![Page 67: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/67.jpg)
TwineworksRunning jobs as rspec tests
67
spec/etl/etl_spec.rb
Recursively traverses etl/spec looking for files whose names
end in _spec.kjb, and dynamically generates a describe and
it block for it.
Hence all such job files are part of the test suite.
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 68: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/68.jpg)
Twineworks
68
describe "ETL" do
Dir.glob("./**/*_spec.kjb").each do |path|
describe "#{path}” do
it "completes successfully" do
@result = run_job path.to_s, {}
expect(@result[:successful?]).to be true
end
end
end
end
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 69: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/69.jpg)
TwineworksRspec – test orchestration
69
Rspec runs in two phases
Phase 1: collects tests, recording the structure as given by
the describe blocks.
Phase 2: filters found tests as per command line parameters
and executes them
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 70: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/70.jpg)
TwineworksRspec – test orchestration
70
Run only tests containing the word ‘clear’ in their name or
enclosing describe blocks:
$ bin/robot test --example 'clear'
Run only tests tagged ‘long_running’:
$ bin/robot test --tag 'long_running'
Run only tests in spec/commands
$ bin/robot test spec/commands
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 71: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/71.jpg)
TwineworksTest orchestration
$ bin/robot test
71Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 72: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/72.jpg)
TwineworksTest orchestration
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin72
![Page 73: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/73.jpg)
TwineworksTest orchestration
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin73
![Page 74: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/74.jpg)
TwineworksTest orchestration
Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin74
![Page 75: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/75.jpg)
Twineworks
Jenkins is a continuous integration server.
It’s basic role is to run the test suite and build any artifacts upon changes
in version control.
Example server:
http://ci.pentaho.com/
![Page 76: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/76.jpg)
Twineworks
Thank you!
![Page 77: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/77.jpg)
Twineworks
Backup Slides
![Page 78: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/78.jpg)
Twineworks
Testing in practice
![Page 79: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/79.jpg)
TwineworksTest what you run
• Verify behavior of the entity you run
directly
79Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 80: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/80.jpg)
TwineworksTools of the trade
• Helpers
– utility code/etl of components reused to make
tests about the what, not about the how
– fixture loaders
– assertion helpers
– data comparison helpers
80Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 81: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/81.jpg)
TwineworksData Fixtures
• Data Fixtures
– sets of test data, encoded in a convenient way, easily loaded into data sources and sinks
• JSON, CSV, SQL, XML, YAML
– Use whatever is easiest to maintain for the team
• Generate data fixtures through parameterized scripts if you need to generate datasets with consistent relationships
81Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com
![Page 82: Continuous ETL Testing for Pentaho Data Integration (kettle)](https://reader036.fdocuments.net/reader036/viewer/2022082213/5a6475ad7f8b9afc4d8b458d/html5/thumbnails/82.jpg)
TwineworksFile Fixtures
• File Fixtures
– sets of test files acted upon during a run
• Maintain file fixtures separate from source location expected by ETL
• If fixture files are changed as part of the test, copy them to a temporary location before running tests
• Create a unique source location per test run, if the file location is shared (like sftp)
82Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com