Jpylyzer, a validation and feature extraction tool developed in SCAPE project
-
Upload
scape-project -
Category
Technology
-
view
640 -
download
0
description
Transcript of Jpylyzer, a validation and feature extraction tool developed in SCAPE project
![Page 1: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/1.jpg)
SCAPE
Johan van der Knijff1,2, René van der Ark1, Carl Wilson3
1 Koninklijke Bibliotheek – National Library of the Netherlands 2 Open Planets Foundation
3 The British Library IS&T, Archiving 2012, Copenhagen, 15.6.2012
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool
![Page 2: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/2.jpg)
SCAPE
National Programme for preservation of paper heritage Digitisation as a means to conserve threatened paper
originals
Metamorfoze
TIFF JP2
146 TB
Migrate by end 2012
![Page 3: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/3.jpg)
SCAPE JP2 from JISC 1 Newspaper Collection (BL)
![Page 4: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/4.jpg)
SCAPE
“Well-formed and valid”
JP2 from JISC 1 Newspaper Collection (BL)
![Page 5: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/5.jpg)
SCAPE
Hardware failure may result in
corrupted images
Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg
![Page 6: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/6.jpg)
SCAPE
Not all encoders
produce standard
compliant images
![Page 7: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/7.jpg)
SCAPE Possible solutions
Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2 Not ready for operational use (yet)
Option 3
Develop dedicated tool
![Page 8: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/8.jpg)
SCAPE
1 1 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0
Jpylyzer tool
![Page 9: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/9.jpg)
SCAPE Jpylyzer tool
- First prototype: December 2011
- Refactoring of original code: Jan 2012
- Packaging (Debian): Mar 2012 Univ. Southampton, KEEP Solutions, AIT Vienna
- Add remaining functionality, bugfixes: Apr-May 2012 (current version: 1.5)
![Page 10: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/10.jpg)
SCAPE
JPEG 2000 Signature box
Contiguous Codestream box 0
File Type box
JP2 Header box (superbox)
Contiguous Codestream box n
IPR box
XML box(es)
UUID box(es)
UUID Info box(es) (superbox)
JP2 file
![Page 11: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/11.jpg)
SCAPE Command-line use
![Page 12: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/12.jpg)
SCAPE Result
![Page 13: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/13.jpg)
SCAPE Properties extraction (excerpt)
![Page 14: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/14.jpg)
SCAPE Properties embedded ICC profile
![Page 15: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/15.jpg)
SCAPE Documentation
![Page 16: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/16.jpg)
SCAPE
Number of images 2,152,116
Total size 45 TB
Average image size 21.8 MB
Number of threads 1
Time 21 days*
Images/day/ thread 100,000
TB/day/thread 2
Example 1: detection of broken JP2s in JISC 1 Newspapers
*Includes unzipping, actual time needed by jpylyzer much less!
![Page 17: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/17.jpg)
SCAPE Results
- 676 broken JP2s in JISC 1 collection (0.03 %) TIFF originals still available
- JISC 2 (> 1 million images): 3 broken JP2s
- 19th Century books (> 22 million images): no broken JP2s
![Page 18: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/18.jpg)
SCAPE
TIFF JP2
146 TB
Migrate by end 2012
Example 2: quality control Metamorfoze migration
![Page 19: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/19.jpg)
SCAPE TIFF
Aware JP2K SDK
JP2 Jpylyzer*
pixel compare
compare image
properties
properties profile
pass fail
pixels identical?
properties match?
valid JP2?
yes
no
no
no
yes
yes
*Imported as module in Python-based workflow
![Page 20: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/20.jpg)
SCAPE Example 3: pre-ingest quality control Wellcome
Library
- JP2s produced in-house and by external suppliers
- Use jpylyzer to validate against JP2 spec
- Use extracted properties to validate against a profile (Progression order, ratio, layers, ….)
- Profile coded as XML schema (So jpylyzer output can be validated against schema)
![Page 21: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/21.jpg)
SCAPE Platforms and licensing stuff
![Page 22: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/22.jpg)
SCAPE http://www.openplanetsfoundation.org/software/jpylyzer
![Page 23: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/23.jpg)
SCAPE Community involvement
![Page 24: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/24.jpg)
SCAPE Acknowledgements
Debian packages - Dave Tarrant (Uni Southampton/OPF)
- Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),
- Rainer Schmidt (AIT)
Feedback on early versions - Christy Henshaw (Wellcome Library)
- Ross Spencer (TNA)
- Wouter Kool (KB)
![Page 25: Jpylyzer, a validation and feature extraction tool developed in SCAPE project](https://reader033.fdocuments.net/reader033/viewer/2022061219/54833ce9b07959380c8b49c8/html5/thumbnails/25.jpg)
SCAPE
#SCAPEProject
http://www.scape-project.eu
This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).
Funding