US GPO AIP Independence Test
description
Transcript of US GPO AIP Independence Test
![Page 1: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/1.jpg)
US GPOAIP Independence Test
CS 496A – Senior Design
Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong
Faculty advisor: Dr. Russ AbbottGPO contact: Kate Zwaard
![Page 2: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/2.jpg)
Overview
Background OAIS FDsys
AIP METS, MODS, and PREMIS Project Objectives
Solution Strategy XML parsing A note on deliverables Repositories Testing
Conclusion
![Page 3: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/3.jpg)
OAIS Open Archival Information System
“An OAIS is an archive consisting of an organization of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community”
Developed by the Consultive Committee on Space Data Systems (ISO 14721:2003)
![Page 4: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/4.jpg)
FDsysFederal Digital System
FDsys – Am OAIS maintained by the U.S. Government Printing Office to provide public access to information submitted by Congress and Federal agencies.
![Page 5: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/5.jpg)
OAIS Primary Functions Ingest – Turn SIPs into AIPs Archival Storage – Storage and retrieval
of AIPs Data Management – Populating,
maintaining and accessing the varieties of information
Administration – Controls day to day operations
Preservation Planning – Maintaining archive accessibility
Access – Functions for access of archive
![Page 6: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/6.jpg)
Information Package- critical component of OAIS
The information package is a conceptual linking of content information with its preservation description and packaging information.
Three kinds of information packages (before, after, and during ingestion) SIP – Submission Information Package AIP – Archive Information Package DIP – Distribution Information Package
![Page 7: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/7.jpg)
AIP
Archival Information Package What is AIP?
METS MODS PREMIS
![Page 8: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/8.jpg)
Project Objectives:
Prove AIP Independence
Improve their file system.
![Page 9: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/9.jpg)
AIP: METS Understanding METS
Schema
File format
Seven major sections
![Page 10: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/10.jpg)
AIP: METS Schema
5 Major Sections5 Major Sections METS Header Descriptive Metadata Administrative Metadata File Section Structural Map
![Page 11: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/11.jpg)
AIP: MODS
Descriptive metadata
Extension to METS
Top-level elements Mandatory Recommended Optional
![Page 12: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/12.jpg)
AIP: MODS
![Page 13: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/13.jpg)
AIP: PREMIS
Preservation metadata
Extension to METS
PREMIS Data Model Intellectual Entity Object Entity Event Entity Agent Entity Rights Entity*
![Page 14: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/14.jpg)
AIP: PREMIS
![Page 15: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/15.jpg)
Solution Strategy
The data we have received are AIPs, not SIPs. Repository software can only ingest SIPs. We must therefore write scripts to parse the AIPs in such a way to construct SIPs from an arbitrary file structure, and then ingest those SIPs into a repository software in order to create new AIPs for the same information.
![Page 16: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/16.jpg)
XML Parsing We plan to use the Java programming
language for our scripting needs. The Java API for XML Processing (JAXP) is the
standard Java library for parsing XML It provides several different possible
representations for XML After being rendered human-readable,
the AIP files will need to be converted into a new SIP schema of our own design, which would only describe information that still appears relevant.
![Page 17: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/17.jpg)
XML Parsing Example This is a portion of a sample FDsys MODS file
that summarizes a bill in Congress: <extension><collectionCode>BILLS</
collectionCode><searchTitle>To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S. 3880 (IS)</searchTitle><category>Bills and Statutes</category><waisDatabaseName>111_cong_bills</waisDatabaseName><branch>legislative</branch><dateIngested>2010-10-06</dateIngested></extension>
![Page 18: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/18.jpg)
XML Parsing Example We might expect this type of output once
properly parsed: <extension>
Collection code: “BILLS”Search title: “To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S. 3880 (IS)”Category: “Bills and Statutes”WAIS database name: “111_cong_bills”Branch: legislativeDate ingested: 2010-10-06
</extension>
![Page 19: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/19.jpg)
A Note on Deliverables
Because our aim is not to design software, this is not a typical computer science design project. Instead, we are conducting coded experimental tests on real data and forming conclusions based on the results.
Deliverables will most likely include: a written report of our findings and
recommendations a reorganized version of the input data
![Page 20: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/20.jpg)
Testing After parsing and organizing the data, it will
be important to perform checks to ensure that the reconstruction is accurate. We may send a preliminary report to GPO for
verification.
The exact testing procedure is still undefined, as we haven’t had a chance to investigate the data in depth yet. Our goals should be clearer once we understand
exactly what type of data we are dealing with.
![Page 21: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/21.jpg)
Repositories Third party repository software to
ingest created SIPs. DSpace, Fedora Commons
(Duraspace)Based on a few simple technologies:
JavaMySQLApache Tomcat JavaScript Server
![Page 22: US GPO AIP Independence Test](https://reader035.fdocuments.net/reader035/viewer/2022070404/56813b89550346895da4b63f/html5/thumbnails/22.jpg)
Conclusion
Our thanks to Kate, Dr. Abbott, and Dr. Pamula for their support.