File Management Chris A. Mattmann OODT Component Working Group.

25
File Management Chris A. Mattmann OODT Component Working Group

Transcript of File Management Chris A. Mattmann OODT Component Working Group.

Page 1: File Management Chris A. Mattmann OODT Component Working Group.

File Management

Chris A. Mattmann

OODT Component Working Group

Page 2: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-2

What is File Management?

• Managing the locations and ancillary information about files, and collections of files– Ancillary information is metadata

• What’s a product?– A collection of some set of files, and/or collections

of files• So, you could have collections of other collections

– Along with metadata about the product

Page 3: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-3

The state of things

• The existing CAS system does file management– For past missions and projects, it’s done the job well

• CAS implementation– Needs an update, and overall refactoring to allow for

modularity and separation of concerns, and general technology and architectural updates

• In particular, a couple of new requirements and drivers for projects– Suggested some ways to extend and improve the CAS to

satisfy the new requirements and drivers

• What are these new requirements and drivers?

Page 4: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-4

New Requirements and Drivers

• Persisting archived files using dynamic metadata and flexible, adaptable policies based on product types– rather than the monolithic and inflexible existing method of

ProductTypeRepository/ProductName/ProductVersion/ as the filesystem location to store products for all product types.

• Clearly separating out the Workflow aspects of the File Manager, from Product ingestion, and flexibly supporting association of Workflows and their subsequent Tasks with any event, not only ingestion.

Page 5: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-5

New Requirements and Drivers

• Leverage existing transactional models such as Java's Transaction API to support transactional management rather than building our own API.

• If we do use any database communication, then making sure that all DB communication is dealt with using standard, available, existing db pooling APIs such as commons-dbcp , available from Apache .

Page 6: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-6

New Requirements and Drivers

• Clearly separating out the administrative portions of policy management from the existing webapp, and distinguishing what pieces of the webapp are user-centric, and what are administrative-centric.

• Supporting heirarchical product structures, such as nested directories that contain many sub-directories, and sub-directories of those sub-directories, with files strewn about at all levels– rather than only supporting the existing method of flat

product structures, where all files in a product are at the same tree level.

Page 7: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-7

New Requirements and Drivers

• Support metadata extraction based on product type or mime-type

• Support dynamic product types. The file management component should not need to know about every product type a priori

Page 8: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-8

New Requirements and Drivers

• You can read/add to the list– Available at:

http://oodt.jpl.nasa.gov/wiki/display/oodt/File+Management

• Please, speak your mind!

Page 9: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-9

File Management: Architectural implications

• Managing files– Data Store: follow the typical repository pattern– Manage information about Products, Product Types, and

References to products

• Managing metadata– Metadata Store: follow the typical registry pattern– Manage product Metadata

• Key/Value pairs

• Separate out the data store and metadata store– This allows data and metadata to be managed

independently

Page 10: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-10

Data Store

+addProduct(Product product):Product+addProductReferences(String productId, String productTypeId, List refs)+addProductType(ProductType productType)+modifyProduct(Product product):Product+modifyProductType(ProductType productType):ProductType+removeProduct(String productId)+removeProductType(String productTypeId)+getProductById(String productId):Product+getProductByName(String productName):Product+getProductReferences(String productId, String productTypeId):List+getProducts():List+getProductsByProductTypeId(String productTypeId):List+getProductsGroupedByProductType():Map+getProductTypeById(String productTypeId):ProductType+getProductTypeByName(java.lang.String productTypeName):ProductType +getProductTypes():List

DataStore

«Interface»

+getProductId():String+getProductName():String+getProductReferences():List+getProductStructure():String+getProductType():ProductType +getTransferStatus():String+setProductId(String productId)+setProductName(String productName)+setProductReferences(List references)+setProductStructure(String productStructure)+setProductType(ProductType productType)+setTransferStatus(String transferStatus)

Product

«Object»

+getDescription():String+getName():String+getProductRepositoryPath():String+getProductTypeId():String+getVersioner():String+setDescription(String description)+setName(String name)+setProductRepositoryPath(String Path)+setProductTypeId(String productTypeId)+setVersioner(String versioner)

ProductType

«Object»

Reference

«Object»

1 *

*

*

1

1

+getDataStoreReference():String+getOrigReference():String+setDataStoreReference(String dataStoreRef)+setOrigReference(String origReference)

+createDataStore():DataStore

DataStoreFactory

«Interface»

1

*

Page 11: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-11

Metadata Store

+addMetadataElement(Element element):Element+addMetadataElementToProductType(String typeId, Element element)+modifyMetadataElement(Element element):Element+removeMetadataElement(String elementId)+removeMetadataElementFromProductType(String typeId, Element elem)+getMetadata(String productId, String productTypeId):Metadata+getMetadataElements():List+getMetadataElements(String productTypeId):List

MetadataStore

«Interface»

+createMetadataStore():MetadataStore

MetadataStoreFactory

«Interface»

1

*

+getDescription():String+getElementId():String+getElementName():String+getProps():Properties+setDescription(String description)+setElementId(String elementId)+setElementName(String elementName)+setProps(Properties props)

Element

«Object»

+getElementMap():Map+setElementMap(Map elementMap)+toXML():org.w3c.dom.Document

Metadata

«Object»

1

1

*

*

Page 12: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-12

How is this different from the existing CAS?

• Separation of concerns– Anything to do with data goes into the data store package– Anything to do with metadata goes into the metadata store

package• Modularity

– Can have different backend implementations of standard interfaces for data stores and metadata stores

• Lucene as a backend for metadata, or if you prefer, traditional DB backend

– Can have multiple data stores and metadata stores per CAS• The existing CAS lumped these two capabilities

together– Was difficult to reason about how to pull them apart

Page 13: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-13

What else do we need to do File Management?

• Need a way to transfer a product from the client to the File Management service– Client gives URIs of files, or collections of

files, which identify References belonging to a Product

Page 14: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-14

Data Transfer Architecture

+transferProduct(Product p)

DataTransfer

«Interface»

+createDataTransfer():DataTransfer

DataTransferFactory

«Interface»

1

*

Page 15: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-15

Transferring files

• How does the transfer actually occur?• You as a developer define how that happens

– Implement the transferProduct(Product p) method

– Can have many different types of data transfer• Local

– Use native system calls, or cp

• Remote– Use whatever protocol you want, XML-RPC, SOAP,

WebDAV, etc.– Don’t use CORBA or RMI: they’re sooooo last year!

Page 16: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-16

Translating the URIs

• Translating the URIs from the client to the File Manager presents an interesting challenge– For example, where should

file:///home/chris/myfile.file be transferred to on the File Manager’s system?

• Leverage and extend existing CAS method– Existing CAS would have answered the above

questions with ProductTypeRepositoryPath/ProductName/VersionId/

– Why should that be the only answer?

Page 17: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-17

Versioners

• Have the concept of a Versioner interface• Versioner is called by the File Manager

before the product is transferred from the client to the File Manager system– Versioner uses the Product metadata, and the

original product references to generate data store URIs that tell the DataTransfer implementation where to physically transfer the files for a particular Product

Page 18: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-18

Versioner Architecture

Versioner

«Interface»

+createDataStoreReferences(Product product, Metadata metadata) +getDescription():String+getName():String+getProductRepositoryPath():String+getProductTypeId():String+getVersioner():String+setDescription(String description)+setName(String name)+setProductRepositoryPath(String Path)+setProductTypeId(String productTypeId)+setVersioner(String versioner)

ProductType

«Object»

1 1

Page 19: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-19

Versioner Example

• Given an mp3 Product, with Metadata:– Mp3Artist: 50cent– Mp3Genre: rap

• And with references:– file:///home/chris/mp3s/gangsta-rap.mp3

Page 20: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-20

Versioner Example

• Use a MusicVersionerpublic class MusicVersioner implements Versioner{

public void createDataStoreReferences(Product p, Metadata m) throws VersioningException{

String origUri = ((Reference)p.getReferences().get(0)).getOrigReference();

String mp3RepoPath = getRepoPath(“Mp3ProductTypeName”);

String dataStoreUri = mp3RepoPath + m.getElementMap().get(“Mp3Genre”) + “/” + m.getElementMap().get(“Mp3Artist”) + “/” + getFileName(origUri);

((Reference)p.getReferences().get(0).setDataStoreRef(dataStoreUri);

}

}

Page 21: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-21

Versioner Example

• So– file:///home/chris/mp3s/gangsta-rap.mp3

• …Yields– file:///path/to/mp3/repo/rap/50cent/gangsta-

rap.mp3

Page 22: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-22

The File Manager

• So, how do we put all these different generic interfaces together?

• Well, something like the following– A File Manager has…

• One or more data stores, to store data to• One or more metadata stores, to store metadata to• A set of Versioners that are associated with Product Types in

order to figure out how to generate the reference data store URIs for a particular product

• A Data Transferer that moves a Product’s file from the client to the File Manager using the source URIs and the data store URIs

• An external interface to it (e.g., XML-RPC, WebDAV, etc.)

Page 23: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-23

What’s implemented so far?

• The basic components of the architecture• Several default implementations of the interfaces

– javax.sql.DataSource based implementations of DataStore and MetadataStore

• Uses Apache’s DBCP for connection pooling– Local Data Transfer using Apache’s commons-io component

that can handle heirarchical product structures, as well as flat product structures

– Several versioners, including one that versions Products using the existing CAS approach of ProductTypeRepositoryPath/ProductName/Version, along with one that versions a product’s references based on production date time

– An external interface based on Apache’s XML-RPC

Page 24: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-24

What needs to be done?

• A lot!– Check out http://oodt.jpl.nasa.gov/vc/, and log in with your JPL Username

and Password. Navigate to “SVN”, and check out the cas-filemgr component.

– Modify the code– Look for bugs– Contribute!

• I find new bugs everyday– Feel free to talk to me about it– Create issues in JIRA (http://oodt.jpl.nasa.gov/jira/)

• Bug Fixes, RFIs, new features, you name it!

• Be sure to check out the apidocs– You can build these yourself by checking out cas-filemgr from our SVN

repository, and then typing: maven site– Or you can visit: http://terra.jpl.nasa.gov/~mattmann/oco/javadoc/cas-

filemgr/

Page 25: File Management Chris A. Mattmann OODT Component Working Group.

Apr 18, 2023 FILE-MGMT CAM-25

Questions?