U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May...
-
Upload
bartholomew-franklin -
Category
Documents
-
view
214 -
download
1
Transcript of U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May...
U.S. Department of the InteriorU.S. Geological Survey
Information Technology Exchange MeetingMay 24 – 28, 2010
A New Decade in Support of Science
Thinking differently about data management
“Adventures in support of ‘project’ data”
Steve Tessler, NJ WSCBrian Reece, TX WSC
Information Technology Exchange Meeting
A New Decade in Support of Science
Save a Process, not Intermediate Data
• Many datasets need to be converted from format ‘A’ to format ‘B’.
• Loading applications are one example of a conversion or transformation tool, but Researchers are often faced with converting ‘structured’ data to one or more separate formats for specific uses, and frequently save all intermediate steps as separate files or tables -- creating versioning nightmares and archival backaches… you know the drill…
Data management is like tending a garden – care today means a good harvest tomorrow.
Information Technology Exchange Meeting
A New Decade in Support of Science
The Staging-Area approachto Data Handling
• Here’s a simple technique for transforming data from ‘format A’ to ‘format B’ – that semi-skilled data handlers can learn and use on their own
• I use MS Access for this, but the approach can be applied or implemented using other dbms’s or toolsets
• At a minimum all it takes is an ActionSet table, one function, some Source data, and a Target structure – Let me illustrate with a more complex example….
Repeating Groups? We don’t need no stinkin’ repeating groups!
Information Technology Exchange Meeting
A New Decade in Support of Science
Information Technology Exchange Meeting
A New Decade in Support of Science
Information Technology Exchange Meeting
A New Decade in Support of Science
The Staging-Area approachto Data Handling
Information Technology Exchange Meeting
A New Decade in Support of Science
The Staging-Area approachto Data Handling
Benefits• No tools to buy or new languages to learn (XSLT)
• Source data are never changed nor intermediates saved, and target tables meet the ‘format’ needs of the researcher (or replace ‘bad’ initial structures)
• Creates a logical ‘one button’ solution based on sequenced ‘action queries’ that users understand and can modify on their own
• No more ‘numbered’ query names – that need to be renumbered
• Very easy to Halt the sequence to check intermediate steps
• The process is self-documenting (descriptions in ActionSet table)
Prevent ‘mystery data’ by giving all tables and fields good definitions
Information Technology Exchange Meeting
A New Decade in Support of Science
Minimum Documentation for Project Datasets
We’ve talked this week about Metadata as a requirement for documenting and sharing project-level datasets. Metadata can take many forms but all are designed to
inform a potential user about the nature and content of the dataset
Here is a list of the other things I wish I got along with every dataset I was given to work with…
Data mismanagement is like tending a garden – dirt everywhere!
Information Technology Exchange Meeting
A New Decade in Support of Science
Minimum Documentation for Project Datasets
A basic Data Dictionary – table and field definitionsDescriptions of Reference/Domain table items
An Entity-Relationship diagramA ‘Loading Order’ report
A Data-Mapping spreadsheet (if mapped or moved from A to B)
Always start with a good data model.
Information Technology Exchange Meeting
A New Decade in Support of Science
Minimum Documentation for Project Datasets
Information Technology Exchange Meeting
A New Decade in Support of Science
Minimum Documentation for Project Datasets
Information Technology Exchange Meeting
A New Decade in Support of Science
Minimum Documentation for Project Datasets
Just kidding, sort of
• Uninstall Excel
• Excel surcharge
• Security features prevent people from running macros, login screen reminds us govt. computer, why not reminder Excel isn’t a data management tool
• Excel rules of behavior annual test?
A clue you need a data management plan
5/9/2009 8:51:11 File Name/Path too Long - Could not write file: \\igskiacwgsnas\PeerSync\Houston\Logger_Data\MATT-D-DRIVE\Matt\USGS\BACK UP\Inflow Pieces\Feb - June 08\3 17 08\Inflow\Other\Inflow bkup\Inflow\Inflow\data\all other\misc\histdata\EFork\EFork SPLUS\ESTREND\TrenTstRes\Cen Const\NH3 OrgN wf\dif dtval p sl.xls
Another clue
This_is_a_really long filename_created_on_May 25, 2010_by_bdreece.pdf
Think abstraction
• You are generating digital objects, and eventually run out of filenames / directories