U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May...

17
U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking differently about data management “Adventures in support of ‘project’ data” Steve Tessler, NJ WSC Brian Reece, TX WSC

Transcript of U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May...

Page 1: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

U.S. Department of the InteriorU.S. Geological Survey

Information Technology Exchange MeetingMay 24 – 28, 2010

A New Decade in Support of Science

Thinking differently about data management

“Adventures in support of ‘project’ data”

Steve Tessler, NJ WSCBrian Reece, TX WSC

Page 2: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

Save a Process, not Intermediate Data

• Many datasets need to be converted from format ‘A’ to format ‘B’.

• Loading applications are one example of a conversion or transformation tool, but Researchers are often faced with converting ‘structured’ data to one or more separate formats for specific uses, and frequently save all intermediate steps as separate files or tables -- creating versioning nightmares and archival backaches… you know the drill…

Data management is like tending a garden – care today means a good harvest tomorrow.

Page 3: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

The Staging-Area approachto Data Handling

• Here’s a simple technique for transforming data from ‘format A’ to ‘format B’ – that semi-skilled data handlers can learn and use on their own

• I use MS Access for this, but the approach can be applied or implemented using other dbms’s or toolsets

• At a minimum all it takes is an ActionSet table, one function, some Source data, and a Target structure – Let me illustrate with a more complex example….

Repeating Groups? We don’t need no stinkin’ repeating groups!

Page 4: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

Page 5: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

Page 6: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

The Staging-Area approachto Data Handling

Page 7: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

The Staging-Area approachto Data Handling

Benefits• No tools to buy or new languages to learn (XSLT)

• Source data are never changed nor intermediates saved, and target tables meet the ‘format’ needs of the researcher (or replace ‘bad’ initial structures)

• Creates a logical ‘one button’ solution based on sequenced ‘action queries’ that users understand and can modify on their own

• No more ‘numbered’ query names – that need to be renumbered

• Very easy to Halt the sequence to check intermediate steps

• The process is self-documenting (descriptions in ActionSet table)

Prevent ‘mystery data’ by giving all tables and fields good definitions

Page 8: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

Minimum Documentation for Project Datasets

We’ve talked this week about Metadata as a requirement for documenting and sharing project-level datasets. Metadata can take many forms but all are designed to

inform a potential user about the nature and content of the dataset

Here is a list of the other things I wish I got along with every dataset I was given to work with…

Data mismanagement is like tending a garden – dirt everywhere!

Page 9: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

Minimum Documentation for Project Datasets

A basic Data Dictionary – table and field definitionsDescriptions of Reference/Domain table items

An Entity-Relationship diagramA ‘Loading Order’ report

A Data-Mapping spreadsheet (if mapped or moved from A to B)

Always start with a good data model.

Page 10: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

Minimum Documentation for Project Datasets

Page 11: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

Minimum Documentation for Project Datasets

Page 12: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Information Technology Exchange Meeting

A New Decade in Support of Science

Minimum Documentation for Project Datasets

Page 13: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Just kidding, sort of

• Uninstall Excel

• Excel surcharge

• Security features prevent people from running macros, login screen reminds us govt. computer, why not reminder Excel isn’t a data management tool

• Excel rules of behavior annual test?

Page 14: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

A clue you need a data management plan

5/9/2009 8:51:11 File Name/Path too Long - Could not write file: \\igskiacwgsnas\PeerSync\Houston\Logger_Data\MATT-D-DRIVE\Matt\USGS\BACK UP\Inflow Pieces\Feb - June 08\3 17 08\Inflow\Other\Inflow bkup\Inflow\Inflow\data\all other\misc\histdata\EFork\EFork SPLUS\ESTREND\TrenTstRes\Cen Const\NH3 OrgN wf\dif dtval p sl.xls

Page 15: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Another clue

This_is_a_really long filename_created_on_May 25, 2010_by_bdreece.pdf

Page 16: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Think abstraction

• You are generating digital objects, and eventually run out of filenames / directories

Page 17: U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.