Pan-STARRS PS1 Published Science Products Subsystem Critical Design Review
description
Transcript of Pan-STARRS PS1 Published Science Products Subsystem Critical Design Review
Pan-STARRS PS1 Published Science Products Subsystem
Critical Design Review
November 5-6, 2007
Honolulu
slide 2
CDR – Day 2
slide 3
Topics: November 6 / Day 2
• Welcome back (Heasley)• ODM Continued (JHU team)• The PSPS Data Retrieval Layer (Referentia team)• Lunch• The Web Based Interface (Heasley)• Pan-STARRS External Interfaces (Heasley)• PSPS Test Plan (Heasley)• PSPS Schedule (Heasley)• System Level Risk Assessment (Heasley)• Executive Session (Committee only)• Recap Meeting of Review Panel with Subsystem and component
leads• Adjourn
slide 4
The Object Data Manger SystemContinued
The Johns Hopkins Team
slide 5
The PSPS Data Retrieval Layer
The Referentia Team
slide 6
Pan-STARRS PS1 Published Science Products Subsystem
Critical Design Review
Data Retrieval Layer (DRL)
Nov 5-6, 2007
Referentia Systems, IncMatt Shawver, [email protected], 808-423-1900x111Kenn Yuen, [email protected] Richmond, [email protected] Knight, [email protected]
slide 7
Outline
Software Architecture Requirements and Implementation Plans Key Design Modifications Test Plan Development Schedule DRL Development Status Demo
slide 8
High Level Software Architecture
slide 9
DRL Software Architecture
DM Adaptor
Web Service Interface
Login / Session Manager
Tomcat Security
Web Based Interface
Java Web Service Proxy
Generic Web Service Client
Result Set Persistence(Java Caching System)
Query Manager Adaptor
PostgreSQLJDBC
Adaptor
MySQLJDBC
Adaptor
JHU Query Manager
PostgreSQLDriver
MySQLDriver
Data Managers
Test Web Based Interface
slide 10
DRL Requirements
Query Analysis Query Queuing and Execution Result Caching Result Retrieval Administrative
• Performance Monitoring• User Administration• Logging
Support multiple Data Managers• JHU Query Manager• MySQL• SQL Server• PostgreSQL
slide 11
Req: Query Analysis
From previous design requirements: Syntax validation Current DM Resource Load Query processing time estimate Schema information
slide 12
Query Analysis Implementation
For syntax validation, SQL Server PARSEONLY command
For performance status, SQL Server sp_monitor procedure
Highly database implementation dependent Exact prediction of query time is impossible. Instead
show query execution plan. Schema information will be retrieved by querying
database metadata views and functions For ODM, use Query Manager functionality when
available
slide 13
Req: Query Queuing and Execution
From previous design requirements: Query any of the Data Managers Provide status and progress information Provide priority level with validation status results Set query priority based on validation priority level
slide 14
Query Queuing and Execution Implementation
Issue: database query execution plans are not always accurate. Alternative implementation: treat all queries the same at first. Short, medium and long queues each have their own connections
allocated If a short or medium query takes longer than a certain amount of
time, it will be moved to a longer queue Queue sizes, and expiration times will be user configurable If the long queue runs out of query slots, most recent query will be
cancelled and restarted when a slot becomes available. For ODM, use Query Manager queuing functionality (user chooses
which queue to use)
slide 15
Req: Result Caching
From previous design requirements:Query result sets are stored in the DRL Cache until they have been retrieved by a PDC. Purge retrieved result sets if space is needed
slide 16
Result Caching Implementation
Maintain results in result set cache as long as possible to allow repeated retrieval of results
With a large enough cache (terabyte), results should be typically held for a week or more
Link to past results via query history Performance of the result set cache is critical for PSPS
responsiveness• Hybrid memory / disk cache
– LRU for memory and disk• In-memory index for fast disk retrieval• Retrieval of partial results• Efficiently support writing and reading multiple
concurrent result sets
slide 17
Result Caching (continued)
Java Caching System (JCS) Implementation• Web server caching system• Uses Java serialization• In memory storage, with swapping to indexed file on disk• Built-in capability for distributing cache across multiple
machines (untested)• Modified JCS to support synchronous puts when memory is full
(wait for space to be freed via disk write)• Store Result Set as a list of objects each made up of a block of
rows• Support many result sets (each result set can use as little as
one block of rows in memory)• Adding memory speeds cache
slide 18
Req: Result Retrieval
From previous design requirements:•The PS1 DRL shall return query results in response to a query request.
slide 19
Result Set Retrieval Implementation
Don’t slow down fast queries• Return results immediately if query is very fast
Enable incremental results for queries with large data volumes
Status updates with number of rows retrieved Execution status if supported by database Support streaming CSV file download
• Stream file directly from cache rather than creating on disk
slide 20
Req: Performance Monitoring
From previous design requirements: Allow administrators to monitor the performance of
the DRL functions• I/O statistics• CPU statistics• memory statistics• process statistics
at configurable levels of detail
slide 21
Performance Monitoring Implementation
JMX over RMI to provide management interface to all JVM information collected at runtime• Does not provide CPU information
Use cross platform third party library if more detailed information is required• YourKit (http://www.yourkit.com) is one good third
party option• Tradeoff: non-JVM profiling libraries incur overhead
Provide user configurable logs to a database to store historical information
slide 22
Req: User Administration
From previous design requirements: The PS1 DRL shall provide a computer security
system to protect the DRL and PSPS DM Components from unauthorized access.
The PS1 DRL shall provide and authenticate at least two levels of access.
The PS1 DRL shall provide and authenticate privileged access for use of the private Administrative API.
The PS1 DRL shall provide and authenticate standard access for use of the public API.
slide 23
User Administration Implementation
Initial plan: Tomcat Realm JDBC based security with an in-process database• Straightforward• Independent of other components• Allows administrator to create and modify user
accounts through web service• Allows association of additional information with user
account– Role– Query log– Available result sets– Running queries
slide 24
Req: Logging
From previous design requirements: Log major system events
• Query events• Unsuccessful Authentication Attempts• Server restarts• Any errors
slide 25
Logging Implementation
Log results via JDBC to in-process database Move to external database if DRL is clustered in the
future Logs linked to user accounts (stored in same database)
slide 26
Key Modifications: DRL – DM ICD Changes
For non-ODM Data Managers, DRL should utilize JDBC directly rather than RMI and MBeans for performance and flexibility reasons• JDBC optimized for transfer of result set data• JDBC already abstracts much of database
implementation details• Eliminate the RMI step, increase performance, and
reduce complexity for Database developers• Use database security for DM rather than custom
J2EE security
slide 27
Driver Specifics
Performance• Result set batching
Data Types Schema Information Retrieval Performance information retrieval
slide 28
Key Modifications: Caching Changes
Instead of purging results as soon as they are retrieved, associates results with query history and keep around as long as possible
slide 29
Key Modifications: Session Management
Connection and data persistence across web service calls
Get UUID back on login to identify session• UUID generator security (Java randomUUID for
cryptographically strong randomness) Web Services don’t usually save state
• In this case, UUID tied to JHU Query Manager session
slide 30
Test Plan
Initial test plan draft developed Includes more test details than existing DRL test plan Key realization:
• Need to define minimum requirements for integration of new Data Managers
• Data Manager acceptance testing needed Will be updated as we continue to make design
decisions and as software is implemented
slide 31
Performance Testing
Performance Critical Components• Result Set Persistence
– Stream large result sets directly to/from disk• HTTP for Data Transfer
– Zip Data Compression• JDBC Drivers
– Optimize use of JDBC Driver• Server Threading
– Test with many distributed clients downloading– Connection persistence across web service calls
• Division of machines / processes
slide 32
Software Delivery
Implementation will be provided using a version of the SDSS database as an example backend. MySQL and PostgreSQL will also be supported.
Example WBI will be provided with software Example Java and .NET client applications will also be
provided Automated test suite will also be delivered
slide 33
Schedule
slide 34
Status
Completed review of specification No DRL problems identified Technologies chosen for implementation
• Tomcat• Axis2 Web Service• Java Caching System for Result Set caching• Microsoft JDBC SQL Server Driver
Initial web service proof of concept developed Draft test plan document
slide 35
Demo
slide 36
The PSPS Web Based Interface & User Interface Clients
Jim Heasley
slide 37
The WBI
The WBI provides an interface for a human user to request published science data from the data stores via the DRL. It is one example of a Published Data Client (PDC). Note that there can be more than one PDC providing the same functionality.
The WBI provides both a Standard User API and an Administrative User API.
The WBI is in fact a combination of the infrastructure needed to connect to the DRL and some number of clients that access the PSPS data stores via the DRL.
Driving requirement – SECURITY • Preventing unauthorized access• Not about encrypting data for privacy!
REALITY CHECK – it’s a web server with some clients attached!
slide 38
WBI Components
slide 39
WBI Components
The WBI Software Design Description is in the SAIC generated document PSDC-630-WBI-SDD
The WBI Components are• Administrative User View
– The user interface for an authenticated WBI administrator• Administrative Web Service Driver
– Programming interface that converts a method call to its mapped SOAP message and sends it to the DRL.
– There is a 1-to-1 mapping of requests handled by the Administrative Service Driver to SOAP messages defined in the DRL WSDL
– Documented in PSDC-630-DRL-PDC-Private-ICD.html• Request Controller
– Provides stateful management of user requests that may persist longer than a user’s WBI session.
slide 40
WBI Components
The WBI Components (continued):• Standard User View
– Provides the user interface to an authenticated non-administrative WBI user.
• Standard User Web Service Driver– Provides a programming interface that converts a method
call to its mapped SOAP message and transmits it to the DRL. It also performs reverse function for responses/faults received from the DRL. There is a 1-to-1 mapping of request to the SOAP messages defined in the DRL WSDL.
– Documented in PSDC-630-DLR-PDC-Public-ICD.html• WBI Account Manager
– Responsible for authenticating users and granting access permissions to WBI functionality. Users will be identified by a user name and password which serves as the authentication credential.
slide 41
WBI Components
The WBI Components (continued):• WBI Log Configuration Manager
– Permits an administrator to define logs, define the level & verbosity of event reporting, and identify events reported to administrators.
• WBI Log Manger– Initializes logs on startup as defined in a
configuration file.– Coordinates logs from multiple WBI components
to ensure only level of logging specified is done
slide 42
WBI Detailed Design
Main challenge – negotiation of the Web Services Interface to the DRL.
These web services make use of concepts outside the realm familiar to traditional scientific programers, e.g.,• XML• SOAP• WSDL• X.509 Certificates• Digital signatures
To simplify access for the WBI and other PDCs the Standard Web Service Driver has been encapsulated in an optional Java-based component named PDC-Core.
Documentation of this reference implementation is provided in the SAIC generated document
PSDC-630-WBI-SDD-Addendum-A_Detailed-Design.html
slide 43
WBI User Interfaces
The components described to this point are there to provide the low-level functionality necessary to ask for and return data from the DRL and the data managers which connected to it. The astronomers won’t interact with them directly.
The USER INTERFACES are the web applications that use the web services provided by the WBI and DRL are the tools with which the astronomers will interact.
As mentioned yesterday, we have followed the advice of the PDR committee and are providing access via “recycled” web applications (the SDSS Casjobs web interface, hereafter Query Manager = Qm), reused tools (from the MOPS), and a work-alike clone of another existing web app (IPAC’s Gator).
slide 44
The SDSS Casjobs Web Interface
slide 45
PS1 Casjobs Interface = Qm
slide 46
A PS1 Menu Driven Web Application
Following the PDR, I developed a prototype of a menu driven web application for accessing the tables in the PS1 data base, modeled on the IPAC’s Infrared Science Archive Gator interface.
This application was developed using PHP, a server-side HTML embedded scripting language. There are PHP APIs available for most major databases.
The user interface allows generating SQL commands from scratch in a roll-your-own window or automated SQL generation from check box selection of database attributes and user specified spatial constraints.
The interface is configured using information stored in a MySQL database. This allows easy modification of schema, help files, etc.
slide 47
slide 48
sqlpage
catalogs collections
schemamakeMenu glossary
generateSQL
submitSQL
Main Window
Menu Driven Queries
Roll your own SQL queries
Menu Driven Web Interface Modules
slide 50
Crocodile Demo
This first demonstration shows Crocodile set up to use the 2MASS point source and extended source databases, the USNO UCAC astrometric catalog, the UCAC bright star supplemental catalog, the USNO-B catalog, and the Tycho 2 catalog. (Only 1% of the 2MASS PSC is implemented for the demo, along with the 2MASS XSC and the UCAC bright star supplement.)
This second demonstration shows an implementation of the Crocodile user interface configured to use an early version of the Pan-STARRS schema. There’s no back end database attached to this demo.
slide 51
MOPS Tools
Within the MOPS subsystem the group (in particular Larry Denneau) has developed an extensive set of software tools for interacting with the MOPS DB. As the SSDM will be a copy (and hot spare) of the MOPS DB, these tools can be interfaced to the WBI to provide access to the SSDM for use by astronomers without impacting the MOPS functional DB.
As Larry noted yesterday, the MOPS tools have been developed in PERL.
The next 3 slides are screen shots of• A summary page of a single night’s coverage with links to
additional information• A section of the tracklet and linking efficiency page for a
lunation• An a (semimajor axis) vs. e (eccentricity) plot of orbits
discovered by MOPS
slide 52
MOPS Tools
slide 53
MOPS Tools
slide 54
MOPS Tools
slide 55
WBI Hardware Plan
The anticipated hardware configuration for the WBI component of the PSPS includes:• 2 servers to provide a full Windows common infrastructure• 2 web servers to host the WBI components and possibly
copies of the DRL software• A network switch (over the entire PSPS) that provides load
balancing for the web servers• Mass storage (~several TBytes) to store output from the
various DMs. Software implementation will include
• Common infrastructure for Windows configuration• Windows 2003 servers• IIS web server software• Windows automated patch installation software
slide 56
WBI Risk Assessment
Overall Risk Assessment for this subsystem is
LOW! Because
– The APIs are very well defined and use well known (in the computer community) software definitions.
– We are recycling and reusing existing software and creating a clone of a successful scientific database interface
– It’s only a damn web server system—it’s NOT rocket science.
slide 57
PSPS External Interfaces
Jim Heasley
slide 58
PSPS External Interfaces
The PS1 PSPS will (initially) receive data from the two data managers being developed under the PS1 construction program: the ODM & MOPS. The external interfaces defined for these two subsystems will provide the basic template for connecting future databases and data stores (e.g., image archives) to the PSPS.
Actual data transfer will be done via the Data Store mechanism described yesterday morning.
For each data store interfaced to a DM we define:• An Interface Requirements Specification (IRS) that
tells what we must transfer.• An Interface Control Document (ICD) that tells how
the transfer takes place and the details of what is actually transferred.
slide 59
PSPS External Interfaces
We have documented these interfaces as follows:• IPP-PSPS
– PSDC 930-006: The Interface Requirements Specification – PSDC 940-006: The Interface Control Document
• MOPS-PSPS– PSDC 930-007: The Interface Requirements Specification – PSDC 940-007: The Interface Control Document
For brevity, in this section I will only discuss the interfacing between the PSPS and the IPP because• The MOPS-PSPS interfacing is much simpler in that we are
moving a dump of the MySQL database held within the MOPS.• The IPP-PSPS interface is more complicated than that for
MOP-PSPS
slide 60
The PS1 IPP-PSPS IRS
Defined IPP Data Products• Predefined Objects determined by the IPP during
commissioning and testing.• Sky Cell Definitions• Camera Configuration Information• Detection Attributes common to P2, P4Σ, P4Δ, and image
stack sources• Detection Attributes common to P2, P4Σ , P4Δ sources when
PSF fitting is insufficient• Attributes for non-psf sources in image stacks for sources that
are not well described by a PSF fit.• P2, P4Σ, P4Δ photometry transformation metadata• Frame Descriptor Metadata (per exposure)• P2 Image Metadata (per OTA per exposure)
slide 61
The PS1 IPP-PSPS IRS
Defined IPP Data Products (continued)• P4Σ, P4Δ, Image Stack Metadata• Definitions of alternative source fitting models used
on the P2, P4Σ, P4Δ sources that don’t fit the PSF• List of Survey Programs• List of Photo-Z recipes
Defined PSPS (derived) Data Products• Object Attributes/Statistics
IPP Data Publication Process PSPS Communication Process
slide 62
The PS1 IPP-PSPS ICD
The IPP-PSPS ICD describes the details of the publication process of data moving from IPP to PSPS.
The IPP defines a 1 time transfer to catalog objects from IPP to PSPS to initialize the objects in the ODM. This is done to jump start the ODM and detection-to-object correlation process.
The PSPS publication cycle consists of a job, which itself consists of a set of batches of FITS files and a batch manifest file
The monthly publication process has 4 stages:• Verification that the FITS files match the batch manifest. If not send,
Batch Manifest Error Notification.• Verification of the integrity of FITS files via checksums. If not, send
Batch Corruption Error Notification.• Verification that the content of the FITS files match the manifest
description. If not, send Batch Content Error Notification.• Detailed verification of every datum in each FITS file. If not, send a
Batch Verification Error Notification. When IPP has shipped all the batches for a job it sends PSPS a batch
manifest file.
slide 63
The PS1 IPP-PSPS ICD
If any error notification is sent the appropriate correction process for IPP is defined by the ICD.
The ICD also describes the situations that could give rise to a broken publication cycle and the actions to be taken to resynchronize the publication process should that occur.
Note: times for publication cycle are based on an old (SAIC) conservation ingest rate. We now expect to do this much faster and may be able to publish on a faster time scale.
Data transfer from IPP to PSPS is assumed to occur in a trusted environment. No authentication measures are specified.
Data moves as a push from IPP to PSPS. The interface between the systems is supplemented by event notifications via email from PSPS to the IPP.
slide 64
The PS1 IPP-PSPS ICD
The following notifications from PSPS to IPP are defined:• Batch Manifest Error
• Batch Corruption Error
• Batch Received
• Batch Refused Error (can’t accept data)
• Batch Content Error
• Batch Verification Error
• Job Manifest Error (manifest lists batches not found)
• Job Verification Error (data from different batches inconsistent)
• Job Verification Successful
• Job Ingest Failure (data verified but PSPS can’t ingest it)
• Job Publication
slide 65
The PS1 IPP-PSPS ICD
Manifest files are simple XML files describing the appropriate structure (e.g., batch or job).
The ICD describes the structure and format of the FITS files used to transmit the IPP data products.
The ICD describes in detail the FITS “cards” used in the Header Data Units (HDU) for each component of the FITS files used to transfer the IPP data products.
The data attributes themselves are included in the FITS files as binary FITS tables as defined in the appropriate HDU.
slide 66
The PSPS Test Plan
Jim Heasley
slide 67
Test Plan Overview
SAIC designed a PSPS Software Test Plan (STP) that is presented in PSDC-630-STP. While the document was prepared at a time when SAIC was the primary support for the PSPS development (and hence is worded to reflect their expectation to be doing the code development around an Oracle database system and the subsequent testing), the test plan itself is very generic and database independent.
As with the previous discussion of the PSPS external interfaces, we will concentrate here on the testing of the ODM.
The primary subsections of the STP cover • Test environment• Test identification• Test Schedule
slide 68
PSPS Software Test Environment
slide 69
Software Test Environment
The software test environment is designed to support:• Design validation• Component testing• Integration testing• Deployment testing
PSPS Test Software will include the following tools:• DM simulator• DRL simulator• IPP simulator• MOPS simulator• WBI simulator• WBI user simulator
slide 70
Test Data from IPP
The IPP will need to provide test data sets for all the data coming to the PSPS as defined in the ICD. In particular, for the ODM we will need samples of:• Alternate Source Model Fits• Camera configuration • Image stack (cumulative sky) data • P2 Frame data• P4Δ high significance data• P4Δ low significance data• Photometry transformation information• Predefined objects from IPP• Sky cell definitions• Survey program definitions
slide 71
Test Data Sets
The STP defines test data sets that are jobs and batches of the IfA provided data sets that are used to evaluate the ingest mode of the ODM. In these tests one or more of the sample data sets are corrupted to test whether the software can identify the data values that are inappropriate or out of range.
slide 72
Test Identification
Test Levels• Design validation
– Tests to be peformed as part of the development process during the design an code/unit test phase.
• Component testing– Tests performed to verify each component meets
specifications• Integration testing
– Tests conducted by JHU, Referentia, and IfA to confirm the PSPS components work with each other
• Deployment testing– Tests to verify PSPS works with IPP and MOPS
slide 73
Planned Tests
Test cases describe prerequisite conditions, inputs, expected results, conditions for success, assumptions, constraints, and outputs to a test log when a test procedure is performed. One or more test cases are defined in the Software Test Description for every test (PSDC-630-STD).
The test cases are organized by component/subsystem as follows:• WBI• DRL• DM• PSPS
slide 74
Real World Testing
Fact of life: the development timelines for the PSPS components differ, in particular, the more complex ODM will be finished later than the others.
The good news: we already have a working MOPS DB and Perl based queries, the DRL is making excellent progress.
So, I plan to have the first “full” integration testing of the PSPS done earlier than ODM’s completion by have an end-to-end testing using the WBI MOPS client, the DRL, and the SSDM (MOPS clone).
JHU can provide a MyBestPS1 minisystem for testing purposes at Referentia, and network access to the ODM prototype for testing and integration purposes.
slide 75
The PSPS Implementation Schedule
Jim Heasley
slide 76
Schedule
The three components of the PSPS have rather different time scales for completion.
The driving factor for the PSPS implementation schedule is the ODM work at JHU. You’ve already heard from that team that we anticipate a complete system design to be available and tested by September 2008.
The DRL development work by Referentia is due to be completed by April 1, 2008.
No detailed schedule is available on the WBI infrastructure software (as Mr. Holmberg hasn’t started work yet). As shown in the previous sections, we already have working prototypes of all three web access clients we intend to implement.
slide 77
Schedule – Target Mile Stones
Jan. 2008 - Define hardware for Yr 1 configuration Jan. 2008 - New software engineers start @ IfA, JHU Apr. 2008 - Referentia finishes DRL programming Aug. 2008 - JHU Finishes ODM design Sep. 2008 - Hardware ordered for PSPS Oct. 2008 - Install hardware at site Nov. 2008 - System integration Dec. 2008 - PSPS ORR
slide 78
Risk Evaluation & Mitigation Plan
Jim Heasley
slide 79
Risk
Paraphrasing an American “philosopher”:
There are the things you know you know,
there are the things that you know you don’t know,
And there are the things that you don’t know that you don’t know.
D. Rumsfeld
slide 80
Risk
In this section we want to show that• We’ve maximized those things in the first category.• Figured out what we need to do about things in the
second category• And hopefully minimized those in the third category!
slide 81
Inter-Subsystem Risks
Telescope, Camera, and Observatory and Telescope Instrumentation Software Subsystems• No direct risk to PSPS schedule or cost• Any delay on their side gives schedule relief to PSPS
integration• Overall risk to PSPS - LOW
slide 82
Inter-Subsystem Risks
Moving Object Processing Software (MOPS)• The Solar System Data Manager (SSDM) in PSPS will
import much simpler data products than those provided by the IPP.
• The SSDM is will be clone of the MOPS internal DM. MOPS decides what it exports.
• Maturity of MOPS-PSPS interface does not pose significant risk to PSPS development.
• Overall risks to PSPS - LOW
slide 83
Inter-Subsystem Risks
Image Processing Pipeline (IPP)• The reality requirement: PSPS must ingest the data products
from IPP at the nominal monthly processing rate… whatever it turns out to be
• IPP is the stressing data source for the PS1 PSPS• IPP is ahead of PSPS in the design/review process
– Some fine tuning of the data attributes and file formats may be needed to reach a final configuration
• The good news is that IPP will be processing data for ~6 months before it is “ready” for delivery to the PSPS, so we will have ample opportunity to test both simulated and real data.
Risk Level – LOW, provided we are able to iterate/refine issues with the IPP.
slide 84
Intra-Subsystem Risks
WBI – LOW DRL – LOW SSDM – LOW ODM –LOW to MODERATE
slide 85
General Areas of Risk
Science Rules• Over the course of these presentations there have been some points
that have been left unspecified on purpose, e.g. radius size for detection-object correlation, number of detections to create an object, etc. These will ultimately be determined by experimentation with real data from the IPP.
• We believe this is a LOW risk issue provided IPP can provide to us the appropriate data before system deployment so that we can conduction the appropriate experiments and deduce the rules.
Short time scale versions• This is the getting the same answer as you did from an old query in a
database which is constantly being updated (and hopefully improved)! • As noted the previously we hope to deal with this by providing snap
shots of the object table and by providing adequate information in the schema so that one could recover to the state the database was in when the query was made.
• Issues generating risk are how often are the snap shots made, how long do they persist, how easy and/or fast does the process of rerunning an old query have to be? Risks here involve cost of storage for the snap shots and perhaps programming effort.
• Risk is LOW unless “unreasonable” demands are placed on the ODM.
slide 86
General Areas of Risk
Major data recalibrations and versions• At the end of the 1st year of operation the PS1 team intends to
develop new astrometric and photometric calibrations to the data derived from the data in hand at that time.
• Such new calibrations can potentially change every derived number in the database for both detections and objects.
• Experience with SDSS indicates that astronomers will also want to keep the “DR1” of the database around as they will have written papers based upon that calibration. Full versioning of the old tables poses a COST RISK.
• It may be feasible to solve this problem by providing correction tables for new calibrations and presenting the different calibrations to users as database views. While this should work in principle, it has not been tested in practice and hence represents a MODERATE risk.
slide 87
General Areas of Risk
Dealing with crawlers and value added data generators• The IPP and PSPS have now agreed that the value added
products will be calculated by teams from the science consortium and not be performed within the database.
• We have three potential schemes available to serve these products back to users
– By providing tables in the ODM into which user provided data could be loaded if it is deemed to be especially desirable to serve it this way. These data would be loaded on a schedule that coincides with regular ODM updates.
– As client web interfaces and DMs attached to the DRL.– PS1 Consortium clients could have their own MyDB space
associated with the ODM (either within the project storage area or as add on storage paid for by the client) and their tables could be exported to other PSPS ODM users via the Qm group mechanism.
• Overall risk to PSPS is LOW (after initial efforts to get the process going).
slide 88
General Areas of Risk
Geoplexing• We need to be able to distribute incremental updates of the
ODM to other sites off Maui in order to– Maintain a physically distinct (and hence) safe backup of
the ODM in case of catastrophic events at the primary ODM site.
– Have a mechanism to provide data to PS1 Consortium partners for their ODM mirrors.
• This process is ultimately limited by the bandwidth available between Maui and the alternate sites.
• Risk level is LOW to MODERATE for providing regular incremental updates but HIGH if one needed to recreate the primary site from one of the remote ones.
slide 89
General Areas of Risk
Location, Location, Location!• The site for the PSPS has not been determined, with
both the IfA buildings on Maui and Oahu being possible choices.
• Strictly speaking this is an operational issue (with some issues that involve the PS1 Consortium) that doesn’t need to be addressed at CDR, but this uncertainty does have implications on the design!
• Risk assessment– MODERATE to HIGH?
slide 90
General Areas of Risk
Staffing• Software development staffing
– We have adequate personnel to deal with the WBI and DRL development. The ODM situation is tighter but will be helped with the addition of two FTE provided by the PS Project in January.
– Risk – LOW to MODERATE• Operational Staffing
– To date there has been no clear discussion of the requirements for an operational staff to install or administer the PSPS system. This was largely because until the completion of the CDR we do not know exactly what we’re planning for.
– The JHU team advises that we need to have 2 people trained to perform the hardware installation and system admin functions.
– Risk –MODERATE as it’s budget issue, but unclear to me whether it belongs to the Project or the Consortium (or both).
slide 91
General Areas of Risk
PSPS stability/trouble shooting/testing/upgrades• Good engineering practice dictates that one does
not do testing for stability, trouble shoot, or upgrades on a production system.
• We believe that to be able to address the problems that are almost certainly going to arise with a new system like the ODM we will require a testing system for this purpose.
• This system was not anticipated in the original PSPS plan but should fit within the maximum possible budget available to the subsystem. Without this test system overall risk to PSPS would be HIGH.
slide 92
General Areas of Risk
PSPS stability/trouble shooting/testing/upgrades• Good engineering practice dictates that one does
not do testing for stability, trouble shoot, or upgrades on a production system.
• We believe that to be able to address the problems that are almost certainly going to arise with a new system like the ODM we will require a testing system for this purpose.
• This system was not anticipated in the original PSPS plan but should fit within the maximum possible budget available to the subsystem. Without this test system overall risk to PSPS would be HIGH.
slide 93
General Areas of Risk
Budget• My concern here is not what it was at PDR where it
was unclear what the project was willing to commit to the PSPS development, but rather a realistic worry that if some other critical subsystem suffers a catastrophic problem funds targeted to PSPS will be diverted for that purpose. At that point we will have to make some cutbacks in what we want to do with the PSPS system.
• Risk level ???
slide 94
In Conclusion
slide 95
Does Our Design Meet the Top Level Design Requirements?
3.3.01 The PSPS shall be able to ingest a total of 1.5x1011 P2 detections, 8.3x1010 cumulative sky detections, and 5.5 x109 celestial objects together with their linkages.• Yes. Prototyping efforts for the ODM show this target can be
achieved. 3.3.02 The PSPS shall be able to ingest the observational
metadata for up to a total of 1,050,000 observations.• Yes. Prototyping efforts for the ODM show this target can be
achieved. 3.3.0.3 The PS1 PSPS shall be capable of archiving up to ~ 100
Terabytes of data (TBR).• Yes. Prototyping efforts for the ODM show this target can be
achieved. 3.3.0.4 The PSPS shall archive the PS1 data products.
• Yes. The design provides sufficient redundancy that the products can not only be input into the database but also reliably maintained in case of hardware failures.
slide 96
Does Our Design Meet the Top Level Design Requirements?
3.3.0.5 The PSPS shall possess a computer security system to protect potentially vulnerable subsystems from malicious external actions.• Yes. Multiple layers of security are built in via the WBI, the DRL, and
the Data Stores. 3.3.0.6 The PSPS shall provide end-users access to detections of objects
in the Pan-STARRS databases.• Yes. Multiple WBI interfaces are available to provide interactive user
access as well as allowing for machine based access by Published Data Clients.
3.3.0.7 The PSPS shall provide end-users access to the cumulative stationary sky images generated by the Pan-STARRS.• Yes. For PS1 the PSPS will serve as a front end for requests for
images from the IPP. 3.3.0.8 The PSPS shall provide end-users with metadata required to
interpret the observational legacy and processing history of the Pan-STARRS data products.• Yes. The relevant metadata are transferred fro the IPP along with the
data products and the schema of the ODM provides for their access.
slide 97
Does Our Design Meet the Top Level Design Requirements?
3.3.0.9 The PSPS shall provide end-users with Pan-STARRS detections of objects in the Solar System for which attributes can be assigned.• Yes. These are contained in the MOPS DB and the SSDM component
of the PSPS is a “hot spare” of the former. 3.3.0.10 The PSPS shall provide end-users with derived Solar System
objects deduced from Pan-STARRS attributed observations and observations from other sources. • Yes. These are contained in the MOPS DB and the SSDM component
of the PSPS is a “hot spare” of the former. 3.3.0.11 The PSPS shall provide the capability for end-users to construct
queries to search the Pan-STARRS data products over space and time to examine magnitudes, colors, and proper motions.• Yes. This will be possible through multiple mechanisms attached to
the DRL through several web clients. 3.3.0.12 The PSPS shall provide a mass storage system with a reliability
requirement of 99.9% (TBR).• Yes. The ODM design provides sufficient redundancy (based on
experience with the SDSS database) to reach this level of reliability.
slide 98
Does Our Design Meet the Top Level Design Requirements?
3.3.0.13 The PSPS baseline configuration should accommodate future additions of databases (i.e., be expandable).• Yes. The basic design of the DRL provides for the
possibility to add both new published data clients and data stores. Further, we have multiple approaches to incorporate “value added” data products within the PSPS design.
slide 99
Does the PSPS Design Meet the CDR Criteria?
Has this design we’ve presented matured to the CDR level?• We believe this is the case. • Indeed all other PS1 subsystems that passed CDR
have experienced continuing development. That’s not an excuse, just a statement of fact.
Is the design reliable and maintainable?• In our opinion, yes.
Is the hardware within the scope of the project budget?• The current design uses generally available
commodity hardware. We continue to seek leveraged hardware acquisition with interested vendors.
slide 101
Does the PSPS Design Meet the CDR Criteria?
Is the test plan well conceived?• Yes.
Are the interface designs mature?• The interfaces to other PS1 subsystems are in fact stable and
have changed little since PDR. That doesn’t mean that some “tweaking” won’t be needed.
• The Data Store component is already used by other PS1 subsystems.
Have the risks been identified and mitigation strategies spelled out?• Yes, we believe so.• Also note that of the risks we have identified none are fatal to
the basic design or cause us to not meet any requirements. These risks are in many ways related to convenience to the PSPS users.
slide 102
Finally
Hopefully the very bright folks I’ve been working with at JHU and Referentia (and previously at SAIC) have found solutions for the hard problems and anticipated the ones which might become hard.
Now, one of your tasks as the CDR panel is to help us by telling us what we don’t know that we don’t know.
And so, for patiently listening to us for the past 2 days
slide 103
Mahalo!