Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass...

16
Managing Large Amounts of Mass Spectrometry Data Using Agilent OpenLAB ECM Technical Overview Abstract Over the past few decades, the amount of data generated in mass spectrometry laboratories has increased exponentially due to the fact that newer instruments are processing many more samples in the same amount of time and governments have expanded their regulatory requirements for electronic data management in certain industry sectors. The complexity and cost associated with data storage have also increased, adding to the challenges that laboratory managers and researchers face as they try to find a way to manage their data. OpenLAB ECM offers an effective solution that simplifies data management, saves time, and reduces costs while addressing data archival, data sharing, application integration, and regulatory requirements in the laboratory and around the globe. This technical overview describes key aspects of the OpenLAB ECM solution and provides a range of system recommendations that suit the varying data requirements of today’s mass spectrometry laboratories.

Transcript of Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass...

Page 1: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

Managing Large Amounts of Mass Spectrometry Data Using Agilent OpenLAB ECM

Technical Overview

Abstract

Over the past few decades, the amount of data generated in mass spectrometry

laboratories has increased exponentially due to the fact that newer instruments

are processing many more samples in the same amount of time and governments

have expanded their regulatory requirements for electronic data management in

certain industry sectors. The complexity and cost associated with data storage

have also increased, adding to the challenges that laboratory managers and

researchers face as they try to find a way to manage their data. OpenLAB ECM

offers an effective solution that simplifies data management, saves time, and

reduces costs while addressing data archival, data sharing, application integration,

and regulatory requirements in the laboratory and around the globe. This technical

overview describes key aspects of the OpenLAB ECM solution and provides a

range of system recommendations that suit the varying data requirements of

today’s mass spectrometry laboratories.

Page 2: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

2

IntroductionOver the past few decades, the amount of data generated each day in mass spectrometry laboratories has continually increased. Researchers and laboratory managers are constantly faced with the challenge of collecting and storing very large amounts of data on a daily basis as the result of an increase in samples processed each day and stringent regulatory requirements.

Increasing sample numbers

Laboratories are constantly looking for instruments and systems that process more samples in less time to improve efficiency and productivity in discovery, development, and manufacturing. As a result, instrument manufacturers are developing high-speed analytical instruments and data systems which generate reliable data within shorter periods of time. These improvements, while increasing productivity, have also increased the amount of data generated each day; for example, today’s mass spectrometers easily generate several gigabytes (GB) of data every day.

Regulatory compliance

In addition, some industries are required to adhere to regulations in their laboratories; for example, US FDA 21 CFR Part 11, EU Annex 11, and SFDA. Regulatory compliance defines rules for how electronic records must be handled in the laboratory. It requires that the integrity of data is maintained, and that it is kept secure, and that it is traceable for much longer periods of time. It also requires that additional data be generated and maintained in the form of electronic signatures, audit trails, and instrument qualification documentation. These requirements have added to the overall need to manage large amounts of data in the laboratory.

Increasing data storage costs

Data generation is also increasing with advances in technology: a single-quadrupole (SQ) mass spectrometer in a high-throughput laboratory can generate approximately 250, 5 MB files daily (1.25 GB/day); a high-end, triple-quadrupole (QQQ) LC/MS system, operating at high-throughput generates about 1000, 1 MB files daily (1 GB/day); and a quadrupole time-of-flight (Q-TOF) LC/MS system generates single files up to 20 MB in size (20 GB/day). This means that the data generated by running one Q-TOF at high-throughput for one day would fill four DVDs.

Such massive quantities of data require powerful workstations for processing and faster, more expensive disk drives for storage. A storage configuration with a 500 GB or 1 TB raid array of SCSI disks on such a workstation allows laboratories to keep only a few months’ worth of data. That is not nearly enough to comply with industry and government regulations. As a result, laboratories are investing in more efficient networking infrastructure and offline storage mechanisms that allow analysts to free up workstation space by moving data over the network to offline storage.

Page 3: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

3

The Solution: Agilent OpenLAB ECM

The OpenLAB solution

OpenLAB ECM is part of the Agilent OpenLAB Suite, a well-integrated solution for a multitude of data management needs in today’s scientific laboratories. The OpenLAB Laboratory Software Suite includes the following components: OpenLAB ECM, described here; the Business Process Management (BPM) add-on for OpenLAB ECM, which brings powerful process automation capabilities to improve the efficiency and productivity of laboratory operations and workflows; the Intelligent Reporter add-on for OpenLAB ECM, a cross-sequence, cross-technique, multivendor reporting system that brings you state of the art capabilities for reporting scientific data; and OpenLAB ELN, the well-integrated and highly adaptable Electronic Lab Notebook (ELN) that helps document and organize your experiments, while providing IP protection.

OpenLAB ECM

OpenLAB Enterprise Content Manager (ECM) is a software solution that helps you make better decisions faster than ever. By providing a secure, central repository and rich content services, OpenLAB ECM allows you to create, manage, collaborate, archive, and re-use all of your business critical information with ease. OpenLAB ECM manages raw data and human readable documents of any type, from any supplier, and its simple web-based interface drastically reduces the learning curve for new users. It is a highly scalable system for the scientific world which can start as a data management solution for a single workgroup, and easily scale into a multisite, multicontinent solution for the entire enterprise.

OpenLAB ECM comes with out of the box compatibility to leading storage solutions, from vendors such as NETAPP, EMC, IBM, and HP, which are based on Windows Shares (CIFS protocol). In addition, OpenLAB ECM has a published API that can be used to easily interface with other existing systems in the laboratory, such as a laboratory information management system (LIMS) or an enterprise resource planning (ERP) system.

Page 4: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

4

Architecture

OpenLAB ECM is built on very simple architectural principles. Files are stored on one or more external NAS devices and OpenLAB ECM indexes, organizes, and keeps track of the files. In addition, the system makes use of a database for information such as folder structures, links to files in storage locations, security configurations, metadata, and indexes. ECM has three main components as shown in Figure 1:

• The ECM Web Application, which provides a user-interaction interface that displays a visualization of folders and files, and allows users to access the system using Internet Explorer.

• The File Transfer Service, which uploads files into the system and transfers them to an appropriate storage location. It also helps while retrieving files from OpenLAB ECM and while moving files between storage locations.

• The ECM Application Server, which is a file filtering service that scans through the uploaded file and filters key pieces of metadata. It extracts metadata from files based on their data type using multiple filters. OpenLAB ECM comes with filters for most popular data systems and exposes an SDK for extending its reach to other formats.

Figure 1. OpenLAB ECM high-level architecture.

Page 5: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

5

For the enterprise-level solution (Figure 3), ECM has the added ability to span components across multiple servers, increasing performance while providing redundancy at the same time. OpenLAB ECM Enterprise is capable of handling several million files.

Scalability and Availability

The OpenLAB ECM architecture is such that it can scale all the way from an all-in-one OpenLAB ECM Workgroup server to a multiserver OpenLAB ECM Enterprise system. In OpenLAB ECM Workgroup edition installation (Figure 2), all components are deployed on a single server.

Figure 2. All-in-one solution for the small laboratory.

OpenLAB ECM Workgroup(A Single Server Solution)

This architecture removes single points of failure in the system while bringing maximum availability. Figure 3, shows a single database; however, Oracle and SQL Server support clustering and replication mechanisms that provide maximum availability for the database component.

Figure 3. Scalable architecture with components spanning multiple servers.

ECM Web Servers serving one or more accounts and adding redundancy

File Transfer Servers serving one or more accounts, increasing performance and adding redundancy

ECM Application Servers serving one or more accounts, increasing performance and adding redundancy

Several Storage Locations providing highly scalable storage capacity

Compatibility with Oracle or SQL Server databases provides industry standard scalability at the database level

Page 6: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

6

Support for Distributed Systems

Another key capability in the OpenLAB ECM Enterprise Edition is its ability to optimally serve geographically distributed users (Figure 4). OpenLAB ECM can be used to create separate accounts for each location, each with local storage, web servers, file transfer servers, and application servers. Actual files are moved over the WAN only when a download or upload attempt is made across accounts; local account operations do not require the files to be moved over the WAN. Additionally, the file transfer server’s caching capabilities reduce network traffic even further. In a distributed installation, users that logon to different accounts through the same web server are still able to perform cross-account searches and retrieve files from other accounts.

Figure 4. Configuration for geographically-distributed users.

Page 7: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

7

Archiving data from Agilent MassHunter to ECM

Interactive archival

MassHunter studies and batches can be archived into OpenLAB ECM directly from MassHunter Quant. Scientists can archive completed LC/MS studies and GC/MS batches using an action menu (Figure 5).

This menu makes use of the ECM Send To tool by invoking it and passing the necessary parameters to it (Figure 6).

Figure 6. The OpenLAB ECM Send To tool.

Figure 5. Archive into ECM with an action menu in MassHunter Quant.

In ECM Send to, Quant users select the desired destination location, choose MassHunter Study Profile, and hit the Upload button. The Send To tool will then process the study/batch folder and present a preview of what it will do next.

Page 8: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

8

The preview dialog (Figure 7) shows which folders will be created and what files will be uploaded. Subfolders in the study get uploaded as SSIZIP files and files in the study folder get uploaded as is. If you do not want to upload a certain part of the study, you have the choice to deselect it from the preview dialog.

Figure 7. The OpenLAB ECM Send To tool preview dialog.

OpenLAB ECM Send To is a general ECM utility that is available on the ECM downloads page. It can be installed on any ECM client machine. This utility integrates into the Microsoft Windows Explorer Send To menu and provides a simple way to send any folder or file right from Windows Explorer. The Send To function provides several prebuilt profiles or configurations specific to various use cases and data systems. Users can create custom profiles as well. The Send To tool has two profiles specific to MassHunter: 04. Upload a MassHunter Study or Batch – Standard Profile and 05. Upload a MassHunter Data Folder – Standard Profile. The former is used from MassHunter Quant to archive studies and batches into ECM. The later can be used to archive individual data folders into ECM.

This ECM Send To tool can also be used to archive studies and batches directly from Windows Explorer (Figure 8). Following this approach you do not need to open the studies in MassHunter and you may select multiple studies at the same time and upload them into ECM.

Figure 8. The OpenLAB ECM Send To capability in Windows Explorer.

Page 9: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

9

Automated archival

Users who wish to avoid any user interaction and completely automate the archiving of data from MassHunter workstations into ECM can make use of ECM Scheduler (Figure 9). ECM Scheduler is also an ECM add-in that can be installed from the ECM downloads page and configured on MassHunter workstations.

The scheduler service on the MassHunter workstation needs to be configured to look into the MassHunter data folder. Once configured, the scheduler monitors the MassHunter folder and will upload any new files that are saved there.

Figure 9. Scheduler options for uploading MassHunter data.

Figure 10. Mapping MassHunter folders to ECM using Winmapper.

Page 10: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

10

Figure 12. ECM Scheduler polling multiple MassHunter workstations.

Figure 11. Typical schedule for sending data to ECM.

The general add-in should be used for uploading MassHunter data. Users can use the add-in to specify the path as well as the file/folder specifications (Figure 10), and a schedule when the folders should be scanned. A typical schedule for sending data to ECM is shown in Figure 11.

In an alternate configuration, the scheduler can be installed on a remote machine (Figure 12). In this mode, the scheduler polls the MassHunter workstation for new data. In this configuration, ECM software does not need to be installed on the MassHunter workstation, just the root data folder needs to be shared so that the scheduler can access the data. Note that this configuration has slightly lower performance, since there is an extra hop on the network for the MassHunter files before they reach ECM.

Page 11: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

11

Indexing and Searching Mass Hunter DataA file that is stored on a local or a network drive is typically found by browsing through folders or by searching for a file name or modified date. However, when you need to look for a study or a batch, and you do not know the filenames or other general information about the file, it can take a much longer time to find your files. OpenLAB ECM looks into the MassHunter files and extracts key scientific metadata and then tags and indexes this information for searching (Figure 13). As a result, with OpenLAB ECM users can find their data quickly and easily with sample specific information such as sample id, operator name, or compound contained in the MassHunter files. The MassHunter specific keys that ECM is capable of extracting are shown in Table 1.

Figure 13. The application server extracts metadata as files are stored into ECM.

Table 1. MassHunter filter keys

MassHunter Filter Keys

SampleAcquisition TimeBalance OverrideBarcodeCommentData FileDilutionEquilb. Time (min)Inj Vol (µl)Instrument NameLevel NameMethodAcq MethodMethod Type

Operator Name Override DA Method DA MethodPlate CodePlate PositionRack CodeRack PositionSample IDSample NameSample PositionSample TypeWt/VolRun CompletedLocked Run

SequenceSequence NameSequence Comment Field

DevicesInstrument Type

Batch ResultsBatch File NameBatch Analysis TimeAnalyst NameReporter NameReport Date/Dime

Batch SamplesQuant MethodSample Analyst Date TimeData File Name

StudyStudy NameSubmitterStart DateType

Page 12: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

12

OpenLAB ECM has an easy-to-use query tool that can be used to perform simple searches on one or more general terms and conditions (Figure 14). This query tool also allows users to perform complex queries based on specific keys (Figure 15).

Figure 14. Quick search using a compound name and operator name.

Figure 15. Advanced search query to find studies or batches based on Operator and Start Date.

Note: OpenLAB ECM is also capable of storing and indexing text and office files such Microsoft Excel reports, Word and PDF documents.

Page 13: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

13

Retrieving and Restoring DataWhen MassHunter studies or batches are uploaded into ECM, subfolders under study/batch folder are zipped up prior to the upload. These are special zip files that have the “*.SSIZIP” extension and are capable of maintaining file timestamps intact when unzipped.

OpenLAB ECM 3.4.1 SP1 has improved the Retrieve All functionality that allows users to pick the study/batch folder in ECM UI and download all included files. As part of the download, users can also have the SSIZIP files unzipped (Figure 16). This allows users to very easily restore the original data in exactly the way it was captured on the workstation prior to its upload to ECM.

Figure 16. Retrieve All with unzip functionality.

Integrate Software Across your Business EnterpriseOpenLAB ECM is a centralized data and report repository that can be integrated with your LIMS, ELN, and ERP systems. OpenLAB ECM can also be tightly integrated with many Agilent software solutions, including OpenLAB CDS (both ChemStation and EZChrom editions), ChemStation, and MSD Productivity ChemStation, making it a seamless part of existing laboratory workflows with little additional training. In addition, OpenLAB ECM integrates smoothly with software from Microsoft and Adobe: integrated drop-down menus and function keys located within OpenLAB CDS, Microsoft Word, Excel, and PowerPoint, as well as within Adobe Acrobat, can be used to retrieve and store files directly to and from OpenLAB ECM.

Page 14: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

14

Implementing OpenLAB ECM

Software requirements

Software Version

OpenLAB ECM Version 3.4.1 SP1

Agilent MassHunter Quant Version B.0.5.01

Operating System Windows 2008 R2 (64-bit), Windows 7 (64-bit)

Database SQL Server 2008 R2 or Oracle 11g

Hardware recommendations

The following configuration is recommended for a system where lesser amount of data (2 MB/minute) is generated. The example use case would be an LC/QQQ being used in a pharmaceutical environment (2 compound + ISTD). The following system would be optimal for storing 2 TB of data.

Component Description

CPU Type 1xIntel Xeon E5410 2.33GHz (4 Core)

Memory size 4 GB (PC2-6400)

HDD OS & Database2x250 GB 15K RPM SAS, RAID-1Internal Cache & File Storage3x2TB 7.2K RPM SATA, RAID-5

Network 1 GB

The following configuration is recommended for a system where a larger amount of data (25 MB/minute) is generated. Example use case would be an LC/QQQ being used for multiresidue analysis. Such an environment is assumed to generate about 7 to 8 TB data each year.

Component Description

CPU Type 1xIntel Xeon E5650 2.7GHz (6 Core)

Memory size 24GB (PC2-6400)

HDD OS & Database & Cache2x350 GB 15K RPM SAS, RAID-1 (RAID-10 preferred)External NAS Storage10 TB

Network 1 GB

In order to handle even higher usage cases, such as when LC/TOF/Q-TOF devices are used, external NAS storage will need to be increased. For example, if a lab produces 100 MB/minute (30 TB/year), the external NAS storage capacity must be at least 30 TB to store 1 years’ worth data.

Page 15: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

15

ConclusionsUsers in modern high-throughput laboratories face a number of productivity and data management issues: spending an excess amount of time searching for and retrieving current as well as archived information, needing to share information and collaborate on data; and identifying reliable, cost-effective data archiving for the long-term.

Agilent OpenLAB ECM provides a well-integrated solution for these issues. Users have the choice of pushing data into ECM or having it automatically pulled in from the MassHunter workstations. The system extracts MassHunter specific metadata and indexes them so that once the data is archived; users can quickly find the files by querying on MassHunter specific keys. OpenLAB ECM also makes it very easy for users to restore data in exactly the way it was prior to the upload to ECM.

OpenLAB ECM’s architecture in such that it can easily scale all the way from a single server workgroup solution to a multiserver, multicontinent, enterprise-wide system that allows you to create, manage, collaborate, archive, and re-use all of your laboratory data and other business critical information with greater efficiency.

Page 16: Managing Large Amounts of Mass Spectrometry Data Using ......Managing Large Amounts of . Mass Spectrometry Data Using Agilent OpenLAB ECM . Technical Overview. Abstract. Over the past

www.agilent.com/chem/openlabecmThis information is subject to change without notice.

© Agilent Technologies, Inc., 2012Printed in the USA, October 26, 20125991-1098EN