PLEASE NOTE: Registration for this workshop is via … · 2015. 9. 1. · The Collaborative Urban...

International Workshop on Science Gateways Monday 19 October 2014, Brisbane

PLEASE NOTE: Registration for this workshop is via http://doodle.com/4y8hyxe7b2rwfgxs.

There is no registration option via the eResearch Conference Registration Site Time Speaker Title The Role of Science Gateways 9 Keynote: Nancy Wilkins-Diehr (by

videoconference) Science Gateways: The Importance of Building Community

9:30 Nigel Ward, Glenn Moloney Reflections on the NeCTAR Virtual Laboratory Program

9:50 Richard O. Sinnott, Christopher Bayliss, Andrew Bromage, Gerson Galang, Yikai Gong, Philip Greenwood, Glenn Jayaputera, Davis Marques, Luca Morandini, Ghazal Nogoorani, Marcos Nino-Ruiz, Hossein Pursultani, Rosana Rabanal, Muhammad Sarwar, William Voorsluys, Ivo Widjaja

The Collaborative Urban Research Environment for Australia

10:10 Wojtek James Goscinski The Characterisation Virtual Laboratory 10:30 Morning tea Science Gateway Experiences Part 1 10:50 Richard O. Sinnott, Jemie Effendy,

Stephan Gloeckner, Anthony Stell

Beyond a Disease Registry: An Integrated Virtual Environment for Adrenal Cancer Research

11:10 Michelle Barker A Science Gateway for Malaria: Successes and Challenges

11:30 Uwe Rosebrock, Peter Oke, Roger Proter, Gary Carroll, Simon Pigot, Xiao (Ming) Fu

The Marine Virtual Laboratory – ocean modeling made easy

11:50 Aurel F. Moise, Tim Pugh, Martin Dix, Bertrand Timbal

The Australian Climate and Weather Science Virtual Laboratory (CWSLab)

12:10 Discussion 12:20 Lunch Science Gateway Experiences Part 2 1:10 Ian M. Atkinson, Jeremy Vanderwal,

Daniel Baird, Andrew Krockenberger, Nigel Bajima, Scott Mills, Nigel G. Sim

An Integrated Sensor Network and Research Data Management System for the Daintree Rainforest Observatory

1:30 Siddeswara Guru, Hoang Anh Nguyen, Shilo Banihit, Matthew Mulholland, Kim Olsson, Tim Clancy

Development of cloud-based virtual desktop environment for synthesis and analysis for ecosystem science community

1:50 David Abramson, Hoang Nguyen Workflow driven Science Gateways 2:10 Sandra Gesing Developing Science Gateways: Current

Solutions and Future Challenges 2:40 Afternoon tea The Future of Science Gateways 3:00 Panel discussion Chair: Rhys Francis 4:30 End

eResearch Australasia Conference | Melbourne – Australia | 27 -‐ 31 October -‐ 2014

Science Gateways: The Importance of Building Community Nancy Wilkins-‐Diehr

San Diego Supercomputer Centre

Science Gateways or portals provide researchers with online access to data, computational tools, and resources. They help researchers collaborate, discover, organise, analyse, visualise and engage – in short, they enable research in bold new ways. Recent research co-‐ordinated by sciencegateways.org involving nearly 5,000 members of the research community illustrates the importance of science gateways for researchers and educators, and opportunities for growth. The research measured the extent and characteristics of the gateway community (reliance on gateways, nature of existing resources) to understand useful services and support for builders and users. Sciencegateways.org provides a range of opportunities for sharing of experiences among gateway developers. While there are still funding challenges for gateways, USA’s National Science Foundation recognising of the importance of science gateways, and advances are being made by highlighting how small investments can benefit many researchers, highlighting the importance of building community.

ABOUT THE AUTHOR(S) Nancy Wilkins-‐Diehr is Associate Director at San Diego Supercomputer Centre and co-‐director of XSEDE’s Extended Collaborative Support program. She has been involved in science gateways and their interfaces to high-‐performance computing since 2005. Nancy received her Bachelor’s degree from Boston College in Mathematics and Philosophy and her Master’s degree in Aerospace Engineering from San Diego State University.

eResearch Australasia Conference | Brisbane – Australia | 19 – 23 October – 2015

Reflections on the NeCTAR Virtual Laboratory Program Nigel Ward, Glenn Moloney

1The University of Melbourne, Melbourne, Australia, [email protected] 2The University of Melbourne, Melbourne, Australia, [email protected]

The NeCTAR (National eResearch Collaboration Tools and Resources) project [1] has established eleven Virtual Laboratories that provide rich domain-‐oriented online environments connecting Australian researchers to facilities, data repositories and computational tools on a national scale. This presentation will provide an overview of the NeCTAR Virtual Laboratories, describe their usage and successes, and reflect on how NeCTAR’s project approach supported and hindered that success.

THE NECTAR VIRTUAL LABORATORIES The NeCTAR Virtual Laboratories provide rich domain-‐oriented online environments that draw together research data, models, analysis tools and workflows to support collaborative research across institutional and discipline boundaries. They include:

• Gemomics Virtual Laboratory [2], that aims to “take the information technology out of bioinformatics”, providing biologists with easy access to a suite of genomics tools and resources through a web-‐portal;

• MARVL Marine Virtual Laboratory [3], supporting forecasting and planning for marine and coastal environments by providing access to ocean observations and ocean models which prior to the lab had been difficult for users to set up.

• Virtual Geophysics Laboratory [4], providing geophysicists with easy access to geophysics data, workflows, simulations, software tools and computational infrastructure.

• Climate and Weather Science Laboratory [5], supporting use of the ACCESS weather model, reproducible climate and weather analysis workflows, visualisations, and access to climate data.

• Characterisation Virtual Laboratory [6], providing a data management and workflow environment for scientists who use advanced imaging techniques in Neuroimaging, Structural Biology, Energy Materials (X-‐ray), and Energy Materials (Atom Probe).

• All Sky Virtual Observatory [7], housing cosmological simulations and tools that allow astronomers to observe each virtual universe as if it were real, and an environment for the hosting, analysis, and exploration of data from the SkyMapper Southern-‐Sky Survey.

• Humanities Networked Infrastructure (HuNI) [8], that combines information from 30 of Australia’s most significant cultural datasets, and allows researchers to assert relationships among data.

• Industrial Ecology Lab [9], supporting modeling of environmental impacts through complex inter-‐industry supply-‐chain networks.

• Biodiversity and Climate Change Virtual Laboratory [10], supporting experiments in Species Distribution Modelling, including projection onto future climate layers, Species Trait Modelling, Biodiversity modelling, and Ensemble statistical distribution.

• Endocrine Virtual Laboratory (endoVL) [11], hosting targeted disease registries which assist researchers and clinicians to gather a large enough cohort of patients to conduct a study or clinical trial with patients suffering from the rarer endocrine conditions

• Alveo [12] that brings together data collections, analysis tools, and workflows in a common environment to allow human communication scientists to study speech, language, text, and music on a large scale.

REFLECTIONS ON SUCCESS From a project management perspective, the NeCTAR Virtual Laboratory projects carried significant risk: they all involved cross-‐institutional collaboration, and all involved integration of infrastructure controlled by other organisations. Despite these risks, they are all successfully operating infrastructure, are all reporting strong uptake and utilisation by their respective research communities, and their governance committees are asserting delivery against research stakeholder expectations. How did NeCTAR and these sub-‐projects successfully deliver research value in spite of their inherit risks? Where did it go wrong?

What went well The NeCTAR Request for Proposals process aimed to maximise the long-‐term research benefits of the infrastructure by favouring proposals that:

• addressed a real research needs within an existing research community;


• involved collaborative partnerships between researchers, infrastructure developers and infrastructure operators;

• leveraged existing national investments in computation, storage, data, trust, networks, and instruments; • included co-‐investment to cover operational costs beyond the development phase.

o all projects committed to operate infrastructure for at least six months, although some have committed to operate for longer, and many have since successfully sourced additional investments to sustain their infrastructure beyond these initial commitments.

All NeCTAR sub-‐project governance groups contain researchers responsible for ensuring the projects deliver value to their target research domain. These research participants are also expected to act as champions and advocates for the sub-‐project within their research domains, supporting uptake and utilisation by the broader research community. The most successful Virtual Laboratory projects involved strong partnerships between researchers and software developers. The researchers contributed their vision, domain knowledge and bespoke tools to the project, while the software developers helped make the tools more robust and widely available, often moving them from stand-‐alone tools that run on a researcher’s desktop and into online research-‐Software-‐as-‐a-‐Service offerings. Given the novel nature of Software-‐as-‐a-‐Service as a research infrastructure delivery model, NeCTAR actively fostered knowledge exchange across the software development supporting the Virtual Laboratories. NeCTAR required its sub-‐projects to regularly report on progress against milestones, expenditure, co-‐investment, risks, measures of uptake, and communication activities. Having projects regularly report on issues and risks to both NeCTAR and at sub-‐project governance meetings was useful in setting NeCTAR’s own risk management and coordination agendas. NeCTAR supported a change control process that allowed projects to respond to opportunities for delivering research value not originally envisaged in their proposals. These processes ensured ongoing changes in the sub-‐project implementation are fully understood and agreed between the project owners, the research community, and NeCTAR. NeCTAR required the involvement of research end-‐users in Acceptance Testing of agreed sub-‐project deliverables. Many projects found that writing acceptance criteria ahead of time and involving researchers in sign-‐off on delivery useful in managing stakeholder expectations.

What didn’t go so well Despite the success of the NeCTAR Virtual Laboratories, there were of course instances where projects were not able to meet research community expectations.

• All of the NeCTAR Virtual Laboratories delivered late. This is perhaps not surprising given they involved complex integrations and short timelines for delivery (2 years for the stage 1 projects, 18 months for stage 2).

• Some projects failed to continuously deliver infrastructure during their software development phase, which disenfranchised some of their research stakeholders. Delays in delivery or stability of 3rd party research infrastructure needed by the Virtual Laboratories further compounded these timeline and engagement pressures.

• Unfortunately, the relationship between the researchers and software developers collapsed in a few projects, leading to difficult governance conversations about changing collaborators mid-‐project, serious delays to project delivery and reduction in scope.

• While all Virtual Laboratories used the NeCTAR change control process, they universally found it laborious, expressing a desire for a change process that involves governance groups rather than lawyers.

CONCLUSION Despite its inherent risks, the NeCTAR Virtual Laboratory program has been an outstanding success. All of the NeCTAR Virtual Laboratories are operating rich domain-‐oriented online environments that draw together research data, models, analysis tools and workflows to support collaborative research across institutional and discipline boundaries. All are reporting strong uptake and utilisation by their respective research communities, and many have found extra funds to continue to operate beyond their initial commitments. We hope that this brief overview of the NeCTAR Virtual Laboratory Program provides inspiration for others to pursue the creation and operation of similar domain-‐oriented online research environments.


REFERENCES 1. National eResearch Collaboration Tools and Resources project. Available from http://nectar.org.au/ , accessed 8

June 2015. 2. Genomics Virtual Laboratory. Available from https://genome.edu.au/, accessed 8 June 2015. 3. MARVL Marine Virtual Laboratory. Available from https://portal.marvl.org.au/, accessed 8 June 2015. 4. Virtual Geophysics Laboratory. Available from http://vgl.auscope.org/, accessed 8 June 2015. 5. Climate and Weather Science Laboratory. Available from http://cwslab.nci.org.au/, accessed 8 June 2015. 6. Characterisation Virtual Laboratory. Available from https://www.massive.org.au/cvl, accessed 8 June 2015. 7. All Sky Virtual Observatory. Available from http://www.asvo.org.au/, accessed 8 June 2015. 8. Humanities Networked Infrastructure. Available from https://huni.net.au/, accessed 8 June 2015. 9. Industrial Ecology Lab. Available from http://www.isa.org.usyd.edu.au/ielab/ielab.shtml, accessed 8 June 2015. 10. Biodiversity and Climate Change Virtual Laboratory. Available from http://www.bccvl.org.au/, accessed 8 June

2015. 11. Endocrine Virtual Laboratory. Available at https://endovl.org.au/, accessed 8 June 2015. 12. Alveo. Available at http://alveo.edu.au/, accessed 8 June 2015.

ABOUT THE AUTHOR(S) Nigel Ward is Deputy Director (Software Infrastructure) at the NeCTAR (National eResearch Collaboration Tools and Resources) project, where he primarily co-‐ordinates projects developing cloud-‐based software tools for the Australian research community. Nigel is based at the University of Queensland, and before joining NeCTAR managed projects within the UQ ITEE eResearch Group aimed at improving research capability through the provision of IT infrastructure. In previous roles he worked on interoperability and standards for research and learning technologies in the Higher Education sector. Glenn Moloney is Director of the NeCTAR (National eResearch Collaboration Tools and Resources) project.


The Collaborative Urban Research Environment for Australia Richard O. Sinnott, Christopher Bayliss, Andrew Bromage, Gerson Galang, Yikai Gong, Philip Greenwood,

Glenn Jayaputera, Davis Marques, Luca Morandini, Ghazal Nogoorani, Marcos Nino-‐Ruiz, Hossein Pursultani, Rosana Rabanal, Muhammad Sarwar, William Voorsluys, Ivo Widjaja

(and the AURIN Network) Department of Computing and Information Systems

University of Melbourne Melbourne, Australia

[email protected]

Presenter Name: Richard Sinnott

ABSTRACT

The federally funded Australian Urban Research Infrastructure Network (AURIN) project (www.aurin.org.au) began in July 2010. AURIN was tasked with developing a secure, web-‐based virtual environment and underpinning e-‐Infrastructure offering seamless, secure access to diverse, distributed and extremely heterogeneous data sets from numerous agencies across Australia with an extensive portfolio of targeted analytical and visualization tools. This is being provisioned for Australia-‐wide urban and built environment researchers – itself a highly heterogeneous collection of research communities with diverse demands, through a unified urban researcher and provider-‐driven collaboration environment. The AURIN platform has extensive features that have been incorporated to support the research community in gaining access to a wide array of distributed data sets as well as for data providers to define their own access and usage demands on their data sets including access to highly sensitive data sets. This paper describes these demands and how the e-‐Infrastructure has been designed and implemented to accommodate this diversity of requirements, both from the user/researcher perspective and from the data provider perspective. The utility of the e-‐Infrastructure is demonstrated through a range of scenarios reflecting the inter-‐disciplinary urban research now possible with specific focus on hitherto challenging (impossible!) scenarios that demand utmost security in accessing sensitive (unit level) data sets and commercially sensitive data.

INTRODUCTION The Australian Urban Research Infrastructure Network (AURIN) project (www.aurin.org.au) is a major national project across Australia that commenced formally in July 2010. AURIN initially received $20 million of funding from the Australian Government Department of Industry for the ‘establishment of facilities to enhance the understanding of urban resource use and management’ [1]. In 2013, the project received a further $4m to extend (harden!) the facilities. In particular, the AURIN project was tasked with providing urban and built environment researchers with a state of the art research infrastructure – an e-‐Infrastructure -‐ offering seamless and secure access to data and tools for interrogating a wide array of distributed data sets from diverse agencies, to support a portfolio of research activities reflecting the diversity of the urban and built environment research agenda. This is being provisioned through a unified urban collaborative environment offering a complete lab-‐in-‐a-‐browser experience. Key to AURIN was that it provides access to data from the definitive urban data providers across Australia. At present AURIN makes access to over 1800 data sets from 70 major agencies available. This includes organisations such as the Australian Bureau of Statistics, VicRoads, VicHealth amongst many others. The basic early functionality of the AURIN platform was described in [2,3] and the way in which it has been designed and developed utilizing agile technologies described in [4]. The use of the platform is described in [5-‐7]. In the last year further work has been undertaken in extending the platform to give access to more data sets and to scale the systems for increased numbers of research and their workloads/demands.

Many of the AURIN data sets are highly sensitive including both commercially and because they relate to individuals with explicit confidentiality requirements. To tackle this, AURIN has developed a flexible and fine-‐grained security model through extending the Australian Access Federation authentication with more advanced authorization capabilities. This paper describes these solutions and how they can be used to restrict access to data sets. Furthermore, many of the data sets demand much finer-‐grained access and usage scenarios to be supported and are expected to meet the explicit demands of data providers. We describe these solutions and how unit level data can now be utilized by exploiting advanced privacy-‐driven geospatial data aggregation techniques.

Key to the success of AURIN or indeed any major research infrastructure is the uptake and adoption by the research community that it is intended for. Since the project started, the AURIN platform has been accessed and used over 35,000 times with increasing numbers of users coming from non-‐academic domains including government and industry. Figure 1 shows the access and usage statistics (data provided by the Australian Access Federation) since the release of the Beta-‐5 version of the platform in September 2014.


Figure 1: AURIN Access and Usage Statistics (September 2014 -‐ May 2015)

This talk will cover all of these aspects and plans for the future including new research domains that will build upon the AURIN platform.

REFERENCES 1. AURIN Final Project Plan, http://aurin.org.au/resources/final-‐project-‐plan 2. R.O. Sinnott, G. Galang, M. Tomko, R. Stimson, Towards an e-‐Infrastructure for Urban Research Across Australia,

IEEE e-‐Science Conference, Stockholm, Sweden, December 2011. 3. R.O. Sinnott, C. Bayliss, G. Galang, P. Greenwood, G. Koetsier, D. Mannix, L. Morandini, M. Nino-‐Ruiz, C. Pettit, M.

Tomko, M. Sarwar, R. Stimson, W. Voorsluys, I. Widjaja, A Data-‐driven Urban Research Environment for Australia, IEEE e-‐Science Conference, Chicago USA, October 2012.

4. R.O. Sinnott, C. Bayliss, L. Morandini, M. Tomko, Tools and Processes to Support the Development of a National Platform for Urban Research: Lessons (Being) Learnt from the AURIN Project, 11th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2013), Adelaide, South Australia, January 2013.

5. C. Pettit, et al, Building an e-‐infrastructure to support urban and built environment research in Australia: a lens-‐centric view, Surveying & Spatial Sciences Conference 2013, Canberra, Australia, April 2013.

6. R.O. Sinnott, C. Bayliss, A. Bromage, G. Galang, G. Grazioli, P. Greenwood, G. Macauley, L. Morandini, M. Nino-‐Ruiz, C. Pettit, M. Tomko, M. Sarwar, R. Stimson, W. Voorsluys, I. Widjaja, The Australian Urban Research Gateway, Journal of Concurrency and Computation: Practice and Experience, April 2014, doi: 10.1002/cpe.3282.

7. C. Pettit, R. Stimson, J. Barton, X. Goldie, R.O. Sinnott, T. Kvan, The Australian Urban Intelligence Network supporting Smart Cities, CUPUM 2015 Conference book on Smart Cities and Planning Support Systems, eds: S. Geertman, J. Stillwell, J. Ferreira, R. Goodspeed, February 2015.

ABOUT THE AUTHORS Professor Richard O. Sinnott is the Director of eResearch at the University of Melbourne and Chair of Applied Computing Systems. In these roles he is responsible for all aspects of eResearch (research-‐oriented IT development) at the University. He has been lead software engineer/architect on an extensive portfolio of national and international projects, with specific focus on those research domains requiring finer-‐grained access control (security). He is technical lead for the AURIN project. Christopher Bayliss is a software engineer within AURIN with focus on security; Andrew Bromage is a software (data) engineer within AURIN; Gerson Galang is a software engineer within AURIN with focus on data clients; Yikai Gong is a PhD candidate at the University of Melbourne; Philip Greenwood is a software engineer within AURIN with focus on workflow tools; Glenn Jayaputera is the implementation project manager for the AURIN technical team; Davis Marques is a software engineer within AURIN with focus on the portal user interface; Luca Morandini is the AURIN data architect; Ghazal Nogoorani is a software engineer within AURIN with focus on the portal user interface; Marcos Nino-‐Ruiz is a software engineer within AURIN with focus on data clients; Hossein Pursultani is a software engineer within AURIN with focus on the supporting infrastructure and the continuous build environment; Rosana Rabanal is a software engineer within AURIN with focus on the supporting infrastructure and the continuous build environment; Muhammad Sarwar is a software engineer within AURIN with focus on the supporting middleware/business logic; William Voorsluys is a software engineer within AURIN with focus on the workflow environment, Ivo Widjaja is a software engineer within AURIN with focus on the portal user interface.

eResearch Australasia Conference | Brisbane – Australia | 19 -‐ 23 October -‐ 2015

The Characterisation Virtual Laboratory Wojtek James Goscinski

Monash University, [email protected]

DEMONSTRATION ABSTRACT The Characterisation Virtual Laboratory (CVL – www.massive.org.au/cvl) is a collaborative NeCTAR-‐funded project to develop online environments for researchers using advanced imaging techniques, and demonstrate the impact of connecting national instruments with computing and data storage infrastructure. The CVL project has three major goals:

1. To integrate Australia’s imaging equipment with specialised HPC and cloud capabilities.

More than 450 registered researchers have used and benefited from the technology that has been developed by the CVL project, providing them with an easier mechanism to capture instrument data and process that data on centralised cloud and HPC infrastructure, including MASSIVE and NCI.

2. To provide scientists with a common cloud-‐based environment for analysis and collaboration.

The CVL have been deployed across NeCTAR federated clouds at the University of Melbourne, Monash University, and QCIF. CVL technology has been used to provide easier access to HPC facilities at MASSIVE, NCI and Central Queensland University.

3. To produce four exemplar platforms, called Workbenches, for multi-‐modal or large-‐scale imaging in Neuroimaging, Structural Biology, Energy Materials (X-‐ray), and Energy Materials (Atom Probe).

The CVL environment now contains 103 tools for specialised data analysis and visualisation in Workbenches. Over 20 imaging instruments have been integrated so that data automatically flows into the cloud for management and analysis.

The technology developed under the CVL provides simple access to centralized processing, analysis and visualisation software, and HPC infrastructure, for newcomers and inexperienced HPC users.

The CVL is a NeCTAR-‐funded collaboration between Monash University, Australian Microscopy & Microanalysis Research Facility (AMMRF), Australian Nuclear Science and Technology Organisation (ANSTO), Australian Synchrotron, National Imaging Facility (NIF), Australian National University, The University of Sydney, and The University of Queensland.


Beyond a Disease Registry: An Integrated Virtual Environment for Adrenal Cancer Research

Richard O. Sinnott, Jemie Effendy, Stephan Gloeckner, Anthony Stell Department of Computing and Information Systems

University of Melbourne Melbourne, Australia

[email protected]

Presenter Name: Richard Sinnott

ABSTRACT Many biomedical research collaborations are focused on establishment of web-‐based databases that capture phenotypic and in some cases genotypic information, targeted to specific diseases – so called disease registries. Such resources are often used for clinical matchmaking and allow information on patients and patient disorders to be shared by clinicians with wider biomedical research communities outside of a given hospital setting, and potentially with patients and/or patient advisory groups. Whilst addressing aspects of clinical collaborations through making (targeted) biomedical accessible -‐ such registries are really only a starting point for what can be achieved to support biomedical research collaborations. In particular, registries should ideally be augmented with a portfolio of additional service offerings that facilitate secure research collaborations: bio-‐banking and bio-‐sample data tracking capabilities; support for feasibility analysis on clinical trials and studies; offer seamless data transfer to/from clinical trials; provide search and analytical capabilities in a user-‐driven research environment. Such a feature rich, Internet-‐based virtual research environment (VRE) has been established as part of the European Union funded ENS@T-‐CANCER (www.ensat-‐cancer.eu) project that has a particular focus on supporting research into four primary types of adrenal tumours. This paper provides an overview of the ENS@T-‐CANCER VRE, outlining its core capabilities and how it has galvanized previously and largely fragmented, country-‐specific database and registry efforts. The ENSAT-‐CANCER VRE is now globally adopted with 70 major cancer centres around the world using the VRE and over 20 major international multi-‐centre clinical trials now fully supported and integrated into the platform. This VRE provides the basis for the recently funded Horizon2020 ENSAT-‐HT program. This talk describes the platform and the lessons being learned in its development.

INTRODUCTION A ubiquitous problem that exists in undertaking biomedical research is access to clinical/biomedical data, and especially access to and sharing of data across organisational and national boundaries. These challenges are multi-‐faceted and comprise information governance, ethics and privacy challenges; human/social and organisational factors, as well as a variety of technical implementation issues that must be overcome. On the latter challenge, a body of work on the realisation of solutions for secure, web-‐based biomedical data sharing now exists [1-‐3]. For many clinical/biomedical collaborations this is through support for targeted disease registries [4,5]. Such systems are used to aggregate clinical/phenotypic data that can be used to coalesce the common understanding and treatment/management of patients with those particular diseases at a clinical level, and provide a model for potentially sharing of physical biosamples of patients with a specific phenotype -‐ subject to ethics and agreement of clinicians, patients and indeed the organisations involved. These biosamples can then be used for a range of bioinformatics analysis and –omics research.

One prime example of a disease registry is the international disorders of sex development registry (I-‐DSD www.i-‐dsd.org). The I-‐DSD registry includes extensive phenotypic information on over 1200 patients with rare disorders of sex development. This system has been adopted on a global scale and captures best practice in disease registries. I-‐DSD provides the critical mass of ‘standardised’ patient data that has, for the first time, allowed research into disorders of sex development to be undertaken in a systematic and statistically relevant manner through access to large-‐scale phenotypic data sets covering a spectrum of DSD manifestations. However whilst essential for inter-‐organisational research collaborations, such a web-‐based disease focused registry represents really a starting point for what can be achieved as a collaboration platform.

The ENS@T-‐CANCER project (www.ensat-‐cancer.eu) has taken the basic idea of a web-‐based disease registry to a new technological level to support a complete virtual research environment (VRE) for integrated biomedical research into adrenal tumours. The primary focus of this paper is to describe the core features of the ENS@T-‐CANCER VRE and illustrate the way in which they collectively provide a step change in biomedical research capabilities for the international adrenal tumour research community. Some of these features were originally presented in [6], however


the platform has evolved to include a range of more advanced features to support biomedical research. These include VRE data access and usage tracking and their temporal visualization; logging analysis for access and usage statistics exploiting Cloud-‐based big data technologies; mobile applications; as well as support for a range of biosample labeling and tracking capabilities. This talk will cover these capabilities and how they support the scientific research.

The current status of the ENSAT-‐CANCER VRE is shown in Figure 1.

Figure 1: ENSAT-‐CANCER Status (May 2015)

REFERENCES 1. Ahmed, S.F., Rodie, M., Jiang, J., Sinnott, R.O., The European DSD Registry – A Virtual Research Environment,

International Journal on Sexual Development, Special issue on Disorders of Sex Development, “New concepts for human disorders of sex development”, Sex Dev. 2010; 4:192-‐198 (http://DOI:10.1159/000313434).

2. Sinnott, R.O., Stell, A.J., Jiang, J., Classifying Architectural Data Sharing Models for e-‐Health Collaborations, Proceedings of the International HealthGrid Conference, Bristol, UK, June 2011.

3. Stell, A.J., Sinnott, R.O., Jiang, J., Donald, R., Chambers, I., Citerio, G., Enblad, P., Gregson, B., Howells, T., Kiening, K., Nilsson, P., Ragauskas, A., Sahuquillo, J., Piper, I., Federating Distributed Clinical Data for the Prediction of Adverse Hypotensive Events Journal of the Philosophical Transactions of the Royal Society A, July 2009, 367:2679-‐2690.

4. Bellgard, M. I., Macgregor, A., Janon, F., Harvey, A., O'Leary, P., Hunter, A., & Dawkins, H. A modular approach to disease registry design: Successful adoption of an internet-‐based rare disease registry. Human Mutation.

5. Kleophas, W., Bieber, B., Robinson, B. M., Duttlinger, J., Fliser, D., Lonnemann, G., and Reichel, H. (2012). Implementation and first results of a German Chronic Kidney Disease Registry. Clinical nephrology.

6. Stell, A.J., Sinnott, R.O., Jiang. J., Enabling Secure, Distributed Collaborations for Adrenal Tumor Research, Proceedings of the International HealthGrid conference, Paris, France, June 2010.

ABOUT THE AUTHORS Professor Richard O. Sinnott is the Director of eResearch at the University of Melbourne and Chair of Applied Computing Systems. In these roles he is responsible for all aspects of eResearch (research-‐oriented IT development) at the University. He has been lead software engineer/architect on an extensive portfolio of national and international projects, with specific focus on those research domains requiring finer-‐grained access control (security). Stephan Gloeckner is a PhD candidate at the University of Melbourne. His research is in auditing and quality assurance of data collected in biomedical research settings. His PhD is joint with the University of Birmingham (UK) in Medicine and the University of Melbourne (Computing). Jemie Effendy is a software engineer within the Melbourne eResearch Group at the University of Melbourne. His research focus is on technologies for big data processing with specific focus on their application to security-‐oriented data domains. Anthony Stell is a senior software engineer working within the Melbourne eResearch Group at the University of Melbourne. He was/is the primary software developer for the ENSAT-‐CANCER VRE. Previously he has worked on a range of other clinical research collaboration platforms including clinical trials systems in the brain trauma domain and disorders of sex development amongst many others.


A Science Gateway for Malaria: Successes and challenges Michelle Barker

James Cook University, Cairns, Australia, [email protected] Abstract for lightning talk. DESCRIPTION The Vector-‐Borne Disease Network (www.vecnet.org) is a science gateway for the malaria community. VecNet has received over USD $7 million funding over 4 years, most of which has been spent on development. VecNet provides free online access to simulation modelling tools and data to increase the use of modelling in policy and funding decisions. Simulation models use existing data to predict malaria intervention outcomes. Their use within a project can reduce costs and save time by demonstrating the likely impacts of interventions before resources are committed. VecNet also provides user-‐friendly interfaces to data and information storage, and these can be linked to the simulation modelling interfaces. The VecNet tools offer accessible, transparent and comprehensible information and simulation modelling programs, to allow users who may not usually use models, to ask "what-‐if" questions to explore combinations of vector and drug based interventions to determine the optimal mix for use in specific geographic areas. This talk will offer insights from the VecNet experience into success factors in building science gateways, identifying enablers and challenges in relation to the common tensions experience by science gateways in the work of Wilkins-‐Diehr and Lawrence (2010):

1. Funding: Development vs. Operations 2. Project Goals: Research vs. Production 3. Tools: Standardised vs. Open-‐Source vs. Custom 4. Community Engagement: Delivering What the Users Want 5. Rewards & Recognition: Traditional vs. New

Both Community Engagement and Rewards and recognition are key focuses of the program’s current phase of development. There will be discussion on the approaches being utilizsd to engage with different parts of the community and encourage usage, particularly the role of early adopters. Nancy Wilkins-‐Diehr and Katherine A Lawrence. 2010 Opening science gateways to future success: The challenges of gateway sustainability. Gateway Computing Environments Workshop, IEEE. http://users.sdsc.edu/~wilkinsn/GCE10_Wilkins-‐Diehr_Lawrence.pdf


The Marine Virtual Laboratory – ocean modelling made easy. Lightning Talk Uwe Rosebrock

CSIRO Ocean & Atmosphere Flagship, Hobart, [email protected] DESCRIPTION Ocean models are routinely applied to predict the past, present, and future state of the ocean to better understand ocean dynamics. Many applications require high-‐resolution ocean models to produce detailed analysis. For these application it is common to configure and run regional models, where the domain extends only a few hundred kilometers or less. Historically, regional ocean models are time-‐consuming to configure, requiring an expert modeller to configure a grid – carefully setting the spatial extent of the model domain and the model resolution. The modeler then gathers data from multiple sources for the initial set up and for input during the simulation time, obtains observation data for validation or data assimilation, and finally manipulates the variety of data to match the requirements of the model code he or she applies. The Australian Marine Virtual Laboratory (MARVL) is a new development in modelling frameworks for researchers in Australia. MARVL makes use of the Java-‐based control system named TRIKE, which has been developed by the CSIRO for some years. It allows a non-‐specialist modeller to automate many of the modelling preparation steps needed to bring the researcher faster to the stage of simulation and analysis. Currently MARVL is configured for several different hydrodynamic models (MOM4, ROMS, SHOC) and wave models (WaveWatch3, SWAN) and offers initial and boundary conditions from a variety of regional or global ocean and atmospheric models. It furthermore provides bathymetry and masking of the domain where needed together with the observations available through IMOS. MARVL has been applied in a number of case studies around Australia ranging in scale from locally confined estuaries to the Tasman Sea between Australia and New Zealand. The underlying infrastructure will be described at a technical level, challenges and opportunities high-‐lighted and an example of its use will be given.


The Australian Climate and Weather Science Virtual Laboratory (CWSLab)

Aurel F. Moise1, Tim Pugh1, Martin Dix2, Bertrand Timbal1 1Bureau of Meteorology Research and Development, Melbourne, Australia, [email protected]

2CSIRO Ocean and Atmosphere Flagship, Aspendale, Australia, [email protected]

Aurel Moise

ABSTRACT

The presentation will provide an overview of the 2nd

Phase of the NeCTAR-funded Australian Climate and Weather Science Laboratory (CWSLab) project to build research infrastructure, services, tools, and repositories for the climate and weather community and the Centre for Australian Weather and Climate Research (CAWCR) at the National Computational Infrastructure (NCI) petascale facility at the Australian National University. The CWSLab is leveraging and integrating existing infrastructure to support an intrinsically complex Earth-System Simulator that allows scientists to simulate and analyze climate and weather phenomena.

During the second Phase the project developed a Virtual Laboratory and Web Portal called “The Climate and Weather Science Laboratory”. This laboratory utilises and integrates the Australian Community Climate Earth-System Simulator (ACCESS) infrastructure to support coupled and uncoupled model simulations of climate and weather phenomena.

Through the proposed integration and enhancements of existing community software such as ACCESS, the laboratory produces an integrated facility for climate and weather process studies in areas such as weather prediction and extreme events, atmosphere-ocean-land-ice interactions, climate variability and change, greenhouse gases, water cycles, and carbon cycles. Additionally, the laboratory provides a facility for the analysis of climate simulations, which will assist in the assessments of Australian climate change and contribute to the future assessment reports of the United Nations Intergovernmental Panel on Climate Change (IPCC).

The virtual laboratory is a community project to establish an integrated national facility for research in climate and weather sciences that complements and leverages the Australian Super Science initiative investments in computational and storage infrastructure at the ANU/NCI facility, and the strong collaboration in place by the Australian National University (nci.org.au), Australian Bureau of Meteorology (www.bom.gov.au), the CSIRO (www.csiro.au/cmar), the Collaboration for Australian Weather and Climate Research (www.cawcr.gov.au), and the Australian Research Council’s Centre of Excellence for Climate System Science (www.climatescience.org.au). (300 word)

THE SECOND PHASE OF THE CWSLab

Through the proposed integration and enhancements of existing community software such as ACCESS and Vistrails, the second phase of the CWSLab will produce an integrated facility for climate and weather process studies in areas such as weather prediction and extreme events, atmosphere-ocean-landice interactions, climate variability and change, greenhouse gases, water cycles, and carbon cycles. In order to showcase this capability, two prototype services have been created: the ACCESS Climate Model Metrics Tool and the Climate Data Analysis Tool. (78 words)

THE ACCESS CLIMATE MODEL METRICS TOOL This tool provides products around enhancements to the existing ACCESS CLIMATE MODEL SIMULATION service. A new package of climate model evaluation metrics from the World Climate Research Program (WCRP) is being integrated into the ACCESS modelling environment for routine verification of climate model performance. Furthermore, the runtime


environment is being enhanced to capture provenance information and perform model initialisation data processing from meteorological and reanalysis data collections in the RDSI data storage system. Finally, standard ACCESS Mdoel experiments are being developed with associated documentation and testing for Transpose AMIP modelling and a high-resolution regional atmospheric model to be used by the science community. “Transpose AMIP modelling” refers to a capability to initialise and run climate models in a manner similar to that for numerical weather prediction, according to certain protocols which facilitate comparison to the observed, resulting in an enhanced model evaluation capability. (143 words)

THE CLIMATE DATA ANLYSIS TOOL This tool provides enhancements to VisTrails (an open-source system that supports data exploration and visualization) to build a workflow management system to support the analysis of climate model output data (in this case statistical downscaling of climate models). The statistical downscaling method (BoM-SDM) is based on weather analogues developed by the Bureau of Meteorology and has been used extensively in recent Australian scientific climate projects (e.g. ACCSP, NRM, SEACI, VicCI). Currently the BoM-SDM is using climate experiment from the most recent global climate model simulations (called CMIP5) and other data sources published internationally or derived locally. The VisTrails package and workflow allows the scientific community to easily access existing statistical downscaling results from CMIP5 climate simulations. In the future the package will support additional downscaling approaches thus enabling and building support for comparative assessments of downscaling methods and data products. The workflow provides runtime support for the execution of the downscaling methods based on the user’s selection of variables, climate model and experiment, and the temporal range for a given geographical location. The end product is statistically downscaled information from CMIP5 climate models tailored to the user needs. Several additional options are provided depending on user interest such as: future greenhouse gas emission scenarios, number of climate models to be included in the analysis, and a choice on the time slice of interest in the future. This constitutes the first level of service and delivers data quickly by relying on pre-computed meteorological analogues, a second level will allow users to changes parameters within the SDM itself to generate new outputs. Future development of the VisTrails downscaling package will allow the generation of downscaled results using other downscaling approaches. (278 words)


ABOUT THE AUTHOR(S)

Dr. Aurel Moise Aurel Moise is a Senior Research Scientist at the Bureau of Meteorology Research and Development section. As a climate scientist, he has lead several projects around the analysis of global and regional climate simulations as part of the Australian Climate Change Science Program. Since early 2015 he is the Project Leader of the NeCTAR/CWSLab project which is a new community project to establish an integrated national facility for research in climate and weather sciences that complements and leverages the Australian Super Science initiative investments in computational and storage infrastructure at the ANU/NCI facility, and the strong collaboration in place by the Australian National University, Australian Bureau of Meteorology, the CSIRO, the Collaboration for Australian Weather and Climate Research, and the Australian Research Council’s Centre of Excellence for Climate System Science. Tim Pugh Tim is a scientific programmer and Senior HPC Scientist with the Bureau of Meteorology specializing in computational fluid dynamics, parallel computing and application development, and internet-based information technology and data services. He is currently the Supercomputer Programme Director for the new Supercomputing Facility at the Bureau of Meteorology. Dr. Martin Dix Martin Dix is the leader of the ACCESS Climate Model Systems team within the Earth System Modelling Program of CSIRO. Prior to this he worked on the development and analysis of the CSIRO global climate models and on the regional model (CCAM) development. His research interest range from climate sensitivity to computational techniques. Dr. Dix is

the Leader for the work creating the ACCESS CLIMATE MODEL METRICS TOOL. Dr. Bertrand Timbal Dr Bertrand Timbal has worked in climate change research area since the early 1990s. He is currently a Research Scientist within the Bureau of Meteorology Research and Development section. His research aims to develop techniques to translate climate change information from climate models to smaller scales in order to provide useful information for climate changes impact studies as well as detection and attribution of on-going observed changes. Dr Timbal has published about 100 peer reviewed publicly available papers, many focusing on understanding rainfall variability on all timescales across South Eastern Australia. He has been involved as theme leader during the South eastern Australia Climate Initiative (SEACI) and now is a project leader in the Victoria Climate Initiative (VicCI). Dr.

Timbal is the Leader for the work creating the CLIMATE DATA ANLYSIS TOOL.


An Integrated Sensor Network and Research Data Management System for the Daintree Rainforest Observatory

Ian M. Atkinson,1 Jeremy Vanderwal,1 Daniel Baird,1 Andrew Krockenberger,1 Nigel Bajima,1 Scott Mills,1 Nigel G. Sim2

1 James Cook University, Townsville, Australia, [email protected] 2 CoastalCOMS Ltd., Gold Coast, Australia, [email protected]

Ian Atkinson

OVERVIEW A fully integrated environmental monitoring network has been established in the Daintree Rainforest that combines over 400 fixed and moveable sensors, high and low data rates with full metadata description, linkage to Research Data Australia, storage in the RDS infrastructure and a range of portal interfaces for use by field technicians, researchers and the general public. This environment will also be adapted for an island research station.

INTRODUCTION The Daintree Rainforest Observatory, or DRO, is a premier ecological monitoring site located in lowland tropical rainforest around 140km North of the city of Cairns in North Queensland, Australia. The Daintree rainforest has the highest biodiversity anywhere in Australia and offers access to unique Gondwanan flora. In 1988 the rainforests in which the DRO is situated were declared the Wet Tropics World Heritage Area. This is one of the few areas in the world where rainforest directly meets shorelines with coral reef, and is unique in having two World Heritage Areas sit side by side. The DRO site is flanked to the west by coastal ranges rising to more than 1400m (4600ft) and by the Coral Sea to the east. The DRO is a 20 Ha site and started life as the Australian Canopy Crane, constructed in 1998 and has been collecting long-‐term observational records as well as conducting explicit research experiments since that time.[1] A AU$10M expansion to the observatory from 2013-‐2015 enhanced the laboratory facilities, eco-‐sensing technologies and amenities to the DRO and it is now a world-‐class research and education facility. However, the remote location and terrain mean, that for now, only low-‐speed, low-‐quota Internet access is available from the DRO site meaning that creative and dedicated cyberinfrastructure is required for this site. In order to support the long-‐term monitoring and specific experiments conducted at the DRO a range of eco-‐sensing instruments have been deployed across the DRO site with specific emphasis on the areas beneath the coverage of the 47m canopy crane that sweeps an area of ~1 Ha. In order to provide access to the instruments, sensors and existing data collections of the DRO as well as provide simple infield maintainability, local and remote data mirroring and data collection description and discovery from Research Data Australia (RDA; http://www.rda.edu.au) an end-‐to-‐end data management environment has been developed – the DRO-‐DMS. This same core data system also facilitates access by school groups and support public education and outreach activities.

DRO SENSOR NETWORKS A wide variety of sensors have been deployed on the DRO site and these are being constantly extended and upgraded. The current deployments are summarized in Table 1 below and range from very low data rate devices though to streaming HD cameras. While the main study site is remote from the main laboratory we have interconnected the major locations via armored fiber-‐optic cable to ensure local Wi-‐Fi and XBee wireless networks remain uncongested. The low-‐light of the rainforest floor has resulted in the use of innovative power solutions as well as the development of very-‐low power electronics to where possible conserve power. As well as automated sensor collection manual data collection is supported by bespoke mobile data input systems that synchronize recorded information directly into the core DRO-‐Data Management System. Currently there are ~400 sensors in active duty.

Table 1: Sensors Currently Deployed on DRO Site

Sensor Type Number Data Type Frequency Data Volume p.a.

HD camera 6 HD Video/Stills 2min video/hr ~4000GB


Still image/5min

Tree Dendrometer (ICT Systems) 60

Numeric (uncalibrated) 15 min 2GB

Tree Sapflow (ICT Systems)

60 Numeric (uncalibrated)

15 min 2GB

Temp./Rel Humidity 240 Numeric 5 min 100Gb

Soil Moisture pits 10 Numeric 15 min 1GB

Meteorological 3 Numeric 5 min/200Hz 500GB

Leaf traps 20 Manual count Weekly 1GB

In addition, sensors streams related to isotopic stream water composition, light intensity, high frequency flux networks, building performance and LIMS data are being integrated to the core platform. Operational maintenance of sensors in a rainforest is a complex and demanding activity and it is essential that the data ingestion system can either transparently account for changing/recalibration of sensors or that calibration, additions and relocation of devices can be easily configured, without error in the field.

DRO-‐DATA MANAGEMENT SYSTEM The DRO-‐DMS has at its core an instance of the CoastalCOMS platform.[2] This is an integrated Digital Asset Management environment optimized for streaming data acquisition, video analytics, event detection and metadata management. The operation of the DRO-‐DMS is controlled via a web interface. Easy to configure ingestors ensure data from any sensor type can be accounted for and data calibration/post analysis can be triggered on the built in event/analytics service. A important feature is that all devices are described from the outset with sufficient metadata to generate an RDA compliant metadata record, so all data generated is discoverable via RDA. Poor internet access from the DRO site means that on a weekly cycle data stored in the DRO-‐DMS is physically transferred to JCU Cairns where it ingested into the mainline JCU DRO-‐DMS where it operates as part of the Tropical Data Hub service[3] and can be served more widely. The DMS seamlessly ingests and indexes new records and accounts for any duplicate data records. Data can be downloaded from the DRO-‐DMS web interface for further analysis or visualized within the portal. An innovative ‘mindcraft’ like data visualization and access tool was also developed.

Figure 1: DRO-‐DMS

REFERENCES 1. Active DRO research projects. Available from https://research.jcu.edu.au/dro/research/research-‐projects

accessed 8 June 2015. 2. CoastalCOMS core platform. Available from http://www.envirocoms.com.au/ accessed 8 June 2015.


3. Tropical Data Hub. Available from http://tropicaldatahub.org.au accessed 8 June 2015.

ABOUT THE AUTHORS Ian Atkinson is a Director of the eResearch Centre at James Cook University. His PhD studies were in chemical physics but nearly 20 years ago moved from experimental science into computational chemistry and high-‐performance computing. He has a long-‐standing interest in eResearch methods, tools, scientific data management and user interfaces for HPC tools. He is also actively involved in researching how new systems and software that connect the physical and virtual worlds, particularly focusing on environmental monitoring with sensor networks. These include the development of the Tropical Data Hub, involvement in “The Digital Homestead” to evaluate how modern Information and Communication Technologies (ICT) such as wireless sensor networks (WSN’s), data analytics and rural connectivity could support greater profitability for the Northern beef industry, and a range of ‘reef-‐to-‐rainforest’ biodiversity, climate and other environmental monitoring projects. A/Prof. Jeremy VanDerWal is a spatial ecologist, a Senior Research Fellow at the Centre for Tropical Biodiversity and Climate Change, and the Deputy Director of the eResearch Centre at James Cook University. His research is focussed on assessing the potential impacts of past, present and future climate on the distribution and abundance of species. Much of his research explores ecological theories with applied aspects. Dr VanDerWal is interested in ensuring that science is not just ‘theoretical’ but rather is used to engage and inform a wide variety of end users. Daniel Baird is a software engineer and user experience designer at the eResearch Centre at James Cook University. He holds degrees in psychology and computer science and uses both to create data engagement experiences at the eResearch Centre. In the past Daniel has perpetrated software in C++, Delphi, Java and Swing, PHP, Microsoft Access, Crystal Reports, ColdFusion, Ruby, and web technologies across the retail, higher education and corporate sectors, and co-‐founded the wiki hosting site http://tiddlyspot.com.

David Beitey is the online technologies manager for the eResearch centre at JCU. Working closely with researchers and other groups alike, David is extremely passionate about free and open source software and all aspects of it security. David performs development and operations for the majority of eResearch services, where most code produced is open-‐sourced. He also provides support for services like JCU’s research portfolio and research bait, and a variety of other it-‐related tasks, such as operating the Vislab 3d visualisation room. His work takes him far and wide, delivering a high-‐quality end-‐to-‐end service for researchers inside and outside of JCU. Andrew Krockenberger Nigel Bajima Scott Mills Nigel G. Sim


Development of cloud-‐based virtual desktop environment for synthesis and analysis for ecosystem science community

Siddeswara Guru1, Hoang Anh Nguyen2, Shilo Banihit3, Matthew Mulholland3, Kim Olsson3, Tim Clancy1 1 Terrestrial Ecosystem Research Ecosystem, University of Queensland, St Lucia, Australia, [email protected],

[email protected] 2 Research Computing Centre, University of Queensland, St Lucia, Australia, [email protected]

3Queensland Cyber Infrastructure Foundation, University of Queensland, St Lucia, Australia

Siddeswara Guru

INTRODUCTION Current scientific experiments are becoming increasingly complicated. These experiments often consist of multiple models, analytical tools and data stores. Furthermore, computational components of an experiment can be executed in parallel and/or in a distributed computing infrastructure. This makes the creation and execution of these experiments challenging and due to a vast variety of software and processes used in the scientific experiments, it is a challenge to capture all procedures and processes utilized in every steps of an experiment to make the experiment transferable, shareable and reproducible. Scientific workflow technology has become popular by providing a high-‐level environment that can automate, manage and execute various steps in scientific research with an ability to store and track provenance information. Scientific workflows provide a powerful unifying platform that allows scientists to build arbitrarily complicated applications by combining predefined components [1], which may be implemented in different programming languages. Once the workflow is built, it can be re-‐used, re-‐executed with minimal effort. These intrinsic capabilities of a workflow system with provenance tracking functionality would improve reproducibility of experiments and encourage sharing of experimental processes and results. Workflow systems offer a broad range of components, and that perform tasks ranging from acquiring data from sensors, querying databases, data-‐mining and visualization, through to execution of arbitrary applications. Workflows are widely used in most of the data-‐intensive scientific domain. These large-‐scale experiments often require distributed computing resources for computation and storage. In an Australian context, most often researchers use the National research platforms National eResearch Collaboration Tools and Resources (NeCTAR) and the Research Data Storage Infrastructure (RDSI) to get compute and storage cloud resource. However, both RDSI and NeCTAR provision resource in an Infrastructure as a Service (IaaS) model. Scientists need to apply for resources, build and maintain a platform of use, and run experiments on the platform. This requires significant system administration skills, which most often domain scientists would not have. Therefore, there should be an alternative approach where researchers get access to distributed computing infrastructure, and it is not cumbersome to setup, access and manage. In this paper, we present the development of Collaborative Environment for Ecosystem Science Research and Analysis (CoESRA), which provides a web-‐based virtual desktop environment that supports analysis and synthesis using scientific workflow. CoESRA is built on NeCTAR and RDSI cloud infrastructure, which will enable users to access ready to use Linux desktop environment through a web browser. One of the motivations for the development of CoESRA is to provide scientist a desktop environment, which has an ability to leverage analysis tools and distributed computing infrastructure to perform data and compute intensive experiments. The desktop environment will be made accessible from a web browser to make it easy to access and lower any impediments to access and use the virtual desktop environment.

SYSTEM OVERVIEW CoESRA is a Web-‐enabled virtual desktop environment running on cloud infrastructure. A user can access a virtual desktop environment and use it to build, execute and share workflow-‐based scientific analysis and synthesis activities. The high-‐level CoESRA system architecture is shown in Figure 1. The system has following functionalities:

• User registration and creating user accounts in a system, • Create and provide access to virtual desktops for users which have tools like Kepler, RStudio, Python and

Nimrod, • Access to storage, • Manage user access to virtual desktops, • Manage virtual desktop instances,


• Ability to publish workflow as a service record in Australian National Data Service (ANDS) Research Data Australia (RDA).

Figure 1: Architecture of CoESRA System

SYSTEM OPERATION

User Registration and System Access A user needs to register with the system to access the virtual desktop (VD). A system supports Australian Access Federation (AAF) as a login mechanism and LDAP for an internal authentication. Once a user registers to a system, user-‐provisioning process will start which includes, creating LDAP entry to a user, creating user home folder on an NFS server (cloud-‐based storage) and updating user information in a database. Once all these steps are successfully performed user will get a system generated email confirming that the registration process is complete. When a registered user login to a system using AAF, a VD session can be requested from CoESRA system. The DaaS Admin Service (Figure 1) looks for a free VD from the pool and assigns that VD for a requested user. Once VD is occupied, it cannot be assigned until the user releases it to the pool.

Virtual Desktop Environment The VD runs on CentOS 6.5 and pre-‐configured with Kepler scientific workflow and related modules, programming languages R, Python, an environment like Rstudio and distribution computing tool Nimrod[2]. Kepler is used for its strong support in ecology domain and a large number of reusable components. A VD is accessible via a web browser using Guacamole and Remote Desktop Protocol. To prevent user from keeping the VDs infinitely, when a user request access to a virtual desktop, it is assigned to a user for a fixed duration (48 hours) and an email will be sent 6 hours before the session expires. A user can request session extension by emailing to CoESRA system administrator. Otherwise, the session will be terminated, and a VD will be released back to a pool. A VD has access to common storage to share workflows and data. This will promote an informal way of sharing and collaboration. A new feature has been added to Kepler graphical user interface to create a RIF-‐CS service record for workflows to make them discoverable from ANDS-‐ RDA. This will provide additional functionality to share workflows on MyExperiment.org as well as on ANDS-‐RDA. All Virtual desktops are mapped to a centralised provenance database to store and query the history of workflow runs performed by users. The users will also have the flexibility to distribute jobs using Nimrod on a dedicated cluster as well as locally in a VD.

Virtual Desktop Pool All the free VDs in CoESRA are kept in a Virtual Desktops Pool (VDP). This pool is resizable depending on the number of VD requests. The load balancer (Figure 1) handles keeping the number of free VDs within a certain threshold. As soon as the number of free VDs is below “minimum threshold”, the load balancer launches virtual machines to create VDs so that the number of free VDs is within a range. Similarly, the load balancer deletes VDs if a number of free VDs are higher than “maximum threshold".

REFERENCES [1] B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao, "Scientific workflow management and the Kepler system," Concurrency and Computation: Practice and Experience, vol. 18, pp. 1039-‐1065, 2006. [2] D. Abramson, C. Enticott, and I. Altinas, "Nimrod/K: Towards massively parallel dynamic Grid workflows," in High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, 2008, pp. 1-‐11.


ABOUT THE AUTHORS Siddeswara Guru is a Data Integration and Synthesis Coordinator for Terrestrial Ecosystem Research Network. His research area is in the scientific data management. He has a PhD From Melbourne University, MBA from the University of Tasmania. Previously, he has worked in CSIRO and Australian Ocean Data Network development office. Mr. Hoang Anh Nguyen is a systems programmer at the Research Computing Centre, the University of Queensland. He received a Bachelor Degree with Honours in Software Engineering from Monash University. He is completing his PhD, which is to design new ways to interact with Kepler workflows running behind a science gateway. Prof. Tim Clancy is a Director of Terrestrial Ecosystem Research Network. Prior to this, he managed the Forest Resources Management Section of the Australian Bureau of Agriculture and Resource Economics and Sciences (ABARES) and led the organisation's Land and Forests Theme. Among his responsibilities was reporting on national forest, land use, land management and vegetation data.

eResearch Australasia Conference | Melbourne – Australia | 27 -‐ 31 October -‐ 2014

The WorkWays Problem Solving Environment David Abramson, Hoang Nugyen

University of Queensland

Science gateways allow computational scientists to interact with a complex mix of mathematical models, software tools and techniques, and high performance computers. Accordingly, various groups have built high-‐level problem-‐solving environments that allow these to be mixed freely. In this talk, we introduce an interactive workflow-‐based science gateway, called WorkWays. WorkWays integrates different domain specific tools, and at the same time is flexible enough to support user input, so that users can monitor and steer simulations as they execute. A benchmark design experiment is used to demonstrate WorkWays.

ABOUT THE AUTHOR(S) Professor David Abramson has been involved in computer architecture and high performance computing research since 1979. He has held appointments at Griffith University, CSIRO, RMIT and Monash University. Most recently at Monash he was the Director of the Monash e-‐Education Centre, Deputy Director of the Monash e-‐Research Centre and a Professor of Computer Science in the Faculty of Information Technology. He held an Australian Research Council Professorial Fellowship from 2007 to 2011. He has worked on a variety of HPC middleware components including the Nimrod family of tools and the Guard relative debugger. Professor Abramson is currently the Director of the Research Computing Centre at the University of Queensland. He is a fellow of the Association for Computing Machinery (ACM) and the Academy of Science and Technological Engineering (ATSE), and a Senior Member of the IEEE. Mr Nguyen is a PhD student in the School of Information Technology and Electrical Engineering at the University of Queensland.


Developing Science Gateways:

Current Solutions and Future Challenges Sandra Gesing

University or Notre Dame, Notre Dame, USA, [email protected]

CURRENT SOLUTIONS In the last 10 years the research area on science gateways has extensively grown and the usage of science gateways has highly increased. This is evident in publications such as special issues on science gateways and statistics that providers of distributed infrastructures reported last year that the first time in history their resources have been used more via science gateways than via commandline. Quite a few mature and widely used science gateway frameworks (e.g., Galaxy, WS-‐PGRADE) and APIs (Apache Airavata, Agave) have evolved, which serve developers with building blocks for efficiently implementing science gateways. They address the challenges the developers have to face for each science gateway -‐ from intuitive user interfaces through security features to distributed job, data and workflow management.

FUTURE CHALLENGES The pace of novel developments of web-‐based technologies as well as agile web frameworks steadily increases as well as the flexibility of utilizing the Internet. To be able to support and integrate such novel developments, science gateway framework solutions need to have a short release cycle warranted by a modular architecture and using concepts such as micro-‐services. Also the underlying distributed infrastructures are further evolving with cloud technologies, with light-‐weight containers like Docker and with cutting-‐edge accelerated architectures on the hardware side. Future approaches will include not only the extension of science gateways to such new IT technologies but also the integration of data sources in labs like telescopes such as the Square Kilometre Array (SKA), which will create data rates in exa-‐scale size. Especially, the amount of data in general – from data created via computational methods or in labs – will demand for new capabilities offered by science gateway frameworks and APIs. The lightning talk will give a brief overview of existing solutions and concludes with a discussion about future challenges.

PLEASE NOTE: Registration for this workshop is via … · 2015. 9. 1. · The Collaborative Urban...

Documents

Transcript of PLEASE NOTE: Registration for this workshop is via … · 2015. 9. 1. · The Collaborative Urban...