Speeding Science Solutions for Data Curation from Microsoft (Research) Lee Dirks Director, Education...

download Speeding Science Solutions for Data Curation from Microsoft (Research) Lee Dirks Director, Education & Scholarly Communication External Research Division

of 24

  • date post

    26-Mar-2015
  • Category

    Documents

  • view

    213
  • download

    1

Embed Size (px)

Transcript of Speeding Science Solutions for Data Curation from Microsoft (Research) Lee Dirks Director, Education...

  • Slide 1

Speeding Science Solutions for Data Curation from Microsoft (Research) Lee Dirks Director, Education & Scholarly Communication External Research Division Microsoft Corporation Slide 2 Division within Microsoft Research focused on partnerships between academia, industry and government to advance computer science, education, and research in fields that rely heavily upon advanced computing Supporting groundbreaking research to help advance human potential and the wellbeing of our planet Developing advanced technologies and services to support every stage of the research process Microsoft External Research is committed to interoperability and to providing open access, open tools, and open technology Slide 3 Mission Optimize and extend Microsoft software to meet the specific needs of the academic community Our approach: Conduct applied projects to enhance academic productivity by evolving Microsofts scholarly communication offerings Microsoft External Research is uniquely positioned to drive this initiative across Microsoft Slide 4 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Data Collection, Research & Analysis Authoring Publication & Dissemination Storage, Archiving & Preservation Collaboration SharePoint LiveMeeting Office Live Office OpenXML XPS Format SQL Server & Entity Framework Rights Management Data Protection Manager Office 2010: Word PowerPoint Excel OneNote Tablet PC/UMPC Word 2010 + PowerPoint 2010 WPF & Silverlight Sea Dragon / PhotoSynth / Deep Zoom Excel 2010 Windows Server HPC Astoria / Pop Fly The Scholarly Communication Lifecycle Discoverability FAST MSR Academic Search Bookweb SharePoint 2010 Slide 5 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Interoperability is essential Actively lobby and drive for consensus around technical standards and standardized protocols proactively adopted by the community; enable broad community engagement Customers have told Microsoft that interoperability is OUR responsibility Leverage Existing Community Protocols, Practices, Guidelines, etc. Example metadata conventions / taxonomies / ontologies: a traditional strength for libraries and a critical component in enabling Web 2.0 Optimize for data-driven research To both data (scientific) and to information (scholarly publications) Reproducible research + computational science Properly document / annotate scholarly output Data preservation (and provenance) should be baseline Documentation of the datas provenance Preservation needs to be like accessibility features i.e., assumed as required Semantic knowledge discovery & social networking Harnessing collective intelligence must be a consideration since accessing research is a core step in the life-cycle. Enable knowledge discovery Optimize for Web 2.0 scenarios and allow end-users/experts to find things easier Slide 6 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Open Science Open Access Open Source Open Data http://www.microsoft.com/interop/ In order to help catalyze and facilitate the growth of advanced CI, a critical component is the adoption of open access policy for data, publications and software. NSF Advisory Committee on Cyberinfrastructure (ACCI) Microsoft Interoperability Principles Open Connections to Microsoft Products Support for Standards Data Portability Open Engagement Slide 7 DataCite is an international consortium to establish easier access to scientific research data on the Internet increase acceptance of research data as legitimate, citable contributions to the scientific record, and to support data archiving that will permit results to be verified and re- purposed for future study. The Open Planets Foundation has been established to provide practical solutions and expertise in digital preservation, building on the 15 million investment made by the European Union and Planets consortium. OPF members benefit from the Planets results, new developments and the growing OPF community that includes experts at some of the most prestigious research, technology and memory institutions in Europe. The Confederation of Open Access Repositories (COAR) is a not-for-profit association of repository initiatives launched in October 2009. It aims to enhance greater visibility and application of research outputs through global networks of Open Access digital repositories. The Coalition for Networked Information (CNI) is an organization dedicated to supporting the transformative promise of networked information technology for the advancement of scholarly communication and the enrichment of intellectual productivity. Membership includes some 200 institutions representing higher education, publishing, network and telecommunications, information technology, and libraries and library organizations. ICSTI, the International Council for Scientific and Technical Information, offers a unique forum for interaction between organizations that create, disseminate and use scientific and technical information. ICSTI's mission cuts across scientific and technical disciplines, as well as international borders, to give member organizations the benefit of a truly global community. CrossRef is a not-for-profit membership association whose mission is to enable easy identification and use of trustworthy electronic content by promoting the cooperative development and application of a sustainable infrastructure. CrossRef's general purpose is to promote the development and cooperative use of new and innovative technologies to speed and facilitate scholarly research. Slide 8 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Slide 9 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Source code and binary: http://GenepatternWordAddin.codeplex.com Services: Connects to GenePattern database Data: Resulting data (and provenance) stored within Word document Data: Control and execute query pipelines into GenePattern Relationships: Inline graphics are synchronized to dataset Slide 10 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Intent: Insert Creative Commons licenses from within Office 2007 Relationships: license information stored as RDF XML within the document OOXML Source code and binary: http://ccaddin2007.codeplex.com Services: Integrates with Creative Commons Web API to create new licenses Slide 11 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Phil Bourne Lynn Fink Source code and binary: http://research.microsoft.com/ontology/ Relationships: Ontology browser Intent: Term recognition & disambiguation John Wilbanks Services: Ontology download web service Slide 12 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Binary (version 2.0): http://research.microsoft.com/authoring/ Relationships: ORE Resource Map creation Structure: Read, convert, and author NLM XML documents Structure: Client-side XML validation Services: repository deposit via SWORD This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Relationships: Citation lookup and reference management Slide 13 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Slide 14 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Relationships: Navigate and link referenced chemistry Peter Murray- Rust Joe Townsend Jim Downing Available soon: http://research.microsoft.com/chem4word/ Data: Semantics stored in Chemistry Markup Language Intent: Recognizes chemical dictionary and ontology terms Author/edit 1D and 2D chemistry. Change chemical layout styles. Intelligence: Verifies validity of authored chemistry Slide 15 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Organize collection of individual workflow activities Author, Execute and Monitor Workflows Available now: http://research.microsoft.com/collaboration/tools/trident.aspx Compose and modify workflows via drag & drop canvas View data products, performance metrics, and provenance data Slide 16 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Slide 17 The Windows Azure platform offers a flexible, familiar environment for developers to create cloud applications and services. With Windows Azure, you can shorten your time to market and adapt as demand for your service grows. Windows Azure offers a platform that is easily implemented alongside your current environment.Windows Azure platform Offerings: Windows Azure: operating system as an online service Windows Azure Microsoft SQL Azure: fully relational cloud database solution Microsoft SQL Azure Windows Azure platform AppFabric: connects cloud services and on-premises applications Windows Azure platform AppFabric Microsoft Codename Dallas: information marketplace for data and web services Microsoft Codename Dallas Slide 18 Microsoft "Dallas" is a service allowing developers and information workers to easily discover, purchase, and manage premium data subscriptions in the Windows Azure pla