aip_developer_overview_icar_2014
-
Upload
matthew-vaughn -
Category
Science
-
view
384 -
download
1
description
Transcript of aip_developer_overview_icar_2014
araport.org
Arabidopsis Information Portal: A new approach to data sharing and
cooperative development
Matt VaughnDirector, Life Sciences ComputingTexas Advanced Computing Center
araport.org
SAVE THE DATE
• First AIP Developer Workshop• Nov 5-6, 2014 in Austin, TX @
Texas Advanced Computing Center
• Come learn and do with the AIP team– Interactive Science Apps
development– Data API development– General hacking and good times
• Email [email protected] if interested
araport.org
Overview
• Rationale for the AIP• Strategic objectives• Science Apps Architecture• Data API Architecture• How you can participate
araport.org
The Rationale for AIP
• Loss of TAIR as a publicly funded shared resource for data mining and basic bioinformatics
• Centralization as a key contributing factor– Loading of new data into database– Development of new user experience– Curation and annotation– Community support mission
• AIP is designed to be de-centralized
araport.org
IAIC Workshop Design
araport.org
AIP Proposed Architecture
araport.org
• Objectives– Develop a community web resource
• Sustainably fundable and community-extensible• Hosts diverse analysis & visualization tools + user data spaces
– Support Federation to integrate diverse data sets from distributed data sources
– Maintain the Col-0 gold standard annotation
• Methods– Assimilate TAIR10 data– Host an Arabidopsis InterMine– Develop a strategy to allow federation– Offer and consume well-designed RESTful web services– Interoperate with iPlant (and other projects) wherever
possible
The AIP Strategy (1)
araport.org
The AIP Strategy (2)
• Key Design Decisions– Centralized (but powerful) data warehousing capability PLUS
infrastructure enabling data federation– Jbrowse as a genome browser platform– WebApollo + Tripal for community annotation– App store model for graphical data interfaces (complete with
3rd party developer path)– Data store model for data sources– Accessible languages and frameworks– Secure & modern single-sign on– Web service access to Arabidopsis data for powerful
bioinformatics– Geo-replication and high availability– Code re-use from other projects wherever possible– Full code release in real time via GitHub
araport.org
Araport Bill of Materials• AIP is currently built using– InterMine*– Jbrowse 1.11.3*– Drupal 7.25*
• Developer-oriented content management system
– Angular.js, Bootstrap.js and other web toolkits– Agave Software as a Service platform
• Developed by the iPlant Collaborative• Bulk data, metadata, authentication, HPC app & job
management, notifications & events, and more• OAuth2 single-sign-on
– Internally-developed API manager*With extensive customization
araport.org
Araport Architecture
Agave Enterprise Service Bus
CLI clients, Scripts, 3rd
party applications
Physical resources
HPC | Files | DB
Agave Services
apps
meta
files
profile
jobssystems
ADAMAmanage
enroll
a b c d e f
AIP & 3rd party data providers
API Mediators• Simple proxy• Mediator• Aggregator• Filter
• Single-sign on• Throttling• Unified
logging• API versioning• Automatic
HTTPS
REST*
REST-likeSOAP
POX
Cambrian CGI
araport.org
What is a Science App?
– Written in HTML/CSS/Javascript• Uses standard frameworks
– Presented via web browser• Query or Analyze, Present, Persist
– Developed by AIP and the community• Deployed in AIP “app store”• Choose which ones you want installed in your
Araport “dashboard”
– Uses AIP Data Architecture• Data services: Local and remote query/retrieval• Data integration and aggregation services• Computation services
araport.org
How are Science Apps Developed?
• Objective: 3rd party developers can create a fully functional Araport science app on their local dev machine, then make it available for all Araport users
• We make use of Node.js, Yeoman, Grunt to begin accomplishing this
araport.org
Setting an a Science App Skeleton$ mkdir icar_test_app$ cd icar_test_app$ npm install generator-aip-science-app$ yo aip-science-app$ grunt
1. Develop locally2. Iterate, test, refine3. Package up with “grunt
dist” and sent to Araport team for hosting.
Media and script dependencies are either provided by AIP directly (common ones) or bundled up in your package.
araport.org
IntAct Viewer
https://github.com/Arabidopsis-Information-Portal/InteractionScienceApp
• Clone from git repo
• Launch locally
• Dig around internals and learn how it works
• Make improvements
• Submit them back to us
araport.org
• Docker.io for packaging• Ultra-portable dev
environment• Wide language
support• Python,
Javascript, Lua, Java
• Implicit security model
• Scales horizontally for performance
• Data API is package of metadata + a Docker file registered with a central arbiter service
• Also used for services written natively for AIP
Objectives: Facile development by end users; simple, secure deployment to AIP systems; reasonable performance
ADAMA: Araport DAta Mediator API
AGAVE
API MANAGER
https://github.com/waltermoreira/apim
araport.org
Data API Design Details (1)
• 100% RESTful services• Queries are JSON objects (conforming
to a JSON schema)• Providers REGISTER their services with
Araport• Science Apps access the services using
Araport as a proxy• Solves Cross-domain scripting issues
that complicate web app development
araport.org
Data API Design Details (2)
• To enroll a new 3rd party data service in API Manager– Specify the mapping between AIP’s
query fields and your service’s fields–Map common query terms to minimal
controlled vocabulary– Describe all service-specific parameters– Describe the outputs of your data
service using some standard language
araport.org
Data API Details (3)
When simple field:field mapping isn’t enough:
• Code-based transformations can be specified via– Python– Java– Ruby– Javascript
• This code runs on Araport API servers – no need for you to host additional services to work with us!
araport.org
Data API Details (4)
• Results returned in a standard JSON format*– status, message, result
• Result is an array of JSON objects• These objects conform to specific
JSON schemas– drafts on AIP GitHub soon for comment
*Unless there’s an operational reason not to
araport.org
Data API Details (5)
• Data APIs will implement extra utility functions– Count: How many records found?– Pagination: Return only subsets– Help: Return a usage page– Convert: JSON (native), XML, CSV, etc.
• Result: 3rd party Arabidopsis data APIs are centrally discoverable and usable
araport.org
Araport Data API Store
araport.org
Araport Data API Storecurl -X GET -k -v -L -b cookies https://api.araport.org/store/site/blocks/api/listing/ajax/list.jag?action=getAllPublishedAPIs
{ "apis": [ {"name":"InteractionBrowser", "provider":"vaughn", "version":"pr2-0.1", "context":"/data/BioAnalyticResource/interactionBrowser", "status":"Deployed", "thumbnailurl":"images/api-default.png", "visibility":null, "visibleRoles":null, "description":"InteractionBrowser", "apiOwner":"vaughn", "isAdvertiseOnly":false},
araport.org
ADAMA Quickstart
Objective: 3rd party developers can create a fully functional Araport web service on their local dev machine, then make it available it as a production service for all Araport users.
• Docs: http://rawgit.com/waltermoreira/apim/master/docs/index.html
• Runnable ADAMA VM (for local development): ETA end of August 2014
• Develop service adapters right in an iPython notebook!
• We’re ready to start having developers kick the tires on Data API development – shoot us an email
araport.org
Summary• Next-generation MOD allowing
community participation in its development
• Powerful interactive query and analysis functions available today
• Developing a data federation model• New data sets and functions coming at
a quick pace• Be on the lookout for participation
opportunities
araport.org
SAVE THE DATE
• First AIP Developer Workshop• Nov 5-6, 2014 in Austin, TX @
Texas Advanced Computing Center
• Come learn and do with the AIP team– Interactive Science Apps
development– Data API development– General hacking and good times
• Email [email protected] if interested
araport.org
Chris Town, PI
Lisa McDonaldEducation and Outreach Coordinator
Chris NelsonProject Manager
Jason Miller, Co-PIJCVI Technical Lead
Erik FerlantiSoftware Engineer
Vivek KrishnakumarBioinf. Engineer
Svetlana KaramychevaBioinf Engineer
Eva HualaProject lead, TAIR
Bob MullerTechnical lead, TAIR
Gos Micklem, co-PI Sergio ContrinoSoftware Engineer
Matt Vaughnco-PI
Steve MockPortal Engineer
Rion Dooley, API Engineer
Matt Hanlon, Portal Engineer
Maria KimBioinf Engineer
Ben RosenBioinf Analyst
Joe Stubbs, API Engineer
Walter Moreira, API Engineer
araport.org
Questions?
araport.org
araport.org
API Manager + Enterprise Service Bus
Araport architecture (2)
Secure, rationalized REST services
Consumer Applications
Simple Proxy
ThaleMine, Data
integration, other services
Cache
XML-to-JSON
SOAP-to-REST
CGI-to-REST
Throttle
Legacy API A
Legacy API B
REST API C
Simple Proxy
• Single-sign on
• Throttling• Unified
logging• API
versioning• Mediation
and translation
• Dev-friendly interfaces
• Rationalized REST for consumer apps
Media
tors
araport.org
Science Objectives
• Make more, varied data available to the Arabidopsis (and other) communities within a unified user experience
• Enhance the innate value of data by offering enhanced search, retrieval, and display capabilities
• Facilitate analysis of user data• Enable community participation in
functional annotation
araport.org
Technical Objectives
• Deploy a responsive, flexible community-extensible system
• Provide APIs everywhere!• Promote and facilitate data integration• Enable language- and region-specific
presentation of scientific content• Meet mobile computing on its own
terms
araport.org
Local vs. Data-driven Apps
Resources are local and inherently offline.
Operating on local data using local computing.
Resources are cloud-based and inherently online. Multiple data streams integrated, queried,
presented in context of broader objective.
Photoshop Express KAYAK Pro
araport.org
Araport Bill of Materials
• Araport is currently built using– Drupal 7.25
• Developer-oriented content management system
– Bootstrap.js and some other Javascript toolkits– InterMine (with modifications)– Bioinformatics infrastructure + misc. other bits– Agave 2.0 Software as a Service platform
• Developed by iPlant Collaborative project• Bulk data, metadata, authentication, HPC app and job
management, notifications & events, and more• OAuth2 out of the box• Enterprise service bus (ESB) architecture• http://agaveapi.co/
araport.org
Agave wso2 interface
Cache (Technology TBD)
CSV
Araport APIM Architecture (1)
POLYMORPH CGI
Form
Input Key Map
Output Key Map
InputTransfor
m
OutputTransfor
m
Listen Respond
Send Listen
Input Key Map
Output Key Map
InputTransfor
m
OutputTransfor
m
Listen Respond
Send Listen
Araport API Manager
JSON Query JSON Response
ElasticSearch
Remote Services
SNP by Locus REST Indel by Position REST Enroll Manage
araport.org
Araport Architecture: Use Cases (1)
• 1001 Genomes POLYMORPH tools– Provides variation data via locus or positional
search– Total of seven variant types available for search– Search parameterization depends a lot on
variant type– Example of a plain-text CGI service– Returns results as CSV with named columns
• Objective: Transform into a RESTful API that expects and returns rationalized JSON
http://polymorph.weigelworld.org
araport.org
Araport Architecture: Use Cases (2)
• ThaleMine– Has native REST interface for general queries– Has templates which can form basis of
specific services
• Objective: Offer both Intermine-native and AIP-conformant interfaces as Data APIs
• Current path– Enroll native services in our APIM– Develop template-based AIP-conformant
serviceshttp://polymorph.weigelworld.org
araport.org
Data APIs: Getting StartedService Queries Notes
BAR eFP Locus
BAR Expressologs Locus
BAR Interactions Locus
COGe Position Special case – output transform only
NASC $SERVICE Locus SOAP based but may be offline permanently
OrthologFinder Locus Based on a Thalemine template
POLYMORPH Locus, Position Actually seven CGI services
SUBA3 Locus
Compiling example queries, parameter mapping and description, and ideal results for use in implementing the system
araport.org
Developing a Data API
• In order, we prefer that you have ready• Well-documented REST• Moderately well-documented REST• SOAP services (plus WSDL or WADL)• Plain Old XML• Plaintext CGI• HTML CGI• No web services at all
• Work with us to enroll your services as a data source. This will involve a minor amount of coding.
araport.org
Computational App Model (1)
Host file systems
Host OSDocker.io
Centos 6.4
custom-repo
Container
/scratch/
database
araport-compute-00
araport-storage-00
Host FS (250 GB)
TACC Corral (PB+)
sftp
Agave apps, data, jobs
REST API x JSON objects
araport.org
Science Apps: Grid View• Current Scheme
• 2-3 column view w draggable apps
• Apps are normal, full-size, or collapsed
• Single app screen• Later in 2014
• N x X grid scheme implementing resizable app “tiles” like one sees in Android or Win8.x
• App SDK libraries will have “help” for enabling resizable design
• Multiple app screens
araport.org
Data API Details (2)
• For service-specific parameters– Provide human-readable names mapped to original
parameter names– Offer minimal descriptive text– Specify validation
• Cardinality• Pattern validator (regex)• Type (number, string, etc.)
– Indicate whether required– Indicate whether they should be visible in a UI– Specify reasonable default values
• Seems familiar?– This approach is used to to abstract command line apps– Allows automatic generation of minimally functional UI
araport.org
Data APIs: Response types (1)
• locus_relationship – pairwise relationship between A and B– Directionality– Type– Array of scores (weights, etc.)
• sequence_feature – positional attribute– Extension of GFF model plus– Build– Attributes array
araport.org
Data APIs: Response types (2)
• locus_feature – key-value attributes per locus– Optional controlled vocabulary* for keys– Support for both slots and arrays
• raw – for returning images or other binary formats– Source and other metadata carried in X-headers instead
of JSON result– Outbound transformation still supported– Not a preferred response mode
• text – returning either native service response or a non-conformant JSON document– Source and other metadata carried in X-headers instead
of JSON result– Not a preferred response mode
araport.org
Data API Details (6)
• Transparent caching will compensate for transient remote service failures
• Automatic indexing of certain response types via ElasticSearch, allowing for sophisticated global search– ElasticSearch allows us to index everything
we “know about” and return it quickly– iPlant uses it to live-index >700 TB user data
araport.org
Developing an app
• Understand and document the user stories you’re addressing with your app
• Identify all requisite data sources AND• Help us prepare them as Data APIs
– This may involve coding
• Understand the data integration or aggregation needs of your app– This may involve coding
• Develop the user interface(s) for your app using our tool kits and suggested practices– This will involve coding.– But you will learn tools like jQuery, Bootstrap, & D3 and will
thus be eminently employable!