ChemConnectA use case example using cloud services
Goals of talk
•Brief introduction to infrastructures and clouds•My experience/use of Google Cloud Platform•ChemConnect
Cloud ComputingWhat is a ‘cloud’ and why is it useful
Cloud Computing
4
COMPUTER NETWORK
STORAGE (DATABASE)
SERVERS
SERVICESAPPLICATIONS
Adopted from: Effectively and Securely Using the Cloud Computing Paradigm by peter Mell, Tim Grance
• Shared pool of configurable computing resources• On-demand network access• Provisioned by the Service Provider
What is Cloud Computing?• Cloud Computing is a general term used to describe a new class of
network based computing that takes place over the Internet, • basically a step up from Utility Computing• a collection of integrated and networked hardware,
software and Internet infrastructure (called a platform).• Using the Internet for communication and transport
provides hardware, software and networking services to clients
• These platforms hide the complexity and details of the underlying infrastructure from users and applications by providing very simple graphical interface or API (Applications Programming Interface).
5
Cloud Summary• Cloud computing is an umbrella term used to refer to Internet
based development and services
• A number of characteristics define cloud data, applications services and infrastructure:
• Remotely hosted: Services or data are hosted on remote infrastructure. • Ubiquitous: Services or data are available from anywhere.• Commercialized: The result is a utility computing model similar to
traditional that of traditional utilities, like gas and electricity - you pay for what you would want!
6
Do you Use the Cloud?
Cloud Flavors?• SaaS – Software as a Service• IaaS – Infrastructure as a Service• PaaS – Platform as a Service• DaaS – Desktop as a Service
Cloud Service ModelsSoftware as a
Service (SaaS)Platform as a
Service (PaaS)Infrastructure as a
Service (IaaS)
Google App Engine
SalesForce CRMLotusLive
Adopted from: Effectively and Securely Using the Cloud Computing Paradigm by peter Mell, Tim Grance
Cloud Architecture
10
Cloud Platform for ChemConnect
https://cloud.google.com/
Others exist(another popular
choice)
Why this one?ChemConnect is based on several
Google services (and philosophies)
Project Connected to Google Account
ServicesProvided
These are types of services provided by Google
as a cloud service providerFor ChemConnect the services of interest are:To run the JAVA based website(the ‘App’)
The ‘NOSQL’ database:(for large amounts of information)
Storage (data files)
App Engine
Programming SupportAPI: Application Program Interfaces
Monitoring Services
Raw Database Information
ChemConnect: client-server Structure
User interface on browser, tablet or phone
(adjustable for each)
Generates InterfaceChemConnect
Computing
andResponse
s
SERVER
CLIENT
Application EnvironmentsExample:
ChemConnect is written in JAVA
Eclipse:Uses a ‘standard’ (public domain)Environment to write code
Local debug and then Deploy to Google Cloud
Development Cycle:Google Cloud The
communityLocal Environment
Testing feedback
Local Deploy
Deploy to Cloud
Local client Interface
Web client Interface
Can’t get something for nothing:
Quotas (for ’small’ applications)
Make the immense amount of data in the combustion community
not only availablebut searchable
ChemConnect
Not restricted to ‘accepted’ published dataRecognize interdependencies between dataDatabase as an analytical tool
Fine-grained
• Data is the backbone of modern scientific research• Exchange of data is paramount to successful interaction
between research groups
Motivation
Publications and conferences Data exchanged between
researchers (email, etc)Virtual Research Environment
paperData files
Clouds (infrastructures)
Key Concept: Meta-Data
Keywords specifyingData TypeData Source (origin, time, place, etc.)Data Qualifications (sharing, quality, etc.)
Data relationships to other data (ontologies)
OntologiesPurpose:Defining interrelationships between data objects
Source:Semantic Web Concepts
Motivation:Large body of research in discovering relationships
RDF: Resource Description Language
Subject: The subject of the description
Predicate: The description of the relationship between subject and objectObject: The object of the description
Subject
ObjectPredicate
Relationships(example from CHEMKIN mechanism)
Object Relationship ObjectMech-butane-2011 hasReaction c2h5+o2 = c2h5o2Mech-butane-2011 hasSpecies c2h5c2h5o2 = c2h4o2h hasReactant c2h4o2hc2h5o2 = c2h4o2h hasProduct c2h4o2hc2h4o2h isIsomer c2h5o2c2h4o2h hasStandardEnthal
py-276.51 kJ/mol
c2h5 hasProduct c2h5o2c2h5 hasProduct c2h4o2hc2h5o2 = c2h4o2h subMechanism C2c2h5o2 = c2h4o2h subMechanism C2H5O2C2h5 + o2 = c2h5o2 followedBy c2h5o2=c2h4o2h
Connecting ’unrelated’ dataPassive Connection:
Don’t need to know which structures you want to connect
toIf they share an RDF subject or a RDF object
Then they are connected!!
Keyword: Passive
Role of data standardsIn one sense,
standards are only important for the initial parsing of the data
and maybe outputting the dataBut not within the database itself
If new standards come up,they can supplement the data
(thinking of the keys, identifiers, meta-data keys, DOIs, etc.)
Large network of interconnections
Each ‘bond’ is an RDF
Restructuring of dataData
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data Element
Data Element
Data Element
Data Element
Data Element
Data Element
Data Element
Blocks of dataIndividual pieces of data(with tags/descriptions)
Network of interconnected
data
Consequence (example)
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Relationships are established
between previously
Independent data sets/elements
Semantic Web: Relationships within the net
http://…isbn/000651409X
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:name a:homepage
a:authora:publisher
Author URL
Origin(and development of idea)
Semantic Web: Relationships within the net
Adds ‘meaning’ to the independent sources of
informationGives ‘relationships’
Between the Pieces of information
Semantic Web:Merge relationships
http://…isbn/000651409X
Ghosh, AmitavBesse,
Christianne
Le palais des miroirs
f:orig
inalf:nom
f:traducteur
f:auteur f:titr
ehttp://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
Common URL!
Connecting sets of Concepts
FrenchLangua
ge
EnglishLangua
ge
Semantic Web:Creating new relationships
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitavhttp://
www.amitavghosh.com
The Glass Palace2000
London
Harper Collins
a:year
a:city
a:p_name
a:name a:homepa
ge
a:author
a:publis
her
http://…isbn/000651409X
Two independent data sources(who did not know about each other)Become connected
Passive
Fine-Grained Information
Extraction of all the bits of information within the data object
CHEMKIN model:Extract set of molecules (with
isomer,thermodynamic data)Extract set of reactions (with ‘isomer’, kinetic
data, Extract relationships between
molecules and molecules (related through reactions)
molecules and reactions (reactants, products, etc.)
reactions and reactions (reaction network information)
Other Sources:Automatic Generation:
Mechanism with the information as above, plus2D-structure, reaction class information, substructure
informationThermodynamic Calculators: more thermodynamic information (plus
2d-structures)Have to have database capacity to store this immense amount of info
To be demonstrated
today
Linking data/models
ChemkinModel I
ChemkinModel II
2-D Structure Computational
ChemistryCalculations
Automatically
GeneratedCHEMKIN
Model
1-Butyl-3-hydroperoxide
C4H11O2
ch2ch2ch(ooh)ch3
1-c4hh8-3-ooh
hasSpecies
hasSpecies
hasSpecies
hasThermo
isIsomer isIsomer
isIsomer
Thermo
hasThermo
Thermo
hasThermo
Thermo
RDF: Resource Description Framework
Snapshot from query interface
UCSanDiego#NaturalGas IsA
Mechanism
UCSanDiego#NaturalGas#n-c3h7=c2h4+ch3
MechanismReactionUCSanDiego#NaturalGas
UCSanDiego#NaturalGas#N-C3H7IsAsReactant
UCSanDiego#NaturalGas#n-c3h7=c2h4+ch3
Names specific to the mechanism
Predicate relating items
Connection to other mechanisms Mechanism
Reaction in mechanism
Molecule in reaction
Simple SpeciesName
Isomer
GRI#GRI-3.0#C3H7Species in another mechanism
Database of RDF connections
Datastore Extremely large amount of Information
Needs anotherTechnology
(even a small CHEMKIN mechanism translates
to megabytes of information)
Other options
Database as Analytic DeviceTraversing through the network of
informationis a tool
to ‘analyze’ and
extract more/new information
How a species reacts?
Species(Isomer)
asReactant
asProduct
SetOf
Reactions
SetOf
Reactions
Not just from one
Mechanism, but from all cataloged
mechanisms
Database as analytic device
Search Path (in interface)
Reaction information
Collecting Information
To ‘’cart’
(building a mechanism)
Reaction pathwaysDatabase as analytic
device
isAProductSpecies
isAReactant
Reaction
isAProductSpecies
isAReactant
Reaction
isAProductSpecies
isAReactant
ReactionSpecies
Establishes a further relationship between two species
Could even supplementDatabase
Species1 PathTo Species2
Species between MechanismsDatabase as analytic
device
CHEMKINMechanis
m
Species are labels:Only know atomic composition (NASA
polynomial)Not structure
CHEMKINMechanis
m
C3H7
N-C3H7i-C3H7
Reactions (asProduct)
Reactions (asReactant)
Reactions (asProduct)
Reactions (asReactant)
Reactions (asProduct)
Reactions (asReactant)
Compare reactions
(species as isomers)
The set with the most similarities: wins
Species between MechanismsDatabase as analytic
device
Reactions (asProduct)
Reactions (asReactant)
Reactions (asProduct)
Reactions (asReactant)
The set with the most similarities: wins
C3H7
N-C3H7
A new relationship can be established
For the cautious:The relationship can be
qualifiedWith a probability
(related to degree of matching)For more certainty:
One can extend the comparison through
A larger network(path through two or more reactions)
Species between Mechanisms:One step further
If one of the mechanisms is automatically generated
Then have the 2D structureThe species goes from a ‘label’
to a Species with a structure
(can be further classified with substructures)
Database as analytic device
Data input
Look and feel
In the backgroundAccount Sign in:
Query:Which data do you have access to
Data input:How is your data shared
SecurityInhibit hacking Social media concepts:
groupsEach data point has sharing and ownership parameters
In the backgroundTransactions:
How who and when was the data entered (or analysed)
How was the database used: which queriesWhy?
Have to filter query results are shown and order themBoth personal and in general
General Field (computer science):Recommendation Systems
Each google search (from different people) gives different results
eCommerce sites use this to
Future directionsSome basic functionality is present:
Reading in CHEMKIN mechanisms from many sourcesManagement of RDFsSimple Query (single keyword search)
Data Sources:Automatic generated mechanisms (mechanism)
Data behind automatic generation (reaction classes, 2-D (sub)structures)
Independent thermodynamic dataComputational chemistry results
Query More complex searches
multiple keywordsinterpretation/preprocessing of keyword
expression before searchOrdering and filtering results (passive and with check
boxes)
To be continued:Demonstration
See you there!
If the gods of the internet (and the demon - ’demo
effect’) allows, you can try it out
Top Related