Post on 10-May-2015
Human Variome ProjectCountry Node Development Workshop
Timothy D. Smithtim@variome.org
@tim_d_smith
Purpose of Today
• An interactive discussion about HVP Country Nodes
– What they are
– What they do
– How they do it
• Get some feedback from you on what Country Nodes need to be
– I’ll be putting you on the spot at certain points
• Hopefully inspire some of you to start a Node in your own
Country
Outline
• How to initiate an HVP Country Node
• The role of the Node within the country
• Genetics Capacity within the country or region
• Data Collection: who, what, where and how
• Ethical issues, Access policies and Data Ownership
• Data Model
Here’s my problem
• There is no “official” definition of what a Country Node is
– We’re not quite there yet
• No “instruction manual” for how to build one
– Not yet, anyway
Working Definition
An HVP Country Node is an electronic repository of information on genetic variations discovered in patients residing in a specific
country or belonging to a specific population. Ideally the repository contains information on genetic variations discovered during
both routine diagnostic testing and during research. The repository is managed locally by a committee or organisation that has
sufficient representation of stakeholder groups and the backing or support of the country’s human genetics society or similar
professional body. This committee would be responsible for ensuring the sustainability of the repository and compliance with
local laws, regulations and ethics requirements, as well as determining policies for the repository (e.g. data access policy, data
retention policy, curation policy, etc.) The government need not be directly involved in the operation or financing of the
repository, although such involvement is desirable. At the very least, the Ministry of Health should be aware of the Node, its
operations, benefits to the local health system and its relationship to the international Human Variome Project. The minimum
level of information that should be collected on each variant is that described in AlAama et al. (2011) and the repository includes
information on variants considered to be non-pathogenic as well as those affecting function (or pathogenic). In addition, the
repository should capture all instances of each variation that are reported, not just the first case.
• From R05-2012: HVP Country Nodes: a partial definition-A report of the Country Node Development Workshop, Paris, 2012
available at http://www.humanvariomeproject.org/
It’s actually pretty simple
“An HVP Country Node is an electronic repository of information on genetic variations discovered in patients residing in a specific country or belonging to a specific population.”
“Ideally the repository contains information on genetic variations discovered during both routine diagnostic testing and during research.”
HVP Country Node
• Repositories of variation within a country
• Service with in-country benefits– Diagnostic labs– Clinics– Policy making and healthcare
delivery planning– Registries?
• Plays a part in global collection efforts
It’s actually pretty simple
HVP Country Node
Global Collection Architecture
Outline
• How to initiate an HVP Country Node
• The role of the Node within the country
• Genetics Capacity within the country or region
• Data Collection: who, what, where and how
• Ethical issues, Access policies and Data Ownership
• Data Model
An Organisation
“The repository is managed locally by a committee or organisation
that has sufficient representation of stakeholder groups and the
backing or support of the country’s human genetics society or
similar professional body.”
• The first step in establishing a Country Node is not to build the
database, but to build the organisation
An Organisation
• Need to decide what that organisation looks like
– A local issue
• Must be suitably representative
– All stakeholder groups
• What does this organisation do?
– Will determine what the organisation looks like.
“This [organisation] would be responsible for ensuring the
sustainability of the repository and compliance with local laws,
regulations and ethics requirements, as well as determining policies
for the repository (e.g. data access policy, data retention policy,
curation policy, etc.)”
Decisions to make - establishment
• How the development and operation of the Node will be funded;
• Where the Node will collect data from and how collection will be
achieved;
• What information the Node will collect on each variant; and
• How the collected data will be made available and to whom.
Decisions to make - running
• How operational and managerial decisions will be handled during
the development and continuing operations of the Node;
A robust and representative organisation is required to make these
decisions.
Organisation Types
• Consortium of interested individuals
• University Department
• Human Genetics Society
• Government
• Other
Stakeholders
• Diagnostic laboratories
• Medical geneticists
• Genetic Counsellors
• Researchers
• Ethicists
• Officials from the Ministry of Health and Ministry of Science and Technology
• Genetics societies
• Professional Bodies in charge of certifying labs and medical geneticists
• Patients
Funding & Sustainability
• The funding requirements of the Node fall in to three categories:
– supporting the organisational framework in an ongoing fashion;
– maintaining the operations of the technical systems and infrastructure;
and
– developing new technical systems and infrastructure in response to user
needs.
Funding & Sustainability
• Organisational component most likely to be the highest recurring
cost:
• Recurring technical costs are generally low
• Most likely sources of funds to support organisational
framework:
– Government
– University
– Professional society
Human Variome Project ICO
• We can help…
– Human Variome Project/China Country Development Programme
– Access to the economic and public health evidence for the importance of
HVP Country Nodes;
– Provide assistance to local organisations in their efforts to generate funds;
and
– Work with UNESCO and WHO to improve the knowledge of genetic
disorders and their economic and public health impacts within the science
and health ministries of member states.
Role of the Node
FundingData
CollectionData Model
Make Initial Decisions
Identify stakeholders
Hold a meetingBuild an
organisation
Initiation
Outline
• How to initiate an HVP Country Node
• The role of the Node within the country
• Genetics Capacity within the country or region
• Data Collection: who, what, where and how
• Ethical issues, Access policies and Data Ownership
• Data Model
Role of the Node
• Service for diagnostic laboratories
• Service for clinicians/genetic counsellors
• Service for medical researchers
• Source of data for the Human Variome Project
• Source of statistics for health service delivery planning
Role of the Node
• driving activity around medical genetics and genomics
– Education – public, CME, etc.
– ELSI
– public funding of genetic testing and genetic health care services
– submission to databases as part of the licensing requirements of
diagnostic laboratories
• act as a “spokesperson” for the Human Variome Project within
the country on issues of importance to the international Project.
• Decide on the role of the Node early, as every choice that needs
to be made will be informed by this decision.
Role of the Node
• Funding• Data Collection
• Data Model
Make Initial Decisions
Identify stakeholders
Hold a meetingBuild an
organisation
Initiation
Outline
• How to initiate an HVP Country Node
• The role of the Node within the country
• Genetics Capacity within the country or region
• Data Collection: who, what, where and how
• Ethical issues, Access policies and Data Ownership
• Data Model
Genetics Capacity
• What tests are available?
• Where are these tests performed?
• What data is generated during these tests?
• What data is shared?
– Locally, nationally, internationally
• Who pays for the tests?
• How will things change in 1, 5, 10 years?
Genetics Capacity
• The Node will need to know all this
– Or, at the very least, be able to find the answers
• The Human Variome Project would like to know this information
as well
– Currently compiling the HVP Country Node Baseline Report
• Capacity will shape what role the Node performs
Role of the Node
• Funding• Data Collection
• Data Model
Make Initial Decisions
Survey labs & clinics
Appreciate Capacity
Identify stakeholders
Hold a meetingBuild an
organisation
Initiation
Outline
• How to initiate an HVP Country Node
• The role of the Node within the country
• Genetics Capacity within the country or region
• Data Collection: who, what, where and how
• Ethical issues, Access policies and Data Ownership
• Data Model
Where data comes from
• Molecular data
– Labs – Research and/or Diagnostic
– Clinics/Genetic Counsellors
– Literature
• Clinical Data
– Labs
– Clinics/Genetic Counsellors
– Patients (Registries)
Data Types
• Classified variants
• Unclassified variants
• Benign variants
• NGS/Incidental findings
• Negative results
Depends on…
• Ethical, legal and social restrictions (difficult to change)
– Government/regulatory bodies
– Professional codes of practice
• What data sources are willing to share (political)
• What data is able to be collected (technical)
• How useful the data will be to the end-users of the system (non-
negotiable)
How will collection be achieved
• Technical
– Electronic or Paper
– Manual or automatic
• Process
– At what point in the pipeline
• Do collection activities differ by data source category
Role of the Node
• Funding• Data Collection
• Data Model
Make Initial Decisions
Survey labs & clinics
Appreciate Capacity
Identify stakeholders
Hold a meetingBuild an
organisation
Initiation
Outline
• How to initiate an HVP Country Node
• The role of the Node within the country
• Genetics Capacity within the country or region
• Data Collection: who, what, where and how
• Ethical issues, Access policies and Data Ownership
• Data Model
ELSI
The Organisation operating the Node will be responsible for
ensuring the data is collected, stored and used in a manner that
complies with all local ethical, legal and social concerns.
Local organisations are best placed to determine how to do this
Need to develop
• Collection policy
• Collection Agreement
• Access policy
• Data ownership/IP policy
– Public/private labs
Collection Policy/Agreement
• What is collected
– Identifiability issues
– Phenotype
• How is it collected
• How is it stored
• When is it collected
• Why is it collected
Access Policy
• Depends on the role
• Who can access data locally and for what purpose
– Controlled Access?
• What use classes
• Who can access data internationally
• What data can be shared and who with
– This is very important
– specific data elements that must be shared are yet to be specified by the
International Confederation of Countries Advisory Council
Ownership/IP
• Who ‘owns’ the data submitted to a Node?
– The lab or the Node
• Private/commercial labs
– License terms
– Withdrawal from the agreement
• Data already shared internationally
Remember
• The issues of what data sources to collect from, ownership of
data and access rights are interconnected
• Permission to collect from certain sources may only be granted if
the Node agrees to certain access rights and ownership
provisions
• Need to be determined before technical development begins
Data Access Policy
Ownership/IP Policy
Sign up collection
sources
Mature as an Organisation
Role of the Node
• Funding• Data Collection
• Data Model
Make Initial Decisions
Survey labs & clinics
Appreciate Capacity
Identify stakeholders
Hold a meetingBuild an
organisation
Initiation
Outline
• How to initiate an HVP Country Node
• The role of the Node within the country
• Genetics Capacity within the country or region
• Data Collection: who, what, where and how
• Ethical issues, Access policies and Data Ownership
• Data Model
Data Model
• what data is stored in the database
• how each data element is related to each other element
• what format each element can be stored in
• which elements are mandatory
• how much intervention will be made during and after submission by human
beings to:
– standardise format
– assess data quality
– correct errors
– add additional information or combine records
Curation
• Australia
• No curation
• relies on a combination of automated processes, data model and data
submitters
• only allows submission by registered diagnostic laboratories via software
tools that the Australian Node provides
• Only accepts submissions of data that has been recorded in a pathology
report
this change
• Protein, DNA, RNA level
• Coding DNA or genomic DNA
• rs#
• HGVS Nomenclature
• Old school methods
• Bespoke methods (BIC)
this gene
• HGNC name: MC1R, melanocortin 1 receptor, HGNC:6929
• Synonyms: MSH-R
• OrphaNet ID: ORPHA139778
• OMIM: 155555
• Entrez Gene: 4157
• Sequence Accession: NM_002386
• Chromosomal Location: 16q24.3
• Coordinates: 16:89,984,286 - 89,987,384
means this
• Simple
• Free text
• Coding
– ICD-10 (9)
– SNOMED
• Ontologies
– HPO
– GO
– PATO
• 20,000 genes = 20,000 disorders
• All different
patient X
• Ethical, legal and social issues
• Global context
• Patient ID
• Assigned by LSDB
• Useful?
• Human Variome Project ICCAC will set standard for minimum content
• Each Node represented on the Council will have input to this process
Data Model
Quality
• Once standards exist you can measure
• Transition from research grade to clinical grade
• These resources ultimately need to be useful as clinical decision
support tools
AlAama et al. (2011)
• Gene Name—described in the form of both the HUGO Nomenclature
Committee approved gene name and a sequence accession number and
version number.
• Variant Name—written as HGVS nomenclature (http://
www.hgvs.org/mutnomen/).
• Pathogenicity—classified as five levels of pathogenicity (see Plon et al., 2008).
• Test date—the date that the results where produced.
• Patient ID—a deidentified code which is unique to a patient.
• Patient Age—the age of the patient when tested.
AlAama et al. (2011)
• Patient Gender.
• Submission date.
• Disease associated with the mutation—if diagnosed.
• Lab Operator ID—a code that identifies the operator who uploaded the data.
• Laboratory Name/ID.
• Country/Region Name/ID—if a regional repository is used.
• Level of consent obtained.
• Can the patient be recontacted for other studies?
• Can clinical and/or molecular data be used for statistical analyses (with options for
local laboratory, country, and/or international)?
Australian Node – Phase 1
• Mandatory– Gene Accession # and version– Variant Name (as HGVS
nomenclature)– Pathogenicity Classification
(Plon)– Date of classification
• Optional– Test Details
• Date of test• Method• Sample Type• Start• Stop
– Age (at test)– Disease– Misc.
• Sample Stored• Pedigree Available
Australian Node – Phase 2
• Mandatory– Gene Accession # and version– Variant Name (as HGVS
nomenclature)– Pathogenicity Classification
(Plon) + justification– Date of classification– Disease– Reason for Test
– Predictive– Carrier– Diagnostic
• Optional– Test Details
• Date of test• Method (inc. NGS Platform)• Sample Type• Sampling Date• Start• Stop
– Interpretation Method (deviation from standard)
– Age (at test)– Misc.
• Sample Stored
Software & Deployed Infrastructure
• DBMS
– Implements data model
– Allows querying of the data
• User interface
– What the user sees
– Access control
• Collection tools
A Node is a living thing
• All technical and organisational elements will require updating
– Needs change
– Laws change
– Technology changes
– HVP Standards & Guidelines
• Role of the Node Organisation is to manage this change
Data ModelDBMS
UICollection
Tools
Technical Development
Data Access Policy
Ownership/IP Policy
Sign up collection
sources
Mature as an Organisation
Role of the Node
•Funding•Data Collection
•Data Model
Make Initial Decisions
Survey labs & clinics
Appreciate Capacity
Identify stakeholders
Hold a meetingBuild an
organisation
Initiation
Help is available
• AlAama, J., Smith, T. D., Lo, A., Howard, H., Kline, A. a, Lange, M., Kaput, J., et al. (2011). Initiating a
Human Variome Project Country Node. Human Mutation, 32(5), 501–6. doi:10.1002/humu.21463
• Cotton, R. G. H., Al Aqeel, A. I., Al-Mulla, F., Carrera, P., Claustres, M., Ekong, R., Hyland, V. J., et al.
(2009). Capturing all disease-causing mutations for clinical and research use: toward an effortless
system for the Human Variome Project. Genetics in Medicine, 11(12), 843–9. doi:
10.1097/GIM.0b013e3181c371c5
• Patrinos, G. P. (2006). National and ethnic mutation databases: recording populations’ genography.
Human Mutation, 27(9), 879–87. doi:10.1002/humu.20376
• Patrinos, G. P., Al Aama, J., Al Aqeel, A., Al-Mulla, F., Borg, J., Devereux, A., Felice, A. E., et al. (2011).
Recommendations for genetic variation data capture in developing countries to ensure a
comprehensive worldwide data collection. Human mutation, 32(1), 2–9. doi:10.1002/humu.21397
Solutions
• HGVS nomenclature
• Open Source DBMS products
– LOVD, UMD, MutBase
– Ethnos, Australian Node
• Minimum Content - recommendations
• Ethics
• Submission Forms
• Forum for discussion, debate, consensus
Australian Node
• HVP Portal (v1.0, r512) - A web application which features the basic interface for browsing and
querying a HVP node.
– Open source – MIT License
– Python/django
• HVP Exporter (v1.0, r512) - Basic HVP exporting tool for laboratories. Features simple GUI and error
checking interface, plug-in architecture for customisation between sites and common libraries for
working with MS Access and MS Excel data sources
– Open source – MIT License
– .NET C#, python/ironpython
• HVP Importer (v1.0, r512) - A series of tools and web services that receive, decrypt and process
information by submitting laboratories using the standard transaction XML format
– Open source – MIT License
– python
Still work to do
• Minimum content
• Infrastructure to enable sharing
• Attribution
– ELSI
– Incentive for submission
• Phenotype description
– Useful – humans and computers
– Language differences
• Describing ethnicity of patients
Human Variome ProjectCountry Node Development Workshop
Timothy D. Smithtim@variome.org
@tim_d_smith