Post on 06-Apr-2018
E-‐Rare
ERA-‐Net for research programmes on rare diseases
Instrument: Coordination Action
Start date:1st June 2006 Duration: 48 months
Strategy paper on platforms for rare disease research
Next Generation Sequencing platforms
May 2010
Project funded by the European Commission within the Sixth Framework Programme (2006-‐2010)
2
Authors
Joke Lievens Sophie Koutouzov GIS Institut des Maladies Rares – INSERM Mailing address: 96 rue Didot 75014 Paris France Tel: + 33 (0)1 58 14 22 84/81 e-‐mail: jlievens@gis-‐maladiesrares.net
3
Table of contents Table of contents ........................................................................................................................ 3
Introduction ................................................................................................................................. 5 Importance of technology for research on rare diseases ............................................................................ 5 Technology Platforms ................................................................................................................................................. 5 What? ................................................................................................................................................................................. 5 How are technology platforms financed? .......................................................................................................... 5 Activities ........................................................................................................................................................................... 6
Objectives ......................................................................................................................................................................... 6
Which platforms do RD researchers need? ....................................................................... 7 Focusing on strategic priorities and bottlenecks ............................................................................................ 7 Method ............................................................................................................................................................................... 7 Results ................................................................................................................................................................................ 7 Response Rate ................................................................................................................................................................. 7 Respondent’s expertise ............................................................................................................................................... 8 Platforms used today ................................................................................................................................................... 8 Bottlenecks identified .................................................................................................................................................. 9 Strategic Platforms .................................................................................................................................................... 10
Conclusions ................................................................................................................................................................... 11
Next generation sequencing platforms ............................................................................ 12 Aims and Approach ................................................................................................................................................... 12 Method ............................................................................................................................................................................ 12 Results ............................................................................................................................................................................. 13 Respondents and Response Rate .......................................................................................................................... 13 NGS equipment in responding laboratories .................................................................................................... 14 Applications ................................................................................................................................................................... 14 Bottlenecks for implementing new applications ........................................................................................... 15 NGS laboratories conduct own research .......................................................................................................... 16 Access to the NGS laboratories ............................................................................................................................. 16 Number of RD projects conducted in NGS laboratories ............................................................................. 17 Bottlenecks for external researchers to access NGS technology ............................................................ 18 Personnel ........................................................................................................................................................................ 18 Data analysis ................................................................................................................................................................ 20 End-‐user training ........................................................................................................................................................ 21 Networking initiatives .............................................................................................................................................. 22 Charge .............................................................................................................................................................................. 23
Conclusions ................................................................................................................................................................... 23
Recommendations ................................................................................................................... 24 Which technology platforms are needed? ....................................................................................................... 24 Improving access to NGS technologies and integrating NGS laboratories in national and transnational programmes on RD ...................................................................................................................... 24 Complexity of data analysis or bioinformatics .............................................................................................. 24
4
Personnel ........................................................................................................................................................................ 25 Funding ........................................................................................................................................................................... 25 End-‐user training ........................................................................................................................................................ 26 Computing power / data storage ........................................................................................................................ 26 Access ............................................................................................................................................................................... 26
“Plateforme Mutations”: Case study of an NGS funding initiative ........................... 27 A public-‐private partnership ................................................................................................................................. 27 Services offered ........................................................................................................................................................... 27 Calls for projects ......................................................................................................................................................... 27 First conclusions ......................................................................................................................................................... 28
Annex 1: Questionnaire -‐ Which platforms do researchers need?
Annex 2: Questionnaire -‐ Next generation sequencing platforms
Annex 3: Newsletter to thank participants to Questionnaire -‐ Next generation sequencing platforms
5
Introduction
Importance of technology for research on rare diseases Technology and techniques used in the research or diagnostic laboratory evolve at great pace and to be at the forefront in a certain field goes hand in hand with keeping abreast of the latest technologies and techniques. This is valid for all research fields and in the case of research on rare diseases it is not difficult to imagine the enormous impact technological advances may have.
Research on rare diseases (RD) is characterized by few and scattered resources in general and thus any technological advance enabling e.g. to use less sample, to lower cost/sample, to diminish hands-‐on time will leave more of greatly-‐needed resources available to advance research. Diagnosis for a lot of RD patients is often very difficult, but the development of DNA analysis techniques that are being refined and adopted by diagnostic laboratories today has greatly simplified the task for diseases with known DNA defects. Also, the availability of biobanks and sample collections of patient cohorts that are very well characterized, make certain rare diseases highly interesting “test cases” for new technologies/techniques.
Technology Platforms “Technology platforms” is very generic terminology, used in various contexts. Thus a clear delineation of the technology platforms as they were considered here is crucial for a good understanding of the objectives of this paper and the strategic recommendations issued at the end. This paragraph is meant to situate the platforms studied and provide a description of the various embodiments of technology platforms, which prove to be quite heterogeneous not only throughout, but also within European countries.
What? Technology platforms can generally be defined as “the association of equipment, know-‐how and human capacities at the same site with the aim of offering high-‐level technological support to a community of users” (GIS-‐IBiSA website http://www.ibisa.net/charte.php accessed on Sep 23, 2009).
For this paper, the technology platforms studied were both not-‐for-‐profit and commercial organisations (privately owned companies offering services to clients from public or private institutions), although the emphasis was on not-‐for-‐profit structures as it became clear that the research community studied more readily uses these.
How are technology platforms financed? Not-‐for-‐profit technology platforms are financed through one or more of the following resources:
- Infrastructure investments of institutions like universities, hospitals, research institutes
- Infrastructure development grants from a regional or national government
- Research grants
- Charity funds
In a classic scheme infrastructure development funds used for the initial set-‐up of these technology platforms, would gradually shift to financial support through grants for research projects or projects that drive development of new technologies and platforms would further function by cost-‐recovery.
6
Activities Their activities fall under one or more of the following categories:
- Customer support/consultancy services: e.g. to give advice on procedures, methods and resources.
- Technical services and assistance
- Research-‐based collaboration: platform enters as a partner. Funding for collaborative research performed by platform partners must be provided unless otherwise agreed. The result of joint research shall be published with joint authorship.
- Training
Objectives The work undertaken during the E-‐Rare project was aimed at facilitating the links between scientists working in the field of rare diseases and technology platforms (high throughput genotyping, molecular screening, proteomics, animal models, stem cell institutes etc.) that could provide resources and/or expertise to those teams. The final goal was to explore potential routes for integration of these technology platforms into a European transnational funding programme for research on rare diseases.
The objectives of this paper are:
- To provide an overview of the organization and functioning of technology platforms and types of technologies
- To provide insight in RD researcher’s needs for technology platforms and study the roadblocks hindering researchers to make use of the expertise and resources at technology platforms.
- To provide a detailed analysis of next generation sequencing platforms today and their involvement in RD research
- To formulate recommendations for optimizing the use of technology platforms, more specifically next generation sequencing platforms, in RD research and enabling RD researchers to fully exploit the expertise and resources that are being made available at technology platforms throughout Europe
- To provide a case study of a technology platform dedicated to RD
7
Which platforms do RD researchers need?
Focusing on strategic priorities and bottlenecks Research on rare diseases comprises all areas of expertise ranging from epidemiology and genetics to therapeutic research and thus no technology platforms can be considered “rare disease specific”. However, to pinpoint potential gaps and difficulties encountered by this community while making use of technology platforms, the approach was taken to focus on those technologies that the RD community considers as strategic for the development and achievements of research in RD.
Researchers involved in projects sponsored by RD funding programmes were surveyed by means of a short qualitative questionnaire. The questionnaire was aimed at determining which technology platforms they were using, the bottlenecks they experienced and which technology platforms will play a strategic role for RD research in the coming years.
The results served as an input for the in-‐depth study of 2 types of technology platforms: next generation sequencing and high-‐throughput small molecule screening platforms (Deliverable D5.5 Part B).
Method
A short web-‐based questionnaire was developed. The questionnaire contained 4 multiple-‐choice questions, except for the last question where the number of answers was limited to 2 (Annex 1). The link to the questionnaire, together with an invitation to participate, was sent in May 2009 to 578 researchers who had applied for RD research funding programmes. More than 94% (545 researchers) of the target group had applied for the E-‐Rare joint transnational call 2009, 9 had applied for the E-‐Rare joint transnational call 2007, 9 had applied in 2009 for the French programme “Plateforme Mutations” which is focused on next generation sequencing and the remaining 15 had applied for a French CGH-‐Array funding programme. The questionnaire was anonymous. Registration of the IP address allowed us to verify that respondents participated only once. One reminder was sent 10 days after first invitation. The questionnaire was closed and answers analysed 28 days after the first invitation.
Results Response Rate One hundred and fifty-‐three (153) responses were collected, which equals an
overall response rate of 26%. Respondents mostly worked in Germany (45), France (34), Italy (24), Spain (18) and the Netherlands (12) (Figure 1). Twenty (20) respondents were working in countries that were less represented: Austria (3), Greece (4), Israel (4), Portugal (6), Turkey (2) and the U.K. (1). For analysis, answers from these less represented countries were grouped together in the category “Other countries”. Response rates varied per country: France 24%, Germany 28%, Italy 32%, Spain 56%, The Netherlands 16% and “other countries” 21%.
Figure 1. Number of questionnaires sent and responses received per country.
8
Respondent’s expertise Overall, a majority of respondents (61%) indicated that they were researchers, a very small group (4%) was working as a clinician only and 35% had both research and clinical activities. Per country analysis as illustrated in Figure 2, showed that this distribution of activities was roughly identical for all countries, except for France, where only very few respondents also had clinical activities. As shown in Figure 3 left panel, forty-‐one and 32% of respondents had expertise in genetics and pathophysiology, respectively. Other areas of expertise were diverse and differed by country, as illustrated in the right panel. Of note, in all countries few respondents are involved in clinical research/trials.
Figure 2. Activity domain of respondents.
In summary, the composition of the population that participated in the survey reflects the heterogeneous backgrounds of researchers active in the field of rare diseases. About one third of the participants has clinical experience. There seem to be small country-‐by-‐country differences in research expertise, which may need to be taken into account when it comes to the use of specific technology platforms.
Platforms used today To situate the use of technology platforms in research on rare diseases, researchers were asked which of 8 platforms they were currently using, and they could indicate other platforms in a free text field. Microarray facilities were by far the most widely used technology platforms. Of all participants, 47% was using microarray technology platforms (Figure 4). Three other types of technology platforms were used by almost 1 in 5 researchers: facilities for the creation of animal models, proteomics and imaging facilities. Other technology platforms were used sporadically. Remarkably, 25% of researchers did not use any type of technology platform.
Figure 3. Area of expertise all countries confounded (left panel) and distribution of expertise per country (right panel)*.
*Multiple answers were allowed.
9
Per country analysis (Figure 5) indicates that more French and Dutch respondents made use of at least one technology platform, in particular microarray and animal model platforms. Use of proteomics platforms varies from one country to another, with about one in four respondents from Germany, The Netherlands and “Other countries” using proteomics while this rate was much lower for the rest of the countries analysed. It should be noted that the Italian sample population includes mostly researchers working in large not-‐for-‐profit research institutes and thus results for Italy may be biased as a large part of RD research work in this country is also performed university hospitals.
Figure 4. Technology platform use (May 2009)
Bottlenecks identified For the question “What are today the bottlenecks you encounter in using high/medium throughput technology platforms” respondents were presented with 6 predefined items plus a button for free text and were free to indicate as much items as they thought necessary. Figure 6 shows the results for all countries confounded (left panel) and the distribution of answers per country (right panel).
Figure 5. Technology platform use per country
10
To this question, responses were remarkably unanimous: more than half (52%) of the respondents indicated «complexity of data management/bioinformatics» to be a bottleneck. “Difficult access conditions” and “absence of training” are mentioned about half as frequently by 22 and 19% of the respondents, respectively. Results were similar for all countries analysed, except for training, which was not considered a bottleneck in the Netherlands. Remarkably, 3 of the 6 predefined items were never pinpointed as bottlenecks, these were « Lack of information/Visibility », « High cost » or « Scarcity of technology platforms ».
Strategic Platforms Finally, respondents were presented with the same 8 platforms as above and were asked to « Choose 2 technology platforms [they considered] to be of strategic importance for [their] research in the coming years ». The item « other » allowed them to define other platforms by means of a free text field. As shown in Figure 7, three technology platforms were chosen most frequently : Second generation sequencing (52 times), the Creation of animal models (51 times) and Microarrays (47 times).
Figure 7. Strategic technology platforms*
A by country analysis allowed to reveal a number of country-‐specific differences as shown in Figure 8. Second generation sequencing platforms are clearly considered to be strategic in France, Italy and the Netherlands, whereas they are not considered strategic in «Other countries». This might be explained by the fact that the group « Other countries » mainly consists of respondents from countries were this technology is largely unavailable as yet (e.g. Turkey, Portugal, Greece...) and thus might be less known.
Creation of animal models is considered to be a strategic platform by researchers from all countries except Italy. Italian researchers give priority to second generation sequencing and bioinformatics
*Frequency = absolute number of times a certain platform was chosen
Figure 8. Strategic technology platforms per country
11
platforms.
With regards to microarrays opinions diverge, researchers from Germany and « Other countries » consider these platforms to be the most strategic, whereas for French researchers microarrays figure among the least strategic platforms. A possible explanation might be that since microarrays are already widely adopted by French researchers they are less considered as a platform that will be « strategic » for the coming years.
Least indicated as strategic are drug screening platforms, stem cell technology platforms and « Other » platforms. To put this into perspective: 61% of the respondents does not have clinical experience and that seems to be reflected in the choices of platforms above. Indisputably, therapeutic research for rare diseases such as drug discovery and the development of innovative therapies is extremely important and needs to be encouraged and platforms such as drug screening or stem cell facilities have an important role to play in such research.
Conclusions Overall three technologies stand out as a strategic priority for researchers in the RD field : second generation sequencing (further indicated as next generation sequencing since this technology is evolving very rapidly), creation of animal models and microarrays. In contrast to the first 2, the use of microarrays is widespread: 47% of respondents already use microarrays today versus 22 and 8% for animal model and second generation sequencing platforms, respectively.
We have identified a few factors that keep RD researchers from fully making use of the available technology facilities today: a major hurdle is the bioinformatics or the complexity of data management associated with high-‐throughput technologies. In second place, also the access conditions and a lack of training for the end users do impede collaborations with technology platforms.
It should be kept in mind that these results reflect the needs and views of a research community that is predominantly implicated in basic research, with only 1 in 6 and 1 in 5 respondents indicating preclinical drug development and clinical research/studies, respectively, as their main area of expertise.
12
Next generation sequencing platforms
Aims and Approach Of the 153 researchers surveyed in the first questionnaire 13 were using Next Generation Sequencing (NGS) facilities, with a large majority working with not-‐for-‐profit laboratories: 11 researchers versus 2 using NGS services of a company. NGS platforms were identified as highly strategic but scarcely used therefore it was decided to study the laboratories that have NGS technology and the hurdles for accessing this technology in more detail. This study focusing on laboratories performing NGS, allowed us to explore topics like nature and level of collaborations, access conditions etc. in detail and to formulate recommendations for RD research funding accordingly.
Over the last years, the number of European initiatives on research infrastructures or aiming at gathering data on technology platforms has multiplied, such as for example the EC-‐funded ERA-‐Instruments, EATRIS and ELIXIR projects. More specifically in the area of rare diseases, Orphanet has obtained financing for the FP7 project “RDPlatform” that includes, among others, the development of an additional tab in the Orphanet database for “Technology & Know-‐How”. This resolves to establishing a more or less comprehensive database of technology platforms that can be used by RD researchers.
Several of the initiatives mentioned above also foresee to look at certain aspects of the NGS landscape within the European Research area. Therefore, in order to avoid overlap and in an effort to create synergies E-‐Rare set up collaboration with 3 partners to jointly study NGS laboratories, these partners were 1) ERA-‐Instruments (ERA-‐NET on funding of life science research infrastructure); 2) gENVADIS (European network of medical research laboratories using NGS) and 3) RDPlatform (FP7 support action creating a set of tools for European RD researchers).
The common goal of the 4 partner organisations was two-‐fold: 1) create a database of laboratories that are performing NGS throughout the European Research Area (see Deliverable 5.3 Part A: Catalogue of Next generation sequencing platforms) and 2) establish a “state of the art” of NGS laboratories covering the partners’ interest fields e.g. sequencers, applications, personnel, accessibility for “external” researchers, (transnational) collaborations etc.
Method An on-‐line questionnaire was elaborated to gather all data. Due to the various interest domains of the 4 partners, the questionnaire was quite extensive (about 45 questions with dynamic answer options-‐ Annex 2). The questionnaire contained 7 sections, treating different topics :
Section 1: General Information: contact details, website…
Section 2: Laboratory activities and technology: sequencing, data analysis, user training…
Section 3: Personnel: number, constraints…
Section 4: Access: research domains, type of projects…
Section 5: Collaborations: participation in large sequencing initiatives, national networks…
Section 6: Financial resources
Section 7: Public display agreement
Section 1 was the only section that was mandatory. Questions were mostly matrix-‐type or multiple-‐choice. Free text fields were foreseen for comments throughout the questionnaire.
In order to achieve a high-‐quality and informative database, it was essential to reach as many NGS
13
laboratories as possible. A comprehensive list with contact e-‐mails of laboratories performing NGS throughout ERA was compiled based on 1) data from national funding agencies; 2) data from national networking/technology initiatives (e.g. IBiSA in France: http://www.ibisa.net) 3) web-‐based research 4) personal contacts from members of the gENVADIS consortium. In October 2009, one hundred and thirty-‐two (132) laboratories were invited by e-‐mail to participate in the survey. A reminder was sent in November 2009. Throughout the following months laboratories were reminded by personal mail and telephone contacts until the questionnaire was finally closed in February 2010. In the beginning of April 2010 a Newsletter (Annex 3) was sent to thank the participants and present them the first results.
Before analysis, it was ensured that there were no multiple responses from the same laboratory or sequencing facility. Out of 76 respondents, 25 did not provide an answer to all questions. The information they did provide however is included in the final dataset. Throughout the analysis it is indicated how many laboratories responded to each question.
Results Respondents and Response Rate Seventy-‐six (76) laboratories from 13 countries within the European Research area participated in the questionnaire and filled out at least section 1, which was the only mandatory section. (response rate = 58%). Fifty-‐one respondents from 12 countries filled out all sections (Figure 9).
Figure 9. Response rate and responding countries
Figure 10. Participating NGS laboratories
About two-‐thirds of the responding laboratories were academic research groups or sequencing facilities. Regional/National sequencing institutes, non-‐profit research organisations and companies each represent about a fifth of all respondents (Figure 10). This distribution remains almost identical for all questions throughout the questionnaire.
14
The responding NGS laboratories that are not privatly owned are run with financial means coming from national or regional funding in the first place, followed by funding from the host institution (universities, university hospitals, research institutes…) and researchers paying for services slightly ahead of European funding.
Between 1 and 3 years is the period for which two thirds of the respondents have been performing NGS. One third only started in the past year, while the other third has been carrying out NGS for more than 3 years. A per country analysis of countries with at least 5 responses indicates that laboratories with the longest experience (>3 years NGS activity) are mostly from Germany, while responding laboratories from Spain often are more novice in NGS (<1 year NGS activity).
NGS equipment in responding laboratories Not taken into account the Wellcome Trust’s Sanger insitute in Hinxton, where 39 sequencers are operational, the NGS laboratories in this survey have on average 1,5 sequencers. Roche’s 454 Genome Sequencer is the most widespread technology (41 out of 66 responding laboratories) and most of its users have access to only 1 machine. Illumina’s Genome Analyser is also widely used (28/66), and in contrast to the 454 Genome Sequencer almost half of its users have more than 1 machine. At the time of the survey, Helicos’ Heliscope and Illumina’s Hiseq sequencers were each found in 1 of the surveyed laboratories only (Figure 11).
Seventeen laboratories (17/66) indicate that they intend to acquire another NGS equipment on the short to medium term (median 6 months). Only 1 of these is planning to buy more than 1 sequencer. At the time of the survey Illumina’s GA was the technology of choice for most laboratories, but this will likely evolve quickly as newer technologies have made (e.g. Illumina HiSeq) or are about to make it to
Figure 11. NGS technologies in use (February 2010)*
*The colour code indicates how many sequencers of a specific type are present in the same laboratory
Applications The great majority of laboratories use their NG sequencer for transcriptome analysis (49/68) and whole genome sequencing (48/68) or mutation detection (45/68 with amplicon sequencing; 42/68 with enrichment strategies). Epigenetic or metagenomic studies are currently less frequent, but still more than 1 in 3 laboratories (24/68) reportedly has experience with these applications (Figure 12).
15
Not unsurprisingly the number of applications that are operational within a laboratory is correlated with the period that laboratory has been performing NGS (Figure 13).
Bottlenecks for implementing new applications Respondents indicate that the number of personnel for data analysis (i.e. bioinformaticians) and funds, for equipment but also consumables, are the main bottlenecks that keep them from developing and implementing new sequencing applications in their lab (Figure 14). The number of technical personnel is also a limitation, but most respondents qualify this as a minor bottleneck. The development of new applications does not appear to be limited by the necessary training of personnel or the number of sequencers available.
A per country analysis shows that certain bottlenecks are perceived differently in different countries: In Italy and the U.K. for example, only 1 in 5 indicates funds to be a major bottleneck while in other countries more than double perceive funds as a major bottleneck. For Italy, the number of personnel to conduct data analysis is clearly the main
Figure 13. Number of applications
*Respondents were asked to attribute a « score » to each point (major, minor or no bottleneck). The vertical axis indicates the frequency of the answer.
*Multiple responses were allowed
Figure 14. Bottlenecks for introducing new applications*
16
impediment for adopting new applications.
NGS laboratories conduct own research In the group of most represented countries (>= 5 respondents) relatively more French laboratories are founded specifically to perform large-‐scale sequencing (9/15), whereas among Dutch and German respondents those laboratories are the exception (respectively 1/11 and 2/14). This may very well reflect a true difference in research funding policies between France on the one hand, where sequencing efforts have been centralized, and Germany and the Netherlands on the other hand, where NGS equipment has been acquired by individual laboratories.
Overall, the large majority of responding laboratories are active in basic, medical or genomic research. Almost half of the respondents indicate that technology development is also one of their main activities (Figure 15).
Access to the NGS laboratories
Who uses NGS equipment?
Laboratories have typically been equipped with NGS technology with the aim of offering high-‐level sequencing support to a community of users that is more or less wide. The largest group i.e. half of the laboratories say they are open to any researcher without distinction from academia or a company (27/54) (Figure 16 left panel). In half of the responding laboratories access is however limited to specific user groups. Overall, one in 4 is open to academic researchers only (13/54). Some laboratories a priori do not deliver NGS services to researchers from outside the host institution (6/54) or from another
Figure 15. Main research focus of respondents*
Figure 16. Access to the NGS laboratory*
*Please note that a response was not obligatory and respondents could indicate multiple user groups as a “majority”.
*Multiple answers were allowed
17
country (7/54). Access to the NGS technology in the responding German laboratories appears to be more restricted: only 1 in 7 is –in theory-‐ “open to all” versus more than 1 in 2 in France or the Netherlands. To be able to appreciate whom the NGS equipment in the survey really serves, we then asked to estimate the relative share of each user group. It appears that in practice the large majority of users are researchers from within the host institution or academic researchers from the same country (Figure 16 right panel). About 2 in 3 of the responding laboratories declare that they have few or no academic researchers from another country as a client. Thus in practice it appears that here is no or only very limited transnational opening of NGS laboratories. The majority of NGS laboratories (27/46) do not have companies among their customers.
Collaborations or service? About two thirds of the NGS research laboratories (32/54) are set up to perform sequencing not only in the framework of research collaborations, but also as a service in exchange for a fee (Figure 17). About half of the time, sequencing projects, be they research collaborations or projects carried out as a paid service, will undergo some sort of selection procedure (e.g. selection by a scientific committee, call for projects…).
Consistent with what was said above, ¾ of the NGS laboratories declares that the majority of projects they actually conduct are internal research projects and collaborations. Of the laboratories that offer the possibility of performing sequencing as a paid service (32/54), a little more than half says paid services represent the majority of the work, in one fourth it concerns only a minority or few of all sequencing projects they do and the remainder of these laboratories never actually provided paid sequencing services to clients.
In summary, while ¾ of NGS laboratories are set up to allow access to academic users from any country, few researchers cross national borders to have their samples sequenced. The research going on in NGS laboratories is mostly “internal” research or research conducted in collaboration with academic users from the same country.
Number of RD projects conducted in NGS laboratories We wanted to quantify the number of RD projects that are being done in the responding laboratories. About half of the responding NGS laboratories have never conducted a project on RD. Of the laboratories that did RD research in the 3-‐year period between January 2007 (when NGS became available) and February 2010, more than half conducted 1 to 5 projects. Six NGS laboratories did 6-‐10 projects and 4 conducted more than 20 projects on RD (Figure 18). These last 4 NGS laboratories (* Exeter Sequencing Service/University of Exeter, Genoscope CEA/CEA Evry, Plateforme Mutations/CEA Evry and Centre for genomic research/University of Liverpool) together carried out more than half of all 220 projects on RD. When considering these numbers it should be taken into account that RD projects are not specifically long (order of magnitude: a few months), because they usually concern only a limited number of samples and a target region in the order of Mb.
Figure 17. Collaborations or paid service*
18
Twenty-‐four of the 220 RD projects can be qualified as “transnational” in the sense that they involved scientists from different European countries and 12 of these were carried out by 1 NGS laboratory (Exeter Sequencing Service/University of Exeter). Thus, sequencing for RD research seems to be concentrated in a few laboratories in Europe and transnational collaborations for NGS in this area of research are not common. In conclusion, only a limited amount of NGS laboratories’ resources appear to be dedicated to RD research.
Bottlenecks for external researchers to access NGS technology Next, we assessed potential roadblocks encountered by researchers without NGS equipment to access NGS technology, either through collaboration or as a service. From the NGS laboratories’ point of view there are 2 major bottlenecks to perform research for « third parties »: the number of bioinformaticians and the number of technical personnel they dispose of. Other important bottlenecks are the development of data analysis methods, the large amount of time spent on internal projects and the lack of funding behind the “third parties’” research proposals. Instrument capacity seems to be a bottleneck in about half of the responding laboratories, while it does not hinder the advancement of research projects for the other half.
Concerning RD projects in particular, NGS laboratories indicate that a lack of funding of the RD teams is the main bottleneck specific for this type of research. The potential difficulty of statistical analysis and lack of funding for RD projects at the NGS laboratory are also considered as bottlenecks albeit minor ones.
Overall, the major roadblocks encountered by NGS laboratories seem to be situated in the area of personnel and the process of data analysis. In order to better understand the difficulties encountered by NGS laboratories, further analysis focuses on these topics.
Personnel
Personnel dedicated to NGS activities The median number of personnel in the responding laboratories (57) is 6 in total: 2 technical staff, 2 bioinformaticians, 1 “other scientific personnel”, 0 students/post-‐doctoral students, 0 administrative personnel and 1 other personnel mostly server/system support or sales support. Figure 19 illustrates the distribution of personnel number per country
Figure 18. Number of NGS laboratories with RD projects (January 2007 – February 2010)
Figure 19. Total number of personnel per laboratory*
*the line indicates the median
19
Lack of personnel All countries confounded, a majority of laboratories claim to lack bioinformaticians (46/57), consistent with the results shown in Figure 14. Respondents were asked to indicate the reasons: half of them attribute this to a lack of qualified applicants and the other half to a lack of funds. The situation is clearly different for technical personnel (lacking in 33/57 laboratories) where a lack of funds is most invoked (26/33). This picture can differ locally as illustrated in Figure 20. As a respondent rightfully remarks it should be kept in mind that the personnel bottleneck has also to do with the impossibility to automate certain lab tasks (e.g. library prep, target enrichment) for the moment. About 1 in 4 respondents also expresses a need for other scientific personnel (17/57) or students/post-‐docs (14/57).
Potential solutions Respondents were presented with a list of potential solutions to solve the lack of applicants and were asked to pick those options they considered effective. Throughout all job categories, ameliorating the financial aspect (salary/grant) as well as orienting students to specialized courses/organising postgraduate courses, were the 2 preferred solutions (Figure 21). In the free text field the need for specific, in-‐house training for bioinformaticians was brought up, as well as the problem of visibility and
Figure 20. Reasons why personnel is lacking*
*Only countries with at least 4 respondents are included in this figure
N° Responses : 57
*A response was optional, the number between brackets indicates the number of times the option has been ticked
20
Data analysis
Infrastructure and personnel involved in NGS data analysis Respondents were asked which infrastructure and personnel for data analysis they had for NGS in their laboratory: almost all laboratories have servers and a system engineer (60/65) and personnel for development or customization of software tools for data analysis (58/65), 1 in 4 does not have commercial software (49/65) or personnel for training end users in data analysis (45/65).
The infrastructure and personnel for data analysis is often not available to users from outside the laboratory. As shown in Figure 22 clear differences exist between countries with respect to the organisation of data analysis for external users (only countries with at least 4 responding laboratories were taken into account). It appears that German NGS laboratories are not set up to deliver NGS data to researchers external to the facility, whereas in France, more than half of the facilities do dedicate infrastructure and/or personnel to NGS data analysis for external users. Laboratories that dedicate resources to third party data analysis most frequently do so through the training of end users in data analysis.
Analysis is done ad hoc There are no standard or common data analysis procedures for data delivered to third parties. The depth of data analysis depends on the NGS laboratory and the specific NGS application. For any specific application, about a third of laboratories will provide external users with the data together with analysis software for the specific application. Another third will deliver the data in genome browser format (except for mutation detection or structural variation detection – where often the data will be a list of variations with respect to a reference sequence).
Bottlenecks for data analysis About half of the responding laboratories indicate that the number of personnel for data analysis (30/60 respondents) and the number of personnel for development or customization of analysis software tools (27/60 respondents) are major bottlenecks for data analysis (Figure 23). Almost always both go together. Of note, these personnel needs may be somewhat less pressing in the Netherlands, where about 1 in 2 responding laboratories indicates these are no bottleneck, and in Belgium, where respondents consider them as a minor bottleneck only. The
Figure 23. Bottlenecks for the analysis of NGS data.
Figure 22. Availability of data analysis infrastructure/personnel for external users*
*Only countries with at least 4 respondents were included in this figure
21
limited capabilities of end users to analyse data autonomously are also considered an important bottleneck for data analysis by the NGS laboratories (major bottleneck for 24/60 respondents), followed by the number of personnel for the training of end users in data analysis (major bottleneck for 17/60 respondents). All responding laboratories in Italy (6) and the U.K. (5) consider data storage and data management as a bottleneck.
How to facilitate data analysis? When asked to indicate which potential solutions to the bottlenecks for data analysis are effective, respondents most frequently choose increasing collaborations with other NGS laboratories to develop the knowhow and (42/60) and hiring more personnel for data analysis (41/60) (Figure 24). These are followed by solutions at the level of the end users, solutions at the analysis software level and more collaboration with technology providers. Remarkably, cloud-‐computing and personnel dedicated for end-‐user training are chosen by only 1 in 5 respondents. Solutions brought up by the respondents in the foreseen free text space fell into the category of end user training or an increase in server capacity.
Consistent with the diverging opinions on certain bottlenecks, opinions may differ on efficacy of certain solutions:
• In Belgium, none of the 4 respondents believes that hiring more personnel for data analysis or software development would be an effective solution
• Corresponding to their specific data storage and server management needs, 4/5 of respondents from the U.K. believes hiring more personnel for data storage and server management is an effective solution for facilitating data analysis versus around 1/5 for other countries.
End-‐user training Out of 62 responding laboratories, 23 organise end-‐user training. Topics range from technology or specific applications (13), data analysis (13), specific bioinformatic tools (13) and experiment design (10) to genetics and genomics of specific organisms (2). In the group of 7 countries with at least 4 respondents it appears that the number of courses is particularly low in Germany (1/10) and Belgium (0/4).
Figure 24. Effective ways to tackle the bottlenecks for data analysis.
22
Networking initiatives More than half of the responding laboratories (33/49) are engaged in some type of networking initiative, mostly national federations or partnering networks (24/49). Less than a third (15/49) is active in European initiatives such as ELIXIR, BBMRI etc.
Some interesting differences emerge when the data are analysed by country (Figure 25). Only countries with at least 4 participating NGS laboratories were considered for this analysis i.e. Germany, France, The Netherlands, Spain, Italy and the United Kingdom. Globally, participation in any type of networking initiative is less frequent in Italy and Spain, while in the United Kingdom, on the contrary, all 4 responding laboratories are involved in some type of collaborative initiative. Except for laboratories in Italy and Spain, at least half of the NGS laboratories is engaged in national federations or partnering networks. The participation in EU initiatives seems to be rather low in France compared to the other countries, but we can not rule out that this might be due to differences in interpretation of the term « European initiatives ». In its most strict sense this could be interpreted as large scale infrastructure initiatives such as EATRIS, BBMRI, etc. only, while in a broader sense it can contain all research consortia supported by the EC under FP7.
Involvement in one or more networking initiatives did not seem to be correlated with the number of personnel working in the laboratory.
Figure 25. Involvement in networking initiatives*
(7)
*The number between brackets indicates for each country the total number of laboratories that responded to the question.
(9) (12) (5) (6) (4)
23
Conclusions Today only a minority of NGS facilities throughout Europe function as a “platform” or service provider stricto sensu. Facilities with NGS equipment are also pursuing their own research goals and a large volume of samples is sequenced in the framework of collaborations. Although in theory most NGS laboratories are open to all academic researchers, they mostly serve researchers from within their own host institution or country. Few samples appear to cross national borders to be sequenced. Altogether only few RD projects have been conducted in the responding laboratories since the beginning of NGS, this RD research has been concentrated in a few laboratories.
The number of sequencing projects is limited by the tremendous effort -‐in terms of time and money-‐ required for data analysis and for preparation of the samples. The NGS field lacks bioinformaticians and in this competitive context, attractive salaries are thought to make the difference. There is a strong need for training of bioinformaticians and technical personnel, but also end-‐users. As procedures for data analysis are in constant development and customization is the rule rather than the exception, collaborations between NGS laboratories, but also with end-‐users and technology and software providers are a necessity.
Figure 26. Type of charge
Charge Most academic laboratories/local sequencing facilities operate on a cost recovery basis i.e. academic customers are charged for consumables cost only, or for consumables and part of the personnel cost. Industrial customers will however pay an extra service fee. A few academic laboratories/local sequencing groups provide the possibility to apply for a grant to conduct the NGS project at no charge. Figure 26 illustrates the ways in which different organisations charge their customers.
24
Recommendations
Which technology platforms are needed? The survey conducted among 153 RD researchers points to the strategic role for technologies like next generation sequencing, the creation of animal models and microarray analysis. The results indicate that the entry barrier to these strategic technologies would be lowered if RD researchers could benefit from assistance with the complex task of managing and analysing the huge amount of data produced by high throughput technologies. Less stringent access conditions and more technology training would also encourage RD researchers to explore these technological possibilities. How these needs can be addressed by funding agencies is explained in more detail below.
Limitations of the findings: the 153 RD researchers surveyed in the framework of the E-‐Rare programme are working in academic laboratories and are mainly involved in genetics and physiopathology research, 39% of the respondents have clinical activities.
Improving access to NGS technologies and integrating NGS laboratories in national and transnational programmes on RD As can be learned from the combined results of the surveys above, researchers and NGS laboratory heads/managers point in the same direction regarding the main bottlenecks for accessing and making optimal use of existing NGS facilities, such as notably the difficulty of data analysis and the lack of personnel/researchers with NGS experience.
The analysis above already mentions potential solutions for a number of bottlenecks identified by NGS laboratories. Here we propose, based on the respondents’ input, several potential “avenues” for funding agencies to improve the opening of NGS laboratories and render NGS more accessible, especially for RD researchers. These recommendations are organised per topic as in the analysis above, although it should be clear that a lot of these topics are intertwined, e.g. lack of personnel and lack of funds.
Complexity of data analysis or bioinformatics Storing, treating and standardising the tremendous amount of data generated by high throughput technologies such as NGS is one of the greatest challenges in the research field today. A number of EU countries (13 at the time of writing), through the ELIXIR project (European Life Sciences Infrastructure for Biological Information), are reflecting on the creation of a sustainable infrastructure for biological information, thus encompassing NGS data. The UK, Denmark, Sweden and Finland have already committed funds for the future pan-‐European infrastructure. It leaves no doubt that this infrastructure will play a major role also in developing and disseminating data analysis methods for NGS. Funding at the level of the NGS laboratories or researchers may very well complement such high level commitments.
Several strategies could be envisaged, aiming at:
1. Stimulating the development of NGS data management and analysis methods at NGS laboratories
• Stimulate collaborations (e.g. through grants) between NGS laboratories: this was the preferred scenario of NGS laboratories.
• Stimulate collaborations (e.g. through grants) between NGS laboratories and software providers
25
• Grants to hire bioinformaticians • Training grants for bioinformaticians
Specifically for rare diseases
• Grants to hire bioinformaticians dedicated to RD projects • Stimulate development/implementation of a new technology/application in a NGS laboratory
by funding RD project as a test case • Grants for the organisation of a meeting/seminar dedicated to RD topics • Promote E-‐Rare call towards bioinformatic research groups in order to encourage them to set
up collaborations with RD groups and submit a proposal
2. Improving/developing NGS data analysis know-‐how within RD research groups
• PhD/post-‐doctoral scholarships for bioinformaticians with RD research projects
• Grants to hire bioinformaticians within RD research groups
• Stimulate proposals with bioinformatic goals within E-‐Rare call, properly evaluate projects from bioinformatics angle
• Bioinformatics summer school grants
• Grants for the organisation of a NGS meeting/seminar
• Technology/infrastructure grants for RD research groups to invest in bioinformatic tools
Personnel The NGS facilities in this survey typically employ 6, of which 2 bioinformaticians and 2 technical personnel. There are typically no students or post-‐docs on the payroll. In almost all laboratories there is a shortage in bioinformaticians and in half there is a shortage of technical personnel. This is not only attributed to a lack of financial means, but also to the challenging task of finding the right applicants.
Several strategies could be envisaged, aiming to establish:
1. More attractive working conditions
• Grants to hire bioinformaticians/technical personnel
• Possibility of offering bioinformaticians salaries competitive with those in industry
• Training/mobility grants for bioinformaticians
• Training/mobility grants for technical personnel
2. Increasing number of scientists and technicians specialised in NGS
• PhD/post-‐doctoral scholarships
• Grants to organise NGS courses
Funding Next generation sequencing technologies have dramatically lowered the cost per base sequenced. However, NGS today remains a very costly technology, not only in terms of the initial investment in equipment, but also in terms of consumables and hands-‐on time of skilled personnel.
In addition to sufficient structural means for example for hiring trained personnel or organizing training for end-‐users, a number of specific/punctual funding initiatives could also be envisaged:
26
Specifically for rare diseases
• Grants for (transnational) RD research proposals using NGS technology: during evaluation the elevated costs of NGS consumables should be taken into account
• Grants for developing/implementing new NGS technology/applications in the framework of an RD research project
End-‐user training Both researchers and NGS laboratory heads/managers are convinced of the importance of end-‐user training. Not only would better knowledge of the NGS technology lower the entry-‐barrier to the NGS laboratory, but -‐according to NGS laboratory heads-‐ it would also facilitate the challenging task of data analysis since it would be easier to share the analysis between the NGS laboratory and the end-‐user (researcher).
Specific/punctual funding initiatives could be envisaged in order to improve researchers’ knowledge of NGS:
Specifically for rare diseases • Grants for RD researchers to attend technology meetings/summer schools • Grants for RD researchers to attend specific training sessions organized by platforms • Support for platforms (network) to organize (international) training sessions for RD groups
Computing power / data storage The NGS laboratories in the UK and in Italy particularly express the need for computing power/data storage infrastructure, whereas NGS laboratories in other countries do not (or not yet?) seem to be confronted with this problem. Other European initiatives such as ELIXIR and EGI (www.elixir-‐europe.org) indeed anticipate an explosive need for computing power generated by the adoption of high throughput technologies in various domains. These initiatives may also propose ways to optimize/mutualise computing power for example for treating NGS data.
Access Today, access to NGS technology is limited because NGS laboratories only have limited personnel to handle samples and analyse the data. In addition, a number of research projects cannot be carried out because these do not dispose of sufficient financial support to carry out NGS analyses. Potential solutions to these bottlenecks have been discussed above.
A potential route for maximizing access to NGS technology is the opening of NGS laboratories for projects from researchers abroad. This is virtually not done today. To achieve more international cooperation one could envisage:
• Wide dissemination of information on NGS laboratories throughout Europe: see e-‐catalogue on E-‐Rare website (www.e-‐rare.eu)
Specifically for rare diseases • Grants for (transnational) RD research proposals using NGS technology
27
“Plateforme Mutations”: Case study of an NGS funding initiative
A public-‐private partnership In January 2009, the GIS-‐Institut des Maladies Rares –a French funding body and E-‐Rare Partner-‐, together with INSERM -‐the French National Institute for Health and Medical Research-‐ and the Centre National de Séquençage/Genoscope –a French publicly funded NGS facility-‐ together created the “Plateforme Mutations” a platform for the discovery of mutations implicated in human monogenic diseases (hence most often rare diseases). The next generation sequencing technologies available at this platform (Roche 454 Ti, Illumina GA, Illumina Hiseq 2000 –expected in 2011-‐ and SOLiD) enable the identification of allelic variants that are difficult to identify with classical approaches (small number of patients, locus heterogeneity, isolated cases, difficulty to determine mode of inheritance etc.).
Each partner contributes significant resources:
GIS-‐Institut des Maladies Rares: employs 2 bioinformaticians full-‐time working for the platform (approx. 260 k€) and finances part of or all consumables for each project with charity funds from AFM (l’Association Française contre les Myopathies – France’s largest patient organisation in the field of rare diseases)
INSERM: acquired Roche 454 Ti sequencer (approx. 350 k€)
Centre National de Séquençage/Génoscope: puts sequencers, lab space, technical and coordinating personnel (3-‐4 persons) at the disposal of the platform
Services offered The platform’s team first assist the RD teams in the experimental design, performs sequence enrichment, if necessary, library construction and sequencing. Current technology at the platform enables to sequence regions of interest of up to 5 Mb and also perform whole exome sequencing.
An essential part of the platform’s services is the bioinformatic analysis of the data generated. Analysis pipelines have been developed to facilitate detection and annotation of polymorphisms. An analysis tool enabling to filter polymorphisms according to several criteria is put at the disposal of the RD teams as well as an interface for visualising the results with links to annotation databases. In addition, since a lot of RD teams do not have bioinformatic experience in-‐house, the platform’s bioinformaticians train the RD teams in and closely assist with data analysis.
Calls for projects Since April 2009, the Platform launches 4 calls for proposals per year for French teams with a research project on monogenic disease. A scientific committee, composed of 11 representatives of the founding institutes and associated stakeholders, performs project selection and evaluates the cost.
Throughout the first 6 calls 68 proposals were received and 44 were selected for sequencing at the platform. Total cost of consumables was calculated at approx. 1 M€, 83% of which was financed by GIS through charity funds.
28
First conclusions The platform aims at treating approx. 40-‐50 RD projects per year and resources need to be secured to keep the turn-‐over time to a limit of 90 days (final data and customised bioinformatics analysis). Also, since NGS technology is evolving very fast, the platform’s team needs to explore new applications and build new tools. The data storage capacity necessary to keep up with all projects is estimated at 9 To.
29
Acknowledgements
The authors wish to thank Vincent Meyer, Gabor Gyapay and Marc Wessner (Genoscope/CEA, France) for their helpful contribution. The authors explicitly thank Terry Vrijenhoek (gENVADIS), Benoît Dardelet and Marie-‐Denise Breton (ERA-‐Instruments) for the fruitful collaboration that led to this paper. The authors also wish to acknowledge the contribution of the whole E-‐Rare consortium and in particular Ralph Schuster (PT-‐DLR) during the elaboration of this paper.