Facilitating Collaborative Life Science Research in Commercial & Enterprise Environments
Embed Size (px)
Transcript of Facilitating Collaborative Life Science Research in Commercial & Enterprise Environments
- 1. 1 Photo credit: Aaron Gardner Bridging the Gap - Facilitating Collaborative Life Science Research in Commercial & Enterprise Environments March 2017 - NEREN SEMINAR
- 2. 2 Im Chris. Im an infrastructure geek (and failed scientist) I work for the BioTeam. Photo credit: Cindy Jessel @chris_dag
- 3. 3 www.BioTeam.net Independent Consulting Shop Run by scientists forced to learn IT to get science done Virtual company with nationwide sta 15+ years bridging the gap between hardcore science, HPC & IT Honest. Objective. Vendor & Technology Agnostic. We are hiring :)
- 4. 4 Content Warning I am not an expert or a thought leader I try to speak honestly about what I see, do and experience on the ground as an IT worker My views are biased by the types of work I perform. Filter my words through your own expertise Im worried about time so I may skip slides full PDF of slide deck will be available.
- 5. 5 Q117 Current State: Commercial LifeSci Research Computing
- 6. 6 01: Science Evolves Faster Than IT Rate of scientic innovation is incredible Same innovation rate seen with lab side instruments Scientic and instrument requirements change far faster than IT organizations can build, rebuild or refresh complex infrastructure In the face of science world changing month-to-month: best funded, most aggressive shops can only refresh large installations every ~2 years. Most refresh on 3-4 year cycles. Gulp!
- 7. 7 02: Weve lost the centralization battle Old way: Centralize all HPC and Research Computing functions into a single-site, centrally managed & supported environment Bring the users and the data to the shared environment This no longer works as well as it used to Terabyte-scale instruments have diused EVERYWHERE and will continue to pop up everywhere Building/campus LANs cant support tera|peta-scale data movement Does not address external collaborators or data sources well
- 8. 8 03: Petabytes for free There are petabytes of very interesting open-access data available for free on the internet There are many valid business and scientic reasons for a research computing user wanting to bring some of this data in-house to facilitate new or existing research programs but Massive technical challenges (Ingest, trash tier storage, etc.) Massive organizational challenges: It takes a ton of work and resources to host peta-scale free data Organizations struggling to build governance/approval models tied to actual business or scientic goals
- 9. 9 04: Userbase now spanning the enterprise Life was a lot easier when the only users of research computing were scientists and R&D organizations Easy to build domain expertise and bias our infrastructure to favor power and capability over 99.99% uptime. Researchers will tolerate occasional downtime if the payo is faster systems or bigger storage Much harder when the full enterprise needs data intensive science Those pesky corporate types want SLAs and 24x7 support :) Userbase diversity is incredible: manufacturing, process optimization, commercial operations, sales operations, compliance, risk management, etc, etc, Far far harder to support, train, enable and mentor
- 10. 10 05: Data Types Getting Weird We are very good at handling terabytes and petabytes of static structured or unstructured data - storage tech and operational practices for this have evolved over DECADES Ingesting, storing and computing against data streams requires entirely new tech, skills and infrastructure Sensor telementy from bioreactors in manufacturing Environmental sensor data streams from greenhouses Website clickstream and advertising metrics from Commercial Ops etc. etc.
- 11. 11 06: Our Networks Suck Enterprise network architectures are optimized for lots of small concurrent trac ows. They have issues with elephant ows where a single network ow may be using 1gb, 10gb or 40gb of bandwidth to move a big data le Our network cores can barely handle 10gig when they should be running at 40gig and 100gig so they can do 10gig to top-of-rack trivially Our building-to-building and lab-to-lab links are woefully undersized Our connections to the outside world are woefully undersized Cost of Cisco networking at 40gb and higher is simply ludicrous
- 12. 12 07: Our Firewalls Suck Stuck with legacy model and operational assumptions (Yes we can do deep packet inspection on EVERYTHING & Yeah it makes total sense to only put a rewall at the perimeter of our network) That $90,000 rewall advertised as 10gig ready cant actually handle a large scientic data transfer because inside the box they are actually aggregating 10x cheap 1gig network paths and calling it 10 gig Feed it a single le transfer stream @ 10gbps and watch it thrash and drop throughput by 90%.
- 13. 13 Summarizing our key challenges What keeps us from the collaborative computing promised land?
- 14. Collaborative Research: Key Challenges Network speeds: Internal & External Deploying ScienceDMZ architectures to take data intensive science load o of networks built for business users Network security methods: Core & Edge Federated Identity Management Obtaining the domain expertise required to enable, mentor and fully support the massively expanding class of collaborative researchers who need sophisticated compute and analytics 14
- 15. 15 Ok dude. All your challenges are tech related. What about the human side of research facilitation?
- 16. 16 Collaborative Research Challenges: Human Factors Wishful thinking rather than critical thinking about what the organization REALLY wants to encourage. We see a lot of build the database/catalog/ warehouse/repository/lake/commons and they will come pitches with zero support for follow-through. Collab/research facilitators with enough seniority to to be thinking Where are the collaborative opportunities, how do they align with the business needs, what data is actually useful to others?
- 17. 17 Collaborative Research Challenges: Human Factors, 2 The BIGGEST ISSUE OF ALL: Whats in it personally for the collaborating parties? Does this get them promoted, published, solve their research problem, answer their burning questions, etc. or does it detract from these things by taking time away from activities more benecial to the org or person? Does the system support or inhibit collaboration through activities like budget allocations, stang, approval processes, etc. ? Org charts, corporate culture and operating models can either encourage or stie any collaborative eorts that may exist. h/t - Simon Twigger!
- 18. 18 Collaborative Research Challenges: Human Factors, 3 Research Facilitators in Industry: Someone needs to be out there learning about the silos of excellence and seeing the opportunities for collaboration Some scientists are too heads down in their own area to see beyond immediate needs. Having a human to make this happen could be huge, way more eective than all the technological solutions we usually throw at this problem. Impedance mismatch: A real issue. We need something like an E-Harmony for matchmaking between collaborators with the same motivation levels ! h/t - Simon Twigger!
- 19. 19 Collaborative Research Computing: Internal Supporting internal eorts in commercial pharma/biotech
- 20. 20 Facilitating Internal Collaboration Harder than multi-party collaboration in some ways Few companies incentivize or otherwise actively encourage collaboration across departmental boundaries Or if they do encourage it is often just empty talk; the reality on the ground when it comes to performance reviews, HR and local management may be dierent Talk is cheap. Taking steps to encourage, track and reward people is not. Other main issue is impedence mismatch between potential collaborators Often two groups that may wish to collaborate may have dierent timeframes, interest levels and available resources. Tough to nd perfect alignment
- 21. 21 Internal Collaboration: How we do it (1) Regular HPC/computing training classes where all are welcome and attendees span various business units. Serendipitous opportunities abound Mailing list, Slack etc. methods for consumers of research computing services to actively communicate, share code and troubleshooting assistance Road-shows and lunch and learn sessions with rotating cast of speakers, delivered across multiple sites. Speakers are often users/consumers with great stories and data to talk about Having most apps and data sets on a large single namespace storage system makes the act of collaboration easier for all comers; Private GitLab or other code hosting portal for users to share code and tooling also helps
- 22. 22 Internal Collaboration: How we do it (2) Publishing data catalogs so people understand what is available for use and exploration is very helpful. Does not have to be complex - even a simple Wiki or web page can work Research Facilitators who can embed with departments or groups for weeklong or monthlong periods are very useful at driving new use cases and collaborations at collecting valuable domain knowledge needed for long term support of users and departments dissolving barriers between IT and people asking interesting questions
- 23. 23 Internal Collaboration: Challenges Fighting for permission to deploy real, useful collaboration tools vs. management who just keep saying SharePoint, SharePoint, SharePoint The new crop of potential collaborators may sit at sites not previously covered by research computing infrastructure or support resources As data types and tooling get more diverse and more complex it is a constant battle to retain the internal IT domain knowledge necessary to help compute consumers be successful in thei