Software Practicals Summer Semester 2019

20
Database Systems Research Group Heidelberg University April 17, 2019 Software Practicals Summer Semester 2019

Transcript of Software Practicals Summer Semester 2019

Page 1: Software Practicals Summer Semester 2019

Database Systems Research GroupHeidelberg University

April 17, 2019

Software PracticalsSummer Semester 2019

Page 3: Software Practicals Summer Semester 2019
Page 4: Software Practicals Summer Semester 2019

● Overview of topics (today)○ send application for a topic until Tuesday, April 23, 13:00○ assignment of topics by April 26

● First milestone (mid/end May)○ prototype/part of software○ summary of research (literature and related systems/tools)○ further milestones in agreement with supervisor

● End of practical (mid/end July)○ code in local gitlab○ report / documentation as local Wiki document ○ presentation/demo of practical and software (10-15 minutes)

Page 5: Software Practicals Summer Semester 2019

● Application○ by email directly to supervisor○ brief list of relevant courses / prior knowledge○ schedule and milestones for the practical○ group work is not possible○ application is binding (don’t apply if you don’t want to do the practical)

● Deadlines○ presentation: planned for third week in July 2019 ○ Report & gitlab upload: by August 10, 2019○ no extension possible○ not finished = failed (grade 5,0)

Page 6: Software Practicals Summer Semester 2019

● Credit points (Leistungspunkte)○ Beginners Practical (IAP, 6 ECTS) [Bachelor students]

■ workload: 180 h (~1 ½ days/week)○ Advanced Practical (IFP, 8 ECTS / 6 ECTS)

■ workload: 240 h (~2 days/week)

● Grading based on○ code (readability, structure, functionality)○ documentation (README, comments)○ commitment and self-reliance○ cool ideas!!

● IMPORTANT○ talk to / communicate with your advisor

Page 8: Software Practicals Summer Semester 2019
Page 9: Software Practicals Summer Semester 2019

1.2.3.4.5.6.7.8.9.

10.

Page 10: Software Practicals Summer Semester 2019

Given: 1. Doctoral letters written in German (semi-structured)2. Medical vocabulary (e.g., Unified Medical Language System, UMLS)Tasks: • Build pipeline that identifies and manages medical named entities• Manage and allow querying named entities in database

Subtasks:• Extend existing information extraction pipeline • Develop GUI components for querying medical named entities

Languages / Tools:• Python; MongoDB; Django/Flask• UMLS, https://www.nlm.nih.gov/research/umls/

Page 11: Software Practicals Summer Semester 2019

Given: 1. Medical term co-occurrence network extracted from doctoral letters

written in German (and managed in MongoDB)2. Medical named entities and medical vocabulary Tasks: • Adapt and extend construction of co-occurrence networks• Web-based querying and visualization of co-occurrence networks

Subtasks: • Consolidate extraction pipeline for co-occurrence networks• Develop GUI components for graph querying

Languages / Tools:• Python; MongoDB; Django/Flask• UMLS, https://www.nlm.nih.gov/research/umls/

Page 12: Software Practicals Summer Semester 2019

Given: 1. Website with information about voting behavior 2. Lists of politicians and partiesTasks: • Extract information about politicians, topics, and votings from

https://www.bundestag.de/abstimmung• Develop Web-based visualization and query framework

Subtasks: • Extract information from Website and manage them in database• Develop GUI components for politician/topic centric querying

Languages / Tools:• Python; MongoDB/Solr; Django/Flask

Page 13: Software Practicals Summer Semester 2019

Given: 1. German legal texts (as XML files for, e.g., BGB, StPO, ZPO)2. Machine Learning frameworksTasks: • Develop pipeline(s) to compute and manage language models for

collections of legal texts• Evaluation and comparison with existing word embeddings

Subtasks:• Extract legal texts from www.gesetze-im-internet.de/ • Apply Machine Learning pipeline on collections of legal texts

Languages / Tools:• Python; SciKit-Learn/Tensorflow

Page 14: Software Practicals Summer Semester 2019

Given: 1. German doctoral letters (semi-structured)2. Machine Learning frameworksTasks: • Develop pipeline(s) to compute and manage language models for

collections of doctoral letters • Evaluation and comparison with existing word embeddings

Subtasks:• Develop and apply Machine Learning pipeline on collections of

medical textsLanguages / Tools:• Python; SciKit-Learn/Tensorflow

Page 15: Software Practicals Summer Semester 2019

Given: 1. Hypergraph/Graph Document Model 2. Relational Implementation (PostgreSQL) as referenceTasks: • Propose and implement a schematic model in a graph database• Evaluate performance on a set of predefined query types

Subtasks:• “Translate” queries from SQL to product-specific query languages• BP: Neo4j only, AP: Can extend this to other frameworks as well

Languages / Tools:• SQL; Neo4j, OrientDB, ArangoDB, Dgraph, MongoDB, ...

Page 16: Software Practicals Summer Semester 2019

Given: 1. Relational Document Model in PostgreSQL 2. Set of “Standard Queries”Tasks: • Find optimal execution pattern and potential improvements

(including Postgres setting)• Investigate optimal SQL execution plan

Subtasks:• Learn details about PostgreSQL internals and query planner• Find bottlenecks in execution plans

Languages / Tools:• SQL/PostgreSQL (at low level, this is C code)

Page 17: Software Practicals Summer Semester 2019

Given:1. News Extraction Pipeline 2. Time-Varying Graph ExplorerTasks: • Extract articles (adapt sample code)• Use Ambiverse to link entities• Implement live view in TVG Explorer

Subtasks:• Decide on intermediate representation (DB / in-memory?)

Languages / Tools:• Python, HTML, JavaScript, (MongoDB), ...

Page 18: Software Practicals Summer Semester 2019

Given: 1. News Extraction Pipeline 2. Measures to rate importance

of entities in News articles [1]Tasks: • Implement browser-based visualization

of current News based on the News Extraction PipelineSubtasks:• Get familiar with the paper and existing code• Decide on suitable graph visualization framework (visjs?)

Languages / Tools:• Java or Python, HTML, JavaScript, ...

Page 19: Software Practicals Summer Semester 2019

Given: 1. Existing code to track communities [1] over multiple snapshots2. Time-Varying Graph ExplorerTasks: • Fix performance bottlenecks in existing code• Port / Reimplement visualization in TVG Explorer

Subtasks:• Measure performance bottlenecks• Decide on suitable replacement algorithms

Languages / Tools:• Python, HTML, JavaScript