Co-Evolving with the Open Source Eco-System | AnacondaCON 2017

Cloverco-evolves with open source

Star of Bethlehem Orchid - 1862

Darwin Moth - 1903

Open Source

Cron job until it hurts you

The new data era…….tada!

Picking Airflow

There’s a multitude of reasons why complex pieces of software are not developed using drag and drop tools: it’s that ultimately code is the best abstraction there is for software...Code allows for arbitrary

levels of abstractions, allows for all logical operation in a familiar way, integrates well with source control, is easy to version and to

collaborate on…

The abstractions exposed by traditional ETL tools are off-target. Sure, there’s a need to abstract the complexity of data processing,

computation and storage. But I would argue that the solution is not to expose ETL primitives (like source/target, aggregations, filtering) into

a drag-and-drop fashion. The abstractions needed are of a higher level.

For example, an example of a needed abstraction in a modern data environment is the configuration for the experiments in an A/

B testing framework: what are all the experiment? what are the related treatments? what percentage of users should be exposed?

what are the metrics that each experiment expects to affect? when is the experiment taking effect?

classify: source_folders: ['SFTP2', 'SFTP_TMGUSER'] classifier: regex: source: '^EFTO\.RH5141\.HCCMODD.*\.D(?P<date>\d{6})\.T(?P<time>\d{6})\d.*$' target: 'hccmodd_d\g<date>_t\g<time>.cbl'

parse: filename_strptime_format: 'hccmodd_d%y%m%d_t%H%M%S.cbl' parser: copybook: record_type: {start: 0, end: 1} records: - id: '1' name: header columns: - record_type: {start: 0, end: 1, type: string} - contract: {start: 1, end: 6, type: string} - run_date: {start: 6, end: 14, type: date, format: '%Y%m%d'} - payment_date: {start: 14, end: 20, type: date, format: '%Y%m'} - id: '3' name: trailer columns: - record_type: {start: 0, end: 1, type: string} - contract: {start: 1, end: 6, type: string} - record_count: {start: 6, end: 15, type: integer} - id: 'A' name: detail_record_a columns: - record_type: {start: 0, end: 1, type: string} - health_insurance_claim_account_number: {start: 1, end: 13, type: string} - beneficiary_last_name: {start: 13, end: 25, type: string} - beneficiary_first_name: {start: 25, end: 32, type: string} - beneficiary_initial: {start: 32, end: 33, type: string} - date_of_birth: {start: 33, end: 41, type: date, format: '%Y%m%d'} - sex: {start: 41, end: 42, type: enum, format: {'0': Unknown, '1': Male, '2': Female}} - social_security_number: {start: 42, end: 51, type: string} - age_group_female_00_34: {start: 51, end: 52, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_35_44: {start: 52, end: 53, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_45_54: {start: 53, end: 54, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_55_59: {start: 54, end: 55, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_60_64: {start: 55, end: 56, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_65_69: {start: 56, end: 57, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_70_74: {start: 57, end: 58, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_75_79: {start: 58, end: 59, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_80_84: {start: 59, end: 60, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_85_89: {start: 60, end: 61, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_90_94: {start: 61, end: 62, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_female_95_gt: {start: 62, end: 63, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_00_34: {start: 63, end: 64, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_35_44: {start: 64, end: 65, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_45_54: {start: 65, end: 66, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_55_59: {start: 66, end: 67, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_60_64: {start: 67, end: 68, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_65_69: {start: 68, end: 69, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_70_74: {start: 69, end: 70, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_75_79: {start: 70, end: 71, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_80_84: {start: 71, end: 72, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_85_89: {start: 72, end: 73, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_90_94: {start: 73, end: 74, type: boolean, format: {true_values: ['1'], false_values: ['0']}} - age_group_male_95_gt: {start: 74, end: 75, type: boolean, format: {true_values: ['1'], false_values: ['0']}}

Ingest

def _single_spec_tasks(dag, spec, upstream, pg_schema_task): classify_task = _classify_task(dag, spec) classify_task.set_upstream(upstream)

classify_catalog_task = _catalog_task( dag, CLASSIFIED_BUCKET, spec.name) classify_catalog_task.set_upstream(classify_task)

parse_task = _parse_task(dag, spec) parse_task.set_upstream(classify_task)

pg_load_task = _pg_load_task(dag, spec) pg_load_task.set_upstream([pg_schema_task, parse_task])

parse_catalog_task = _catalog_task( dag, PARSED_BUCKET, spec.name) parse_catalog_task.set_upstream(parse_task)

finished_task = operators.DummyOperator( task_id='finished_{}'.format(spec.name), dag=dag) finished_task.set_upstream([ classify_catalog_task, parse_catalog_task, pg_load_task])

return finished_task

File exports

database: dwh_db

source: sql: file: ../populate_grievances.sql parameters: quarter_start_date: '2016-04-01' medicare_part: part_c

validation: queries: - validate_required_fields: {file: ../validate_required_fields.sql}

write: filename: value: 'CLOVER_GRIEVANCES_PART_C_Q2_2016.TXT' writer: csv: header: false delimiter: "\t" newline: "\n" columns: - contract_number: {type: string, validators: [len: {operator: '==', value: 5}]} - tot_griev_tot_num: {type: integer, max_length: 12} - tot_griev_timely_notice_given_num: {type: integer, max_length: 12} - num_expedited_griev_tot_num: {type: integer, max_length: 12} - num_expedited_griev_timely_notice_given_num: {type: integer, max_length: 12} - enrollment_disenrollment_griev_tot_num: {type: integer, max_length: 12} - enrollment_disenrollment_griev_timely_notice_given_num: {type: integer, max_length: 12} - plan_bene_griev_tot_num: {type: integer, max_length: 12} - plan_bene_griev_timely_notice_given_num: {type: integer, max_length: 12} - access_griev_tot_num: {type: integer, max_length: 12} - access_griev_timely_notice_given_num: {type: integer, max_length: 12} - marketing_griev_tot_num: {type: integer, max_length: 12} - marketing_griev_timely_notice_given_num: {type: integer, max_length: 12} - customer_serv_griev_tot_num: {type: integer, max_length: 12} - customer_serv_griev_timely_notice_given_num: {type: integer, max_length: 12} - org_determ_griev_tot_num: {type: integer, max_length: 12} - org_determ_griev_timely_notice_given_num: {type: integer, max_length: 12} - quality_care_griev_tot_num: {type: integer, max_length: 12} - quality_care_griev_timely_notice_given_num: {type: integer, max_length: 12} - cms_issue_griev_tot_num: {type: integer, max_length: 12}

Campaignsname: [REDACTED] Screeninguuid: [REDACTED]

splits: - name: Holdout description: Members that should not show up in the list allocation: 2 control: true - name: Active description: Members that we're trying to call allocation: 8 spreadsheet: id: [REDACTED] write_to: Member Info read_from: State

timeline: start: [REDACTED] ops_end: [REDACTED] data_end: [REDACTED]

queries: eligibility: file: eligibility.sql success: file: success.sql reference: file: reference.sql

1. Custom code (high technical difficulty)2. Iterate (moderate technical difficulty)3. If not <understand problem>: goto 24. Abstract problem to declarative specification (high technical

difficulty)5. Make a new specification (low technical difficulty)6. If not <solved healthcare>: goto 5

Pipeline development flow

Side effect

The Kingpin of corporate software

Notebooks to the rescue

Open Source

• SQLAlchemy Temporal

• Ingest Framework

• CLI Tool for Airflow

https://github.com/CloverHealth/temporal-sqlalchemy

Two universes vs

Do we make data accessible by moving the data closer to the humans, or the humans

closer to the data? Moving people toward the data has a few positive externalities, including the organization-wide ability to create faster,

more programmatic output. If everyone across the company is writing little programs to do more work faster (and more consistently),

we’re making good on the premise of Clover as a business that leverages technology

across the org. ~ Clare Corthell

Co-Evolving with the Open Source Eco-System | AnacondaCON 2017

Data & Analytics

Transcript of Co-Evolving with the Open Source Eco-System | AnacondaCON 2017

eco eco eco eco eco eco

High Performance Analytics with Dask & Tensorflow | AnacondaCON 2017

Evolving WorkforceFuture of HR & Evolving Workforce

Leveraging the Power of Machine Learning at GE | AnacondaCON 2017

TECHNICAL DATA ECO 10 ECO 12 ECO 15 ECO 10 ECO 12 ECO … podatki eco line serije.pdf · eco 12 lt2 eco 12 lt3 eco 15 lt2 eco 15 lt3 eco 10 st2 eco 12 st2 eco 15 st2 dooor opening

ECO Fluorometer - comm-tec.com · ECO FL User’s Guide (FL) Revision AE 22 October 2007 ECO Fluorometer ECO FL User’s Guide The user’s guide is an evolving document. If you find

Evolving Testing and Analysis for Evolving Software · Evolving Testing and Analysis for Evolving Software . Tao Xie . Peking University (2011-2012), China . ... Microsoft Research

Evolving markets require evolving market approaches

JavaScript 2.0: Evolving a Language for Evolving Systems

Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Keynote: The Anaconda Roadmap | AnacondaCON 2017

GIAHS: Evolving Systems, Evolving Culture

The Next Generation of Data Products | AnacondaCON 2017

Cracking Washington’s Black Box | AnacondaCON 2017

Evolving Technology, Evolving Audiences - Jonathan Marshall

Exploring the rural eco-economy beyond neo …esrs2015.hutton.ac.uk/sites/ the...Exploring the rural eco-economy beyond neo-liberalism ... (e.g SITRA, Finland, Lund, ... • Evolving

Supercharging Excel with Anaconda Fusion | AnacondaCON 2017

ECO Volume Scattering Function MeterECO VSF User’s Guide (VSF) Revision AJ 11 Sept. 2007 ECO Volume Scattering Function Meter (VSF) User’s Guide The user’s guide is an evolving

VidInc Sydney - The Evolving Digital Eco-System

CloudSpeed Eco™ Gen. II SATA SSD - …...Title CloudSpeed Eco Gen. II SATA SSD Subject SaaS, IaaS, and PaaS business models are evolving to support enterprises for mission-critical