The Process1. Productize
- Compelling data products- Innovation pipeline
2. Ruggedize- Toolchain: Rstudio, Devtools, Github, Travis CI, Docker- Strong testing- Production-ready Architecture
3. Assimilate- Command line tools- Make it into HTTP APIs- Make it into Docker containers
Step 1: ProductizeInternal Products:
- Ad-hoc Analyses - Internal Dashboards- Automated reports- Rapid Prototyping
External Products:- End-user data products- Backend services
Step 2: Ruggedize
1. Create reproducible architecture2. Set up strong testing & CI 3. Separate Production and Dev 4. Set up monitoring & reporting
Case Study: HB Architecture
- Rstudio - Containerized Architecture- Continuous Integration- Multiple Environments- Notifications/Monitoring
Data Architecture
elasticsearch:
image: elasticsearch
shiny-server:
image: shiny
ports:
- "443:443"
links:
- elasticsearch
etl:
image:etl
volumes:
- .:/data
etl-data:
image: etl-dataETL
Shiny Server Elastic
ETL Data
SQL S3
Web
rAPI
SQL
Shiny Server
Elastic
ETL data
ETL
rAPI
Docker Compose Containers
+ =
Rstudio Server
Environments
ETL
Shiny Server Elastic
data volume
SQL S3
www.dataproduct.com
internal-dashboards.com
ETL
Shiny Server Elastic
data volume
SQL S3
staging-www.dataproduct.com
staging-internal-dashboards.com
Production Staging
Continuous Integration
Github Travis CI
commit
latest-stable tag
Production
pull latest-stable
Staging
pull latest-stableSuccess!
Docker Registry/Rolling Back
Docker Registry
ETL data volume
Changes Deployed to Prod
Save Versioned Image
Danger! Need to Rollback!
ETL data volume
Load Older Image
Docker Registry
Assimilate (contd)- HTTP APIs
- OpenCPU, rapier- Docker containers
- Rocker- Command line tools
- Rscript, littler, docopt
Interviewing ● What we want
○ Problem solving ability○ Typical question: how would you approach a problem we are currently
working on?
● What others want○ It depends! :)○ Orgs with DS will ask you the standard DS questions
InterviewingOrgs without DS will ask you about1. Search / recommendations2. SQL3. Data engineering (how would you use Hadoop to …?)4. Software development
Orgs without DS will evaluate you as a1. Subject matter expert (search/recs)2. Software engineer3. DBA / SQL analyst4. Product manager
Be prepared! (study ‘Cracking the Interview’ or similar)
Top Related