Clean code in Jupyter notebooks

43

Click here to load reader

Transcript of Clean code in Jupyter notebooks

Page 1: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Clean CodeIn Jupyter notebooks, using Python

1

5th of July, 2016

Page 2: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Volodymyr (Vlad) Kazantsev

Head of Data @ product madness

Product Manager

MBA @LBS

Graphics programming

Writes code for money since 2002

Math degree2

Kateryna (Katya) Nerush

Mobile Dev @ Octopus Labs

Dev Lead in Finance

Data Engineer

Web Developer

Writes code for money since 2003

CS degree

Page 3: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Why we end-up with messy ipy notebooks?

3

Coding

Stats Business

Page 4: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Who are Data Scientists, really?

4

Coding

Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.”

Data Science with Python

Page 5: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

It is not going to production anyway!

5

Page 6: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999

6

WTF! How am I suppose to validate this??

Sorry, but how do can I calculate 7 day retention ?

Page 7: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

From Prototype to ... The Data Science Spiral

7

Ideas & Questions

Data Analysis

Insights

Impact

Page 8: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

You do it for your own good..

8

Re-run all AB tests analysis for the last months, by tomorrow

Ideas & Questions

Data Analysis

Insights

Impact

Page 9: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Part 2What can Data Scientists learn from

Software Engineers?

9

Page 10: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Robert C. Martin, a.k.a. “Uncle Bob”

10

https://cleancoders.com/

Page 11: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

“Clean Code” ?

11

Pleasingly graceful and stylish in appearance or manner

Bjarne StroustrupInventor of C++

Clean code reads like well written proseGrady Boochcreator of UML

.. each routine turns out to be pretty much what you expected

Ward Cunninghaminventor of Wiki and XP

Page 12: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

One does not simply start writing clean code..

12

First make it work,Then make it Right,Then make it fast and small

Kent Beckco-inventor of XP and TDD

Leave the campground cleaner than you found it

- Run all the tests

- Contains no duplicate code

- Expresses all ideas...

- Minimize classes and methods

Ron Jeffriesauthor of Extreme

Programming Installed

The Boy Scouts of America

Applied to programming by Uncle Bob

Volodymyr Kazantsev
What are your thoughts? Makes sense?
Page 13: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

I'm not a great programmer; I'm just a good programmer with great habits.

13

Kent Beck

Page 14: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

“There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton

long_descriptive_names

Avoid: x, i, stuff, do_blah()

Pronounceable and Searchable

revenue_per_payer vs. arpdpu

Avoid encodings, abbreviations, prefixes, suffixes.. if possible bonus_points_on_iphone vs. cns_crm_dip

Add meaningful contextdaily_revenue_per_payer

Don’t be lazy. Spend time naming and renaming things.14

Page 15: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

“each routine turns out to be pretty much what you expected” - Ward Cunningham

Small

Do one thing

One Level of Abstraction

Have only few arguments (one is the best)

Less important in Python, with named arguments.

15

Volodymyr Kazantsev
I'll leave this slide to you then
Katya Nerush
noo
Katya Nerush
sorrywanted to be useful
Katya Nerush
i disappear...
Volodymyr Kazantsev
ok
Page 16: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Use good names

Avoid obvious comments.

Dead Commented-out Code

ToDo, licenses, history, markup for documentation and other nonsense

But there are exceptions..

“When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck

16

Page 17: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

// When I wrote this, only God and I understood what I was doing// Now, God only knows

17

Page 18: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

// sometimes I believe compiler ignores all my comments

18

Page 19: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

/*** Always returns true.*/public boolean isAvailable() { return false;}

19

Page 20: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

“Long functions is where classes are trying to hide” - Robert C. Martin

20

Small

Do one thing

SOLID, Design Patterns, etc.

Volodymyr Kazantsev
Can you please fill slide with books
Katya Nerush
not sure about pragmatic programmer
Volodymyr Kazantsev
there probably should be original design patterns by "gang of four" book
Volodymyr Kazantsev
I don't know about pragmatic programmer
Volodymyr Kazantsev
I think this is good
Page 21: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Code conventions

Team should produce same style code as if that was one person

Team conventions over language one, over personal ones

Automate style formatting

21

Page 22: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Part 3How to write Clean Code in Python?

(e.g. this is not Java)

22

Page 23: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

● Indentation● Tabs or Spaces?● Maximum Line Length● Should a line break before or after a binary operator?● Blank Lines● Imports● Comments● Naming Conventions

Example:

PEP 8 -- Style Guide for Python Code

23

foo = long_function_name(var_one, var_two, var_three, var_four)

foo = long_function_name(var_one, var_two, var_three, var_four)

Good Bad

https://www.python.org/dev/peps/pep-0008/

Page 24: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Google Python Style Guide

24

https://google.github.io/styleguide/pyguide.html

Page 25: Clean code in Jupyter notebooks

@KNerush @Volodymyrk25

My favourite !

This is not Java or C++

Functions are first-class objects

Duck-typing as an interface

No setters/getters

Itertools, zip, enumerate

etc.

Page 26: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Part 4How to write Clean Python Code in

Jupyter Notebook?

26

Page 27: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

1. Imports

27

2. Get Data

5.Visualisation

6. Making sense of the data

4. Modelling

3. Transform Data

Typical structure of the ipynb

Page 28: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

How big should a notebook file be?

28

Page 29: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

How big should a notebook file be?

Hypothesis - Data - Interpretation

29

Page 30: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Keep your notebooks small!

(4-10 cells each)

30

Page 31: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Example:

Tip 1: break fat notebook into many small ones

31

1_data_preparation.ipynb

df.to_pickle(‘clean_data_1.pkl)

2_linear_model.py

df = pd.read_pickle(‘clean_data_1.pkl)

3_ensamble.py

df = pd.read_pickle(‘clean_data_1.pkl)

Page 32: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Tip 2: shared library

Data access

Common plotting functionality

Report generation

Misc. utils

32

acme_data_utils Data_access.py plotting.py setup.py tests/

Page 33: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Tip 3: Don’t just be pythonic. Be IPythonicDon’t hide “secret sauce” inside imported module

BAD:

Good:

33

Page 34: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Clean code reads like well written prose

34

Grady Booch

Page 35: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Good jupyter notebook reads like well written prose

35

Page 36: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

How big should one Cell be?

36

Page 37: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

One “idea - execution - output” triplet per cell

Import Cell: expected output is no import errors

CMD+SHIFT+P

37

Tip 4: each cell should have one logical output

Page 38: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Tip 5: write tests .. in jupyter notebooks

38

https://pypi.python.org/pypi/pytest-ipynb

Page 39: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Tip 6: ..to the cloud

39

Page 40: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Code Smells .. in ipynb

- Cells can’t be executed in order (with runAll and Restart&RunAll)

- Prototype (check ideas) code is mixed with “analysis” code

- Debugging cells

- Copy-paste cells

- Duplicate code (in general)

- Multiple notebooks that re-implement the same function

40

Page 41: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Tip 7: Run notebook from another notebook!

41

analysis.ipynb

Page 42: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Make Data Product from notebooks!

42

Page 43: Clean code in Jupyter notebooks

@KNerush @Volodymyrk

Summary: How to organise a Jupyter project

1. Notebook should have one Hypothesis-Data-Interpretation loop

2. Make a multi-project utils library

3. Good jupyter notebook reads like a well written prose

4. Each cell should have one and only one output

5. Write tests in notebooks

6. Deploy a shared Jupyter server

7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.

43