p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d...

32
Count plots and bar plots INTRODUCTION TO SEABORN Erin Case Data Scientist

Transcript of p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d...

Page 1: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

Count plots and barplots

I N T R O D U C T I O N TO S E A B O R N

Erin CaseData Scientist

Page 2: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Categorical plotsExamples: count plots, bar plots

Involve a categorical variable

Comparisons between groups

Page 3: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

catplot()Used to create categorical plots

Same advantages of relplot()

Easily create subplots with col= and row=

Page 4: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

countplot() vs. catplot()

import matplotlib.pyplot as plt

import seaborn as sns

sns.countplot(x="how_masculine",

data=masculinity_data)

plt.show()

Page 5: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

countplot() vs. catplot()

import matplotlib.pyplot as plt

import seaborn as sns

sns.catplot(x="how_masculine",

data=masculinity_data,

kind="count")

plt.show()

Page 6: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Changing the orderimport matplotlib.pyplot as plt

import seaborn as sns category_order = ["No answer",

"Not at all",

"Not very",

"Somewhat",

"Very"] sns.catplot(x="how_masculine",

data=masculinity_data,

kind="count",

order=category_order) plt.show()

Page 7: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Bar plotsDisplays mean of quantitative variable per

category

import matplotlib.pyplot as plt

import seaborn as sns

sns.catplot(x="day",

y="total_bill",

data=tips,

kind="bar")

plt.show()

Page 8: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Con�dence intervalsLines show 95% con�dence intervals for the

mean

Shows uncertainty about our estimate

Assumes our data is a random sample

Page 9: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Turning off con�dence intervals

import matplotlib.pyplot as plt

import seaborn as sns

sns.catplot(x="day",

y="total_bill",

data=tips,

kind="bar",

ci=None)

plt.show()

Page 10: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Changing the orientation

import matplotlib.pyplot as plt

import seaborn as sns

sns.catplot(x="total_bill",

y="day",

data=tips,

kind="bar")

plt.show()

Page 11: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

Let's practice!I N T R O D U C T I O N TO S E A B O R N

Page 12: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

Creating a box plotI N T R O D U C T I O N TO S E A B O R N

Erin CaseData Scientist

Page 13: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

What is a box plot?Shows the distribution of quantitative data

See median, spread, skewness, and outliers

Facilitates comparisons between groups

Page 14: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

How to create a box plot

import matplotlib.pyplot as plt

import seaborn as sns

g = sns.catplot(x="time",

y="total_bill",

data=tips,

kind="box")

plt.show()

Page 15: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Change the order of categories

import matplotlib.pyplot as plt

import seaborn as sns

g = sns.catplot(x="time",

y="total_bill",

data=tips,

kind="box",

order=["Dinner",

"Lunch"])

plt.show()

Page 16: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Omitting the outliers using `sym`

import matplotlib.pyplot as plt

import seaborn as sns

g = sns.catplot(x="time",

y="total_bill",

data=tips,

kind="box",

sym="")

plt.show()

Page 17: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Changing the whiskers using `whis`By default, the whiskers extend to 1.5 * the interquartile range

Make them extend to 2.0 * IQR: whis=2.0

Show the 5th and 95th percentiles: whis=[5, 95]

Show min and max values: whis=[0, 100]

Page 18: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Changing the whiskers using `whis`

import matplotlib.pyplot as plt

import seaborn as sns

g = sns.catplot(x="time",

y="total_bill",

data=tips,

kind="box",

whis=[0, 100])

plt.show()

Page 19: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

Let's practice!I N T R O D U C T I O N TO S E A B O R N

Page 20: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

Point plotsI N T R O D U C T I O N TO S E A B O R N

Erin CaseData Scientist

Page 21: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

What are point plots?Points show mean of quantitative variable

Vertical lines show 95% con�dence intervals

Page 22: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Line plot: average level of nitrogen dioxide over

time

Point plot: average restaurant bill, smokers vs.

non-smokers

Page 23: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Point plots vs. line plotsBoth show:

Mean of quantitative variable

95% con�dence intervals for the mean

Differences:

Line plot has quantitative variable (usually time) on x-axis

Point plot has categorical variable on x-axis

Page 24: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Point plots vs. bar plotsBoth show:

Mean of quantitative variable

95% con�dence intervals for the mean

Page 25: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Point plots vs. bar plots

Page 26: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Creating a point plot

import matplotlib.pyplot as plt

import seaborn as sns

sns.catplot(x="age",

y="masculinity_important",

data=masculinity_data,

hue="feel_masculine",

kind="point")

plt.show()

Page 27: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Disconnecting the points

import matplotlib.pyplot as plt

import seaborn as sns

sns.catplot(x="age",

y="masculinity_important",

data=masculinity_data,

hue="feel_masculine",

kind="point",

join=False)

plt.show()

Page 28: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Displaying the median

import matplotlib.pyplot as plt

import seaborn as sns

sns.catplot(x="smoker",

y="total_bill",

data=tips,

kind="point")

plt.show()

Page 29: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Displaying the median

import matplotlib.pyplot as plt

import seaborn as sns

from numpy import median

sns.catplot(x="smoker",

y="total_bill",

data=tips,

kind="point",

estimator=median)

plt.show()

Page 30: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Customizing the con�dence intervals

import matplotlib.pyplot as plt

import seaborn as sns

sns.catplot(x="smoker",

y="total_bill",

data=tips,

kind="point",

capsize=0.2)

plt.show()

Page 31: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

INTRODUCTION TO SEABORN

Turning off con�dence intervals

import matplotlib.pyplot as plt

import seaborn as sns

sns.catplot(x="smoker",

y="total_bill",

data=tips,

kind="point",

ci=None)

plt.show()

Page 32: p l o ts C o u n t p l o ts a n d b a r · 2019-07-01 · By d efau lt , t h e wh is kers ext en d t o 1.5 * t h e in t erq u art ile ran ge M ake t h em ext en d t o 2.0 * I Q R:

Let's practice!I N T R O D U C T I O N TO S E A B O R N