Binds - Amazon Simple Storage Service Keith Richards 3 Charlie Watts 4 Ronnie Wood Label names for...

29
JOINING DATA IN R WITH DPLYR Binds

Transcript of Binds - Amazon Simple Storage Service Keith Richards 3 Charlie Watts 4 Ronnie Wood Label names for...

JOINING DATA IN R WITH DPLYR

Binds

Joining Data in R with dplyr

Joining Data in R with dplyr

● rbind()

● cbind()

● bind_rows()

● bind_cols()

Joining Data in R with dplyr

> bind_rows(band1, band2) name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr 5 Mick Jagger 6 Keith Richards 7 Charlie Watts 8 Ronnie Wood

bind_rows()> band1 name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr

> band2 name surname 1 Mick Jagger 2 Keith Richards 3 Charlie Watts 4 Ronnie Wood

tables to combine

Joining Data in R with dplyr

bind_cols()

> bind_cols(band1, plays1) name surname instrument born 1 John Lennon Guitar 1940 2 Paul McCartney Bass 1942 3 George Harrison Guitar 1943 4 Ringo Starr Drums 1940

> band1 name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr

> plays1 instrument born 1 Guitar 1940 2 Bass 1942 3 Guitar 1943 4 Drums 1940

Joining Data in R with dplyr

Benefits of bind_rows() and bind_cols()● Faster

● Return a tibble

● Can handle lists of data frames

● .id

Joining Data in R with dplyr

> bind_rows(Beatles = band1, Stones = band2, .id = "band") band name surname 1 Beatles John Lennon 2 Beatles Paul McCartney 3 Beatles George Harrison 4 Beatles Ringo Starr 5 Stones Mick Jagger 6 Stones Keith Richards 7 Stones Charlie Watts 8 Stones Ronnie Wood

bind_rows()> band1 name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr

> band2 name surname 1 Mick Jagger 2 Keith Richards 3 Charlie Watts 4 Ronnie Wood

Label names for new column

Column name for new column

JOINING DATA IN R WITH DPLYR

Let’s practice!

JOINING DATA IN R WITH DPLYR

Build a be!er data frame

Joining Data in R with dplyr

● data.frame()

● as.data.frame()

● data_frame()

● as_data_frame()

Joining Data in R with dplyr

data.frame() defaults● Changes strings to factors

● Adds row names

● Changes unusual column names

Joining Data in R with dplyr

> data_frame( + Beatles = c("John", "Paul", "George", "Ringo"), + Stones = c("Mick", "Keith", "Charlie", "Ronnie"), + Zeppelins = c("Robert", "Jimmy", "John Paul", "John") + ) # A tibble: 4 × 3 Beatles Stones Zeppelins <chr> <chr> <chr> 1 John Mick Robert 2 Paul Keith Jimmy 3 George Charlie John Paul 4 Ringo Ronnie John

data_frame()

Joining Data in R with dplyr

data_frame()data_frame() will not…

● Change the data type of vectors (e.g. strings to factors)

● Add row names

● Change column names

● Recycle vectors greater than length one

Joining Data in R with dplyr

data_frame()

> data_frame( + numbers = 1:5, + squares = numbers ^ 2 + ) # A tibble: 5 × 2 numbers squares <int> <dbl> 1 1 1 2 2 4 3 3 9 4 4 16 5 5 25

● Returns a tibble

● Evaluates arguments lazily, in order

Joining Data in R with dplyr

as_data_frame()

JOINING DATA IN R WITH DPLYR

Let’s practice!

JOINING DATA IN R WITH DPLYR

Working with data types

Joining Data in R with dplyr

> 1 + 1 [1] 2

> "one" + "one" Error in "one" + "one" : non-numeric argument to binary operator

Joining Data in R with dplyr

Character

Number

Character? Number?

Joining Data in R with dplyr

> typeof(TRUE) [1] "logical"

> typeof("hello") [1] "character"

> typeof(3.14) [1] "double"

> typeof(1L) [1] "integer"

> typeof(1 + 2i) [1] "complex"

> typeof(raw(1)) [1] "raw"

Atomic data typesLogical

Character (i.e. string)

Double (i.e. numeric w/ decimal)

Integer (i.e. numeric w/o decimal)

Complex

Raw

Joining Data in R with dplyr

> x <- c(1L, 2L, 3L, 2L) > x [1] 1 2 3 2 > typeof(x) [1] "integer" > class(x) [1] "integer"

> attributes(x) <- list(class = "factor", levels = c("A", "B", "C", "D")) > x [1] A B C B Levels: A B C D > typeof(x) [1] "integer" > class(x) [1] "factor"

Classes

1L = A 2L = B 3L = C 4L = D

JOINING DATA IN R WITH DPLYR

Let’s practice!

JOINING DATA IN R WITH DPLYR

dplyr's coercion rules

Joining Data in R with dplyr

Character

Number

Character? Number?

Joining Data in R with dplyr

Character (string)

Integer Double Logical

Character

Double Integer Logical

Double

TRUE -> 1 FALSE -> 0

Integer Logical

Integer

TRUE -> 1 FALSE -> 0

as.character()

as.numeric()

as.integer()

Joining Data in R with dplyr

> as.character(x) [1] "A" "B" "C" "B" > as.numeric(x) [1] 1 2 3 2

factors# x is a factor > x [1] A B C B Levels: A B C D

# How x is stored? > unclass(x) [1] 1 2 3 2 attr(,"levels") [1] "A" "B" "C" "D"

Joining Data in R with dplyr

factors# y is a factor > y <- factor(c(5, 6, 7, 6)) > y [1] 5 6 7 6 Levels: 5 6 7

> unclass(y) [1] 1 2 3 2 attr(,"levels") [1] "5" "6" "7"

> as.character(y) [1] "5" "6" "7" "6" > as.numeric(y) [1] 1 2 3 2 > as.numeric(as.character(y)) [1] 5 6 7 6

Joining Data in R with dplyr

dplyr's coercion behavior● dplyr functions will not automatically coerce data types

● Returns an error

● Expects you to manually coerce data

● Exception: factors

● dplyr converts non-aligning factors to strings

● Gives warning message

JOINING DATA IN R WITH DPLYR

Let’s practice!