Post on 10-May-2015
description
Collections in Clojure
Jan Herich
2014-03-19 Mon
Jan Herich Collections in Clojure 2014-03-19 Mon 1 / 23
Outline
1 Basic Clojure collection types
2 Persistent characteristics of Clojure collections
3 Sequence abstraction and laziness
4 Reducers - better performance and parallelism
Jan Herich Collections in Clojure 2014-03-19 Mon 2 / 23
Basic Clojure collection types
Lists
List data-structureimplemented asordinarysingle-linked listLists are specialbecause they areused to composeClojure programsUnquoted lists aretreated as functioncalls by Clojureenvironment
;; list literal representation’(1 2 :id (3 4) "name");; unquoted list interpreted;; as function call(= (+ 1 2) 3);; get the first element(= (peek ’(1 2 3)) 1);; new vector from old one(= (pop ’(1 2 3)) ’(2 3))(= (conj ’(3 2 1) 4)
’(4 3 2 1))
Jan Herich Collections in Clojure 2014-03-19 Mon 3 / 23
Basic Clojure collection types
Sets
Sets are collectionsof unique elementsAs every collectionin Clojure, sets canbe heterogeneousFast membershiptest
;; set literal representation#{1 :id :type "name"};; testing membership(= true (contains? #{1 2} 2));; new set from old one(= (disj #{1 2 3} 2) #{1 3})(= (conj #{1 3} 2) #{1 2 3})
Jan Herich Collections in Clojure 2014-03-19 Mon 4 / 23
Basic Clojure collection types
Maps
Maps is a basicconstruct forholding structuredinformationDefaultimplementationuses a well-knownhash-mapmechanismFast look-up
;; map literal representation{:id 1 :name "John"};; Optional comma delimiters{:id 1, :name "John"};; lookup(= (get {:id 1 :name "John"} :id)
1);; new map from old one(= (dissoc {:id 1 :name "John"}
:name){:id 1})
(= (assoc {:id 1} :name "John"){:id 1 :name "John"})
Jan Herich Collections in Clojure 2014-03-19 Mon 5 / 23
Basic Clojure collection types
Vectors
Vector is the rightstructure forordered data whererandom look-up isnecessaryFast look-up byindexMaintains orderingof elements
;; vector literal representation[1 2 3 4 5];; lookup by zero based index(= (get [1 2 3] 2) 3);; new vector from old one(= (subvec [1 2 3 4 5] 2)
[3 4 5])(= (conj [1 2 3] 4)
[1 2 3 4])(= (assoc [1 3] 0 2) [2 3])
Jan Herich Collections in Clojure 2014-03-19 Mon 6 / 23
Persistent characteristics of Clojure collections
Non-destructive updates
All Clojure persistent collections support functional,non-destructive updates, instead of in-place mutation of dataTo guarantee that updates with such semantics will be fast andmemory efficient, it’s obvious that simple defensive copyingwon’t workLuckily, there is a technique called structural sharing, which canhelp us
Jan Herich Collections in Clojure 2014-03-19 Mon 7 / 23
Persistent characteristics of Clojure collections
Example of structural sharing
Before update After update
Jan Herich Collections in Clojure 2014-03-19 Mon 8 / 23
Sequence abstraction and laziness
Sequence as a powerful abstraction for collections
Sequence is a logical list, persistent and immutable view of thecollectionAll core Clojure collections provide sequence implementationsMost core Clojure transformation functions for manipulatingcollections like filter or map are defined in terms of sequencesThis is very handy when composing collection transformations
Jan Herich Collections in Clojure 2014-03-19 Mon 9 / 23
Sequence abstraction and laziness
Sequences explained
You can call seq on any Clojure collection, which yields sequenceimplementation appropriate to the collection. This implementationprovides following basic guarantees (which are defined in terms of theISeq interface under the hood):;; Returns the first item in the collection. Calls seq;; on its argument. If coll is nil, returns nil(first coll);; Returns a sequence of the items after the first.;; Calls seq on its argument. If there are no more items,;; returns a logical sequence for which seq returns nil(rest coll);; Returns a new seq where item is the first element;; and seq is the rest(cons item seq)
Jan Herich Collections in Clojure 2014-03-19 Mon 10 / 23
Sequence abstraction and laziness
How Clojure leverages sequences
As already mentioned, many Clojure functions are defined in terms ofsequences, for example, have a look at greatly simplified mapimplementation:(defn map [f coll]
(when-let [s (seq coll)](cons (f (first s)) (map f (rest s)))))
This enable the map function to operate on any collection whichsatisfies sequence interface, because the map function calls seq on itssecond (coll) argument. Notice that the map returns sequence aswell, with the consequence, that functions operating on sequencescan be easily composed together.
Jan Herich Collections in Clojure 2014-03-19 Mon 11 / 23
Sequence abstraction and laziness
Composing collection transformations
;; filter countries, calculate densities and sort them(->> ’({:code "SK" :area 49035 :population 5415949}
{:code "CZ" :area 78866 :population 10513209}{:code "AT" :area 83855 :population 8414638}{:code "HU" :area 93030 :population 9908798})
(filter (fn [country](> (get country :area) 80000)))
(map (fn [country](assoc country :density
(double (/ (get country :population)(get country :area))))))
(sort-by (fn [country](get country :density))))
Jan Herich Collections in Clojure 2014-03-19 Mon 12 / 23
Sequence abstraction and laziness
Laziness
As it turns out, it’s very easy to express infinite sequences, justby defining some recursive relations between sequence elementsClojure gives us many functions for infinite sequences, such asiterate;; infinite stream of ascending numbers from zero(iterate inc 0);; to avoid blocking the consuming thread, use take(take 10 (iterate inc 0))To be able to express such infinite sequences, we need to expresslazinessIn fact, most Clojure core functions (for example map) aredefined as lazy so they can consume and produce lazy sequences
Jan Herich Collections in Clojure 2014-03-19 Mon 13 / 23
Sequence abstraction and laziness
How to express laziness in Clojure
;; define fibonacci number as lazy sequence with;; the help of lazy-seq macro(defn fib [a b]
(cons a (lazy-seq (fib b (+ a b)))));; consume first ten numbers from sequence(take 10 (fib 0 1));; map is lazy as well(take 10 (map (fn [x] (* 3 x)) (fib 0 1)))
Jan Herich Collections in Clojure 2014-03-19 Mon 14 / 23
Reducers - better performance and parallelism
Reducers, or another useful collection abstraction
Why another abstraction if we already have sequences ?1 Laziness is great when we need it, but not always2 Sequence is fundamentally serial3 Those two points are problems if we want high-performing
solution which can easily exploit parallelism
Therefore, we need to find some new notion of collection, evensimpler one than sequence abstractionThe new, minimalist notion of collection is something which isreducible
Jan Herich Collections in Clojure 2014-03-19 Mon 15 / 23
Reducers - better performance and parallelism
How is reducible defined
It’s important to understand the reduce function:;; this is a simplified definition of reduce(defn reduce [f init coll]
(if-let [s (seq coll)](reduce f (f init (first s)) (rest s))init))
;; this is how we call reduce with reducing function(reduce (fn [accumulator item]
(* accumulator item))1’(1 2 3 4 5 6 7))
Reducible is something which can reduce itself, and we are notinterested in actual mechanism
Jan Herich Collections in Clojure 2014-03-19 Mon 16 / 23
Reducers - better performance and parallelism
Digging deeper into reducers
Reducers are about transformation of reducing functions;; new simplified definition of map(defn mapping [f]
(fn [f1](fn [accumulator item]
(f1 accumulator (f item)))))Reducers library offer alternatives to sequence functions definedsimilar to mapping above => as a higher order functions whichtransform the reducing step to include the logic of mapping,filtering, etcWhat’s particularly nice, is that those functions consist only ofthe core logic of their operations
Jan Herich Collections in Clojure 2014-03-19 Mon 17 / 23
Reducers - better performance and parallelism
Applying reducers
If we keep the definition of mapping from previous slide, ourcode would be little strange;; our sequence based code(reduce + 0 (map (fn [x] (* x 3)) ’(1 2 3)));; and equivalent reducers based code(reduce ((mapping (fn [x] (* x 3))) +) 0 ’(1 2 3))Luckily, we are in a LISP land, so reducers library handles suchdetails with the help of macros and we are working withfunctions which have the same shape as before;; require reducers library(require ’[clojure.core.reducers :as r]);; use it(reduce + 0 (r/map (fn [x] (* x 3)) 0 ’(1 2 3)))
Jan Herich Collections in Clojure 2014-03-19 Mon 18 / 23
Reducers - better performance and parallelism
What we gain and what we loose
Reducers are faster and more memory efficient then theirsequence based counterparts, specially when moretransformations are chained (have a look at slide 12), becauseno intermediate sequences are producedThis is because composing reducers functions merely creates arecipe for future reduction, no work is done until reduce is calledWe loose laziness in the process, so we can’t write thisexpression with reducers anymore(take 10 (r/map (fn [x] (* 3 x)) (fib 0 1)))(compiler will complain, because unlike normal map, r/mapdoesn’t return a sequence)
Jan Herich Collections in Clojure 2014-03-19 Mon 19 / 23
Reducers - better performance and parallelism
Enter parallelism
With reducers, core collection operations are freed from lazinessand representation, but we are stuck with reduce function whichis serial as wellBut we can parallelize reduction by using independentsub-reductions and combining their resultsThere is a function which does just that: foldfold takes an combining function, reducing function andcollection and returns the result of combining the results ofreducing sub-segments of the collection, potentially in parallel
Jan Herich Collections in Clojure 2014-03-19 Mon 20 / 23
Reducers - better performance and parallelism
Fold example
(require ’[clojure.core.reducers :as r]);; we use the same combine and reduce function(r/fold + + [1 2 3 4 5 6]);; when this is the case, it’s enough to supply;; just reducing function and fold will use it;; to combine the the sub-reductions(r/fold + [1 2 3 4 5 6])
Jan Herich Collections in Clojure 2014-03-19 Mon 21 / 23
Reducers - better performance and parallelism
Conclusion
Fold will take advantage of collections which are amenable toparallel subdivision, ideal candidates are trees, such as Clojurevectors and mapsParallel implementations of fold for those collections are basedupon Java ForkJoin frameworkIf the underlying collection is not suited for parallel subdivision(as is the case with sequence), fold just devolves into reduce
Jan Herich Collections in Clojure 2014-03-19 Mon 22 / 23