Deep Learning for the Working Category...

Deep Learning for the Working Category Theorist

Jade MasterUniversity of California Riverside

jmast003@ucr.edu

February 12, 2019

February 12, 2019 1 / 28

Overview

1 Deep Learning With Commutative Diagrams

2 Compositional Learning Algorithms

February 12, 2019 2 / 28

A good intro to deep learning from a categorical perspective can be foundin Jake Bian’s Deep Learning on C∞ Manifolds [1]. Deep learning solvesthe following problem.

π1π2

Where D ⊆ X × Y and π1 and π2 are the canonical projections.

Commuting approximately is enough. We don’t want to overfit.

any old function won’t do! f should extrapolate the data from D.

February 12, 2019 3 / 28

To get more of this extrapolating flavor we equip Y with a metric (maybewithout some axioms)

L : Y × Y → R+

and measure the error of f via

E (f ) =∑

(x ,y)∈D

L(f (x), y)

We also require f to be within some restricted class of functions betweenX and Y (e.g. linear, smooth). This allows us to impose some externalassumptions about the way the relationship between X and Y shouldbehave.

February 12, 2019 4 / 28

For example we can require f to be smooth, so that f represents somephysical, differentiable relationship. In this case, minimizing E can beframed as a problem in variational calculus. Physicists and differentialgeometers know how to solve this using the Euler-Lagrange equation.

Unfortunately we are interested in the case when D and the dimension ofX are very large. So this would require solving a massive partialdifferential equation. This is not practical.

February 12, 2019 5 / 28

Let’s stick with ”smooth” as our class of functions. To make things moremanageable let’s consider a family of functions paramaterized by somemanifold Σ. Our class of functions can be summarized by a map

α : Σ× X → Y

which is smooth in each variable. The key idea of deep learning isminimize the error E on element of D at a time obtaining a sequence offunctions which converges to the global minimum.

February 12, 2019 6 / 28

That is, find a path on Σ,

γ : [0, 1]→ Σ

such that the induced 1-parameter family of functions

ft = α(γ(t),−) : X → Y

converges to a minimum of the functional E . One way to obtain such acurve is to

Evaluate the derivatives of E , giving a vector field dE on Σ

Numerically integrate −dE starting from an initial parameter λ0.Because we’re doing this numerically we’ll use a finite sequence offunctions rather than a smooth family.

Let µ : TΣ→ Σ be the integrator. For a fixed step size this functionadvances a parameter in the direction of a given tangent vector.

February 12, 2019 7 / 28

February 12, 2019 8 / 28

Neural Networks

Neural networks are a special case of this. Let X and Y be vector spaceswith dimension m and n respectively. It’s natural to choose linear maps asour class of functions. This gives ”the line of best fit” which is a relativelycrude tool. Instead we choose an intermediate vector space V withdimension k and a nonlinear map σ : V → V . The class of maps we willconsider will be of the form

Xa0 // V

σ // Va1 // Y

where f and g are linear maps. In this case

Σ = GL(m, k)× GL(k , n)

This is called a one layer neural network.

February 12, 2019 9 / 28

This can be generalized to multiple layers. Indeed, choosing a finitesequence of vector spaces and nonlinear maps {Vi , σi} you can considerfunctions of the form

Xa0 // V0

σ0 // V0a1 // V1 . . .Vn

σn // Vnan // Y

for your training algorithm. This is a multiple layer neural network.

February 12, 2019 10 / 28

February 12, 2019 11 / 28

Backprop as a Functor introduces a categorical framework for building andtraining large neural networks as a sequence of smaller networks. First weneed to abstractify the above discussion.

Definition

Let A and B be sets. A supervised learning algorithm from A to B istuple (P, I ,U, r) where P is a set and I , U and r are functions of thefollowing form.

I : P × A→ B

U : P × A× B → P

r : P × A× B → A

The request function seems mysterious at first. It allows you to composelearning algorithms. Given training data (x , y) ∈ X × Y and(y , z) ∈ Y × Z we need to synthesize this into a pair (x , z) ∈ X × Z . Therequest function does this by allowing the data to flow backwards. Given acomparison between the desired input and output and the request functiontells you what the input should have been to get a better result.

February 12, 2019 12 / 28

Composition of Learning Algorithms

Given learners (P, I ,U, r) : A→ B and (Q, J,V , s) : B → C we constructa new learner from A to C with parameter set P × Q. Let’s use stringdiagrams!

We can draw implement functions like

and the update request functions as

February 12, 2019 13 / 28

Composition of implementations is straightforward

but composition of update and request is a little more complicated.

February 12, 2019 14 / 28

Two learners are equivalent if there is a bijection between their sets ofparameters commuting with the implement, update and request functions

Definition

Let Learn be the category where the objects are sets, the morphisms areequivalence classes of learners, and composition is defined as above.

Theorem

Learn is a symmetric monoidal category.

On objects the monoidal product is given by cartesian product. Themonoidal product is given on morphisms as follows.

February 12, 2019 15 / 28

For implemement functions

and for update and request

February 12, 2019 16 / 28

So far these learning algorithms are too general and abstract. We canimpose extra structure on them via symmetric monoidal functors.

Definition

Let Para be the category where

objects are Euclidean spaces and,

a morphism from Rn to Rm is a differentiable map f : P × Rn → Rm

up to equivalence.

The following theorem gives a symmetric monoidal functor which imposesthe structure of a gradient descent learning algorithm on every morphismin Para.

February 12, 2019 17 / 28

Theorem

Fix a real number h > 0 and a differentiable error function

e : R× R→ R

such that ∂e∂x (z ,−) is invertible for all z in R. Then there is a symmetric

monoidal functorL : Para→ Learn

which is the identity on objects and sends a parameterized functionI : P × A→ B the learner (P, I ,UI , rI ) defined by

Ul(p, a, b) := p − h∇pEI (p, a, b)

andrI := fa (∇aEI (p, a, b))

where EI (p, a, b) :=∑

j e(I (p, a)j , bj) and fa is the component-wise

application of the inverse to ∂e∂x (ai ,−) for each i . Here i ranges over the

dimension of A and j ranges over the dimension of B.

February 12, 2019 18 / 28

Remark: Let’s compare to what we had before: E (p, a, b) = L(I (a, p), b),UI is the composite up to the second Σ, and the request function is notpresent.

February 12, 2019 19 / 28

We define a category where the morphisms gives the shape of a neuralnetwork.

Definition

Let List be a category where

the objects are natural numbers and,

a morphism m→ n is a list of natural numbers (a0, a1, . . . , ak) witha0 = m and ak = n

composition is given by concatenation and this category is symmetricmonoidal using the + operation in N.

The idea is that a morphism (a0, a1, . . . , ak) represents a neural networkwith k − 1 hidden layers and number of neurons given by the ai .

February 12, 2019 20 / 28

Theorem

Let σ : R→ R be a differentiable function. Then there is a symmetricmonoidal functor

F : List→ Para

sends a natural number n to vector space Rn and,

sends a list (a0, a1, . . . , ak) to the parameterized map

Rm × GL(a1, a2)× . . .GL(ak−2, ak−1)→ Rn

which is the alternating composite of the linear maps given by theparameters and the extension of σ to the corresponding vector spaces.

Note that this is the same description described in Deep Learning on C∞

Manifolds

February 12, 2019 21 / 28

What’s all this good for?

Now we have a machine for building neural networks compositionally.

ListF // Para

L // Learn

This is known...but now we’ve generalized it and described it with categorytheory!

February 12, 2019 22 / 28

If you’re one of those folks that loves string diagrams then you’re in luck.This theorem gives a framework for building neural networks, neuron byneuron, using string diagrams.First, fixing a step size and an error function we can use the passage

FVect→ Para→ Learn

to induce a bimonoid structure on Learn. For example using quadraticerror gives the following.

February 12, 2019 23 / 28

February 12, 2019 24 / 28

There’s a scalar multiplication learner λ : R→ R

a bias learner β : 1→ R,

and an activation learner σ : R→ R

I didn’t describe biases in the previous theorem. But they can be addedand adds a parameter of constant weights to your approximation function.

February 12, 2019 25 / 28

Layers of your neural network can be built as the following:

February 12, 2019 26 / 28

Conclusion

Some learning algorithms like regression analysis don’t learn one datapoint at a time. Regression takes the whole data-set into account ateach succesive approximation. It would be nice to have a modelwhich takes this into account.

Can this be made more general?

The data of a neural network is a representation of a linear Quiver.Can we use representation theorems to classify them?

February 12, 2019 27 / 28

References

Jake Bian (2017)

Deep Learning on C∞ Manifolds. Available ashttp://jake.run/pdf/geom-dl.pdf

Brendan Fong, David I. Spivak, and Remy Tuyeras (2017)

Backprop as a Functor: A compositional perspective on supervised learning.Available as https://arxiv.org/abs/1711.10455

February 12, 2019 28 / 28

Deep Learning for the Working Category...

Documents

Transcript of Deep Learning for the Working Category...

Deep Learning with H2O - University of Idahostevel/504/Deep Learning with...Deep Learning with H2O. . 4Deep Learning Overview Unlike the neural networks of the past, modern Deep Learning

Deep learning

Deep Learning Lab Course 2017 (Deep Learning …ais.informatik.uni-freiburg.de/teaching/ws17/deep...Deep Learning Lab Course 2017 (Deep Learning Practical) Labs: (Computer Vision)

Deep Learning - Oxford Statisticspalamara/teaching/SML19/HT19_lecture15.pdf · SB2b/SM4 - Deep Learning Deep Learning Infrastructure! Computational Infrastructure critical to deep

fast.ai - Learning Deep Learning

Lecture 1: Introduction to Deep Learning · uva deep learning course –efstratios gavves introduction to deep learning - 1

Introduction to Deep Learning - Joyce Hojoyceho.github.io/cs534_s17/slide/25-deep-learning.pdf · Deep Learning: Overview ... Review: Backpropogation https: ... • Deep Learning

Introducing Deep Learning with MATLAB - MathWorks · Introducing Deep Learning with MATLAB7 How A Deep Neural Network ... relevant features of an image. With deep learning, ... Deep

CS 678 – Deep Learning1 Deep Learning Early Work Why Deep Learning Stacked Auto Encoders Deep Belief Networks.

UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Deep Learning: An Introduction from the NLP Perspective Deep Learning - An...Deep Learning Approach 1 Deep Learning Approach 2 Disclaimer I am not (yet) an expert in Deep Learning.

Tutorial: Deep Reinforcement Learning · Outline Introduction to Deep Learning Introduction to Reinforcement Learning Value-Based Deep RL Policy-Based Deep RL Model-Based Deep RL

NPDL Global Report - New Pedagogies for Deep Learning · Deep Learning 7 The 6Cs 7 Deep Learning Progressions 9 Impact 16 Key Findings: Deep Learning 18 New Pedagogies 19 Learning

Deep materials informatics: Applications of deep learning in … · deep learning: † Deep learning requires big data: In many cases, the biggest limiting factor for applying deep

Learning Deep Learning

Deep Learning I - Korea Universitydsba.korea.ac.kr/wp/wp-content/seminar/Deep learning/Deep Learning... · Deep Learning I Deep learning is research on learning models with multilayer

Executing Deep Learning Strategies Masterclass Preview - Enterprise Deep Learning

Deep Learning .

Introducing Deep Learning with MATLAB - Systematics · Introducing Deep Learning with MATLAB6 Inside a Deep Neural Network ... Deep learning is a subtype of machine learning. With

THE DEEP LEARNING REVOLUTION - Nvidia...THE DEEP LEARNING REVOLUTION Superhuman Breakthroughs in Modern Artiﬁcial Intelligence Powered by GPUs JOIN THE DEEP LEARNING ERA DEEP LEARNING