Machine Learning Methods: Applications in Biology and ...

Machine Learning Methods:Applications in Biology and ComputerVisionMario Banuelos, California State University, Fresno7 mbanuelos22 � www.mbgmath.com R [email protected]

http://twitter.com/mbanuelos22

http://www.mbgmath.com

mailto: [email protected]

Outline

1 Introduction & Background

2 Neural Networks and Computer Vision

3 Detecting Genomic Variation

4 Conclusions

M.Banuelos (Fresno State) [email protected] February 18, 2019

Introduction & Background


Applications of Machine Learning

Self-driving cars

Voice Assistants

Object Classification/Detection


Artificial Intelligence vs. Machine Learning

Machine LearningMachine learning is the scientific study of algorithms and statistical modelsthat computers use to understand data.


Simple Linear Regression

I Models the relationship between 2 variables, X and Y , a predictorand a quantitative response by assuming a linear relationship

E (Y |X = x) = β0 + β1X

I The goal is to find parameters for our equation (β0, β1) such that weminimize the error between data and line.

I Coefficients β0 and β1 are computed using least squares, resulting inthe OLS line.


Linear Regression (cont.)

I We begin by choosing a metric to compare our predictions yi to datayi. This loss, error, or cost function may be defined as

L(β0, β1) =∑i

(yi − yi)2

=∑i

(yi − β0 − β1xi)2

I Then we find the βs, by solving

∂L(β0, β1)β0

= −2n∑i=1

(yi − β0 − β1xi) = 0

∂L(β0, β1)β1

= −2n∑i=1

xi(yi − β0 − β1xi) = 0.


OR Gradient DescentWe can also update our guess for βold

0 , with learning step η, as

βnew0 = βold

0 − η(∂L(β0, β1)

βold0

)


Moving to Classification

Assume we want to classify the following

How would you separate the headphones from the glasses?


Logistic Regression

A few key differences . . .I Loss function becomes L = − (y log(p) + (1− y) log(1− p)), wherep is the predicted probability of being in a class.


Logistic Regression

A few key differences . . .

We use log(

p1−p

)= β0 + β1X where p is referred to as the probability

that y = 1.


Neural Networks and Computer Vision


Logistic Regression as Neural Network

Neural NetworkA class of machine learning algorithms which take input and passes itthrough interconnected nodes to predict a response.

Figure: An example of a perceptron.

The Sigmoid Function

σ(x) = 1(1 + e−x)


Deep Learning - Neural Networks

DefinitionA class of machine learning algorithms which uses a cascade of multiplelayers of nonlinear processing units (neurons) (i.e., Affine transformationsfollowed by nonlinear transformations).

Note: σ(·) will represent nonlinear transformations.

x1 h11

Inputsx2 h12

ySigmoid(output)

Hidden Layer 1

h21

h22

Hidden Layer 2


Applications in the Medical Field


Computer Vision

Figure: MNIST Images Figure: Corrupted MNIST Images


Computer Vision – but harder


Seeing Numbers as a Computer

Figure: Two images displaying that a black and white image is just a matrix ofnumbers between 0 and 255, 0 representing white and 255 representing black.

Question: If we were interested in classifying numbers as either an 8 or a0, how could we incorporate this into a logistic regression model?


Decision Boundaries are often Nonlinear

I Logistic regression will perform worse than a neural network with only1 hidden layer

I These tools are generalizable to natural language processing, biology,societal applications, and many more fields.

I But . . . more mathematicians are needed to design studies andinterpret the results.


Detecting Genomic Variation


Genomic Variation within Species


DNA Sequencing

1. Unknown genome ischosen.

2. Fragment unknowngenome.

3. Ends of fragments aresequenced.

4. Ends aligned toreference genome.


Probabilistic Generative Model of Sequencing

Lander-Waterman Statistics - Genome sequencing can be modeled as apoisson process.

I Assumes independence.

I Assumes number of fragments in a region follows Poisson distribution.



Assume G is the genome length and L the fragment length.



Assume G is the genome length and L the fragment length.

Then, for an interval [x, x+ L],

P (observing one fragment in an interval) = L

G.


Binomial Distribution

What is the probability of drawing exactly one red marble in 2 trials (withreplacement)?




2× P (red)P (not red)




2× P (red)P (not red) = 2× (0.4)(0.6) = 0.48

.M.Banuelos (Fresno State) [email protected] February 18, 2019


What is the probability of drawing exactly one red marble in n trials?



What is the probability of drawing exactly one red marble in n trials?(n

1

)× P (red)P (not red)n−1



What is the probability of drawing exactly one red marble in n trials?(n

1

)× P (red)P (not red)n−1 = n× (0.4)(0.6)n−1

.



For observing k number of fragments in a fixed interval, with n totalfragments, we have

P (X = k) =(n

k

)pk(1− p)n−k,

or B(n, p), where p is the probability of success.



For observing k number of fragments in a fixed interval, with n totalfragments, we have

P (X = k) =(n

k

)pk(1− p)n−k,

or B(n, p), where p is the probability of success. This distribution hasmean np. Let λ = np⇒ p = λ

n .


From Binomial to Poisson

Sequencing involves DNA duplication and therefore requires large n, so

limn→∞

B(n, p) = limn→∞

(n

k

)pk(1− p)n−k

= limn→∞

(n

k

)(λ

n

)k (1− λ

n

)n−k= λk lim

n→∞n!

k!(n− k)!

( 1n

)k (1− λ

n

)n−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

n

)n (1− λ

n

)−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

n

)n︸︷︷︸→e−λ

(1− λ

n

)−k︸︷︷︸

→1

= λk

k! e−λ = Poisson(λ)




limn→∞


(n

k

)pk(1− p)n−k

= limn→∞

(n

k

)(λ

n

)k (1− λ

n

)n−k

= λk limn→∞

n!k!(n− k)!

( 1n

)k (1− λ

n

)n−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

n

)n (1− λ

n

)−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

n

)n︸︷︷︸→e−λ

(1− λ

n

)−k︸︷︷︸

→1

= λk





limn→∞


(n

k

)pk(1− p)n−k

= limn→∞

(n

k

)(λ

n

)k (1− λ

n

)n−k= λk lim

n→∞n!

k!(n− k)!

( 1n

)k (1− λ

n

)n−k

= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

n

)n (1− λ

n

)−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

n

)n︸︷︷︸→e−λ

(1− λ

n

)−k︸︷︷︸

→1

= λk





limn→∞


(n

k

)pk(1− p)n−k

= limn→∞

(n

k

)(λ

n

)k (1− λ

n

)n−k= λk lim

n→∞n!

k!(n− k)!

( 1n

)k (1− λ

n

)n−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

n

)n (1− λ

n

)−k

= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

n

)n︸︷︷︸→e−λ

(1− λ

n

)−k︸︷︷︸

→1

= λk





limn→∞


(n

k

)pk(1− p)n−k

= limn→∞

(n

k

)(λ

n

)k (1− λ

n

)n−k= λk lim

n→∞n!

k!(n− k)!

( 1n

)k (1− λ

n

)n−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

n

)n (1− λ

n

)−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

n

)n︸︷︷︸→e−λ

(1− λ

n

)−k︸︷︷︸

→1

= λk



Generalized Variant Detection Framework

To detect genomic variants, we seek to

Maximize P (data | DNA sequencing assumptions)subject to Relatedness Constraints,


Generalized Variant Detection Framework

To detect genomic variants, we seek to

Maximize P (data | DNA sequencing assumptions)subject to Relatedness Constraints,

whereI Sequencing assumptions result in different probabilistic models

(which can be nonconvex).I Constraints result in different feasible regions (depending on

the family structure).


Maximum Likelihood Approach

P (~y| Coverage λ)



P (~y| Coverage λ) =N∏i=1

P (yi|λ)




P (yi|λ)

Real SV Error




P (yi|λ)

Real SV Error

∼ Poisson(λ) P (error)




P (yi|λ)

Real SV Error

∼ Poisson(λ) P (error)

e−λλyiyi!

Using convex optimization, we can maximize P (~y|λ) for each location i!


Sparsity Penalty


Inheritance of Structural Variants (SVs)


Method Testing and Validation

I We consider both simulated and real data from 1000 GenomesProject.

I Results are compared against enforcing sparsity without relatedness.

I ROC Curves used to compare both methods.


Method Testing and Validation - ROC Curves

Image Source: Wikimedia CommonsM.Banuelos (Fresno State) [email protected] February 18, 2019

1000 Genomes Data

I Real data is from 1000 Genomes Project, CEUfather-mother-daughter trio sequenced at low (∼ 4X) coverage.

I Observations for possible SVs obtained from variant detectionmethod.

I Experimentally validated deletions > 250bp considered as true signal.

Source: D. M. Altshuler, E. S. Lander, L. Ambrogio, T. Bloom, K. Cibulskis, T. J. Fennell, S. B.Gabriel, D. B. Jaffe, E. Shefler, C. L. Sougnez, et al., A map of human genome variation frompopulation scale sequencing. Nature, vol. 467, no. 7319, pp. 1061–1073, 2010


1000 Genomes Data Results


Applications to Real Data - 1 Parent, 1 Child Model

0 1 2 3 4 5·104

0

500

1,000

1,500

Novel Deletions

True

Posit

ives

ROC Curves for 1KG Parent Signals

Neg. BinomialThresholding


Conclusions


Conclusions

I There are plenty of areas you can apply your mathematics andstatistics backgrounds (this is just a sample).

I I like to work with data.I I like to work with data. I like to work with low-quality data.I If you are around next semester and this is interesting, shoot me an

email.


Conclusions


I I like to work with data.

I I like to work with data. I like to work with low-quality data.I If you are around next semester and this is interesting, shoot me an

email.


Conclusions


I I like to work with data.I I like to work with data. I like to work with low-quality data.

I If you are around next semester and this is interesting, shoot me anemail.


Conclusions


I I like to work with data.I I like to work with data. I like to work with low-quality data.I If you are around next semester and this is interesting, shoot me an

email.


Machine Learning Methods: Applications in Biology and ...

Documents

Transcript of Machine Learning Methods: Applications in Biology and ...