Intro to NN & FL

42
Neural Networks Neural Networks & Fuzzy Logic & Fuzzy Logic Introduction Introduction Aleksandar Raki Aleksandar Raki ć ć rakic rakic @etf.rs @etf.rs

Transcript of Intro to NN & FL

ETFBeograd

Neural Networks & Fuzzy LogicIntroductionAleksandar Raki [email protected]

Neural Networks0 0 0 1 0 0 0

adjustable weights

1

20

37

10

1

1

2

Neural Networks

Definition & Area of ApplicationNeural Networks (NN) are: mathematical models that resemble nonlinear regression models, but are also useful to model nonlinearly separable spaces knowledge acquisition tools that learn from examples Neural Networks are used for: pattern recognition (objects in images, voice, medical diagnostics for diseases, etc.) exploratory analysis (data mining) predictive models and control

3

Neural Networks

Biological Analogy

4

Neural Networks

PerceptronsOutput of unit j:Output units

j

oj = f(aj)

Input to unit j: aj = S wijai

Input to unit i: ai

measured value of variable iiInput units

5

Neural Networks

Example: Logical AND function with NNy

q = 0.5w2 x2

w1x1 input output 0 00 01 0 10 0 11 1 f(x1w1 + x2w2) = y f(0w1 + 0w2) = 0 f(0w1 + 1w2) = 0 f(1w1 + 0w2 ) = 0 f(1w1 + 1w2 ) = 1

y=f(a) = q

1, for a > q

0, for a q

some possible values for w1 and w2 w1 w2 0.20 0.35 0.20 0.40 0.25 0.30 0.40 0.206

Neural Networks

Example: Perceptrons (NN) in Medical DiagnosticsInput unitsCough Headache

weights

D rule change weights to decrease the error

No disease

Pneumonia

Flu

Meningitis

-

what we got what we wanted error

Output units7

Neural Networks

Linear SeparationMeningitisNo cough Headache

FluCough Headache

01

11 No treatment Treatment

00 No diseaseNo cough No headache

10 PneumoniaCough No headache010 011 110 111

101 000 100

8

Neural Networks

Nonlinear SeparationLinear Linear Discriminant Activation Logistic Nonlinear Activation Regression

Y = a(X) + b

Y=

1 1 + e -a(X) + b

9

Neural Networks

Multilayered PerceptronsOutput units

k

Output of unit k: ok = 1/ (1 + e - ( ak+q k) ) Input to unit k: ak = Swjkoj Output of unit j:

Multilayered perceptron Perceptron

Hidden units

j

oj = 1/ (1 + e - ( a j+qj) ) Input to unit j: aj = Swijai

iInput units

Input to unit i: ai measured value of variable i

10

Example: Multilayer NN for Diagnosis of Abdominal PainAppendicitis Diverticulitis Perforated Duodenal Non-specific Cholecystitis Ulcer Pain Small Bowel Obstruction Pancreatitis

Neural Networks

0

0

0

1

0

0

0

adjustable weights

1

20

37

10

1

1

Male

Age

Temp

WBC

Pain Intensity

Pain Duration

11

Neural Networks

Regression vs. Neural NetworksY Y

X1 X1 X2 X3 X1X2 X1X3 X2X3 X1X2 X3

X2

X1X3

X1X2X3

(2 3 -1) possible combinationsY = a(X1) + b(X2) + c(X3) + d(X1X2) + ...

X1

X2

X3

Jargon Pseudo-Correspondence

Independent variable = input variable Dependent variable = output variable Coefficients = weights Estimates = targets Cycles = epoch12

Neural Networks

Logistic Regression ModelInputs Age Gender Stage 34 2 45 4

Output

SS = 34*.5 + 1*.4 + 4*.8 = 20.6

Probability of beingAlive

0.6

8

Independent variables x1, x2, x3

Coefficients

Dependent variablePrediction13

a, b, c

Neural Networks

Neural Network ModelActivation functions

Inputs Age Gender Stage Input variables 34.6 .2S

Linear Threshold or step function Logistic, sigmoid, squash Hyperbolic tangent

.4

Output 0.6

24

.1 .3 .7 .2 .2S

.5

.8

S

Probability of beingAlive

Weights

Hidden Layer

Weights

Output variable Prediction14

Neural Networks

Learning: Hidden Units and Backpropagation

15

Neural Networks

Minimizing the ErrorError FunctionsError surface initial error

negative derivative

Mean Squared Error (for most problems) (t - o)2/n

final error local minimum

Cross Entropy Error (for dichotomous or binary outcomes) - (t ln o) + (1-t) ln (1-o)

winitial wtrainedpositive change

Epochs16

Neural Networks

Implementation of Learning: Gradient descent & MinimaError

Global minimum

Local minimumEpochs17

Neural Networks

Implementation of Learning: Problem of OverfittingOverfitted model Real model Overfitted model

CHD

errorholdout

training

0

age

epochs

18

Neural Networks

Implementation of Learning: Problem of Overfittingtss Overfitted model

tss amin (Dtss) tss bStopping criterion

a = test set

b = training set

Epochs19

Neural Networks

Parameter EstimationLogistic Regression It models just one function

Neural Network It models several functions

Maximum likelihood Fast Optimizations Fisher Newton-Raphson

Backpropagation Iterative Slow Optimizations Quickprop Scaled conjugate g.d. Adaptive learning rate

20

Neural Networks

What Do You Want? Insight versus PredictionInsight into the model

Accurate predictions

Explain importance of each variable Assess model fit to existing data

Make a good estimate of the real probability Assess model prediction in new data

21

Neural Networks

Model Selection: Finding Influential VariablesLogistic

Neural Network

Forward Backward Stepwise Arbitrary

Weight elimination Automatic Relevance Determination Relevance

All combinationsRelative risk

22

Neural Networks

Regression Diagnostics: Finding Influential ObservationsLogistic

Neural Network

Analysis of residuals Cooks distance Deviance Difference in coefficients when case is left out

Ad-hoc

23

Neural Networks

How Accurate are Predictions?

Construct training and test sets or bootstrap to assess unbiased errorAssess

Discrimination

How model separates alive and deadHow close the estimates are from real probability

Calibration

24

Neural Networks

Unbiased Evaluation: Training and Tests Sets

Training set is used to build the model (may include holdout set to control for overfitting) Test set left aside for evaluation purposes Ideal: yet another validation data set, from different source to test if model generalizes to other settings

25

Neural Networks

Evaluation of NN

26

Neural Networks

More Examples: ECG InterpretationQRS amplitude R-R interval SV tachycardia QRS duration Ventricular tachycardia AVF lead LV hypertrophy S-T elevation RV hypertrophy Myocardial infarction P-R interval27

Neural Networks

More Examples: Thyroid DiseasesClinical nding 1 Partial Patient diagnoses data Hidden layer (5 or 10 units) Normal . . Hypothyroidism Patients who will be evaluated TSH further T4U Hyperthyroidism T3 Other conditions TT4 TBG Additional input Clinical nding 1 . . . . . Patient data Hidden layer (5 or 10 units) Normal Final diagnoses

. . .

Hypothyroidism Primary hypothyroidism Compensated hypothyroidism Secondary hypothyroidism Other conditions

TSH T4U

28

Neural Networks

Expert Systems and Neural Nets

29

Fuzzy Logic

30

Fuzzy Logic

Definition

Experts rely on common sense when they solve problems. How can we represent expert knowledge that uses vague and ambiguous terms in a computer? Fuzzy logic is not logic that is fuzzy, but logic that is used to describe fuzziness. Fuzzy logic is the theory of fuzzy sets, sets that calibrate vagueness. Fuzzy logic is based on the idea that all things admit of degrees. Temperature, height, speed, distance, beauty all come on a sliding scale. The motor is running really hot. Tom is a very tall guy.31

Fuzzy Logic

Definition

Many decision-making and problem-solving tasks are too complex to be understood quantitatively, however, people succeed by using knowledge that is imprecise rather than precise. Fuzzy set theory resembles human reasoning in its use of approximate information and uncertainty to generate decisions. It was specifically designed to mathematically represent uncertainty and vagueness and provide formalized tools for dealing with the imprecision intrinsic to many engineering and decision problems in a more natural way. Boolean logic uses sharp distinctions. It forces us to draw lines between members of a class and non-members. For instance, we may say, Tom is tall because his height is 181 cm. If we drew a line at 180 cm, we would find that David, who is 179 cm, is small. Is David really a small man or we have just drawn an arbitrary line in the sand?32

Fuzzy Logic

Bit of History

Fuzzy, or multi-valued logic, was introduced in the 1930s by Jan Lukasiewicz, a Polish philosopher. While classical logic operates with only two values 1 (true) and 0 (false), Lukasiewicz introduced logic that extended the range of truth values to all real numbers in the interval between 0 and 1. For example, the possibility that a man 181 cm tall is really tall might be set to a value of 0.86. It is likely that the man is tall. This work led to an inexact reasoning technique often called possibility theory. In 1965 Lotfi Zadeh, published his famous paper Fuzzy sets. Zadeh extended the work on possibility theory into a formal system of mathematical logic, and introduced a new concept for applying natural language terms. This new logic for representing and manipulating fuzzy terms was called fuzzy logic.33

Fuzzy Logic

Why Fuzzy Logic?

Why fuzzy? As Zadeh said, the term is concrete, immediate and descriptive; we all know what it means. However, many people in the West were repelled by the word fuzzy, because it is usually used in a negative sense. Why logic? Fuzziness rests on fuzzy set theory, and fuzzy logic is just a small part of that theory. The term fuzzy logic is used in two senses: Narrow sense: Fuzzy logic is a branch of fuzzy set theory, which deals (as logical systems do) with the representation and inference from knowledge. Fuzzy logic, unlike other logical systems, deals with imprecise or uncertain knowledge. In this narrow, and perhaps correct sense, fuzzy logic is just one of the branches of fuzzy set theory. Broad Sense: fuzzy logic synonymously with fuzzy set theory34

Fuzzy Logic

Fuzzy Applications

Theory of fuzzy sets and fuzzy logic has been applied to problems in a variety of fields: taxonomy; topology; linguistics; logic; automata theory; game theory; pattern recognition; medicine; law; decision support; Information retrieval; etc. And more recently fuzzy machines have been developed including: automatic train control; tunnel digging machinery; washing machines; rice cookers; vacuum cleaners; air conditioners, etc.

35

Fuzzy Logic

Fuzzy ApplicationsAdvertisement:

Extraklasse Washing Machine - 1200 rpm. The Extraklasse machine has a number of features which will make life easier for you. Fuzzy Logic detects the type and amount of laundry in the drum and allows only as much water to enter the machine as is really needed for the loaded amount. And less water will heat up quicker - which means less energy consumption. Foam detection Too much foam is compensated by an additional rinse cycle: If Fuzzy Logic detects the formation of too much foam in the rinsing spin cycle, it simply activates an additional rinse cycle. Fantastic! Imbalance compensation In the event of imbalance, Fuzzy Logic immediately calculates the maximum possible speed, sets this speed and starts spinning. This provides optimum utilization of the spinning time at full speed [] Washing without wasting - with automatic water level adjustment Fuzzy automatic water level adjustment adapts water and energy consumption to the individual requirements of each wash programme, depending on the amount of laundry and type of fabric []36

Fuzzy Logic

More Definitions

Fuzzy logic is a set of mathematical principles for knowledge representation based on degrees of membership. Unlike two-valued Boolean logic, fuzzy logic is multi-valued. It deals with degrees of membership and degrees of truth.

Fuzzy logic uses the continuum of logical values between 0 (completely false) and 1 (completely true). Instead of just black and white, it employs the spectrum of colours, accepting that things can be partly true and partly false at the same time.

0

0

0 1

1

1

0 0

0.2

0.4

0.6

0.8

1 1

(a) Boolean Logic.

(b) Multi-valued Logic.37

Fuzzy Logic

Fuzzy Sets

The concept of a set is fundamental to mathematics. However, our own language is also the supreme expression of sets. For example, car indicates the set of cars. When we say a car, we mean one out of the set of cars. The classical example in fuzzy sets is tall men. The elements of the fuzzy set tall men are all men, but their degrees of membership depend on their height.

Name Chris Mark John Tom David Mike Bob Steven Bill Peter

Height, cm 208 205 198 181 179 172 167 158 155 152

Degree of Membership Crisp Fuzzy 1 1 1 1 0 0 0 0 0 0 1.00 1.00 0.98 0.82 0.78 0.24 0.15 0.06 0.01 0.00

38

Fuzzy Logic

Crisp vs. Fuzzy SetsThe x-axis represents the universe of discourse the range of all possible values applicable to a chosen variable. In our case, the variable is the man height. According to this representation, the universe of mens heights consists of all tall men. The y-axis represents the membership value of the fuzzy set. In our case, the fuzzy set of tall men maps height values into corresponding membership values.Degree of Membership 1.0 0.8 0.6 0.4 0.2 0.0 150 Degree of Membership 1.0 0.8 0.6 0.4 0.2 0.0 150 160 170 180 190 200 210 Height, cm 160 170 180 Fuzzy Sets 190 200 210 Height, cm Crisp Sets

Tall Men

39

Fuzzy Logic

A Fuzzy Set has Fuzzy Boundaries

Let X be the universe of discourse and its elements be denoted as x. In the classical set theory, crisp set A of X is defined as function fA(x) called the characteristic function of A:

For any element x of universe X, characteristic function fA(x) is equal to 1 if x is an element of set A, and is equal to 0 if x is not an element of A.

1, if x A f A ( x) = fA(x) : X {0, 1}, where 0, if x A

In the fuzzy theory, fuzzy set A of universe X is defined by function A(x) called the membership function of set A

A(x) : X {0, 1}, where A(x) = 1 if x is totally in A; A(x) = 0 if x is not in A; 0 < A(x) < 1 if x is partly in A.For any element x of universe X, membership function A(x) is the degree of membership to which x is an element of set A.40

Fuzzy Logic

Fuzzy Set Representation

First, we determine the membership functions. In our tall men example, we can obtain fuzzy sets of tall, short and average men. The universe of discourse the mens heights consists of three sets: short, average and tall men. As you will see, a man who is 184 cm tall is a member of the average men set with a degree of membership of 0.1, and at the same time, he is also a member of the tall men set with a degree of 0.4.

Degree of Membership 1.0 0.8 0.6 0.4 0.2 Short

Crisp Sets

Average

Short Tall Tall Men

0.0 150 Degree of Membership 1.0 0.8 0.6 0.4 0.2 0.0 150 160 170 180 190 200 210 Tall Short Average Tall 160 170 180 Fuzzy Sets 190 200 210 Height, cm

41

Fuzzy Logic

Linguistic Variables and Inference

At the root of fuzzy set theory lies the idea of linguistic variables. A linguistic variable is a fuzzy variable. For example, the statement John is tall implies that the linguistic variable John takes the linguistic value tall. In fuzzy expert systems, linguistic variables are used in fuzzy rules. For example: IF wind is strong THEN sailing is good IF THEN IF THEN project_duration completion_risk speed stopping_distance is long is high is slow is short42