new AI

7/31/2019 new AI

1/12

7/31/2019 new AI

2/12

STEPS OF PATTERN RECOGNITION

7/31/2019 new AI

3/12

PARAMETER ESTIMATION

Estimation theory is a branch ofstatisticsandsignal processingthat deals with estimating the values of

parameters based on measured/empirical data that has a random component. The parameters describe an

underlying physical setting in such a way that their value affects the distribution of the measured data.Anestimatorattempts to approximate the unknown parameters using the measurements.

In estimation theory, it is assumed the measured data is random with probability distribution dependent on the

parameters of interest. For example, in electrical communication theory, the measurements which contain

information regarding the parameters of interest are often associated with anoisysignal. Without randomness, or

noise, the problem would bedeterministicand estimation would not be needed.

We can design an optimal classifier if we knew the prior probabilities P(i) and the class-conditional densities

p(x|i). Unfortunately, in pattern recognition applications we rarely if ever have this kind of complete knowledgeabout the probabilistic structure of the problem. In a typical case we merely have some vague, general knowledge

about the situation, together with a number of design samples or training data particular representatives of the

patterns we want to training data classify. The problem, then, is to find some way to use this information to design

or train the classifier.

One approach to this problem is to use the samples to estimate the unknown probabilities and probability

densities, and to use the resulting estimates as if they were the true values. In typical supervised pattern

classification problems, the estimation of the prior probabilities presents no serious difficulties. However,

estimation of the class-conditional densities is quite another matter.

The problem of parameter estimation is a classical one in statistics, and it can be approached in several ways. We

shall consider two common and reasonable procedures, maximum likelihood estimation and Bayesian estimation.

Although the results obtained with these two procedures are frequently nearly identical, the approaches are

conceptually quite different.

One is computational complexity and here maximum likelhood methodsare often to be preferredsince they require merely differential

calculus techniques or gradient search for , rather than a possibly complex multidimensional integration needed in Bayesian estimation.

Interpretability. In many cases the maximum likelihood solution will be easier to interpret and understand since it returns the single best

model from the set the designer provided (and presumably understands).In contrast Bayesian methods give a weighted average of models

(parameters), often leading to solutions more complicated and harder to understand than those provided by the designer. The Bayesian

approach reflects the remaining uncertainty in the possible models.
http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Noise_(physics)http://en.wikipedia.org/wiki/Noise_(physics)http://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Noise_(physics)http://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Statistics

7/31/2019 new AI

4/12

Maximum likelihood

In statistics, maximum-likelihood estimation (MLE) is amethodofestimatingtheparametersof astatistical

model. When applied to a data set and given astatistical model, maximum-likelihood estimation

providesestimatesfor the model's parameters.

The method of maximum likelihood corresponds to many well-known estimation methods in statistics. For

example, one may be interested in the heights of adult female giraffes, but be unable due to cost or time

constraints, to measure the height of every single giraffe in a population. Assuming that the h eights arenormally

(Gaussian) distributedwith some unknownmeanandvariance, the mean and variance can be estimated with MLE

while only knowing the heights of some sample of the overall population. MLE would accomplish this by taking the

mean and variance as parameters and finding particular parametric values that make the observed results the

most probable (given the model).

Parameters are fixed but unknown

Best parameters obtained by maximizing probability of obtaining samples observed

z Has good convergence properties as sample size increases

z Simpler than any other alternative techniques

The General Principle

Suppose that we separate a collection of samples according to class, so that we have c sets, D1, ...,Dc, with the

samples in Dj having been drawn independently according to the probability law p(x|j ). We say such samples are

i.i.d. independent identically i.i.d. distributed random variables. We assume that p(x|j ) has a known

parametric form,and is therefore determined uniquely by the value of a parameter vector j .For example, we

might have p(x|j ) N(j ,j ), where j consists of the components of j and j . To show the dependence of

p(x|j )on j explicitly, we write p(x|j )as p(x|j , j ). Our problem is to use the information provided by the

training samples to obtain good estimates for the unknown parameter vectors 1, ..., c associated with each

category.

To simplify treatment of this problem, we shall assume that samples in Di give no information about j if i = j

that is, we shall assume that the parameters for the different classes are functionally independent. This permits us

to work with each class separately, and to simplify our notation by deleting indications of class distinctions.

With this assumption we thus have c separate problems of the following form:Use a set D of training samples

drawn independently from the probability density p(x|)to estimate the unknown parameter vector .

Use n training samples in a class to estimate

If D contains n independently drawn samples, x1, x2,, xn
http://en.wikipedia.org/wiki/Statistical_methodhttp://en.wikipedia.org/wiki/Statistical_methodhttp://en.wikipedia.org/wiki/Statistical_methodhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Statistical_method

7/31/2019 new AI

5/12

ML estimate of is, by definition the value that maximizes p(D | )

It is the value of that best agrees with the actually observed training samples

The maximum-likelihood estimator has essentially nooptimalpropertiesfor finite samples.[3]

However, the

maximum-likelihood estimator possesses a number of attractiveasymptotic properties, for many problems; these

asymptotic properties include:

Consistency: the estimator converges in probability to the value being estimated.

Asymptotic normality: as the sample size increases, the distribution of the MLE tends to the Gaussian

distribution with mean and covariance matrix equal to the inverse of theFisher informationmatrix.

Efficiency, i.e., it achieves theCramrRao lower boundwhen the sample size tends to infinity. This means

that no asymptotically unbiased estimator has lower asymptoticmean squared errorthan the MLE.

Second-order efficiency after correction for bias.

Applications

Maximum likelihood estimation is used for a wide range of statistical models, including:

linear modelsandgeneralized linear models;

exploratoryandconfirmatory factor analysis;

structural equation modeling;

many situations in the context ofhypothesis testingandconfidence intervalformation;

discrete choicemodels.

These uses arise across applications in widespread set of fields, including:

communication systems;

econometrics;

data modeling in nuclear and particle physics;

magnetic resonance imaging;
http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Consistency_of_an_estimatorhttp://en.wikipedia.org/wiki/Consistency_of_an_estimatorhttp://en.wikipedia.org/wiki/Asymptotic_normalityhttp://en.wikipedia.org/wiki/Asymptotic_normalityhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Efficient_estimatorhttp://en.wikipedia.org/wiki/Efficient_estimatorhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Linear_modelhttp://en.wikipedia.org/wiki/Linear_modelhttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Factor_analysishttp://en.wikipedia.org/wiki/Factor_analysishttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Structural_equation_modelinghttp://en.wikipedia.org/wiki/Structural_equation_modelinghttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Discrete_choicehttp://en.wikipedia.org/wiki/Discrete_choicehttp://en.wikipedia.org/wiki/Communication_systemshttp://en.wikipedia.org/wiki/Communication_systemshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Communication_systemshttp://en.wikipedia.org/wiki/Discrete_choicehttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Structural_equation_modelinghttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Factor_analysishttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Linear_modelhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Efficient_estimatorhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Asymptotic_normalityhttp://en.wikipedia.org/wiki/Consistency_of_an_estimatorhttp://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Optimization_(mathematics)

7/31/2019 new AI

6/12

INTRODUCTION OF CLUSTER TECHNIQUES

7/31/2019 new AI

7/12

7/31/2019 new AI

8/12

7/31/2019 new AI

9/12

A hidden Markov model (HMM) is astatisticalMarkov modelin which the system being modeled is

assumed to be aMarkov processwith unobserved (hidden) states. An HMM can be considered as the

simplestdynamic Bayesian network.

In a regularMarkov model, the state is directly visible to the observer, and therefore the state transition

probabilities are the only parameters. In a hiddenMarkov model, the state is not directly visible, but

output, dependent on the state, is visible. Each state has a probability distribution over the possible output

tokens. Therefore the sequence of tokens generated by an HMM gives some information about the

sequence of states. Note that the adjective 'hidden' refers to the state sequence through which the model

passes, not to the parameters of the model; even if the model parameters are known exactly, the model is

still 'hidden'.

We continue to assume that at every time step t the system is in a state (t) but now we also

assume that it emits some (visible) symbol v(t). While sophisticated Markov models allow for the

emission of continuous functions (e.g., spectra), we will restrict ourselves to the case where a

discrete symbol is emitted. As with the states, we define a particular sequence of such visible

states as VT = {v(1),v(2), ..., v(T)} and thus we might have V6 = {v5,v1,v1,v5,v2,v3}.

Our model is then that in any state (t) we have a probability of emitting a particular visible state

vk(t). We denote this probability P(vk(t)|j (t)) = bjk. Because we have access only to the visible

states, while the i are unobservable, such a full model is called a hidden Markov model.
http://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Statistical_model

7/31/2019 new AI

10/12

7/31/2019 new AI

11/12

A concrete example

Consider two friends, Alice and Bob, who live far apart from each other and who talk together daily over the telephone about what they did

that day. Bob is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is

determined exclusively by the weather on a given day. Alice has no definite information about the weather where Bob l ives, but she knows

general trends. Based on what Bob tells her he did each day, Alice tries to guess what the weather must have been like.

Alice believes that the weather operates as a discreteMarkov chain. There are two states, "Rainy" and "Sunny", but she cannot observe

them directly, that is, they are hiddenfrom her. On each day, there is a certain chance that Bob will perform one of the following activities,

depending on the weather: "walk", "shop", or "clean". Since Bob tells Alice about his activities, those are the observations. The entire system

is that of a hidden Markov model (HMM).

states = ('Rainy','Sunny')

observations = ('walk','shop','clean')

start_probability = {'Rainy': 0.6,'Sunny': 0.4}

transition_probability = {

'Rainy' : {'Rainy': 0.7,'Sunny': 0.3},

'Sunny' : {'Rainy': 0.4,'Sunny': 0.6},

}

emission_probability = {

'Rainy' : {'walk': 0.1,'shop': 0.4,'clean': 0.5},

'Sunny' : {'walk': 0.6,'shop': 0.3,'clean': 0.1},

}

In this example, there is only a 30% chance that tomorrow will be sunny if today is rainy. Theemission_probability represents

how likely Bob is to perform a certain activity on each day. If it is rainy, there is a 50% chance that he is cleaning his apartment; if it is sunny,

there is a 60% chance that he is outside for a walk.
http://en.wikipedia.org/wiki/Markov_chainhttp://en.wikipedia.org/wiki/Markov_chainhttp://en.wikipedia.org/wiki/Markov_chainhttp://en.wikipedia.org/wiki/Markov_chain

7/31/2019 new AI

12/12

Knowledge representation (KR) is an area of artificial intelligenceresearch aimed at representing knowledge in symbols to facilitateinferencing from

thoseknowledgeelements, creating new elements of knowledge. The KR can be made to be independent of the underlying knowledge model or knowledge base

system (KBS) such as asemantic network.

Knowledge Representation (KR) research involves analysis of how to reason accurately and effectively and how best to use a set of symbols to represent a set of

facts within a knowledge domain. A symbol vocabulary and a system of logic are combined to enableinferencesabout elements in the KR to create new KR

sentences. Logic is used to supply formal semanticsof how reasoning functions should be applied to the symbols in the KR system. Logic is also used to define

how operators can process and reshape the knowledge. Examples of operators and operations include, negation, conjunction, adverbs, adjectives, quantifiers and

modal operators. The logic is interpretation theory. These elements--symbols, operators, and interpretation theory--are what give sequences of symbols meaning

within a KR.

A knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing it self, used to enable an entity to determine consequences bythinking rather than acting, i.e., by reasoning about the world rather than taking action in it.

It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representation's fundamental conception of intelligent

reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of i nferences it recommends. It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this

pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences.

It is a medium of human expression, i.e., a language in which we say things about the world."

Some issues that arise in knowledge representation from an AI perspective are:

How do people represent knowledge? What is the nature of knowledge? Should a representation scheme deal with a particular domain or should it be general purpose? How expressive is a representation scheme orformal language? Should the scheme be declarative or procedural?

CharacteristicsA good knowledge representation covers six basic characteristics:

Coverage, which means the KR covers a breadth and depth of information. Without a wide coverage, the KR cannot determine anything or resolveambiguities.

Understandable by humans. KR is viewed as a natural language, so the logic should flow freely. It should support modularity and hierarchies of classes(Polar bears are bears, which are animals). It should also have simple primitives that combine in complex forms.

Consistency. If John closed the door, it can also be i nterpreted as the door was closed by John. By being consistent, the KR can eliminate redundant orconflicting knowledge.

Efficient Easiness for modifying and updating. Supports the intelligent activity which uses the knowledge base
http://en.wikipedia.org/wiki/Artificial_intelligencehttp://en.wikipedia.org/wiki/Artificial_intelligencehttp://en.wikipedia.org/wiki/Artificial_intelligencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Artificial_intelligence

new AI

Documents

Transcript of new AI