new AI

download new AI

of 12

Transcript of new AI

  • 7/31/2019 new AI

    1/12

  • 7/31/2019 new AI

    2/12

    STEPS OF PATTERN RECOGNITION

  • 7/31/2019 new AI

    3/12

    PARAMETER ESTIMATION

    Estimation theory is a branch ofstatisticsandsignal processingthat deals with estimating the values of

    parameters based on measured/empirical data that has a random component. The parameters describe an

    underlying physical setting in such a way that their value affects the distribution of the measured data.Anestimatorattempts to approximate the unknown parameters using the measurements.

    In estimation theory, it is assumed the measured data is random with probability distribution dependent on the

    parameters of interest. For example, in electrical communication theory, the measurements which contain

    information regarding the parameters of interest are often associated with anoisysignal. Without randomness, or

    noise, the problem would bedeterministicand estimation would not be needed.

    We can design an optimal classifier if we knew the prior probabilities P(i) and the class-conditional densities

    p(x|i). Unfortunately, in pattern recognition applications we rarely if ever have this kind of complete knowledgeabout the probabilistic structure of the problem. In a typical case we merely have some vague, general knowledge

    about the situation, together with a number of design samples or training data particular representatives of the

    patterns we want to training data classify. The problem, then, is to find some way to use this information to design

    or train the classifier.

    One approach to this problem is to use the samples to estimate the unknown probabilities and probability

    densities, and to use the resulting estimates as if they were the true values. In typical supervised pattern

    classification problems, the estimation of the prior probabilities presents no serious difficulties. However,

    estimation of the class-conditional densities is quite another matter.

    The problem of parameter estimation is a classical one in statistics, and it can be approached in several ways. We

    shall consider two common and reasonable procedures, maximum likelihood estimation and Bayesian estimation.

    Although the results obtained with these two procedures are frequently nearly identical, the approaches are

    conceptually quite different.

    One is computational complexity and here maximum likelhood methodsare often to be preferredsince they require merely differential

    calculus techniques or gradient search for , rather than a possibly complex multidimensional integration needed in Bayesian estimation.

    Interpretability. In many cases the maximum likelihood solution will be easier to interpret and understand since it returns the single best

    model from the set the designer provided (and presumably understands).In contrast Bayesian methods give a weighted average of models

    (parameters), often leading to solutions more complicated and harder to understand than those provided by the designer. The Bayesian

    approach reflects the remaining uncertainty in the possible models.

    http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Noise_(physics)http://en.wikipedia.org/wiki/Noise_(physics)http://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Noise_(physics)http://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Statistics
  • 7/31/2019 new AI

    4/12

    Maximum likelihood

    In statistics, maximum-likelihood estimation (MLE) is amethodofestimatingtheparametersof astatistical

    model. When applied to a data set and given astatistical model, maximum-likelihood estimation

    providesestimatesfor the model's parameters.

    The method of maximum likelihood corresponds to many well-known estimation methods in statistics. For

    example, one may be interested in the heights of adult female giraffes, but be unable due to cost or time

    constraints, to measure the height of every single giraffe in a population. Assuming that the h eights arenormally

    (Gaussian) distributedwith some unknownmeanandvariance, the mean and variance can be estimated with MLE

    while only knowing the heights of some sample of the overall population. MLE would accomplish this by taking the

    mean and variance as parameters and finding particular parametric values that make the observed results the

    most probable (given the model).

    Parameters are fixed but unknown

    Best parameters obtained by maximizing probability of obtaining samples observed

    z Has good convergence properties as sample size increases

    z Simpler than any other alternative techniques

    The General Principle

    Suppose that we separate a collection of samples according to class, so that we have c sets, D1, ...,Dc, with the

    samples in Dj having been drawn independently according to the probability law p(x|j ). We say such samples are

    i.i.d. independent identically i.i.d. distributed random variables. We assume that p(x|j ) has a known

    parametric form,and is therefore determined uniquely by the value of a parameter vector j .For example, we

    might have p(x|j ) N(j ,j ), where j consists of the components of j and j . To show the dependence of

    p(x|j )on j explicitly, we write p(x|j )as p(x|j , j ). Our problem is to use the information provided by the

    training samples to obtain good estimates for the unknown parameter vectors 1, ..., c associated with each

    category.

    To simplify treatment of this problem, we shall assume that samples in Di give no information about j if i = j

    that is, we shall assume that the parameters for the different classes are functionally independent. This permits us

    to work with each class separately, and to simplify our notation by deleting indications of class distinctions.

    With this assumption we thus have c separate problems of the following form:Use a set D of training samples

    drawn independently from the probability density p(x|)to estimate the unknown parameter vector .

    Use n training samples in a class to estimate

    If D contains n independently drawn samples, x1, x2,, xn

    http://en.wikipedia.org/wiki/Statistical_methodhttp://en.wikipedia.org/wiki/Statistical_methodhttp://en.wikipedia.org/wiki/Statistical_methodhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Statistical_method
  • 7/31/2019 new AI

    5/12

    ML estimate of is, by definition the value that maximizes p(D | )

    It is the value of that best agrees with the actually observed training samples

    The maximum-likelihood estimator has essentially nooptimalpropertiesfor finite samples.[3]

    However, the

    maximum-likelihood estimator possesses a number of attractiveasymptotic properties, for many problems; these

    asymptotic properties include:

    Consistency: the estimator converges in probability to the value being estimated.

    Asymptotic normality: as the sample size increases, the distribution of the MLE tends to the Gaussian

    distribution with mean and covariance matrix equal to the inverse of theFisher informationmatrix.

    Efficiency, i.e., it achieves theCramrRao lower boundwhen the sample size tends to infinity. This means

    that no asymptotically unbiased estimator has lower asymptoticmean squared errorthan the MLE.

    Second-order efficiency after correction for bias.

    Applications

    Maximum likelihood estimation is used for a wide range of statistical models, including:

    linear modelsandgeneralized linear models;

    exploratoryandconfirmatory factor analysis;

    structural equation modeling;

    many situations in the context ofhypothesis testingandconfidence intervalformation;

    discrete choicemodels.

    These uses arise across applications in widespread set of fields, including:

    communication systems;

    econometrics;

    data modeling in nuclear and particle physics;

    magnetic resonance imaging;

    http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Consistency_of_an_estimatorhttp://en.wikipedia.org/wiki/Consistency_of_an_estimatorhttp://en.wikipedia.org/wiki/Asymptotic_normalityhttp://en.wikipedia.org/wiki/Asymptotic_normalityhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Efficient_estimatorhttp://en.wikipedia.org/wiki/Efficient_estimatorhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Linear_modelhttp://en.wikipedia.org/wiki/Linear_modelhttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Factor_analysishttp://en.wikipedia.org/wiki/Factor_analysishttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Structural_equation_modelinghttp://en.wikipedia.org/wiki/Structural_equation_modelinghttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Discrete_choicehttp://en.wikipedia.org/wiki/Discrete_choicehttp://en.wikipedia.org/wiki/Communication_systemshttp://en.wikipedia.org/wiki/Communication_systemshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Communication_systemshttp://en.wikipedia.org/wiki/Discrete_choicehttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Structural_equation_modelinghttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Factor_analysishttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Linear_modelhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Efficient_estimatorhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Asymptotic_normalityhttp://en.wikipedia.org/wiki/Consistency_of_an_estimatorhttp://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Optimization_(mathematics)
  • 7/31/2019 new AI

    6/12

    INTRODUCTION OF CLUSTER TECHNIQUES

  • 7/31/2019 new AI

    7/12

  • 7/31/2019 new AI

    8/12

  • 7/31/2019 new AI

    9/12

    A hidden Markov model (HMM) is astatisticalMarkov modelin which the system being modeled is

    assumed to be aMarkov processwith unobserved (hidden) states. An HMM can be considered as the

    simplestdynamic Bayesian network.

    In a regularMarkov model, the state is directly visible to the observer, and therefore the state transition

    probabilities are the only parameters. In a hiddenMarkov model, the state is not directly visible, but

    output, dependent on the state, is visible. Each state has a probability distribution over the possible output

    tokens. Therefore the sequence of tokens generated by an HMM gives some information about the

    sequence of states. Note that the adjective 'hidden' refers to the state sequence through which the model

    passes, not to the parameters of the model; even if the model parameters are known exactly, the model is

    still 'hidden'.

    We continue to assume that at every time step t the system is in a state (t) but now we also

    assume that it emits some (visible) symbol v(t). While sophisticated Markov models allow for the

    emission of continuous functions (e.g., spectra), we will restrict ourselves to the case where a

    discrete symbol is emitted. As with the states, we define a particular sequence of such visible

    states as VT = {v(1),v(2), ..., v(T)} and thus we might have V6 = {v5,v1,v1,v5,v2,v3}.

    Our model is then that in any state (t) we have a probability of emitting a particular visible state

    vk(t). We denote this probability P(vk(t)|j (t)) = bjk. Because we have access only to the visible

    states, while the i are unobservable, such a full model is called a hidden Markov model.

    http://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Statistical_model
  • 7/31/2019 new AI

    10/12

  • 7/31/2019 new AI

    11/12

    A concrete example

    Consider two friends, Alice and Bob, who live far apart from each other and who talk together daily over the telephone about what they did

    that day. Bob is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is

    determined exclusively by the weather on a given day. Alice has no definite information about the weather where Bob l ives, but she knows

    general trends. Based on what Bob tells her he did each day, Alice tries to guess what the weather must have been like.

    Alice believes that the weather operates as a discreteMarkov chain. There are two states, "Rainy" and "Sunny", but she cannot observe

    them directly, that is, they are hiddenfrom her. On each day, there is a certain chance that Bob will perform one of the following activities,

    depending on the weather: "walk", "shop", or "clean". Since Bob tells Alice about his activities, those are the observations. The entire system

    is that of a hidden Markov model (HMM).

    states = ('Rainy','Sunny')

    observations = ('walk','shop','clean')

    start_probability = {'Rainy': 0.6,'Sunny': 0.4}

    transition_probability = {

    'Rainy' : {'Rainy': 0.7,'Sunny': 0.3},

    'Sunny' : {'Rainy': 0.4,'Sunny': 0.6},

    }

    emission_probability = {

    'Rainy' : {'walk': 0.1,'shop': 0.4,'clean': 0.5},

    'Sunny' : {'walk': 0.6,'shop': 0.3,'clean': 0.1},

    }

    In this example, there is only a 30% chance that tomorrow will be sunny if today is rainy. Theemission_probability represents

    how likely Bob is to perform a certain activity on each day. If it is rainy, there is a 50% chance that he is cleaning his apartment; if it is sunny,

    there is a 60% chance that he is outside for a walk.

    http://en.wikipedia.org/wiki/Markov_chainhttp://en.wikipedia.org/wiki/Markov_chainhttp://en.wikipedia.org/wiki/Markov_chainhttp://en.wikipedia.org/wiki/Markov_chain
  • 7/31/2019 new AI

    12/12

    Knowledge representation (KR) is an area of artificial intelligenceresearch aimed at representing knowledge in symbols to facilitateinferencing from

    thoseknowledgeelements, creating new elements of knowledge. The KR can be made to be independent of the underlying knowledge model or knowledge base

    system (KBS) such as asemantic network.

    Knowledge Representation (KR) research involves analysis of how to reason accurately and effectively and how best to use a set of symbols to represent a set of

    facts within a knowledge domain. A symbol vocabulary and a system of logic are combined to enableinferencesabout elements in the KR to create new KR

    sentences. Logic is used to supply formal semanticsof how reasoning functions should be applied to the symbols in the KR system. Logic is also used to define

    how operators can process and reshape the knowledge. Examples of operators and operations include, negation, conjunction, adverbs, adjectives, quantifiers and

    modal operators. The logic is interpretation theory. These elements--symbols, operators, and interpretation theory--are what give sequences of symbols meaning

    within a KR.

    A knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing it self, used to enable an entity to determine consequences bythinking rather than acting, i.e., by reasoning about the world rather than taking action in it.

    It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representation's fundamental conception of intelligent

    reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of i nferences it recommends. It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this

    pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences.

    It is a medium of human expression, i.e., a language in which we say things about the world."

    Some issues that arise in knowledge representation from an AI perspective are:

    How do people represent knowledge? What is the nature of knowledge? Should a representation scheme deal with a particular domain or should it be general purpose? How expressive is a representation scheme orformal language? Should the scheme be declarative or procedural?

    CharacteristicsA good knowledge representation covers six basic characteristics:

    Coverage, which means the KR covers a breadth and depth of information. Without a wide coverage, the KR cannot determine anything or resolveambiguities.

    Understandable by humans. KR is viewed as a natural language, so the logic should flow freely. It should support modularity and hierarchies of classes(Polar bears are bears, which are animals). It should also have simple primitives that combine in complex forms.

    Consistency. If John closed the door, it can also be i nterpreted as the door was closed by John. By being consistent, the KR can eliminate redundant orconflicting knowledge.

    Efficient Easiness for modifying and updating. Supports the intelligent activity which uses the knowledge base

    http://en.wikipedia.org/wiki/Artificial_intelligencehttp://en.wikipedia.org/wiki/Artificial_intelligencehttp://en.wikipedia.org/wiki/Artificial_intelligencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Artificial_intelligence