AI 2Marks questions

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

QUESTION BANK

DEPARTMENT: CSE SEMESTER: VI

SUBJECT CODE / Name: CS2351 – ARTIFICIAL INTELLIGENCE

UNIT – I

PART -A (2 Marks)

PROBLEM SOLVING 1. What is artificial intelligence?

The exciting new effort to make computers think machines with minds in the

full and literal sense. Artificial intelligence systemizes and automates intellectual tasks and is

therefore potentially relevant to any sphere of human intellectual activities.

2. List down the characteristics of intelligent agent.

Intelligent Agents are autonomous because they function without requiring that the Console

or Management Server be running.

An Agent that services a database can run when the database is down, allowing the Agent to

start up or shut down the database.

The Intelligent Agents can independently perform administrative job tasks at any time,

without active participation by the administrator.

Similarly, the Agents can autonomously detect and react to events, allowing them to monitor

the system and execute a fixit job to correct problems without the intervention of the

administrator.

3. What do you mean by local maxima with respect to search technique?

The golden section search is a technique for finding the extremum (minimum or maximum) of a

strictly unimodal function by successively narrowing the range of values inside which the extremum

is known to exist. The technique derives its name from the fact that the algorithm maintains the

function values for triples of points whose distances form a golden ratio. The algorithm is the limit

of Fibonacci search (also described below) for a large number of function evaluations.

4. Define Turing test.

The Turing test proposed by Alan Turing was designed to provide a satisfactory operational

definition of intelligence. Turing defined intelligent behavior as the ability to achieve human-level

performance in all cognitive tasks, sufficient to fool an interrogator.

5. List the capabilities that a computer should possess for conducting a Turing Test?

The capabilities that a computer should possess for conducting a Turing Test are,

Natural Language Processing;

Knowledge Representation;

Automated Reasoning;

www.Vidyarthiplus.com


http://en.wikipedia.org/wiki/Extremum

http://en.wikipedia.org/wiki/Unimodal_function

http://en.wikipedia.org/wiki/Golden_ratio

http://en.wikipedia.org/wiki/Fibonacci_search


6. Define an agent.

An agent is anything that can be viewed as perceiving its environment through Sensors and acting

upon the environment through effectors.

7. Define rational agent.

A rational agent is one that does the right thing. Here right thing is one that will

cause agent to be more successful. That leaves us with the problem of deciding how and when to

evaluate the agent’s success.

8. Define an Omniscient agent.

An omniscient agent knows the actual outcome of its action and can act accordingly; but

omniscience is impossible in reality.

9. What are the factors that a rational agent should depend on at any given time?

The factors that a rational agent should depend on at any given time are,

The performance measure that defines criterion of success;

Agent’s prior knowledge of the environment;

Action that the agent can perform;

The agent’s percept sequence to date.

10. List the measures to determine agent’s behavior.

The measures to determine agent’s behavior are,

Performance measure,

Rationality,

Omniscience, Learning and Autonomy.

11. List the various types of agent programs.

The various types of agent programs are,

Simple reflex agent program;

Agent that keep track of the world;

Goal based agent program;

Utility based agent program.

12. List the components of a learning agent?

The components of a learning agent are,

Learning element;

Performance element;

Critic;

Problem generator.

13. List out some of the applications of Artificial Intelligence.

Some of the applications of Artificial Intelligence are,

Autonomous planning and scheduling;




Game playing;

Autonomous control;

Diagnosis;

Logistics planning;

Robotics.

14. What is depth-limited search?

Depth-limited avoids the pitfalls of DFS by imposing a cut off of the maximum

depth of a path. This cutoff can be implemented by special depth limited search algorithm or by

using the general search algorithm with operators that keep track of the depth.

15. Define breadth-first search.

The breadth-first search strategy is a simple strategy in which the root-node is

expanded first, and then all the successors of the root node are expanded, then their successors

and so on. It is implemented using TREE-SEARCH with an empty fringe that is a FIFO queue,

assuring that the nodes that are visited first will be expanded first.

16. Define problem formulation.

Problem formulation is the process of deciding what actions and states to consider

for a goal that has been developed in the first step of problem solving.

17. List the four components of a problem?

The four components of a problem are,

An initial state;

Actions;

Goal test;

Path cost.

18. Define iterative deepening search.

Iterative deepening is a strategy that sidesteps the issue of choosing the best depth

limit by trying all possible depth limits: first depth 0, then depth 1, then depth 2& so on.

19. Mention the criteria’s for the evaluation of search strategy.

The criteria’s for the evaluation of search strategy are,

Completeness;

Time complexity;

Space complexity;

Optimality.

20. Define the term percept.

The term percept refers to the agents perceptual inputs at any given instant. An

agent’s percept sequence is the complete history of everything that the agent has perceived

.




21. Define Constraint Satisfaction Problem (CSP).

A constraint satisfaction problem is a special kind of problem satisfies some

additional structural properties beyond the basic requirements for problem in general. In a CSP,

the states are defined by the values of a set of variables and the goal test specifies a set of

constraint that the value must obey.

22. List some of the uninformed search techniques.

Some of the uninformed search techniques are,

Breadth-First Search(BFS);

Depth-First Search(DFS);

Uniform Cost Search;

Depth Limited Search;

Iterative Deepening Search;

Bidirectional Search.

PART- B

1. Explain Agents in detail.

An agent is anything that can be viewed as perceiving its environment through

sensors and SENSOR acting upon that environment through actuators.

Percept

We use the term percept to refer to the agent's perceptual inputs at any given instant.

Percept Sequence

An agent’s percept sequence is the complete history of everything the agent has ever

perceived.

Agent function

Mathematically speaking, we say that an agent's behavior is described by the agent

function

Properties of task environments

Fully observable vs. partially observable

Deterministic vs. stochastic

Episodic vs. sequential

Static vs. dynamic

Discrete vs. continuous

Single agent vs. multiagent

Fully observable vs. partially observable:

If an agent's sensors give it access to the complete state of the environment at each point

in time, then we say that the task environment is fully observable.




A task environment is effectively fully observable if the sensors detect all aspects that are

relevant to the choice of action.

An environment might be partially observable because of noisy and inaccurate sensors or

because parts of the state are simply missing from the sensor data.

Deterministic vs. stochastic:

If the next state of the environment is completely determined by the current state and the

action executed by the agent, then we say the environment is deterministic; other-wise, it is

stochastic.

Episodic vs. sequential:

In an episodic task environment, the agent's experience is divided into atomic episodes.

Each episode consists of the agent perceiving and then performing a single action.

Crucially, the next episode does not depend on the actions taken in previous episodes.

For example, an agent that has to spot defective parts on an assembly line bases each

decision on the current part, regardless of previous decisions.

In sequential environments, on the other hand, the current decision could affect all

future decisions. Chess and taxi driving are sequential.

Discrete vs. continuous:

The discrete/continuous distinction can be applied to the state of the environment, to the

way time is handled, and to the percepts and actions of the agent.

For example, a discrete-state environment such as a chess game has a finite number of

distinct states.

Chess also has a discrete set of percepts and actions.

Taxi driving is a continuous- state and continuous-time problem: the speed and location

of the taxi and of the other vehicles sweep through a range of continuous values and do

so smoothly over time.

Taxi-driving actions are also continuous (steering angles, etc.).

Single agent vs. multiagent:

An agent solving a crossword puzzle by itself is clearly in a single-agent environment,

whereas an agent playing chess is in a two-agent environment. As one might expect, the hardest

case is partially observable, stochastic, sequential, dynamic, continuous, and multi agent.




Examples of task environments and their characteristics.

2. Explain uninformed search strategies.

Uninformed Search Strategies have no additional information about states beyond

that provided in the problem that knows whether one non goal state is “more promising” than

another are called informed search or heuristic search strategies.

There are five uninformed search strategies as given below.

Breadth-first search;

Uniform-cost search;

Depth-first search;

Depth-limited search;

Iterative deepening search.

BREADTH-FIRST SEARCH

Breadth-first search is a simple strategy in which the root node is expanded

first, then all successors of the root node are expanded next, then their

successors, and so on.

In general, all the nodes are expanded at a given depth in the search tree

before any nodes at the next level are expanded.

Breath-first-search is implemented by calling TREE-SEARCH with an

empty fringe that is a first-in-first-out (FIFO) queue, assuring that the nodes

that are visited first will be expanded first.

In other wards, calling TREE-SEARCH (problem, FIFO-QUEUE ())




results in breadth-first-search.

The FIFO queue puts all newly generated successors at the end of the

queue,which means that

Shallow nodes are expanded before deeper nodes.

Breadth-first searches on a simple binary tree. At each stage, the node to be expanded next is

indicated by a marker.

Properties of breadth-first-search




Time and memory requirements for breadth-first-search. The numbers

shown assume branch factor of b = 10 ; 10,000 nodes/second; 1000

bytes/node

Time complexity for BFS

Assume every state has b successors. The root of the search tree generates b nodes at

the first level, each of which generates b more nodes, for a total of b2 at the second level. Each of

these generates b more nodes, yielding b3

nodes at the third level, and so on. Now suppose that

the solution is at depth d. In the worst case, we would expand all but the last node at level

degenerating bd+1

- b nodes at level d+1.

Then the total number of nodes generated is

b + b2 + b

3 + …+ b

d + ( b

d+1 + b) = O(b

d+1).

Every node that is generated must remain in memory, because it is either part of the fringe or is

an ancestor of a fringe node. The space complexity is, therefore, the same as the time complexity

UNIFORM-COST SEARCH:

Instead of expanding the shallowest node, uniform-cost search expands the

node n with the lowest path cost. Uniform-cost search does not care about the number of steps a

path has, but only about their total cost.




Properties of Uniform-cost-search

2 DEPTH-FIRST-SEARCH

Depth-first-search always expands the deepest node in the current fringe of

the search tree. The progress of the search is illustrated in figure 1.31. The search proceeds

immediately to the deepest level of the search tree, where the nodes have no successors. As

those nodes are expanded, they are dropped from the fringe, so then the search “backs up” to

the next shallowest node that still has unexplored successors.




Depth-first-search on a binary tree. Nodes that have been expanded and

have no descendants in the fringe can be removed from the memory; these are shown in

black. Nodes at depth 3 are assumed to have no successors and M is the only goal node.

This strategy can be implemented by TREE-SEARCH with a last-in-

first-out (LIFO) queue, also known as a stack. Depth-first-search has very modest memory

requirements. It needs to store only a single path from the root to a leaf node, along with the

remaining unexpanded sibling nodes for each node on the path. Once the node has been

expanded, it can be removed from the memory, as soon as its descendants have been fully

explored. For a state space with a branching factor b and maximum depth m, depth-first-search

requires storage of only bm + 1 nodes.




Using the same assumptions as Figure 1.15, and assuming that nodes at the same

depth as the goal node have no successors, we find the depth-first-search would require 118

kilobytes instead of 10 pet bytes, a factor of 10 billion times less space.

Drawback of Depth-first-search

The drawback of depth-first-search is that it can make a wrong choice and get

stuck going down very long(or even infinite) path when a different choice would lead to solution

near the root of the search tree. For example, depth-first-search will explore the entire left sub

tree even if node C is a goal node.

3 DEPTH-LIMITED-SEARCH:

The problem of unbounded trees can be alleviated by supplying depth-first-

search with a pre-determined depth limit l. That is, nodes at depth l are treated as if they have no

successors. This approach is called depth-limited-search. The depth limit saves the infinite path

problem.

Depth limited search will be no optimal if we choose l > d. Its time

complexity is O(bl) and its space complexity is O(bl). Depth-first-search can be viewed as a

special case of depth-limited search with l = oo. Sometimes, depth limits can be based on

knowledge of the problem. For, example, on the map of Romania there are 20 cities. Therefore,

we know that if there is a solution. It must be of length 19 at the longest, so l = 10 is a possible

choice. However, it ocean be shown that any city can be reached from any other city in at most 9

steps. This number known as the diameter of the state space gives us a better depth limit.

Depth-limited-search can be implemented as a simple modification to the

general tree-search algorithm or to the recursive depth-first-search algorithm. It can be noted that

the above algorithm can terminate with two kinds of failure: the standard failure value indicates

no solution; the cutoff value indicates no solution within the depth limit. Depth-limited search =

depth-first search with depth limit l, returns cut off if any path is cut off by depth limit




4. ITERATIVE DEEPENING DEPTH-FIRST SEARCH:

Iterative deepening search (or iterative-deepening-depth-first-search) is a general

strategy often used in combination with depth-first-search that finds the better depth limit. It does

this by gradually increasing the limit – first 0, then 1, then 2, and so on – until a goal is found.

This will occur when the depth limit reaches d, the depth of the shallowest goal node. The

algorithm is shown in Figure 1.16.

Iterative deepening combines the benefits of depth-first and breadth-first-search

Like depth-first-search, its memory requirements are modest;O(bd) to be precise. Like Breadth-

first-search, it is complete when the branching factor is finite and optimal when the path cost is a

non decreasing function of the depth of the node.




Four iterations of iterative deepening search on a binary tree.




Iterative search is not as wasteful as it might seem

ITERATIVE DEEPENING SEARCH

S S

S

Limit = 0 A D

Limit = 1

S S S

A D A D

Limit = 2 B D A E

Iterative search is not as wasteful as it might seem

Properties of iterative deepening search




In general, iterative deepening is the preferred uninformed search method when

there is a large search space and the depth of solution is not known.

5 BIDIRECTIONAL SEARCH:

The idea behind bidirectional search is to run two simultaneous searches one forward from he

initial state and the other backward from the goal, stopping when the two searches meet in the

middle (Figure 1.18)

The motivation is that bd/2

+ bd/2

much less than ,or in the figure ,the area of the two small circles

is less than the area of one big circle centered on the start and reaching to the goal.

A schematic view of a bidirectional search that is about to succeed,when a

Branch from the Start node meets a Branch from the goal node.

6 COMPARING UNINFORMED SEARCH STRATEGIES:

.

Figure 1.19 Comparing Uninformed Search Strategies




Evaluation of search strategies is the branching factor; d is the depth of the shallowest solution;

m is the maximum depth of the search tree; l is the depth limit. Superscript caveats are as

follows: a complete if b is finite;

b complete if step costs >= E for positive E;

c optimal if step

costs are all identical; d

if both directions use breadth-first search.

3. Explain informed search strategies.

Informed search strategy is one that uses problem-specific knowledge beyond the

definition of the problem itself. It can find solutions more efficiently than uninformed strategy.

Best-first search;

Heuristic function;

Greedy-Best First Search(GBFS);

A* search;

Memory Bounded Heuristic Search.

INFORMED (HEURISTIC) SEARCH STRATEGIES:

Informed search strategy is one that uses problem-specific knowledge beyond the

definition of the problem itself. It can find solutions more efficiently than uninformed strategy.

Best-first search

Best-first search is an instance of general TREE-SEARCH or GRAPH-SEARCH

algorithm in which a node is selected for expansion based on an evaluation function f(n). The

node with lowest evaluation is selected for expansion, because the evaluation measures the

distance to the goal.

This can be implemented using a priority-queue, a data structure that will maintain the

fringe in ascending order of f-values.

Heuristic functions:

A heuristic function or simply a heuristic is a function that ranks alternatives in various

search algorithms at each branching step basing on available information in order to make a

decision which branch is to be followed during a search.

The key component of Best-first search algorithm is a heuristic function, denoted by




h (n):

h (n) = estimated cost of the cheapest path from node n to a goal node.

For example, in Romania, one might estimate the cost of the cheapest path from Arad to

Bucharest via a straight-line distance from Arad to Bucharest.

Heuristic function is the most common form in which additional knowledge is imparted

to the search algorithm.

2.1GREEDY BEST-FIRST SEARCH:

Greedy best-first search tries to expand the node that is closest to the goal, on the

grounds that this is likely to a solution quickly.

It evaluates the nodes by using the heuristic function f (n) = h (n).

Taking the example of Route-finding problems in Romania, the goal is to reach

Bucharest starting from the city Arad. We need to know the straight-line distances to Bucharest

from various cities as shown in Figure 2.1. For example, the initial state is In (Arad), and the

straight line distance heuristic hSLD (In (Arad)) is found to be 366.




Using the straight-line distance heuristic hSLD ,the goal state can be reached faster.

Figure 2.1 Values of hSLD - straight line distances to Bucharest




stages in greedy best-first search for Bucharest using straight-line distance

heuristic hSLD. Nodes are labeled with their h-values.

Figure 2.2 shows the progress of greedy best-first search using hSLD to find a path from Arad to

Bucharest. The first node to be expanded from Arad will be Sibiu, because it is closer to

Bucharest than either Zerind or Timisoara. The next node to be expanded will be Fagaras,




because it is closest. Fagaras in turn generates Bucharest, which is the goal.




Properties of greedy search:

Complete: No–can get stuck in loops,

Complete in finite space with repeated-state checking

Time :O(bm), but a good heuristic can give dramatic improvement

Space: O(bm)—keeps all nodes in memory

Optimal: No

Greedy best-first search is not optimal, and it is incomplete.

The worst-case time and space complexity is O (bm

), where m is the maximum depth of the

search space.

2 A*

SEARCH:

A*

Search is the most widely used form of best-first search. The evaluation function f(n) is

obtained by combining

g(n) = the cost to reach the node.

h(n) = the cost to get from the node to the goal .

f(n) = g(n) + h(n).

A*

Search is both optimal and complete. A*

is optimal if h(n) is an admissible heuristic.

The obvious example of admissible heuristic is the straight-line distance hSLD. It cannot be an

overestimate.

A*

Search is optimal if h(n) is an admissible heuristic – that is, provided that h(n) never

overestimates the cost to reach the goal.

An obvious example of an admissible heuristic is the straight-line distance hSLD that we

used in getting to Bucharest. The progress of an A*

tree search for Bucharest is shown in Figure

2.2.

Recursive Best-first Search (RBFS):

Recursive best-first search is a simple recursive algorithm that attempts to mimic the

operation of standard best-first search, but using only linear space.

Its structure is similar to that of recursive depth-first search, but rather than continuing

indefinitely down the current path,




It keeps track of the f-value of the best alternative path available from any ancestor of the

current node.

If the current node exceeds this limit, the recursion unwinds back to the alternative path.

As the recursion unwinds, RBFS replaces the f-value of each node along the path with the

best f-value of its children.

Coding for how RBFS reaches Bucharest.

function RECURSIVE-BEST-FIRST-SEARCH(problem) return a solution or failure

return RFBS(problem,MAKE-NODE(INITIAL-STATE[problem]),∞)

function RFBS( problem, node, f_limit) return a solution or failure and a new f- cost limit

if GOAL-TEST[problem](STATE[node]) then return node

successors EXPAND(node, problem)

if successors is empty then return failure, ∞

for each s in successors do

f [s] max(g(s) + h(s), f [node])

repeat

best the lowest f-value node in successors

if f [best] > f_limit then return failure, f [best]

alternative the second lowest f-value among successors

result, f [best] RBFS(problem, best, min(f_limit, alternative))

if result failure then return result



Stages in an RBFS search for the shortest route to Bucharest. The f-limit value for each

recursive call is shown on top of each current node. (a) The path via Rimnicu Vilcea is

followed until the current best leaf (Pitesti) has a value that is worse than the best

alternative path (Fagaras).

(b) The recursion unwinds and the best leaf value of the forgotten sub tree (417) is backed

up to Rimnicu Vilcea; then Fagaras is expanded, revealing a best leaf value of 450.



(c) The recursion unwinds and the best leaf value of the forgotten sub tree (450) is backed

up to Fagaras; then Rimni Vicea is expanded. This time because the best alternative

path(through Timisoara) costs at least 447,the expansion continues to Bucharest

RBFS Evaluation:

RBFS is a bit more efficient than IDA*.Still excessive node generation (mind changes). Like

A*, optimal if h(n) is admissible. Space complexity is O(bd).IDA* retains only one single

number (the current f-cost limit).Time complexity difficult to characterize. Depends on accuracy

if h(n) and how often best path changes.IDA* en RBFS suffer from too little memory.

2 HEURISTIC FUNCTIONS:

A heuristic function or simply a heuristic is a function that ranks alternatives in various

search algorithms at each branching step basing on available information in order to make a

decision which branch is to be followed during a search

A typical instance of the 8-puzzle.

The solution is 26 steps long.



The 8-puzzle:

The 8-puzzle is an example of Heuristic search problem. The object of the puzzle is to

slide the tiles horizontally or vertically into the empty space until the configuration matches the

goal configuration (Figure 2.4)

The average cost for a randomly generated 8-puzzle instance is about 22 steps. The

branching factor is about 3. (When the empty tile is in the middle, there are four possible moves;

when it is in the corner there are two; and when it is along an edge there are three).

This means that an exhaustive search to depth 22 would look at about 322

approximately

= 3.1 X 1010

states. By keeping track of repeated states, we could cut this down by a factor of

about 170,000, because there are only 9!/2 = 181,440 distinct states that are reachable. This is a

manageable number, but the corresponding number for the 15-puzzle is roughly 1013

.

If we want to find the shortest solutions by using A*, we need a heuristic function that

never overestimates the number of steps to the goal.

The two commonly used heuristic functions for the 15-puzzle is:

h1 = the number of misplaced tiles.

For Figure 2.4, all of the eight tiles are out of position, so the start state would have h1 =

8. h1 is an admissible heuristic.

h2 = the sum of the distances of the tiles from their goal positions. This is called the city block

distance or Manhattan distance.

h2 is admissible ,because all any move can do is move one tile one step closer to the goal.

Tiles 1 to 8 in start state give a Manhattan distance of

h2 = 3 + 1 + 2 + 2 + 2 + 3 + 3 + 2 = 18.

Neither of these overestimates the true solution cost, which is 26.

The Effective Branching factor:

One way to characterize the quality of a heuristic is the effective branching factor b*. If the

total number of nodes generated by A* for a particular problem is N, and the solution depth is d,

then b*

is the branching factor that a uniform tree of depth d would have to have in order to

contain N+1 nodes. Thus,

N + 1 = 1 + b* + (b

*)

2+…+ (b

*)

d



For example, if A*

finds a solution at depth 5 using 52 nodes, then effective branching factor is

1.92.

A well designed heuristic would have a value of b*

close to 1, allowing failure large

problems to be solved.

To test the heuristic functions h1 and h2, 1200 random problems were generated with

solution lengths from 2 to 24 and solved them with iterative deepening search and with A*

search

using both h1 and h2. Figure 2.5 gives the average number of nodes expanded by each strategy

and the effective branching factor.

The results suggest that h2 is better than h1, and is far better than using iterative deepening

search. For a solution length of 14, A*

with h2 is 30,000 times more efficient than uninformed

iterative deepening search.

Comparison of search costs and effective branching factors for the

ITERATIVE-DEEPENING-SEARCH and A*

Algorithms with h1,and h2. Data are average

over 100 instances of the 8-puzzle,for various solution lengths.

Inventing admissible heuristic functions:

Relaxed problems:

A problem with fewer restrictions on the actions is called a relaxed problem

The cost of an optimal solution to a relaxed problem is an admissible heuristic for the

original problem



If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1 (n) gives

the shortest solution

If the rules are relaxed so that a tile can move to any adjacent square, then h2 (n) gives

the shortest solution.

4. Explain about Local Search Algorithms And Optimization Problems.

In many optimization problems, the path to the goal is irrelevant; the goal state itself is

the solution

For example, in the 8-queens problem, what matters is the final configuration of queens,

not the order in which they are added.

In such cases, we can use local search algorithms. They operate using a single current

state (rather than multiple paths) and generally move only to neighbors of that state.

The important applications of these classes of problems are

(a) Integrated-circuit design,

(b)Factory-floor layout,

(c) Job-shop scheduling

(d)Automatic programming,

(e) Telecommunications network optimization,

(f)Vehicle routing, and

(g) Portfolio management.

Key advantages of Local Search Algorithms:

(1) They use very little memory – usually a constant amount;

(2) They can often find reasonable solutions in large or infinite (continuous) state spaces for

which systematic algorithms are unsuitable.

OPTIMIZATION PROBLEMS:

In addition to finding goals, local search algorithms are useful for solving pure

optimization problems, in which the aim is to find the best state according to an objective

function.



State Space Landscape

A landscape has both “location” (defined by the state) and “elevation”(defined by the

value of the heuristic cost function or objective function).

If elevation corresponds to cost, then the aim is to find the lowest valley – a global

minimum; if elevation corresponds to an objective function, then the aim is to find the highest

peak – a global maximum.

Local search algorithms explore this landscape. A complete local search algorithm

always finds a goal if one exists; an optimal algorithm always finds a global

minimum/maximum.

Figure 2.6 A one dimensional state space landscape in which elevation corresponds to

the objective function. The aim is to find the global maximum. Hill climbing search

modifies the current state to try to improve it, as shown by the arrow. The various

topographic features are defined in the text.

1. Hill-climbing search:

The hill-climbing search algorithm as shown in Figure 2.6 is simply a loop that

continually moves in the direction of increasing value – that is, uphill. It terminates when it

reaches a “peak” where no neighbor has a higher value.



function HILL-CLIMBING( problem) return a state that is a local maximum

input: problem, a problem

local variables: current, a node.

neighbor, a node.

current MAKE-NODE(INITIAL-STATE[problem])

loop do

neighbor a highest valued successor of current

if VALUE [neighbor] ≤ VALUE[current] then return STATE[current]

current neighbor

Figure 2.7 The hill-climbing search algorithm (steepest ascent version), which is the

most basic local search technique. At each step the current node is replaced by the best

neighbor; the neighbor with the highest VALUE. If the heuristic cost estimate h is used, we

could find the neighbor with the lowest h.

Hill-climbing is sometimes called greedy local search because it grabs a good neighbor state

without thinking ahead about where to go next. Greedy algorithms often perform quite well.

Problems with hill-climbing:

Hill-climbing often gets stuck for the following reasons:

Local maxima: a local maximum is a peak that is higher than each of its neighboring

states, but lower than the global maximum. Hill-climbing algorithms that reach the

vicinity of a local maximum will be drawn upwards towards the peak, but will then be

stuck with nowhere else to go

Ridges: A ridge is shown in Figure 2.10. Ridges results in a sequence of local maxima

that is very difficult for greedy algorithms to navigate.

Plateaux: A plateau is an area of the state space landscape where the evaluation function

is flat. It can be a flat local maximum, from which no uphill exit exists, or a shoulder,

from which it is possible to make progress.



Figure 2.8 Illustration of why ridges cause difficulties for hill-climbing. The grid of

states(dark circles) is superimposed on a ridge rising from left to right, creating a sequence

of local maxima that are not directly connected to each other. From each local maximum,

all the available options point downhill.

Hill-climbing variations

Stochastic hill-climbing: Random selection among the uphill moves. The selection

probability can vary with the steepness of the uphill move.

First-choice hill-climbing: stochastic hill climbing by generating successors randomly

until a better one is found.

Random-restart hill-climbing: Tries to avoid getting stuck in local maxima.

2. Simulated annealing search

A hill-climbing algorithm that never makes “downhill” moves towards states with lower

value (or higher cost) is guaranteed to be incomplete, because it can stuck on a local maximum.

In contrast, a purely random walk –that is, moving to a successor chosen uniformly at random

from the set of successors – is complete, but extremely inefficient.

Simulated annealing is an algorithm that combines hill-climbing with a random walk in

some way that yields both efficiency and completeness.

Figure 2.9 shows simulated annealing algorithm. It is quite similar to hill climbing.

Instead of picking the best move, however it picks the random move. If the move improves the



situation, it is always accepted. Otherwise, the algorithm accepts the move with some probability

less than 1. The probability decreases exponentially with the “badness” of the move – the amount

E by which the evaluation is worsened.

Simulated annealing was first used extensively to solve VLSI layout problems in the

early 1980s. It has been applied widely to factory scheduling and other large-scale optimization

tasks.

Figure 2.9 The simulated annealing search algorithm, a version of stochastic hill climbing

where some downhill moves are allowed.

3. Genetic algorithms:

A Genetic algorithm (or GA) is a variant of stochastic beam search in which successor

states are generated by combining two parent states, rather than by modifying a single state.

Like beam search, Genetic algorithm begin with a set of k randomly generated states,

called the population.

Each state, or individual, is represented as a string over a finite alphabet – most

commonly, a string of 0s and 1s. For example, an 8 8-quuens state must specify the positions of

8 queens, each in a column of 8 squares, and so requires 8 x log2 8 = 24 bits.



Figure 2.10 The genetic algorithm. The initial population in (a) is ranked by the fitness

function in (b), resulting in pairs for mating in (c). They produce offspring in (d), which are

subjected to mutation in (e).

Figure 2.10 shows a population of four 8-digit strings representing 8-queen states

Fitness function should return higher values for better states.

Cross Over: Each pair to be mated, a cross over point is randomly chosen from the

positions in the string.

Offspring: Created by crossing over the parent strings at the crossover point.

function GENETIC_ALGORITHM( population, FITNESS-FN) return an individual

input: population, a set of individuals

FITNESS-FN, a function which determines the quality of the individual

repeat

new_population empty set

loop for i from 1 to SIZE(population) do

x RANDOM_SELECTION(population, FITNESS_FN)

y RANDOM_SELECTION(population, FITNESS_FN)

child REPRODUCE(x,y)



if (small random probability) then child MUTATE(child )

add child to new_population

population new_population

until some individual is fit enough or enough time has elapsed

return the best individual

Figure 2.11 Genetic algorithms. The algorithm is same as the one diagrammed in Figure.

Local search in continuous spaces:

We have considered algorithms that work only in discrete environments, but real-world

environment are continuous. Local search amounts to maximizing a continuous objective

function in a multi-dimensional vector space. This is hard to do in general. Can immediately

retreat. Discretize the space near each state apply a discrete local search strategy (e.g., stochastic

hill climbing, simulated annealing) Often resists a closed-form solution.

Online search interleave computation and action Compute—Act—Observe—Compute—·

Online search good. For dynamic, semi-dynamic, stochastic domains whenever offline search would yield

exponentially many contingencies Online search necessary for exploration problem States and actions

unknown to agent uses actions as experiments to determine what to do. se

5. Explain CSP in detail.

A constraint satisfaction problem is a special kind of problem satisfies some

additional structural properties beyond the basic requirements for problem in general. In a CSP,

the states are defined by the values of a set of variables and the goal test specifies a set of

constraint that the value must obey.

CSP can be viewed as a standard search problem as follows:

Initial state: the empty assignment {}, in which all variables are

unassigned.

Successor function: a value can be assigned to any unassigned variable,

provided that it does not conflict with previously assigned variables.

Goal test: the current assignment is complete.

Path cost: a constant cost for every step.

Varieties of CSP’s:

Discrete variables.

CSPs with continuous domains.

Varieties of constraints :

Unary constraints involve a single variable.

Binary constraints involve pairs of variables.



Higher order constraints involve 3 or more variables.

A Constraint Satisfaction Problem (or CSP) is defined by a set of variables, X1, X2….Xn,

and a set of constraints C1, C2,…, Cm. Each variable Xi has a nonempty domain D of possible

values. Each constraint Ci involves some subset of variables and specifies the allowable

combinations of values for that subset.

A State of the problem is defined by an assignment of values to some or all of the

variables,{Xi = vi, Xj = vj,…}.

An assignment that does not violate any constraints is called a consistent or legal

assignment. A complete assignment is one in which every variable is mentioned, and a solution

to a CSP is a complete assignment that satisfies all the constraints.

Some CSPs also require a solution that maximizes an objective function.

Example for Constraint Satisfaction Problem:

The map of Australia showing each of its states and territories. We are given the task of

coloring each region either red, green or blue in such a way that the neighboring regions have the

same color.

To formulate this as CSP, we define the variable to be the regions: WA, NT, Q, NSW, V,

SA, and T.

The domain of each variable is the set {red, green, blue}. The constraints require

neighboring regions to have distinct colors; for example, the allowable combinations for WA and

NT are the pairs {(red,green),(red,blue),(green,red),(green,blue),(blue,red),(blue,green)}.

The constraint can also be represented more succinctly as the inequality WA not = NT, provided

the constraint satisfaction algorithm has some way to evaluate such expressions.)

There are many possible solutions such as

{WA = red, NT = green = red, NSW = green, V = red, SA = blue, T = red}.

It is helpful to visualize a CSP as a constraint graph, as shown in Figure 2.13.

The nodes of the graph corresponds to variables of the problem and the arcs correspond

to constraints.



Figure 2.13 The map coloring problem represented as a constraint graph.

CSP can be viewed as a standard search problem as follows:

Initial state: the empty assignment {}, in which all variables are unassigned.

Successor function: a value can be assigned to any unassigned variable, provided that it

does not conflict with previously assigned variables.

Goal test: the current assignment is complete.



Path cost: a constant cost (E.g., 1) for every step.

Every solution must be a complete assignment and therefore appears at depth n if there

are n variables.

Varieties of CSPs

(i) Discrete variables

a. Finite domains:

The simplest kind of CSP involves variables that are discrete and have finite domains.

Map coloring problems are of this kind. The 8-queens problem can also be viewed as finite-

domain CSP, where the variables Q1,Q2,…..Q8 are the positions each queen in columns 1,….8

and each variable has the domain {1,2,3,4,5,6,7,8}.

If the maximum domain size of any variable in a CSP is d, then the number of possible

complete assignments is O(dn) – that is, exponential in the number of variables. Finite domain

CSPs include Boolean CSPs, whose variables can be either true or false.

b. Infinite domains:

Discrete variables can also have infinite domains – for example, the set of integers or the

set of strings. With infinite domains, it is no longer possible to describe constraints by

enumerating all allowed combination of values.

Instead a constraint language of algebraic inequalities such as Start job1 + 5 <= Startjob3.

(ii) CSPs with continuous domains

CSPs with continuous domains are very common in real world. For example, in operation

research field, the scheduling of experiments on the Hubble Telescope requires very precise

timing of observations;

The start and finish of each observation and maneuver are continuous-valued variables

that must obey a variety of astronomical, precedence and power constraints.

The best known category of continuous-domain CSPs is that of linear programming

problems, where the constraints must be linear inequalities forming a convex region.



Varieties of constraints:

(i) Unary constraints involve a single variable.

Example: SA # green

(ii) Binary constraints involve pairs of variables.

Example: SA # WA

(iii) Higher order constraints involve 3 or more variables.

Example: cryptarithmetic puzzles.

Figure 2.14 Cryptarithmetic problems. Each letter stands for a distinct digit; the aim

is to find a substitution of digits for letters such that the resulting sum is arithmetically

correct, with the added restriction that no leading zeros are allowed. (b) The

constraint hypergraph for the cryptarithmetic problem, shown in the Alldiff constraint

as well as the column addition constraints. Each constraint is a square box connected

to the variables it contains.

5 BACKTRACKING SEARCH FOR CSPS

The term backtracking search is used for depth-first search that chooses values for one

variable at a time and backtracks when a variable has no legal values left to assign.



Figure A simple backtracking algorithm for constraint satisfaction problem.

Figure Part of search tree generated by simple backtracking for the map coloring

problem.

Any backtracking search should answer the following questions:

1. Which variable should be assigned next, and in what order should its values be tried?

2. What are the implications of the current variable assignments for the order unassigned

variables?

3. When a path fails – that is a state is reached in which a variable has no legal values can

the search avoid repeating this failure in subsequent paths?



Variable and value ordering

Choosing the variable with the fewest legal values is called minimum remaining values

(MRV) heuristics. It is also called as most constrained variable or fail first heuristic

Degree Heuristic: If the tie occurs among most constrained variables then most

constraining variable is chosen (i.e.,) choose the variable with the most constraints on

remaining variables.

Once a variable has been selected, choose the least constraining value, it is the line that

rules out, the fewest values in the remaining variables.

PROPAGATING INFORMATION THROUGH CONSTRAINTS:

So far our search algorithm considers the constraints on a variable only at the time that

the variable is chosen by SELECT-UNASSIGNED-VARIABLE. But by looking at some of the

constraints earlier in the search, or even before the search has started, we can drastically reduce

the search space.

1. Forward checking

One way to make better use of constraints during search is called forward checking.

Whenever a variable X is assigned, the forward checking process looks at each unassigned

variable Y that is connected to X by a constraint and deletes from Y ’s domain any value that is

inconsistent with the value chosen for X. the following Figure shows the progress of a map-

coloring search with forward checking.



2. Constraint propagation:

Figure 2.17 forward checking

Although forward checking detects many inconsistencies, it does not detect all of them.

Constraint propagation is the general term for propagating the implications of a constraint

on one variable onto other variables.

Figure Constraint propagation



5 LOCAL SEARCH FOR CSPS

6 THE STRUCTURE OF PROBLEMS

Problem Structure

Independent Sub problems

Figure Independent Sub problems



Tree-Structured CSPs

Figure Tree-Structured CSPs




QUESTION BANK



UNIT – II

PART -A (2 Marks)

LOGICAL REASONING

1. What factors determine the selection of forward or backward reasoning

approach for an AI problem?

A search procedure must find a path between initial and goal states. There are two directions in

which a search process could proceed. Reason forward from the initial states: Being formed the

root of the search tree. General the next level of the tree by finding all the rules whose left sides

match the root node, and use their right sides to generate the siblings. Repeat the process until a

configuration that matches the goal state is generated.

2. What are the limitations in using propositional logic to represent the

knowledge base?

Formalise the following English sentences:

Al is small

Ted is small

Someone is small

Everyone is small

No-one is not small

Propositional Logic would represent each of these as a different Proposition,

so the five propositions might be represented by P. Q, R, S and T

What this representation is missing is the similarity between the propositions,

they are all concerned with the relation ’small’

Predicate logic allows relations and quantification (which allows the

representation of English descriptors like someone, everyone and noone)

3. What are Logical agents?

Logical agents apply inference to a knowledge base to derive new information and

make decisions.

4.What is first-order logic?

The first-order logic is sufficiently expressive to represent a good deal of our

commonsense knowledge. It also either subsumes or forms the foundation of many other

representation languages.




5. What is a symbol?

The basic syntactic elements of first-order logic are the symbols. It stands for

objects, relations and functions.

6. What are the types of Quantifiers?

The types of Quantifiers are,

Universal Quantifiers;

Existential Quantifiers.

7. What are the three kinds of symbols?

The three kinds of symbols are,

Constant symbols standing for objects;

Predicate symbols standing for relations;

Function symbols standing for functions.

8. What is Logic?

Logic is one which consist of

A formal system for describing states of affairs, consisting of a)

Syntax b) Semantics;

Proof Theory – a set of rules for deducing the entailment of set

sentences.

9. Define a Sentence?

Each individual representation of facts is called a sentence. The sentences are

expressed in a language called as knowledge representation language.

10. Define a Proof.

A sequence of application of inference rules is called a proof. Finding proof is

exactly finding solution to search problems. If the successor function is defined to generate all

possible applications of inference rules then the search algorithms can be applied to find proofs.

11. Define Interpretation

Interpretation specifies exactly which objects, relations and functions are referred

to by the constant predicate, and function symbols.

12. What are the three levels in describing knowledge based agent?

The three levels in describing knowledge based agent

Logical level;

Implementation level;

Knowledge level or epistemological level.




13. Define Syntax?

Syntax is the arrangement of words. Syntax of a knowledge describes the possible

configurations that can constitute sentences. Syntax of the language describes how to make

sentences.

14. Define Semantics

The semantics of the language defines the truth of each sentence with respect to

each possible world. With this semantics, when a particular configuration exists within an agent,

the agent believes the corresponding sentence.

15. Define Modus Ponen’s rule in Propositional logic?

The standard patterns of inference that can be applied to derive chains of

conclusions that lead to the desired goal is said to be Modus Ponen’s rule.

16. Define a knowledge Base.

Knowledge base is the central component of knowledge base agent and it is

described as a set of representations of facts about the world.

17. Define an inference procedure.

An inference procedure reports whether or not a sentence is entitled by

knowledge base provided a knowledge base and a sentence . An inference procedure ‘i’ can be

described by the sentences that it can derive.

If i can derive from knowledge base, we can write, KB Alpha is derived from

KB or i derives alpha from KB

18. What are the basic Components of propositional logic?

The basic Components of propositional logic

Logical Constants (True, False)

Propositional symbols (P, Q)

Logical Connectives (^,=, , )

19. Define AND –Elimination rule in propositional logic.

AND elimination rule states that from a given conjunction it is possible to inference

any of the conjuncts.

1 ^ 2------^ n

i

20. Define AND-Introduction rule in propositional logic.

AND-Introduction rule states that from a list of sentences we can infer their

conjunctions.

1, 2,…….. n

1^ 2^…….^ n




21. What is forward chaining?

A deduction to reach a conclusion from a set of antecedents is called forward

chaining. In other words, the system starts from a set of facts, and a set of rules, and tries to find

the way of using these rules and facts to deduce a conclusion or come up with a suitable course

of action.

22. What is backward chaining?

In backward chaining, we start from a conclusion, which is the hypothesis we wish

to prove, and we aim to show how that conclusion can be reached from the rules and facts in the

data base.

PRAT B

1. Explain in detail about knowledge engineering process in FOL.

KNOWLEDGE REPRESENTATION

Intelligent agents need knowledge about the world in order to reach good

decisions.

Knowledge is contained in agents in the form of sentences in a knowledge

representation language that are stored in knowledge base.

Logic is the formal systematic study of the principles of valid inference and

correct reasoning.

A system of inference rules and axioms allows certain formulas to be derived,

called theorems: which may be interpreted as true propositions.

Knowledge representation languages should be declarative, compositional,

expressive, context-independent, and unambiguous.

FIRST ORDER LOGIC:

Syntax

Let us first introduce the symbols, or alphabet, being used. Beware that there are all sorts

of slightly different ways to define FOL.

Alphabet

Logical Symbols: These are symbols that have a standard meaning, like: AND, OR,

NOT, ALL, EXISTS, IMPLIES, IFF, FALSE, =.




Non-Logical Symbols: divided in:

Constants:

Predicates: 1-ary, 2-ary, n-ary. These are usually just identifiers.

Functions: 0-ary, 1-ary, 2-ary, n-ary. These are usually just identifiers. 0-ary

functions are also called individual constants.

Where predicates return true or false, functions can return any value.

Variables: Usually an identifier.

One needs to be able to distinguish the identifiers used for predicates, functions, and

variables by using some appropriate convention, for example, capitals for function and predicate

symbols and lower cases for variables.

Terms

A Term is either an individual constant (a 0-ary function), or a variable, or an n-ary

function applied to n terms: F(t1 t2 ..tn)

[We will use both the notation F(t1 t2 ..tn) and the notation (F t1 t2 .. tn)]

Atomic Formulae

An Atomic Formula is either FALSE or an n-ary predicate applied to n terms: P(t1 t2 ..

tn). In the case that "=" is a logical symbol in the language, (t1 = t2), where t1 and t2 are terms, is

an atomic formula.

Literals

A Literal is either an atomic formula (a Positive Literal), or the negation of an atomic

formula (a Negative Literal). A Ground Literal is a variable-free literal.

Clauses

A Clause is a disjunction of literals. A Ground Clause is a variable-free clause. A Horn

Clause is a clause with at most one positive literal. A Definite Clause is a Horn Clause with

exactly one positive Literal.

Notice that implications are equivalent to Horn or Definite clauses:

(A IMPLIES B) is equivalent to ( (NOT A) OR B)




(A AND B IMPLIES FALSE) is equivalent to ((NOT A) OR (NOT B)).

Formulae

A Formula is either:

• an atomic formula, or

• a Negation, i.e. the NOT of a formula, or

• a Conjunctive Formula, i.e. the AND of formulae, or

• a Disjunctive Formula, i.e. the OR of formulae, or

• an Implication, that is a formula of the form (formula1 IMPLIES formula2), or

• an Equivalence, that is a formula of the form (formula1 IFF formula2), or

• a Universally Quantified Formula, that is a formula of the form (ALL variable formula).

We say that occurrences of variable are bound in formula [we should be more precise].

Or

• a Existentially Quantified Formula, that is a formula of the form (EXISTS variable

formula). We say that occurrences of variable are bound in formula [we should be more

precise].

An occurrence of a variable in a formula that is not bound is said to be free. A formula

where all occurrences of variables are bound is called a closed formula, one where all variables

are free is called an open formula.

A formula that is the disjunction of clauses is said to be in Clausal Form. We shall see

that there is a sense in which every formula is equivalent to a clausal form.

Often it is convenient to refer to terms and formulae with a single name. Form or

Expression is used to this end.

Substitutions

Given a term s, the result [substitution instance] of substituting a term t in s for a variable x,

s[t/x], is:

t, if s is the variable x

y, if s is the variable y different from x




F(s1[t/x] s2[t/x] .. sn[t/x]), if s is F(s1 s2 .. sn).

Given a formula A, the result (substitution instance) of substituting a term t in A for a

variable x, A[t/x], is:

FALSE, if A is FALSE,

P(t1[t/x] t2[t/x] .. tn[t/x]), if A is P(t1 t2 .. tn),

(B[t/x] AND C[t/x]) if A is (B AND C), and similarly for the other connectives,

(ALL x B) if A is (ALL x B), (similarly for EXISTS),

(ALL y B[t/x]), if A is (ALL y B) and y is different from x (similarly for EXISTS).

The substitution [t/x] can be seen as a map from terms to terms and from formulae to

formulae. We can define similarly [t1/x1 t2/x2 .. tn/xn], where t1 t2 .. tn are terms and x1 x2 .. xn

are variables, as a map, the [simultaneous] substitution of x1 by t1, x2 by t2, .., of xn by tn. [If

all the terms t1 .. tn are variables, the substitution is called an alphabetic variant, and if they are

ground terms, it is called a ground substitution.] Note that a simultaneous substitution is not the

same as a sequential substitution.

3 SYNTAX AND SEMANTICS OF FOL:

FOL have objects in them.

The domain of a model is the set of objects it contains; these objects are sometimes called

as domain elements

SEMANTICS :

Before we can continue in the "syntactic" domain with concepts like Inference Rules and

Proofs, we need to clarify the Semantics, or meaning, of First Order Logic.

An L-Structure or Conceptualization for a language L is a structure M= (U,I), where:

• U is a non-empty set, called the Domain, or Carrier, or Universe of Discourse of M, and

• I is an Interpretation that associates to each n-ary function symbol F of L a map

I(F): UxU..xU -> U

and to each n-ary predicate symbol P of L a subset of UxU..xU.




The set of functions (predicates) so introduced form the Functional Basis (Relational Basis) of

the conceptualization.

Given a language L and a conceptualization (U,I), an Assignment is a map from the variables of

L to U. An X-Variant of an assignment s is an assignment that is identical to s everywhere

except at x where it differs.

Given a conceptualization M=(U,I) and an assignment s it is easy to extend s to map each term t

of L to an individual s(t) in U by using induction on the structure of the term.

Then

• M satisfies a formula A under s if

o A is atomic, say P(t1 .. tn), and (s(t1) ..s(tn)) is in I(P).

o A is (NOT B) and M does not satisfy B under s.

o A is (B OR C) and M satisfies B under s, or M satisfies C under s. [Similarly for all

other connectives.]

o A is (ALL x B) and M satisfies B under all x-variants of s.

o A is (EXISTS x B) and M satisfies B under some x-variants of s.

• Formula A is satisfiable in M iff there is an assignment s such that M satisfies A under s.

• Formula A is satisfiable iff there is an L-structure M such that A is satisfiable in M.

Formula A is valid or logically true in M iff M satisfies A under any s. We then say that M

is a model of A.

• Formula A is Valid or Logically True iff for any L-structure M and any assignment s, M

satisfies A under s.

Some of these definitions can be made relative to a set of formulae GAMMA:

• Formula A is a Logical Consequence of GAMMA in M iff M satisfies A under any s that

also satisfies all the formulae in GAMMA.

• Formula A is a Logical Consequence of GAMMA iff for any L-structure M, A is a

logical consequence of GAMMA in M. At times instead of "A is a logical consequence

of GAMMA" we say "GAMMA entails A".




USING FIRST ORDER LOGIC:

We say that formulae A and B are (logically) equivalent if A is a logical consequence of

{B} and B is a logical consequence of {A}.An Inference Rule is a rule for obtaining a new

formula [the consequence] from a set of given formulae [the premises].

A most famous inference rule is Modus Ponens:

{A, NOT A OR B}

B

For example:

{Sam is tall, if Sam is tall then Sam is unhappy}

Sam is unhappy

When we introduce inference rules we want them to be Sound, that is, we want the consequence

of the rule to be a logical consequence of the premises of the rule. Modus Ponens is sound. But

the following rule, called Abduction, is not:

{B, NOT A OR B} is not. For example:

John is wet

If it is raining then John is wet

It is raining gives us a conclusion that is usually, but not always true [John takes a shower even

when it is not raining].

A Logic or Deductive System is a language, plus a set of inference rules, plus a set of logical

axioms [formulae that are valid].

A Deduction or Proof or Derivation in a deductive system D, given a set of formulae

GAMMA, is a sequence of formulae B1 B2 .. Bn such that:

• for all i from 1 to n, Bi is either a logical axiom of D, or an element of GAMMA, or is

obtained from a subset of {B1 B2 .. Bi-1} by using an inference rule of D.

In this case we say that Bn is Derived from GAMMA in D, and in the case that GAMMA is

empty, we say that Bn is a Theorem of D.

Soundness, Completeness, Consistency, Satisfiability

A Logic D is Sound iff for all sets of formulae GAMMA and any formula A:

• if A is derived from GAMMA in D, then A is a logical consequence of GAMMA




A Logic D is Complete iff for all sets of formulae GAMMA and any formula A:

• If A is a logical consequence of GAMMA, then A can be derived from GAMMA in D.

A Logic D is Refutation Complete iff for all sets of formulae GAMMA and any formula A:

• If A is a logical consequence of GAMMA, then the union of GAMMA and (NON A) is

inconsistent

Note that if a Logic is Refutation Complete then we can enumerate all the logical

consequences of GAMMA and, for any formula A, we can reduce the question if A is or not a

logical consequence of GAMMA to the question: the union of GAMMA and NOT A is or not

consistent. We will work with logics that are both Sound and Complete, or at least Sound and

Refutation Complete.

KNOWLEDGE ENGINEERING IN FIRST-ORDER LOGIC:-

Knowledge engineering is the general process of knowledge base construction. A

knowledge engineer is someone who investigates particular domain, learns what

concepts are important is that domain ad creates a formal representation of the objects

and relations in the domain.

Types of knowledge bases:-

Two types:-

Special

General

THE KNOWLEDGE ENGINEERING PROCESS:-

1) Identify the task

2) Assemble the relevant knowledge

3) Decide on a vocabulary of prediction, function and constants

4) Encode general knowledge about the domain

5) Encode a description of the specific problem instance

6) Pose quries to the inference procedure and get answer

7) Debug the knowledge base

1) IDENTIFY THE TASK:-




The knowledge engineer must define that the range of question that the knowledge base

will support and the kind of facts that will be available for each specific problem instance

For example will the same facts include current location

2) ASSEMBLE THE RELEVANT KNOWLEDGE:-

The knowledge engineer might already be an expert in the domain or might need to work

real experts to extract what they know –a process called knowledge

Acquisition:-

For real domain the issue of relevance can be quit difficult for example system for

simulating VLSI design might or might not need to take into account stray capacitance and skin

effects

3) DECIDE ON A VOCABULARY OF PREDICATES, FUNCTION AND CONSTANTS:-

The important domain level concept are translated into logic level names.

This involves many question of knowledge engineering style.

Like programming style this can have a significant impact on the eventual success of the

project.

Once the choice have been made the result is a vocabulary that is known as the ontology

of the domain.

Ontology means a particular theory of the nature of being or existence it determine

What kind of things exists but does not determine their specific properties and

interrelationships.

4) ENCODE GENERAL KNOWLEDGE ABOUT THE DOMAIN:-

The knowledge engineer writes down the axioms for all the vocabulary terms.

This miss down the meaning of the terms, enabling the expert to check the content.

Often this step reveals misconceptions or gaps in the vocabulary that must fixed by

returning to step there and iterating through the process.

5) ENCODE A DESCRIPTION OF THE SPECIFIC PROBLEM INSTANCE:-

It will involve writing simple atomic sentences above instance of concepts that are

already part of the ontology.

For a logical agent, problem instances are supplied by the sensor, where a “real”

knowledge base is supplied with additional sentence in the same way that traditional

program are supplied with input data.




All gates have one output terminal. Circuits, like gates, have input and output terminals.

to reason about functionality and compare connectivity, it is needed to talk about the

wires themselves, the path the wires take , all the junctions where the two wires comes

together one output terminal is connected to another input terminal without having to

mention the wire that actually connects them.

REPRESENTATION OF GATES:-

A gate must be distinguished from the other gate by remaining them with constants,

x1and x2.

Ways to represent gates:-

Function: - type (x1)=XOR

Binary predicates: type(x1, XOR)

Several individual types predicates: XOR(x1)

The function type avoids the need for axioms stating that each individual gate can have

only one type.

REPRESENTATION OF TERMINALS:-

A gate of circuit can have one or more input terminals and one or more output terminals

Each terminal could be named with a constant. Thus a gate x1 could have terminals

named x1 IN1, X1 IN2 and X1OUT1.

The suggestion into the avoid generating long compound names.

It is probably better to named the gate using a function, like IN (1, X1) to denote the first

input terminal for gate X1. A similar function OUT is used for output terminals.

REPRESENTATION OF CONNECTIVITY:-

The connectivity between the gates can be represented by the predicates connected

(OUT(1,X1),IN(1,X2)).

REPRESENTATION OF SIGNALS:-

To know whether a signal is on or off, use a many predicates ‘on’, which is true when the

signal at the terminal is on.




For answering question like, what are all the possible values of signals at the output terminals of a

circuit C1? introduce to signal values 1 and 0, and a function ‘signal’ that takes a terminal as

argument and denotes the signal value for that terminal

3. Discuss in detail about unification and lifting.

LIFTING:-

Generalized modus ponens is a lifted main of modus ponens – it main modus ponens

from proportional to first – order logic. The key advantage of lifted inference rules over

proportionalization is that they make only those substitutions which are replied to allow

particular inferences to proceed.

UNIFICATION:-

Lifted inference rules require finding substitution that make difficult logical expressions

look identical. This process is called ‘Unification’ and is a key component of all first order

inference algorithms. The UNIFY algorithm takes two sentences and return a ‘unifier’ for then if

are exists,

UNIFY(p,q)= θ where SUBSET(θ,p)=SUBSET(θ,q)

STANDARDIZING AGENT:-

The problem can be avoided by ‘Standardizing agent’ one of the two sentences being

unified, which means renaming its variables to avoid name clashes. For example , we can rename

x in knows(x,Elizabeth) to Z17( a new variable name) without changing its meaning. Now the

unification will work,

UNIFY(knows(John,x), knows(Z17,Elizabeth))={x/Elizabeth,Z17/John}

MOST GENERAL UNIFIER:-

For every unfixable pair of expression, there is a single most general unifier that is unique

up to renaming of variables.

In this case, it is{y/John,x/z}

Occur check:-

When matching a variable against a complex term, one must check whether the variable

itself occur inside the term, if it does, the match fails became no consistent unifier can be

constructed. This is so called occur check make the complexity of the entire algorithm quadratic

in the size of the expression being unified.




THE UNIFICATION ALGORITHM:-

Function UNIFY(x,y, θ) returns a substitution to make x and y identical

Inputs:x, a variable, constant, list on compound

Y, a variable, constant, list or compound

Θ, the substitution built-up so far (options, default to empty)

If θ= failure then return failure

Else if x=y then return θ’else if VARIABLE?(x) then return UNIFY-VAR(x, y, θ)

Else if VARIABLE?(y) then return UNIFY-VAR(y, x, θ)

Else if COMPOUND?(x) and COMPOUND?(y) then return

UNIFY(ARGS[x],ARGS[y],UNIFY(op[x],op[y], θ))

Else if LIST?(x) and LIST?(y) then return

UNIFY(REST[x],REST[y],UNIFY(FIRST[x],FIRST[y], θ))

Else return failure

Function UNIFY-VAR (Var, x, θ) returns a substitution

Inputs: var, a variable

X, any expression

Θ, the substitution built up so far

If{var/val}Ɛ θ then return UNIFY(val, x, θ)

Else if {x/val}Ɛ θ then return UNIFY(var, val, θ)

Else if OCCUR-CHECK? (Var, x) then return failure

Else return add{var/x} to θ

WORKING:-

Recursively explore the two expressions simultaneously “side by side”, building up a

unifier along the way. But it fails if two compounding points in the structures do not

match. There is one expensive step.

STORAGE AND RETRIEVAL:-

STORE(S)-stores a sentence‘s’ into the knowledge base

FETCH (q)-returns all unifier such that the query ‘q’ unifier with some sentence in the

knowledge – base.

Predicate indexing:-

It is a simple scheme.

It puts all the ‘knows’ facts in one bucket and all the Brother – facts in another. The




buckets can be stored in a hash table – for efficient access.

Predicate indexing is useful when there are many predicate symbols but only a few

clauses for each symbol.

SUBSCRIPTION LATTICE:-

Employs (AIMA.org, Richard) DoesAIMA.org employ Richard?

Employs (x.Richard) who employs Richard/

Employs (AIMA.org, y) whom does AIMA.org employ?

Employs(x, y) who employs whom?

These queries from a sub assumption lattice, A sentence with repeated constants has a slightly

different lattice.

4. Explain in detail about forward and backward chaining with example.

Efficient forward chaining;

Incremental forward chaining;

Backward chaining;

FORWARD CHAINING:-

A forward chaining algorithm start with the atomic sentence in the knowledge base and

applies modus ponens in the forward direction, adding new atomic sentences, until no further

influences can be made.

FIRST ORDER DEFINITE CLAUSES:-

First – order definite clauses are disjunction of literals of which exactly one is positive.

They closely assemble propositional definite clauses.

The following are first – order definite clauses,

King(x) ^ greedy(x)=>evil(x)

King (john)

Greedy(y)

First order literals can include variables, in which case those variables are assumed to be

universally qualified.

DATALOG:-




n

The knowledge base contains no function symbols and is therefore an instance of the

class of ‘data log’ knowledge bases – that is, sets of first order definite clauses with no function

symbols.

SIMPLE FORWARD CHAINING ALGORITHM:-

Function FOL-FC-ASK (KB, α) returns a substitution or false

Inputs: KB, the knowledge base, a set of first – order definite clauses

Α, the query, an atomic sentence

Local variables: new, the new sentences inferred on each iteration

Repeat until new is empty

New{}

For each sentence r in KB do

(P1Ʌ….Ʌ Pn=>q)STANDARDIZE – APART(r)

For each θ such that SUBSET (θ, P1Ʌ….Ʌ Pn)= SUBSET(θ, P11Ʌ….Ʌ P 1

)

1




For some, P11……. Pn

Q1SUBSET (θ,q)in KB

If q1

is not a renaming of some sentence already in KB or new then do

Add q1 to new

ΦUNIFY (q1,α)

If φ is not fail then return φ

Add new to

KB Return

false

WORKING:-

Starting from the known facts, it triggers all the rules whose premises are

satisfied, adding their conclusions to the known fact. The process repeats until the query is

answered or no new fact is added. Notice, that a fact is not ‘new’ if it s just a ‘renaming’ of a

known fact. One sentence is a renaming of another if they are identical except for the names of

the variables.

EXAMPLE:-

Crime problem can be used to show how FOL-FC-ASK

words

The implication sentences are rule 3, 6,7

& 8. Two iterations are required,

On the first iteration, rule(3) has unsatisfied

premises.

Rule 6 is satisfied with {x/M1}, and sells{West, M1, none) is

added

Rule 7 is satisfied with {x/ M1} and weapon (M1) is

added. Rule 8 is satisfied with {x/none} and

Hostile(none)is added.

On the second iteration, rule 3 is satisfied with {x/West, y/ M1, z/none}, and

(revival

(west) is

added.

Figure:- proof tree generated by forward

chaining




FIXED POINT:-

Notice that no new inferences are possible at this point because every sentence that

could be concluded by forward chaining is already contained explicitly in the KB. Such a

knowledge base is called a fixed point of the inference process.

6 BACKWARD CHAINING:-

These algorithm work background from the goal, chaining through rules to find

known facts that support the proof.

BACKWARD CHAINING ALGORITHM:-

Function FOL-BC-ASK (KB, goals, θ) returns a set of substitutions

Inputs: KB, a knowledge

base

Goals, a list of conjuncts forming a query (θ already

applied) Θ, the current substitution, initially the empty

substitution {} Local variables: answer, a set of

substitutions, initially empty If goals are empty then

return {θ}

QSUBSET (θ, FIRST

(goals))

For each sentence r is KB where STANDARDIZE-APART(r)=(P1Ʌ….Ʌ Pn=>q)

And θ1UNIFY

(q,q1)succeeds

New goals

[P1……Pn/REST(goals)]

AnswerFOL-BC-ASK(KB,var-goals, COMPOSE(θ1,θ)) U answers

Return answer

WORKING:-

FOL-BC-ASK algorithm is called with a list of goals containing a single element,

the original query and returns the set of all substitutions satisfying the query.

The algorithm takes the first goal in the list and finds every clause in the knowledge




base whose positive literal or head unifies with the goal.

Each such clause creates a new recursive call in which the premise or body of the

clause is added to the goal stack.

TO PROVE THAT WEST IS A CRIMINAL THE FOLLOWING STEPS

ARE FOLLOWED:-

(i) The tree should be read depth first, left to right

(ii) Prove the form conjuncts below it to prove criminal (west).

(iii) Some of these are in the knowledge base, and others require further

backward chaining.

(iv) Binding for each successful unification is shown next to the corresponding sub

goal. (v) Thus, by the time FOL-BC-ASK gets to the last conjunct, originally Hostile

(z), z is already bound to Nono.

DISADVANTAGES:-

(i) Since it is clearly a depth – first search algorithm, its space requirements are linear

in size.

(ii) It suffers from problem with repeated states and incompleteness.

LOGIC PROGRAMMING:-

Logic programming is a technology that comes fairly close to representing that

systems should be constructed by expressing knowledge is a formal language.

EFFICIENT FORWARD CHAINING:-

The forward chaining algorithm designed for case of understanding rather than

for efficiency of operation. These are three possible success of complexity.

First the ‘inner loop’ of the algorithm involves finding all possible unifies such that

the premise of a rule unifier with a suitable set of facts in the knowledge base. This is often

called pattern matching and can be very expensive.

Second, the algorithm rechecks every rule on every iteration to see whether its

premises are satisfied, even if every few additions are made to the knowledge base on each




iteration. Finally, the algorithm might generate many facts that are irrelevant to the goal.

5. What is resolution? Explain it in detail.

Resolution interference rule.

RESOLUTION

Completeness

Theorem:-

For first order logic, showing that any entailed sentence has a finite proof.

Incompleteness Theorem:-

The theorem states that a logical system that includes the principle of induction –

without in which way a little of discrete mathematics can be constructed – is necessarily

incomplete. Conjunctive Normal Form for First Order Logic (CNF):-

That is, a conjunction of clauses, where each clause is a disjunction of literals – literals

can contain variables, which are assumed to be universally qualified. For

example, X American(x)Ʌweapon(y)Ʌsells(x,

y,z)ɅHostile(z)=>criminal(x)

Becomes, in CNF

A American(x)V weapon(y)Vsells(x, y, z)VHostile(z)Vcriminal(x)

Every sentence of first order logic can be converted into an inferentially equivalent CNF

sentence.

The procedure for conversion to CNF is very similar to the propositional case is given

below, We will illustrate the procedure by translating the sentence “Everyone who loves all

animals is loved by someone” or

Ʉx [Ʉy animal(y)=> lover(x,y)]=>[϶Y lover(y,x)]

THE STEPS ARE:-

Eliminate implications:-

Ʉx [Ʉy animal(y)=> lover(x,y)]=>[϶Y lover(y,x)]

More Forward:-

In addition to the usual rules for recapture connections, we need rules for regenerate

the quantifiers, thus we have,

┐Ʉxp becomes for all x ┐p




┐Ʉxp becomes for all x ┐p

Our sentence goes through the following transformation,

Ʉx[϶y┐(┐animal(y)ν lover(x,y))]ν[϶y lover(y,x)]

Ʉx[϶y┐┐animal(y)Ʌνlover(x,y))]ν[϶y lover(y,x)]

Ʉx[϶y animal(y)Ʌνlover(x,y))]ν[϶y lover(y,x)]

Standardize Variables:-

For sentence like (Ʉx p(x))ν(϶(x)(x))

Thus, we have,

Ʉx[϶y animal(y)Ʌ νlover(x,y))]ν[϶z lover(z,x)]

Skolemnize:- skolemization in the process of removing

Existential qualifiers by elimination.

In the simple case, it is just like the existential instant ion rule, translate ϶x p(x) into

p(a), where A is a new constant.

If we apply this rule, we get,

Ʉx[϶y animal(A)Ʌ νlover(x,A))]ν[϶z lover(B,x)]

Which has the wrong meaning entirely?

Thus we want the skolem entities to depend on x,

Ʉx[϶y animal(f(x))Ʌ νlover(y,f(x)))]ν[϶z lover(g(x),x)]

Here f & g are skolem function

The general rule is that the arguments of the skolem function are all the

universally quantified variables in where scope of the existential quantifier

appear.

Drop Universal Quantifier:-

At this point, all remaining variables must be universally quantified. Moreover,

the sentence is equivalent to one in which all the universal quantifiers have been moved to

the left. We can therefore drop the universal quantifies,

[Animal (f(x)Ʌ νlover(x,y))]^[┐ lover(x,f(x))]νlover(G(x),x)].

LEARNING HEURISTIC FROM EXPERIENCE:-

A heuristics function h(n) is supported to instance the cost of a solution beginning

from the state at node ‘n’.




Learning Solution can be Done In Two Ways:-

Relaxed problem can be devised for which an optical solution can be found

easily. Next is to learn from experience.

Inductive learning:-

Inductive learning algorithm can be used to constructed a function h(n) that can

predict solution cost for other states that arise during search.

The resolution inference rule:-

The resolution rule for first-order clauses is simply a lifted version of the

propositional resolution rule. Propositional literals are complementary if one is the

negation of the other. Thus, we have,

Lv……………………vlk,

m,v………vmn

-------------------------------------------------------------------

----

SUBST(Q,l,V……vli-1vli+1…..vlk vm,v..vmj-1 vmj+1

v….vmn) Where UNIFY(li,┐mj)=Q.

Completeness of Resolution:-

This solution given competences proof of solution, the basic structure of proof is given below,

1. First, we observe that if ‘S’ is unsatisfiable, then there exists a particular set

of ground instances of the clauses of ‘S’ such that this set in who

unsatisfiable (heuristics them)

2. We then appeal to the ground resolution

theorem(OTTER).

3. We then are a lifting lemma

Structure of a competences proof for resolution

Any set of sentence S is repeatable in easel from

Assume S is unsatisfiable, and is clausal

Hubrands

them

Some set S’ of ground instance is unsatishiable




Ground resolution theorem

Resolution can find a contradiction in s’

Lifting

lemma

There is a resolution proof for the contradiction in S’

To carry out the first step, we need these new concepts:-

1. Herbrand universe

2.

Saturation

3. Her brand base



QUESTION BANK



UNIT – III

PART -A (2 Marks)

PLANNING

1. Define partial order planner.

Basic Idea

– Search in plan space and use least commitment, when possible

• Plan Space Search

– Search space is set of partial plans

– Plan is tuple <A, O, B>

• A: Set of actions, of the form (ai : Opj)

• O: Set of orderings, of the form (ai < aj)

• B: Set of bindings, of the form (vi = C), (vi ¹ C), (vi = vj) or (vi ¹ vj)

– Initial plan:

• <{start, finish}, {start < finish}, {}>

• start has no preconditions; Its effects are the initial state

• finish has no effects; Its preconditions are the goals

2. What are the differences and similarities between problem solving and

planning?

we put these two ideas together to build planning agents. At the most abstract level, the task of planning is the

same as problem solving. Planning can be viewed as a type of problem solving in which the agent uses beliefs

about actions and their consequences to search for a solution over the more abstract space of plans, rather than

over the space of situations

3. Define state-space search.

The most straightforward approach is to use state-space search. Because the

descriptions of actions in a planning problem specify both preconditions and effects, it is

possible to search in either direction: either forward from the initial state or backward from the

goal

4. What are the types of state-space search?

The types of state-space search are,

Forward state space search;

Backward state space search.

5.What is Partial-Order Planning?

A set of actions that make up the steps of the plan. These are taken from the set of

actions in the planning problem. The “empty” plan contains just the Start and Finish actions.



Start has no preconditions and has as its effect all the literals in the initial state of the planning

problem. Finish has no effects and has as its preconditions the goal literals of the planning

problem.

6. What are the advantages and disadvantages of Partial-Order Planning?

Advantage: Partial-order planning has a clear advantage in being

able to decompose problems into sub problems.

Disadvantage: Disadvantage is that it does not represent states

directly, so it is harder to estimate how far a partial-order plan is

from achieving a goal.

7. What is a Planning graph?

A Planning graph consists of a sequence of levels that correspond to time steps in

the plan where level 0 is the initial state. Each level contains a set of literals and a set of actions.

8. What is Conditional planning?

Conditional planning is also known as contingency planning, conditional planning

deals with incomplete information by constructing a conditional plan that accounts for each

possible situation or contingency that could arise

9. What is action monitoring?

The process of checking the preconditions of each action as it is executed, rather

than checking the preconditions of the entire remaining plan. This is called action monitoring.

10. Define planning.

Planning can be viewed as a type of problem solving in which the agent uses

beliefs about actions and their consequences to search for a solution.

11. List the features of an ideal planner?

The features of an ideal planner are,

The planner should be able to represent the states, goals and

actions;

The planner should be able to add new actions at any time;

The planner should be able to use Divide and Conquer method for

solving very big problems.

12. What are the components that are needed for representing an action?

The components that are needed for representing an action are,

Action description;

Precondition;

Effect.

13. What are the components that are needed for representing a plan?

The components that are needed for representing a plan are,

A set of plans steps;

A set of ordering constraints;

A set of variable binding constraints;



A set of casual link protection.

14. What are the different types of planning?

The different types of planning are,

Situation space planning;

Progressive planning;

Regressive planning;

Partial order planning;

Fully instantiated planning.

15. Define a solution.

A solution is defined as a plan that an agent can execute and that guarantees the

achievement of goal.

16. Define complete plan and consistent plan.

A complete plan is one in which every precondition of every step is achieved by

some other step.

A consistent plan is one in which there are no contradictions in the ordering or

binding constraints.

17. What are Forward state-space search and Backward state-space search?

Forward state-space search: It searches forward from the initial

situation to the goal situation.

Backward state-space search: It searches backward from the goal

situation to the initial situation.

18. What is Induction heuristics? What are the different types of induction heuristics?

Induction heuristics is a method, which enable procedures to learn descriptions

from positive and negative examples.

There are two different types of induction heuristics. They are:

Require-link heuristics.

Forbid-link heuristics.

19. Define Reification.

The process of treating something abstract and difficult to talk about as though it

were concrete and easy to talk about is called as reification.

20. What is reified link?

The elevation of a link to the status of a describable node is a kind of reification.

When a link is so elevated then it is said to be a reified link.

21. Define action monitoring.

The process of checking the preconditions of each action as it is executed, rather

than checking the preconditions of the entire remaining plan. This is called action monitoring.

22. What is meant by Execution monitoring?

Execution monitoring is related to conditional planning in the following way. An

agent that builds a plan and then executes it while watching for errors is, in a sense, taking into

account the possible conditions that constitute execution errors.



PART - B

1. Explain partial order planning.

SIMPLE PLANNING AGENT

The agent first generates a goal to achieve and then constructs aplan to achieve it from the Current state

PROBLEMSOLVING TO PLANNING

Representation Using Problem Solving Approach

Forward search

Backward search

Heuristic search Representation Using Planning Approach

STRIPS-standard research institute problem solver.

Representation for states and goals

Representation for plans

Situation space and plan space

Solutions

Why Planning ?

Intelligent agents must operate in the world. They are not simply passive reasoners (Knowledge

Representation, reasoning under uncertainty) or problem solvers (Search), t hey must also acton

the world.

We want intelligent agents to act in “intelligent ways”. Taking purposeful actions, predicting the

expected effect of such actions, composing actions together to achieve complex goals.

E.g. if we have a robot we want robot to decide what to do; how to act to achieve our goals



Planning Problem

How to change the world to suit our needs

Critical issue: we need to reason about what the world will be like after doing a few actions, not

just what it is like now

GOAL: Craig has coffee

CURRENTLY: robot in mailroom, has no coffee, coffee not made, Craig in office etc.

TO DO: goto lounge, make coffe

PARTIAL ORDER PLANNING Partial-Order Planning Algorithms

Partially Ordered Plan

• Plan

• Steps

• Ordering constraints

• Variable binding constraints

• Causal links

• POP Algorithm

• Make initial plan

• Loop until plan is a complete

– Select a subgoal

– Choose an operator

– Resolve threats

Choose Operator

• Choose operator(c, Sneeds)



• Choose a step S from the plan or a new step S by

instantiating an operator that has c as an effect

• If there’s no such step, Fail

• Add causal link S _c Sneeds

• Add ordering constraint S < Sneeds

• Add variable binding constraints if necessary

• Add S to steps if necessary

Nondeterministic choice

• Choose – pick one of the options arbitrarily

• Fail – go back to most recent non-deterministic choice and

try a different one that has not been tried before

Resolve Threats

• A step S threatens a causal link Si _c Sj iff ¬ c ∈

effects(S) and it’s possible that Si < S < Sj

• For each threat

• Choose

–Promote S : S < Si < Sj

–Demote S : Si < Sj < S

• If resulting plan is inconsistent, then Fail

Threats with Variables

If c has variables in it, things are kind of tricky.

• S is a threat if there is any instantiation of the

variables that makes ¬ c ∈ effects(S)

•We could possibly resolve the threat by adding a

negative variable binding constraint, saying that

two variables or a variable and a constant

cannot be bound to one another



R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy

• Another strategy is to ignore such threats until the very end, hoping that the variables will

become bound and make things easier to deal.

2. Discuss about planning graphs in detail.

Planning graphs for heuristic estimation;

The GRAPHPLAN algorithm;

Termination of GRAPHPLAN.

3. Explain planning with State-Space Search in detail.

LEARNING HEURISTIC FROM EXPERIENCE:-

A heuristics function h(n) is supported to instance the cost of a solution beginning

from the state at node ‘n’.

Learning Solution can be Done In Two Ways:-

Relaxed problem can be devised for which an optical solution can be found

easily. Next is to learn from experience.

Inductive learning:-

Inductive learning algorithm can be used to constructed a function h(n) that can

predict solution cost for other states that arise during search.

The resolution inference rule:-

The resolution rule for first-order clauses is simply a lifted version of the

propositional resolution rule. Propositional literals are complementary if one is the

negation of the other. Thus, we have,

Lv……………………vlk,

m,v………vmn

-------------------------------------------------------------------

----

SUBST(Q,l,V……vli-1vli+1…..vlk vm,v..vmj-1 vmj+1

v….vmn) Where UNIFY(li,┐mj)=Q.

Completeness of Resolution:-

This solution given competences proof of solution, the basic structure of proof is given below,

1. First, we observe that if ‘S’ is unsatisfiable, then there exists a particular set

of ground instances of the clauses of ‘S’ such that this set in who

unsatisfiable (heuristics them)

2. We then appeal to the ground resolution

theorem(OTTER).




3. We then are a lifting lemma

Structure of a competences proof for resolution

Any set of sentence S is repeatable in easel from

Assume S is unsatisfiable, and is clausal

Hubrands

them

Some set S’ of ground instance is unsatishiable

Ground resolution theorem

Resolution can find a contradiction in s’

Lifting

lemma

There is a resolution proof for the contradiction in S’

To carry out the first step, we need these new concepts:-

1. Herbrand universe

2.

Saturation

3. Her brand base

LIFTING LEMMA:-

It lifts a proof step from ground clauses up to general first-order clauses. In order to

prove his basic lifting lemma, Robinson had to invent unification and derive all of the

properties of most general unifies.

Dealing with Equality:-

Ʉx x=x

Ʉx,y x=y=>y=x

Ʉx,y,z x=y^y=z=>x=z

Ʉx,y x=y=>(p1(x)p1(y))




Ʉx,y x=y=>(p2(x)p2(y))

Ʉwx,y w=y^x=z=>(f1(w,x)f1(y,z))

Ʉwx,y w=y^x=z=>(f2(w,x)f2(y,z))

Demodulation:-

For any terms x,y and z, where

UNIFY(x,z)=Q And mn[z] is a literal

containing ‘Z’

X=y,

m,v….vmn

[z]

-------------------------------------------

---

M,v…….vmn[SUBST(Q,

Y)]

Paramodulation:-

For any terms x,y and z, where

UNIFY(x,y)=Q,

L,v……vlm` vx=y, m,v……….vmn[z]

-------------------------------------------------------------------

Subst(q,L,V….VLK VM,V….VMN[Y])

Equational

Unification:-

Equation unification of this kind can be done with efficient algorithm designed for

the particular axioms need.

Resolution

Strategies:-

Unit

preference

Set of

support

Input

resolution

Substations




THEOREM PROVERS:-

We describe the theorem prover OTTER (Organized Techniques For thermo –

proving and Effective Research), the we must divided the knowledge into form

parts,

A set of clauses known as the set of support

A set of unable axioms

A set of equation known as rewrite or demodulations.

A set of parameters and clauses that define the control strategy.

SKETCH OF THE OTTER THEOREM POWER:-

Procedure OTTER (sos, unable)

Input: sos, a set of support –clauses defining the problem

Usable, background knowledge potentially relevant to the problem

repeat Clauses the lightest member of sos move clauses from sos to

unable PROCESS (INFER (clauses, unable),sos)

Until sos=[]or a salutation has been found

Function INFER (clauses, unable)reruns clauses resolve clauses with each membrane

of unable return the resulting clauses after applying FILTER

PROCEDURE PROCESS (clauses,

sos) For each clauses in clauses do

ClasuesSIMPLIFY (clauses)

Merge identical lituals

Discards clauses if it is a tautology

Sos [classes has no literals their a refutation has been found

If classes have one litieral then look for unit refutation

Extending Prolog:-

An alternatives way to builds a thermo power is to start with a prolog

complier and extend it to get a sound & complete

Theorem Provers As Assistants:-

proof –checker




socratic reasoned

(1) What is backward chaining ? Explain with an example. Forward chaining applies a set of rules and facts to deduce whatever conclusions can be derived.

In backward chaining ,we start from a conclusion,which is the hypothesis we wish to prove,and we

aim to show how that conclusion can be reached from the rules and facts in the data base.

The conclusion we are aiming to prove is called a goal ,and the reasoning in this way is known as

goal-driven.

Backward chaining example

Fig : Proof tree constructed by backward chaining to prove that West is criminal.

Note:

(a) To prove Criminal(West) ,we have to prove four conjuncts below it. (b) Some of which are in knowledge base,and others require further backward chaining.

(2) Explain conjunctive normal form for first-order logic with an example.




very sentence of first-order logic can be converted into an inferentially equivalent CNF

sentence. In particular, the CNF sentence will be unsatisfiable just when the original sentence

is unsatisfiable, so we have a basis for doing proofs by contradiction on the CNF sentences.

Here we have to eliminate existential quantifiers. We will illustrate the procedure by translating the

sentence "Everyone who loves all animals is loved by someone," or

Ontology refers to organizing every thing in the world into hierarch of categories.

Representing the abastract concepts such as Actions,Time,Physical Objects,and Beliefs is called

Ontological Engineering.




How categories are useful in Knowledge representation?

CATEGORIES AND OBJECTS

The organization of objects into categories is a vital part of knowledge representation. Although

interaction with the world takes place at the level of individual objects, much reasoning

takes place at the level of categories.

What is taxonomy?

Subclass relations organize categories into a taxonomy, or taxonomic hierarchy. Taxonomies

have been used explicitly for centuries in technical fields. For example, systematic

biology aims to provide a taxonomy of all living and extinct species; library science has

developed a taxonomy of all fields of knowledge, encoded as the Dewey Decimal system;

and

tax authorities and other government departments have developed extensive taxoriornies of

occupations and commercial products. Taxonomies are also an important aspect of general

commonsense knowledge.

First-order logic makes it easy to state facts about categories, either by relating objects

to categories or by quantifying over their members:




What is physical composition?

Explain the Ontology of Situation calculus.

Situations are logical terms consisting of the initial situation (usually called So) and

all situations that are generated by applying an action to a situation. The function

Result(a, s) (sometimes called Do) names the situation that results when action a is

executed in situation s. Figure 10.2 illustrates this idea.

Fluents are functions and predicates that vary from one situation to the next, such as

the location of the agent or the aliveness of the wumpus. The dictionary says a fluent

is something that fllows, like a liquid. In this use, it means flowing or changing across

situations. By convention, the situation is always the last argument of a fluent. For




example, lHoldzng(G1, So) says that the agent is not holding the gold GI in the initial

situation So. Age( Wumpus, So) refers to the wumpus's age in So.

Atemporal or eternal predicates and functions are also allowed. Examples include the

predicate Gold (GI) and the function LeftLeg Of ( Wumpus).

(3) What is event calculus?

Time and event calculus

Situation calculus works well when there is a single agent performing instantaneous, discrete

actions. When actions have duration and can overlap with each other, situation calculus

becomes somewhat awkward. Therefore, we will cover those topics with an alternative for-

EVENTCALCULUS malism known as event calculus, which is based on points in time

rather than on situations.

(The terms "event7' and "action" may be used interchangeably. Informally, "event" connotes

a wider class of actions, including ones with no explicit agent. These are easier to handle in




event calculus than in situation calculus.)

In event calculus, fluents hold at points in time rather than at situations, and the calculus

is designed to allow reasoning over intervals of time. The event calculus axiom says that a

fluent is true at a point in time if the fluent was initiated by an event at some time in the past

and was not terminated by an intervening event. The Initiates and Terminates relations

play a role similar to the Result relation in situation calculus; Initiates(e, f , t) means that

the occurrence of event e at time t causes fluent f to become true, while Terminates (w , f, t)

means that f ceases to be true. We use Happens(e, t) to mean that event e happens at time t,

(4) What are semantic networks?

(5) Semantic networks are capable of representing individual objects,categories of

objects,and relation among objects. Objects or Ctegory names are represented in ovals

and are connected by labeled arcs.

Semantic network example



QUESTION BANK



UNIT – IV

PART -A (2 Marks)

UNCERTAIN KNOWLEDGE AND REASONING

1. List down two applications of temporal probabilistic models.

A suitable way to deal with this problem is to identify a temporal causal model that may

effectively explain the patterns observed in the data. Here we will concentrate on

probabilistic models that provide a convenient framework to represent and manage

underspecified information; in particular, we will consider the class of Causal Probabilistic

Networks (CPN).

2. Define Dempster-Shafer theory.

The Dempster–Shafer theory (DST) is a mathematical theory of evidence. It allows one to

combine evidence from different sources and arrive at a degree of belief (represented by a

belief function) that takes into account all the available evidence. The theory was first

developed by Arthur P. Dempster and Glenn Shafer

3. Define Uncertainty.

Uncertainty means that many of the simplifications that are possible with deductive

inference are no longer valid.

4. State the reason why first order, logic fails to cope with that the mind like medical

diagnosis.

Three reasons:

Laziness: It is hard to lift complete set of antecedents of consequence,

needed to ensure and exception less rule.

Theoretical Ignorance: Medical science has no complete theory for the

domain.

Practical ignorance: Even if we know all the rules, we may be uncertain

about a particular item needed.

5. What is the need for probability theory in uncertainty?

Probability provides the way of summarizing the uncertainty that comes from our

laziness and ignorance. Probability statements do not have quite the same kind of semantics

known as evidences.

6. What is the need for utility theory in uncertainty?

Utility theory says that every state has a degree of usefulness, or utility to In agent,



http://en.wikipedia.org/wiki/Evidence

http://en.wikipedia.org/wiki/Arthur_P._Dempster

http://en.wikipedia.org/wiki/Glenn_Shafer

and that the agent will prefer states with higher utility. The use utility theory to represent

and

reason with preferences.

7. What Is Called As Decision Theory?

Preferences As Expressed by Utilities Are Combined with Probabilities in the

General Theory of Rational Decisions Called Decision Theory. Decision Theory =

Probability

Theory + Utility Theory.

8. Define conditional probability?

Once the agents has obtained some evidence concerning the previously unknown

propositions making up the domain conditional or posterior probabilities with the notation

p(A/B) is used. This is important that p(A/B) can only be used when all be is known.

9. When probability distribution is used?

If we want to have probabilities of all the possible values of a random variable

probability distribution is used.

Eg:

P(weather) = (0.7,0.2,0.08,0.02). This type of notations simplifies many equations.

10. What is an atomic event?

An atomic event is an assignment of particular values to all variables, in other

words, the complete specifications of the state of domain.

11. Define joint probability distribution.

Joint probability distribution completely specifies an agent's probability

assignments to all propositions in the domain. The joint probability distribution p(x1,x2,----

----

xn) assigns probabilities to all possible atomic events; where x1,x2------xn=variables.

12. What is meant by belief network?

A belief network is a graph in which the following holds

A set of random variables

A set of directive links or arrows connects pairs of nodes.

The conditional probability table for each node

The graph has no directed cycles.

13. What are called as Poly trees?

The algorithm that works only on singly connected networks known as

Poly trees. Here at most one undirected path between any two nodes is present.

14. What is a multiple connected graph?

A multiple connected graph is one in which two nodes are connected by more than

one path.

15. List the three basic classes of algorithms for evaluating multiply connected graphs.

The three basic classes of algorithms for evaluating multiply connected graphs



Clustering methods;

Conditioning methods;

Stochastic simulation methods.

16. What is called as principle of Maximum Expected Utility (MEU)?

The basic idea is that an agent is rational if and only if it chooses the action that

yields the highest expected utility, averaged over all the possible outcomes of the action.

This is

known as MEU

17. What is meant by deterministic nodes?

A deterministic node has its value specified exactly by the values of its parents, with

no uncertainty.

18. What are all the uses of a belief network?

The uses of a belief network are,

Making decisions based on probabilities in the network and on the

agent's utilities;

Deciding which additional evidence variables should be observed in

order to gain useful information;

Performing sensitivity analysis to understand which aspects of the

model have the greatest impact on the probabilities of the query

variables (and therefore must be accurate);

Explaining the results of probabilistic inference to the user.

19. What is called as Markov Decision problem?

The problem of calculating an optimal policy in an accessible, stochastic

environment with a known transition model is called a Markov Decision Problem(MDP).

20. Define Dynamic Belief Network.

A Belief network with one node for each state and sensor variable for each time

step is called a Dynamic Belief Network.(DBN).

21. Define Dynamic Decision Network?

A decision network is obtained by adding utility nodes, decision nodes for action in

DBN. DDN calculates the expected utility of each decision sequence.

PART - B

1. Explain about Probabilistic Reasoning.



• The students should understand the role of uncertainty in knowledge representation

• Students should learn the use of probability theory to represent uncertainty

• Students should understand the basic of probability theory, including

o Probability distributions o Joint probability o Marginal probability o Conditional probability o Independence o Conditional independence

• Should learn inference mechanisms in probability theory including

o Bayes rule o Product rule

• Should be able to convert natural language statements into probabilistic statements

and apply inference rules

• Students should understand Bayesian networks as a data structure to represent

conditional independence

• Should understand the syntax and semantics of Bayes net

• Should understand inferencing mechanisms in Bayes net

• Should understand efficient inferencing techniques like variable ordering

• Should understand the concept of d-separation

• Should understand inference mechanism for the special case of polytrees

• Students should have idea about approximate inference techniques in Bayesian

networks

At the end of this lesson the student should be able to do the following:

• Represent a problem in terms of probabilistic statemenst

• Apply Bayes rule and product rule for inferencing

• Represent a problem using Bayes net

• Perform probabilistic inferencing using Bayes net.

Probabilistic Reasoning

Using logic to represent and reason we can represent knowledge about the world with

facts and rules, like the following ones:

bird(tweety).

fly(X) :- bird(X).

We can also use a theorem-prover to reason about the world and deduct new facts about

the world, for e.g.,

?- fly(tweety).

Yes

However, this often does not work outside of toy domains - non-tautologous certain

rules are hard to find.

A way to handle knowledge representation in real problems is to extend logic by using

certainty factors.

In other words, replace



IF condition THEN fact

with

IF condition with certainty x THEN fact with certainty f(x)

Unfortunately cannot really adapt logical inference to probabilistic inference, since the

latter is not context-free.

Replacing rules with conditional probabilities makes inferencing simpler.

Replace

smoking -> lung cancer

or

lotsofconditions, smoking -> lung cancer

with

P(lung cancer | smoking) = 0.6

Uncertainty is represented explicitly and quantitatively within probability theory, a

formalism that has been developed over centuries.

A probabilistic model describes the world in terms of a set S of possible states - the

sample space. We don’t know the true state of the world, so we (somehow) come up with

a probability distribution over S which gives the probability of any state being the true

one. The world usually described by a set of variables or attributes.

Consider the probabilistic model of a fictitious medical expert system. The ‘world’ is

described by 8 binary valued variables:

Visit to Asia? A

Tuberculosis? T

Either tub. or lung cancer? E

Lung cancer? L

Smoking? S

Bronchitis? B

Dyspnoea? D

Positive X-ray? X

We have 28

= 256 possible states or configurations and so 256 probabilities to find.

2.Explain about Review of Probability Theory.

The primitives in probabilistic reasoning are random variables. Just like primitives in

Propositional Logic are propositions. A random variable is not in fact a variable, but a



function from a sample space S to another space, often the real numbers.

For example, let the random variable Sum (representing outcome of two die throws) be

defined thus:

Sum(die1, die2) = die1 +die2

Each random variable has an associated probability distribution determined by the

underlying distribution on the sample space

Continuing our example : P(Sum = 2) = 1/36,

P(Sum = 3) = 2/36, . . . , P(Sum = 12) = 1/36

Consdier the probabilistic model of the fictitious medical expert system mentioned

before. The sample space is described by 8 binary valued variables.

Visit to Asia? A

Tuberculosis? T

Either tub. or lung cancer? E

Lung cancer? L

Smoking? S

Bronchitis? B

Dyspnoea? D

Positive X-ray? X

There are 28

= 256 events in the sample space. Each event is determined by a joint

instantiation of all of the variables.

S = {(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f),

(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t), . . .

(A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t)}

Since S is defined in terms of joint instantations, any distribution defined on it is called a

joint distribution. ll underlying distributions will be joint distributions in this module. The

variables {A,T,E, L,S,B,D,X} are in fact random variables, which ‘project’ values.

L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f) = f

L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t) = f

L(A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t) = t

Each of the random variables {A,T,E,L,S,B,D,X} has its own distribution, determined by

the underlying joint distribution. This is known as the margin distribution. For example,

the distribution for L is denoted P(L), and this distribution is defined by the two

probabilities P(L = f) and P(L = t). For example,

P(L = f)

= P(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f)

+ P(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t)

+ P(A = f, T = f,E = f,L = f, S = f,B = f,D = t,X = f)

. . .



P(A = t, T = t,E = t,L = f, S = t,B = t,D = t,X = t)

P(L) is an example of a marginal distribution.

Here’s a joint distribution over two binary value variables A and B

We get the marginal distribution over B by simply adding up the different possible values

of A for any value of B (and put the result in the “margin”).

In general, given a joint distribution over a set of variables, we can get the marginal

distribution over a subset by simply summing out those variables not in the subset.

In the medical expert system case, we can get the marginal distribution over, say, A,D by

simply summing out the other variables:

However, computing marginals is not an easy task always. For example,

P(A = t,D = f)

= P(A = t, T = f,E = f,L = f, S = f,B = f,D = f,X = f)

+ P(A = t, T = f,E = f,L = f, S = f,B = f,D = f,X = t)

+ P(A = t, T = f,E = f,L = f, S = f,B = t,D = f,X = f)

+ P(A = t, T = f,E = f,L = f, S = f,B = t,D = f,X = t)

. . .

P(A = t, T = t,E = t,L = t, S = t,B = t,D = f,X = t)

This has 64 summands! Each of whose value needs to be estimated from empirical data.

For the estimates to be of good quality, each of the instances that appear in the summands

should appear sufficiently large number of times in the empirical data. Often such a large

amount of data is not available.

However, computation can be simplified for certain special but common conditions. This

is the condition of independence of variables.



Two random variables A and B are independent iff

P(A,B) = P(A)P(B)

i.e. can get the joint from the marginals

This is quite a strong statement: It means for any value x of A and any value y of B

P(A = x,B = y) = P(A = x)P(B = y)

Note that the independence of two random variables is a property of a the underlying

probability distribution. We can have

Conditional probability is defined as:

It means for any value x of A and any value y of B

If A and B are independent then

Conditional probabilities can represent causal relationships in both directions.

From cause to (probable) effects

From effect to (probable) cause

3.Explain about Probabilistic Inference Rules.



Two rules in probability theory are important for inferencing, namely, the product rule

and the Bayes' rule.

Here is a simple example, of application of Bayes' rule.

Suppose you have been tested positive for a disease; what is the probability that you

actually have the disease?

It depends on the accuracy and sensitivity of the test, and on the background (prior)

probability of the disease.

Let P(Test=+ve | Disease=true) = 0.95, so the false negative rate,

P(Test=-ve | Disease=true), is 5%.

Let P(Test=+ve | Disease=false) = 0.05, so the false positive rate is also 5%.

Suppose the disease is rare: P(Disease=true) = 0.01 (1%).

Let D denote Disease and "T=+ve" denote the positive Tes.

Then,

P(T=+ve|D=true) * P(D=true)

P(D=true|T=+ve) = ------------------------------------------------------------

P(T=+ve|D=true) * P(D=true)+ P(T=+ve|D=false) * P(D=false)



0.95 * 0.01

= -------------------------------- = 0.161

0.95*0.01 + 0.05*0.99

So the probability of having the disease given that you tested positive is just 16%. This

seems too low, but here is an intuitive argument to support it. Of 100 people, we expect

only 1 to have the disease, but we expect about 5% of those (5 people) to test positive. So

of the 6 people who test positive, we only expect 1 of them to actually have the disease;

and indeed 1/6 is approximately 0.16.

In other words, the reason the number is so small is that you believed that this is a rare

disease; the test has made it 16 times more likely you have the disease, but it is still

unlikely in absolute terms. If you want to be "objective", you can set the prior to uniform

(i.e. effectively ignore the prior), and then get

P(T=+ve|D=true) * P(D=true)

P(D=true|T=+ve) = ------------------------------------------------------------

P(T=+ve)

0.95 * 0.5 0.475

= -------------------------- = ------- = 0.95

0.95*0.5 + 0.05*0.5 0.5

This, of course, is just the true positive rate of the test. However, this conclusion relies on

your belief that, if you did not conduct the test, half the people in the world have the

disease, which does not seem reasonable.

A better approach is to use a plausible prior (eg P(D=true)=0.01), but then conduct

multiple independent tests; if they all show up positive, then the posterior will increase.

For example, if we conduct two (conditionally independent) tests T1, T2 with the same

reliability, and they are both positive, we get

P(T1=+ve|D=true) * P(T2=+ve|D=true) * P(D=true)

P(D=true|T1=+ve,T2=+ve) = ------------------------------------------------------------

P(T1=+ve, T2=+ve)

0.95 * 0.95 * 0.01 0.009

= ----------------------------- = ------- = 0.7826

0.95*0.95*0.01 + 0.05*0.05*0.99 0.0115

The assumption that the pieces of evidence are conditionally independent is called the

naive Bayes assumption. This model has been successfully used for mainly application

including classifying email as spam (D=true) or not (D=false) given the presence of

various key words (Ti=+ve if word i is in the text, else Ti=-ve). It is clear that the words

are not independent, even conditioned on spam/not-spam, but the model works

surprisingly well nonetheless.



In many problems, complete independence of variables do not exist. Though many of

them are conditionally independent.

X and Y are conditionally independent given Z iff

In full: X and Y are conditionally independent given Z iff for any instantiation x, y, z of

X, Y,Z we have

An example of conditional independence:

Consider the following three Boolean random variables:

LeaveBy8, GetTrain, OnTime

Suppose we can assume that:

P(OnTime | GetTrain, LeaveBy8) = P(OnTime | GetTrain)

but NOT P(OnTime | LeaveBy8) = P(OnTime)

Then, OnTime is dependent on LeaveBy8, but independent of LeaveBy8 given GetTrain.

We can represent P(OnTime | GetTrain, LeaveBy8) = P(OnTime | GetTrain)

graphically by: LeaveBy8 -> GetTrain -> OnTime

Inferencing in Bayesian Networks

10.5.5.1 Exact Inference

The basic inference problem in BNs is described as follows:

Given

1. A Bayesian network BN

2. Evidence e - an instantiation of some of the variables in BN (e can be empty)

3. A query variable Q

Compute P(Q|e) - the (marginal) conditional distribution over Q

Given what we do know, compute distribution over what we do not. Four categories of



inferencing tasks are usually encountered.

1. Diagnostic Inferences (from effects to causes)

Given that John calls, what is the probability of burglary? i.e. Find P(B|J)

2. Causal Inferences (from causes to effects)

Given Burglary, what is the probability that

John calls, i.e. P(J|B)

Mary calls, i.e. P(M|B)

3. Intercausal Inferences (between causes of a common event)

Given alarm, what is the probability of burglary? i.e. P(B|A)

Now given Earthquake, what is the probability of burglary? i.e. P(B|A E)

4. Mixed Inferences (some causes and some effects known)

Given John calls and no Earth quake, what is the probability of Alarm, i.e.

P(A|J,~E)

We will demonstrate below the inferencing procedure for BNs. As an example consider

the following linear BN without any apriori evidence.

Consider computing all the marginals (with no evidence). P(A) is given, and

We don't need any conditional independence assumption for this.

For example, suppose A, B are binary then we have

Now,

P(B) (the marginal distribution over B) was not given originally. . . but we just computed

it in the last step, so we’re OK (assuming we remembered to store P(B) somewhere).

If C were not independent of A given B, we would have a CPT for P(C|A,B) not

P(C|B).Note that we had to wait for P(B) before P(C) was calculable.

If each node has k values, and the chain has n nodes this algorithm has complexity

O(nk2). Summing over the joint has complexity O(k

n).



Complexity can be reduced by more efficient summation by “pushing sums into

products”.

Dynamic programming may also be used for the problem of exact inferencing in the

above Bayes Net. The steps are as follows:

1. We first compute

2. f1(B) is a function representable by a table of numbers, one for each possible value of

B.

3. Here,

4. We then use f1(B) to calculate f2(C) by summation over B

This method of solving a problem (ie finding P(D)) by solving subproblems and storing

the results is characteristic of dynamic programming.

The above methodology may be generalized. We eliminated variables starting from the

root, but we dont have to. We might have also done the following computation.



The following points are to be noted about the above algorithm. The algorithm computes

intermediate results which are not individual probabilities, but entire tables such as

f1(C,E). It so happens that f1(C,E) = P(E|C) but we will see examples where the

intermediate tables do not represent probability distributions.

Dealing with Evidence

Dealing with evidence is easy. Suppose {A,B,C,D,E} are all binary and we want

P(C|A = t,E = t). Computing P(C,A = t,E = t) is enough—it’s a table of numbers, one for

each value of C. We need to just renormalise it so that they add up to 1.

It was noticed from the above computation that conditional distributions are basically just

normalised marginal distributions. Hence, the algorithms we study are only concerned

with computing marginals. Getting the actual conditional probability values is a trivial

“tidying-up” last step.

Now let us concentrate on computing

It can be done by plugging in the observed values for A and E and summing out B and D.



We don’t really care about P(A = t), since it will cancel out.

Now let us see how evidence-induce independence can be exploited. Consider the

following computation.

Since,

Clever variable elimination would jump straight to (5). Choosing an optimal order of

variable elimination leads to a large amount of computational sving. However, finding

the optimal order is a hard problem.

10.5.5.1.1 Variable Elimination

For a Bayes net, we can sometimes use the factored representation of the joint probability

distribution to do marginalization efficiently. The key idea is to "push sums in" as far as

possible when summing (marginalizing) out irrelevant terms, e.g., for the water sprinkler

network



Notice that, as we perform the innermost sums, we create new terms, which need to be

summed over in turn e.g.,

where,

Continuing this way,

where,

In a nutshell, the variable elimination procedure repeats the following steps.

1. Pick a variable Xi

2. Multiply all expressions involving that variable, resulting in an expression f over a

number of variables (including Xi)

3. Sum out Xi, i.e. compute and store

For the multiplication, we must compute a number for each joint instantiation of all

variables in f, so complexity is exponential in the largest number of variables

participating in one of these multiplicative subexpressions.

If we wish to compute several marginals at the same time, we can use Dynamic

Programming to avoid the redundant computation that would be involved if we used

variable elimination repeatedly.

Exact inferencing in a general Bayes net is a hard problem. However, for networks with

some special topologies efficient solutions inferencing techniques. We discuss one such

technque for a class of networks called Poly-trees.

Inferencing in Poly-Trees

A poly-tree is a graph where there is at most one undirected path between any two pair of

nodes. The inferencing problem in poly-trees may be stated as follows.

U: U1 … Um, parents of node X

Y: Y1 … Yn, children of node X



E+

E-

X X

X: Query variable

E: Evidence variables (whose truth values are known)

Objective: compute P(X | E)

X is the set of causal support for X comprising of the variables above X connected

through its parents, which are known.

X is the set of evidential support for X comprising of variables below X connected

through its children.

In order to compute P(X | E) we have

+ -

P(X|E) = P(X|EX ,EX )

- + +

P(EX |X,EX )P(X|EX ) = -------------------------------

- +

P(EX |EX )

Since X d-separates E +

from E - we can simplify the numerator as

- +

P(X|E) = α P(EX |X)P(X|EX )

where 1/α is the constant representing denominator.

conditional independence relations. If the parents are known, X is conditionally independent from all other nodes in the Causal support set. Similarly, given the children,

X is independent from all other variables in the evidential support set.

Approximate Inferencing in Bayesian Networks

Many real models of interest, have large number of nodes, which makes exact inference

very slow. Exact inference is NP-hard in the worst case.) We must therefore resort to

approximation techniques. Unfortunately, approximate inference is #P-hard, but we can

nonetheless come up with approximations which often work well in practice. Below is a

list of the major techniques.

Variational methods. The simplest example is the mean-field approximation, which

exploits the law of large numbers to approximate large sums of random variables by their

means. In particular, we essentially decouple all the nodes, and introduce a new

parameter, called a variational parameter, for each node, and iteratively update these

parameters so as to minimize the cross-entropy (KL distance) between the approximate

and true probability distributions. Updating the variational parameters becomes a proxy

for inference. The mean-field approximation produces a lower bound on the likelihood.

More sophisticated methods are possible, which give tighter lower (and upper) bounds.




Chain (MCMC), and includes as special cases Gibbs sampling and the Metropolis- Hasting

algorithm.

Bounded cutset conditioning. By instantiating subsets of the variables, we can break loops

in the graph. Unfortunately, when the cutset is large, this is very slow. By instantiating only a

subset of values of the cutset, we can compute lower bounds on the probabilities of interest.

Alternatively, we can sample the cutsets jointly, a technique known as block Gibbs sampling.

Parametric approximation methods. These express the intermediate summands in a simpler

form, e.g., by approximating them as a product of smaller factors. "Minibuckets" and the

Boyen-Koller algorithm fall into this category




QUESTION BANK



UNIT – V

PART -A (2 Marks)

LEARNING

1. Explain the concept of learning from example.

Each person will interpret a piece of information according to their level of understanding and their

own way of interpreting things.

2. What is meant by learning?

Learning is a goal-directed process of a system that improves the knowledge or the

Knowledge representation of the system by exploring experience and prior knowledge.

3. How statistical learning method differs from reinforcement learning method?

Reinforcement learning is learning what to do--how to map situations to actions--so as to maximize

a numerical reward signal. The learner is not told which actions to take, as in most forms of machine

learning, but instead must discover which actions yield the most reward by trying them. In the most

interesting and challenging cases, actions may affect not only the immediate reward but also the

next situation and, through that, all subsequent rewards. These two characteristics--trial-and-error

search and delayed reward--are the two most important distinguishing features of reinforcement

learning.

4. Define informational equivalence and computational equivalence.

A transformation from on representation to another causes no loss of information;

they can be constructed from each other.

The same information and the same inferences are achieved with the same amount

of effort.

5. Define knowledge acquisition and skill refinement.

knowledge acquisition (example: learning physics)—learning new

symbolic information coupled with the ability to apply that information in

an effective manner

skill refinement (example: riding a bicycle, playing the piano)—occurs at

a subconscious level by virtue of repeated practice

6. What is Explanation-Based Learning?

The background knowledge is sufficient to explain the hypothesis of Explanation-

Based Learning. The agent does not learn anything factually new from the instance. It extracts




general rules from single examples by explaining the examples and generalizing the explanation.

7. Define Knowledge-Based Inductive Learning.

Knowledge-Based Inductive Learning finds inductive hypotheses that explain set

of observations with the help of background knowledge.

8. What is truth preserving?

An inference algorithm that derives only entailed sentences is called sound or truth

preserving.

9. Define Inductive learning. How the performance of inductive learning algorithms can be

measured?

Learning a function from examples of its inputs and outputs is called inductive

learning.

It is measured by their learning curve, which shows the prediction accuracy as a

function of the number of observed examples.

10. List the advantages of Decision Trees

The advantages of Decision Trees are,

It is one of the simplest and successful forms of learning algorithm.

It serves as a good introduction to the area of inductive learning and is

easy to implement.

11. What is the function of Decision Trees?

A decision tree takes as input an object or situation by a set of properties, and

outputs a yes/no decision. Decision tree represents Boolean functions.

12. List some of the practical uses of decision tree learning.

Some of the practical uses of decision tree learning are,

Designing oil platform equipment

Learning to fly

13.What is the task of reinforcement learning?

The task of reinforcement learning is to use rewards to learn a successful agent

function.

14. Define Passive learner and Active learner.

A passive learner watches the world going by, and tries to learn the utility of being

in various states.

An active learner acts using the learned information, and can use its problem

generator to suggest explorations of unknown portions of the environment.




15. State the factors that play a role in the design of a learning system.

The factors that play a role in the design of a learning system are,

Learning element

Performance element

Critic

Problem generator

16. What is memorization?

Memorization is used to speed up programs by saving the results of computation.

The basic idea is to accumulate a database of input/output pairs; when the function is called, it

first checks the database to see if it can avoid solving the problem from scratch.

17. Define Q-Learning.

The agent learns an action-value function giving the expected utility of taking a

given action in a given state. This is called Q-Learning.

18. Define supervised learning & unsupervised learning.

Any situation in which both inputs and outputs of a component can be perceived is

called supervised learning.

Learning when there is no hint at all about the correct outputs is called

unsupervised learning.

19. Define Bayesian learning.

Bayesian learning simply calculates the probability of each hypothesis, given the

data, and makes predictions on that basis. That is, the predictions are made by using all the

hypotheses, weighted by their probabilities, rather than by using just a single “best” hypothesis.

20. What is utility-based agent?

A utility-based agent learns a utility function on states and uses it to select actions

that maximize the expected outcome utility.

21. What is reinforcement learning?

Reinforcement learning refers to a class of problems in machine learning which

postulate an agent exploring an environment in which the agent perceives its current state and

takes actions. The environment, in return, provides a reward (which can be positive or negative).

Reinforcement learning algorithms attempt to find a policy for maximizing cumulative reward

for the agent over the curse of the problem.

22. What is the important task of reinforcement learning?

The important task of reinforcement learning is to use rewards to learn a successful

agent function.




PART - B

1. Explain about Learning From Observations.

FORMS OF LEARNING:

A learning agent can be thought of as containing a ‘Performance element’ that decides

what actions to take and a ‘learning element’ that modifies the performance element so that it

makes better decisions.

The design of a learning element is affected by three major issues:-

Which ‘components’ of the performance element are to be learned?

What ‘feedback’ is available to learn these components?

What ‘reputation’ is used for the components?

The components of the agents include the following:-

A direct mapping from conditions on the current state to actions.

A means to infer relevant properties of the world from the percept sequence.

Information about the way the world evolves and about the results of possible actions the

agent can take.

Utility information indicating the desirability of world status.

Action-value information indicating the desirability of actions.

Goals that describe classes of states where achievement maximizing the agents utility.

Each of the components can be learned from appropriate feedback. Consider, for

example, an agent training to become a taxi driver. Every time the instructor shouts “Break!” the

agent can learn a condition-action rule for when to breaks (component 1).

By seeing many camera image that it is told contain human, it can learn to recognize

them.

By trying actions and observing the result-for example, breaking hard on a wet road-it

can learn the effects of the action.

Then, when it receives no tip from passengers who have been thoroughly shacks up

during the trip, it can learn a useful component of it’s overall utility function.

The type of feedback available for learning is usually the most important factor is

determining the nature of the learning problem that the agent faces.




The field of machine learning usually distinguishes three cases supervised,

unsupervised and reinforcement learning.

The problem of supervised learning involves learning a function from examples of it’s

inputs and outputs.

Cases (1), (2) and (3) are all instances of supervised learning problems.

In (1), the agent learns condition-action rule for breaking- this is a function from states to

a Boolean output (to breaks or not to breaks).

In (2), the agent learns a function from images to a Boolean output(whether the images

contain a bus).

In (3), the theory of breaking is a function from status and breaking action to say,

stopping distance is feet.

The problem of ‘unsupervised learning’ involves learning pattern is the input when no

specific output values are supplied.

For example, a taxi agent might gradually develop a concept of “good traffic days” and

“bad traffic days” without ever being given labeled examples of each.

A purely unsupervised learning agent cannot learn what to do, because it has no

information on to what constitutes a current action on a desirable state.

The problem of ‘reinforcement learning’ is the most general of three categories. Rather

than being told what to do by a teacher, a reinforcement learning agent must learn from

reinforcement. (Reinforcement learning typically includes the sub problem of learning

how the environment works)

The representation of the learned information also plays a way important role is

determining how the learning algorithm must work

The last major factor is the design of learning system is the availability of prior

knowledge. The majority of learning reach is AI, computer science, and psychology has

studied the case is which the agent begins with no knowledge at all about what it is trying

to learn.

Inductive learning:

An algorithm for deterministic supervised learning is given as input) the current value of

the unknown function for particular inputs and (must try to become the unknown function

or some-thing close to it

We say that an example is a pair (x,f(x)), where x is the input and f(x) is the output of the




function applied to x.

The task of pure inductive inference (or induction) is this,

Given a collection of examples of ‘f’, return a function ‘h’ that approximates ‘f’.

The function ‘h’ is called a hypothesis. The reason that learning is difficult, from a

conceptual point of view, is that it is not easy to tell whether any particular ‘h’ is a good

approximation of ‘f’ (A good hypothesis will generalize well-that is, will predict example

correctly. This is the fundamental problem induction)

LEARNING FROM DECISION TREES:

Decision tree induction is one of the simplest and yet most successful forms of

learning algorithm.

Decision tree as performance elements:-

A decision tree takes as input or object or situation described by a set of attributes and

returns a ‘decision,-the predicted output value for the input.

The input attributes can be discrete or continuous. For now, a we assume discrete inputs.

The output values can also be discrete or continuous; learning a discrete-valued function

is called regression.

We will concentrate a Boolean classification, where in each example is classified as true

(positive) or false (negative).

A decision tree reaches its decision by performing a sequence of tests. Each internal node

is the tree compounds to a test of the values of one of the properties, and the branches

from the node are labeled with the possible values of the cost.

Each leaf node in the tree specifies the values to be returned if that leaf is reached.




No

Example:-

The problem of whether to wait for a table at a restaurant. The aim here is to learn a

definition for the goal predicate ‘will wait’.

Patrons

None Some Full

No Yes Wait Estimate

>60 30-60 10-30 0-10

No Alternate Hungry Yes

Reservation? Fri/sat Yes Alternate?

Bar?

No Yes No Yes No Yes

Yes No Yes No Raining?

No Yes No Yes

Decision tree

We will see how to automate this task, for now, let’s suppose are decide on the following list of

attributes:

1. Alternate: Whether there is a suitable alternative restaurant nearly.

2. Bar: Whether the restaurant has a comfortable bar area to wait is.

3. Fri/Sat: True on Fridays & Saturdays.




4. Hungry: Whether we are hungry.

5. Patrons: How many people are in the restaurant (Values are Name, Some &Full).

6. Price: The restaurants price range ($,$$,$$$).

7. Raining: Whether it is raining outside.

8. Reservation: Whether we made a reservation.

9. Type: The kind of restaurant (French, Italian, Thai or Burger).

10. WaitEstimate: The wait estimation by the host (0-10, 10-30, 30-60,>60).

The decision tree usually used by one of us (SR) for this domain is given the Figure 4.1.

The tree dose not we the ‘price’ & ‘Type’ attributes, is effect considering them to be

irrelevant. Examples are processed by the tree starting at the root & following the appropriate

branch until a leaf is reached. For instance, an example with patrons=full and Wait Estimate=0-

10 will be classified positive.

2. Explain about Ensemble Learning.

The idea of ensemble learning methods is to select a whole collection of hypothesis from

the hypothesis space and combine these predictions.

Methods are:-

1. Boosting

2. Bagging

Motivation:-

The motivation for ensemble learning is simple. Consider an ensemble (collection) of H=

5 hypothesis and suppose that their prediction are combined using simple majority voting.

For the ensemble to misclassify it. The hope is that is much likely then a misclassification

by a single hypothesis. Furthermore, suppose assume that the made by each hypothesis are

independent.

1. In that case, if ‘p’ is small then the probability of a misclassification occurring is very small.

Now, obviously if different, thereby reducing the correlation between their ever, then ensemble

learning can be very useful.

2. Ensemble idea is a generic way of enlarging the hypothesis space. That is, the ensemble itself

can be thought of as a hypothesis and the new hypothesis space as the set of all possible

ensemble construction from hypothesis is the original space.




If the original hypothesis space allows for a simple and efficient learning algorithm, then

the ensemble method provides a way to learn a much more expressive class of hypothesis

without much additional computational or algorithmic complexity.

Boosting:-

Boosting is the most widely used ensemble method. To understand how it work. It is

necessary to understand the concept of a weighted training set.

Weighted training set:-

In a weighted training set, each example has an associated weight wj≥0. The higher the

weight of an example, the higher is the importance attached to it during the learning of a

hypothesis.

Working:-

Boosting starts with wj=1 for all the examples.

From this set, it generates the first hypothesis h1. This hypothesis will classify some of

the training examples correctly and some incorrectly.

The hypothesis can be made better by increasing the weight of misclassified Examples

while decreasing the weight of the correctly classified examples.

From this new weighted training set, hypothesis h2 is generated.

The process continuous until ‘M’ hypothesis has been generated, where M is an input to

the boosting algorithm.

The final ensemble hypothesis is a weighted-majority combination of all the AI

hypothesis, each weight according to how well it performed on the training set.

ADABOOST algorithm:-

ADABOOST algorithm is one among the variants.

ADABOOST is one of the most commonly used boosting algorithms.

It has a important property is which if the input learning algorithm ‘L’ is a weak learning

algorithm, which means that L always returns a hypothesis with weighted ever on the

training set.




LOGICAL FORMULATION OF LEARNING

Expressions of Decision tree:-

Any Particular decision tree hypothesis for the ‘Will wait’ goal predicate can be seen as

an aeration of the form,

¥s will wait(s)(p1(s)˅p2(s)˅…˅pn(s)),

Where each condition pi(s) is a conjunction of tests corresponding to a path from the root

of the tree to a leaf with a parities octane.

The decision tree is really deciding a relationship between ‘Will Wait’ and some logical

combination of attribute values.

We cannot decision tree to represent tests that refer to two or more different objects-for

example,

Эr2Near by (r2,r)˄Price(r, p)˄Price(r2,p2)˄Cheaper(p2,p)

Decision tree are fully expressive within the class of proportional language, that is, any

Boolean function can be written as a decision tree.

This can be done trivially by having each row is the truth table for the function

corresponds to a path in the tree.

This would yield an exponentially large decision tree representation because the truth

table has exponentially many rows. Clearly, decision tree can represent many functions with

much smaller tree.

Clearly, decision tree can represent much function with much smaller tree. For some

kinds of functions, however this is a real problem. For example, if the function is the parity

function, which returns 1 if and only if an every number of inputs are 1, then an exponentially

large decision tree will be needed.

It is also difficult to represent a majority function, which returns ‘1’ if more than half of

its inputs are 1.

The truth table has 2n

rows, because input care is described by ‘n’ attributes. We can

consider the ‘answer’ column of the table as a 2n-bit number that defines the function.

If it takes 2n

bits to define the function, then there are 22ˆn

different functions on ‘n’

attributes. For an, with just six Boolean attributes, there are 22ˆ6

=18,446,744,673,709,551,616

different function to choose from.




Example

Attributes

Goal

Alt Bar Fri Hun Pat Price Rain Re

s

Type Est Willwait

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

Y

Y

N

Y

Y

N

N

N

N

Y

N

N

N

Y

N

N

Y

Y

N

Y

Y

N

N

N

N

Y

Y

N

N

N

Y

Y

N

Y

Y

N

Y

N

Y

N

Y

N

Y

N

Some

Full

Some

Full

Full

Some

None

Some

Full

Full

None

$$$

$

$

$

$$$

$$

$

$$

$

$$$

$

N

N

N

Y

N

Y

Y

Y

Y

N

N

Y

N

N

N

Y

Y

N

Y

N

Y

N

French

Thai

Buya

Thai

French

Italian

Buya

Thai

Buya

Italian

Thai

0-10

30-

60

0-10

10-

30

>60

0-10

0-10

0-10

>60

10-

30

Y

N

Y

Y

N

Y

N

Y

N

N

N

Inducing decision tree from example:-

An example for a Boolean decision tree consists of a vector of input attributes, X and a

single Boolean output value ‘y’. A set of examples (x1,y1)…,(x12,y12) is shown in the Figure




Y Y Y Y Full $ N N Buya

0-10

30-

60

Y

Figure 4.2 Inducing decision tree from example

The positive example are the once in which the goal ‘Will Wait’ is true (x1,x3,…); the

negative examples are the ones is which it is false (x2,x5,…). The complete set of example is

called training set.

Trivial tree:-

The problem of finding a decision tree that agree with tracing set might seem difficult,

but is fact there is a trivial solution that is constructing trivial tree.

Construct a decision tree that has one path to a leaf for each example, where the path tests

each attributes is turn and follows the classification of the example.

Problem with trivial tree:-

It just memorizes the observations.

It does not extract any pattern from the example, so it is not expected to be able to

extrapolate to example it has not seen.

The above Figure 4.2 shows how the algorithm gets started 12 training example are

given, which are classified into positive and negative sets. Then which attribute to be on the first

test is the tree is decided, because it leaves in with four possible outcomes, each of which has the

same number of +-ve & -ve examples.

On the other hand, the following Figure 4.2. Shown that patrons is a fairly important attribute,

There are four cases to consider for the recursive problem:-

1) If there is some positive & negative example, then choose the best attribute to split them.

Figure 4.2 (h) shows ‘Hungry’ being used to split the remaining examples.

2) If all the remaining example are positive, then it is done: can answer Yes or No, Figure(a)

shows examples of this is the Name & Some cases.

3) If there are no examples left, it means that no such example has been observed, and a

default values calculated from the majority classification at the nodes parent is returned.




4) If there are no attributes left, but both positive and negative example, then there examples

have exactly the same description, but different classification.

The decision-tree-learning algorithm:-

Function DECISION-TREE-LEARNING(example, at ribs, default)returns a decision tree

Input: example, attains, default

If example is empty then return default Else if all examples have the same classifications then

return the classification else if attributes is empty then return MAJORITY-VALUE (example)

Else

BestCHOOSSE-ATTRIBUTE (attibs, examples)

Tree a new decision tree with cost test best

MMAJORITY-VALUE(EXAMPLES)

For each values of Vi best do

Example {element of example with best=Vi}

Subtree DECISION-TREE-LEARNING(example, attribs-best, m)

Add a branch to tree with label u, and sub tree

Return tree

The final tree produced by the algorithm applied to the 12-example data set is shown in Figure

4.2.

Choosing attribute test:-

To choose attribute test, the idea is to pick the attribute that goes as for as possible toward

providing an exact classification of the examples. A perfect attribute divides the examples into

sets that are all positive or all negative.

Measure to find ‘finely good’ and ‘really useless’ attributes:-

In general, if possible ensure Vi have probabilities P(Vi), then the information content I of

the actual answer is given by ,

I(p(vi),…p(vn))=∑ −p(vi)log2p(vi)

To check this equ, for the tossing of a fair coin, the following can be used,

I[1/2,1/2]=-1/2log1/2-1/2log21/2=1bit




Gain(A)=I[p/p+n,n/p+n]-Remainder(A)

Gain(patterns)=1-[(2/12)I(0,1)+(4/12)I(0,1)+(6/12)I(2/6,4/6)]

~0.54bits

Gain(Type)=1-[(2/12)I(1/2,1/2)+(2/12)I(1/2,1/2)+(4/12)I(2/4,2/4)+(4/12)I(2/4,2/4)]

=0

Assume the performance of the learning algorithm:-

A learning algorithm is good if it produces hypothesis that do a good job of predicting the

classification of example prediction quality can be estimated in advance on it can be estimated

after the fact.

1) Collect a large set of example.

2) Divide it into two disjoint sets, the training set and the test set.

3) Apply the learning algorithm to training set, generating a hypothesis ‘h’.

4) Measure the percentage of example in the test set that is correctly classified by ‘h’.

5) Repeat step 1 to 4 for different size of training sets and different randomly selected

training sets of each size.

Noise and over fitting:-

The problem of finding meaningless “regularity” in the data, whenever there is a large set

of possible hypothesis is called over fitting.

Solution:-

Decision tree pruning

Cross validation

3. Explain about KNOWLEDGE-IN-LEARNING:-

Prior knowledge can help an agent to learn from new experiences.

Logical formulation of learning:-

Inductive learning, which is a process of finding a hypothesis that, agrees with the

observed example

Hypothesis is represented by a set of logical sentences

Logical sentences such as prior knowledge, example description and classification

Thus, logical inference aids learning

Examples and hypothesis:-




Restaurant learning problem i.e., learning a rule for deciding whether to wait for a table.

The example object is described by logical sentence, the attribute is way predicates.




Example alternate Bar Fri Hungry Patern Price Rain Rearch Est Goal

will

wait

X1 Y N N Y Some $$ No Yes 0-10 yes

Figure 4.3 Knowledge-in-learning

Di(Xi)=Alternate(X1)˄┐Bar(X1)˄┐Fri/Sat(X1)˄Hungry(X1)˄Patern(X1)˄

Price($$$)˄┐Rain(X1)˄Research(X1)˄………

Where Di logical expression taking single argument.

Classification of the object is done by WillWait(X1) on the generic notation is,

Q(X1) if the example is positive.

┐Q(X1) if the example is negative.

The complete training set is just the conjunction of all description and classification

sentences.

The hypothesis can propose an expression called candidate definition of the goal

predicate.

Ci=candidate definition.

Then hypothesis Hi is a sentence of the form,

¥x Q(x)Ci(x)

For a decision tree, the goal predicate of an object is leading to tree is satisfied.

The decision tree induced from the training set,

¥r willwait (r)Pations (r, Some)

˅Pations (r, full) ˄Hungry(r) ˄Type (r, French)

˅Pations (r, full) ˄ Hungry(r) ˄ Type(r, thai)

˄Fri/Sat(r) ˅ Pations (r, full)

˄Hungry(r) ˄ Type(r, Buyer)

If is a function of branches where output on leaf node is true.




Current best hypothesis search:-

To maintain single hypothesis.

It is adjusted to maintain the consistory.

This search algorithm is painted by John Stuart Mill(1843)

Generalization:-

Hypothesis says it should be negative but it is true. Hence this entering has to be included

in the hypothesis. This is called Generalization.

Specialization:-

Hypothesis says that the new example is positive, but it is actually negative. This

criterion must be decreased to exclude this example. This is called Specialization. Current-best

learning algorithm is used in many machine-learning alg.

4. Explain about Explanation – Based Learning

(EBL):- Definition:-

When an agent can utilize a worked example of a problem as a problem-solving method,

the agent is said to have the capability of Explanation-based learning (EBL).

Advantage:-

A deductive mechanism, it requires only a single training example.

EBL algorithm requires all of the following:-

Accepts 4 kinds of inputs:-

1. A training example:-

What are the learning seen in the world.

2. A goal concept

A high level description of what the program is to learn.

An operational criterion:-

A description of which concept are usable.

A domain theory:-

A set of rule that describe relationship b/w object and action is a domain.

Entailment constraint satisfied by EBL is,

Hypothesis ˄ description ≠ classifications




Background ≠ hypothesis

Extracting rules from example:-

If is a method for extracting general rules from individual observations.

Idea is to construct an explanation of the observation using prior knowledge.

Consider (X2,X)=2X

Suppose to simply 1X(0+X). the knowledge here include the following rules,

Rewrite (u,v)˄Simplify (v,w)=>Simplify(u,w)

Primitive(u)=>Simplify(u,u)

Arithmetic unknoun=>primitive(u)

Number(u)primitive(u)

Rewrite(1*u,u)

Rewrite(0+u,u)

EBL process working:-

1. Construct a proof that the goal predicted applies to the example using the available

background knowledge.

2. In parallel, construct a generalized proof tree for the variabilized goal using the same

inference steps as in the original proof.

3. Construct a new rule where left hand side consists of leaves of the proof tree and RHS is

the variabilized goal.

4. Drop any conditions that are tree regardless the values of the variable is the goal.

Improving efficiency:-

1. Reduce the large number of rules. It increases the branching factor in the search space.

2. Derived rules must offer significant increase in speed.

3. Derived rule is a general as possible, so that they apply to the largest possible set of

cases.




5. Explain about Learning Using Relevance

INFORMATIONS:- Relevance based learning using

prior knowledge.

Eg:- traveler is Brazil concluded that all Brazilian speaks Portuguese.

Entailment constraint is given by,

Hypothesis ˄Description ≠ Classification

Background ˄Description ˄ Classification ≠ Hypothesis

Express the above example is FOL as,

Nationality (x,n) ˄ Nationality (y,n) ˄ language (x,l) => Language (y,l)

If x & y have the same nationality & x speaks language ‘l’ & then y also speaks it.

Nationality (Fernando, Brazil) ˄ language (Fernando, Portuguese

Entails the following sentence as,

Nationality (x, Brazil) => Language (x, Portuguese)

The relevance for given nationality language is fully determined. There sentences are

called functional dependencies or determination.

Nationality (x, n) => language (x, l)

INDUCTIVE LOGIC PROGRAMMING:-

Knowledge based inductive learning (KBIL) find inductive hypothesis that explain set of

observation with the help of background knowledge.

ILP techniques perform KBIL on knowledge that expressed in first order logic.

ILP gained popularity for three reasons:-

1. It offers rigorous approach to general knowledge-based inductive learning problem.

2. It offers complete algorithm for including general first order theories from examples.

3. It produces hypothesis that are easy for humans to read.

Ex:-

Problem for learning family relationship. The descriptions consists of mother, father,

married relation, male and female properties.

A typical family tree:-

The corresponding descriptions are as follows, Father

(Philips, Charles) Father (Philips, Anne)…. Mother




(Mum, Margaret) Mother (Mum, Elizabeth)….

Married (Diana, Charles) Married (Elizabeth, Philip)…

Male (Philip) Male (Charles)….

Female (Diana) Female (Elizabeth)….

Grandparent (Mum, Charles) Grandparent (Elizabeth, Bectria)….

┐Grandparent (Mum, Harry) ┐Grandparent (Spencer, Peter)…

Statistical learning methods:-

The key concept of statistical learning is domain & hypothesis. (Domain work)

Data:-

Data are evidence of some or all of the random variables describing the domain.




Hypothesis:-

The hypothesis is probabilistic theories of how the domain works including

logical theories as a special case.

Example;

Candy comes as two flavors: cherry & lime

Bayesian learning:

It calculates the probability of each hypothesis, given the data & makes predication

by use all the hypothesis.

Bayesion view of learning is extremely powerful, providing general solution the

problem of noise, over fitting & optional prediction.

Ex:-

h1:100%

cherry

h2:75% cherry +25%

lime h3:50% cherry +

50% lime h4:25%

cherry +75% lime

h5:100% lime

Learning with complete data:-

Parameter learning task involves finding the numerical parameter for the

probability model.

The structure of the model is

fixed.

Data are complete when each data point contains values for every variable in

the probability model.

Complete data simplify the problem of learning the parameter of complex model.

REINFORCEMENT LEARNING:-

If involves finding a balance b/w exploration of new knowledge and exploitation

of current knowledge.

Ex:- chess by supervised learning.




No agent means, it tries some random moves.

The agent need to know that something good has happened when it wise and

something bad when it loss. This kind of feedback is called leeward or reinforcement.

In game like chess, the reinforcement is received only at the end of the

game. In ping-pong game each point scored can be considered as leeward.

The test of reinforcement learning is to be observed rewards to learn an optional poling for

the environment.

It is a sub-area of machine learning.

The basic reinforcement learning model consists of:-

1. A set of environment states s;

2. A set of action A; and

3. A set of scalar “rewards”; R

If is well suited for application like robot control, telecommunication and games.

Various types of agents:-

1. Utility based agents:-

It learns utility function on states, and cases it to select action that maximizes

the expected outcome utility.

2. Q-learning agent:-

It learns an action-value function or Q- function, giving the expected utility

of taking a given action is a given state.

3. Reflex agent:-

It learns a policy that maps directly from states to

action. Basic reinforcement learning model:

Types of reinforcement

learning:



AI 2Marks questions

Documents

Transcript of AI 2Marks questions