kthnearest

8/3/2019 kthnearest

1/3

1

K

Nearest and Furthest Points in M-dimensional SpaceShashidhar M. Sugur

III sem. M.Tech. Department of Studies in Computer Science

University of Mysore,

Mysore.

Abstract: In this paper, we present an algorithm to find

the k nearest and furthest point from a particular point

for data set having n points in m-dimensional space,. The

distance measure and the neighborhood function has many

applications in various fields of computing and

engineering such as artificial intelligence, image

processing and pattern recognition.

Keywords: Distance measure, k-nearest-neighbors,

and k-farthest-neighbors.

1. INTRODUCTIONGiven n points in m dimensional space, we want to find all

distances from the selected point P (x1, x2, ..xm).

The distances between p and other points, is computed by

the distance formula

We tag each distance from the point p and put these pointswith their related distances into an array a[1..n], then we

implement any sorting technique either in ascending or

descending order to these points and keeping their tags

attached to them. Using the position of the points which

are sorted we select the Kth

nearest point and Kth

farthest

point.

2.ALGORITHM DEVELOPMENT

To trace out the operation of the algorithm, we take

random example in simple two dimensional array and

explain the mechanism of finding the Kth

nearest andfarthest from the selected point.

Considering five points, for instance (1,9), (2,6), (6,5),

(3,3) (4,4) and (5,7) in two dimensional array as shown in

the graph below.

Note: For two dimensional graphs the distance formula

is

d =2

12

2

12 )()( yyxx

If the chosen point is (3,3) from the points as given in

the table below.

Tag 1 2 3 4 5

Point (1,9) (2,6) (4,4) (5,7) (6,5)

Therefore, the algorithm makes use of the formula to

determine (d) the distance between two points for all

neighbors and store them in one-dimensional array as

shown below.

Tag 1 2 3 4 5

Distance 6.32 9.06 1.41 4.47 3.61

The distance is sorted in ascending order by using any

sorting technique. In our case, we have used bubble

sort. As shown below

PrevTag 3 5 4 1 2

NewTag 1 2 3 4 5

Distance 1.41 3.61 4.47 6.32 9.06

pp

i

n

i

i yxd/1

1

)||(

8/3/2019 kthnearest

2/3

2

The kth

nearest point is previous tag 3 and the kth

farthest

point is previous tag 2 are fetched from the sorted array.

Algorithm: Kth

-nearest and farthest.

Input: n, m, f[i][j]i=1..n,j=1..m, x[i]i=1..m, k.

Output: k nearest and k farthest points from point x.

Method:

{

// to find distance matrix

for (i0 to n)

{

s0

for(j0 to m)

{

qx[j]-f[i][j]

ss+q*q

}

d[i] square root(s)

c[i]i

}

for (i0 to n-1)

{

for (ji+1 to n)

{

if(d[i]>d[j])

{

tempd[j]

d[j]d[i]

d[i]temp

tempc[j]

c[j]c[i]c[i]temp

}

}

}

//to display k nearest points.

for (i=0 to k)

{

for (j0 to m)

{

Display ( f[c[i]][j])

}

}

//to display k farthest points.

for (i=n to (n-k+1))

{

for(j0 to m)

{

Display (f[c[i]][j])

}

}

}

3.A PRIORI ANALYSIS

After having made an analytical review of the

algorithm, we ended up with the following calculated

time complexity for the algorithm, as implementation

using table would be too lengthy.

The time complexity for the algorithm is as follows:

Best case:

)(

arg

15123

22

2

nnt

eln

for

nmnnt

Worst case:

)(

arg

3151211

22

2

nnt

eln

for

nmnnt

o

o

(n2) O(n2) = (n2).The total time taken by the algorithm, hence is

dependent of two factors, total number of points and

size of dimensional space.

4.PROFILING:

As seen in the a priori analysis, total time taken by the

program to execute is dependent on two factors, that is

total number of points in space, and dimensions of the

space.

For the above reason a posteriori analysis has been

done by first considering size of dimensions constant

and varying only number of points whose table of

values is shown below.

8/3/2019 kthnearest

3/3

3

n Best case Worst case Average

5 270 430 356.5833

10 690 1410 1081.833

15 1260 2970 2093.667

20 1980 5020 3582

25 2850 7650 5541.833

30 3870 10830 6807.5Table 4.1 Values by keeping constant dimensions.

The graph below has been generated from the table of the

values were generated from the experiment.

The graph shows a quadratic relationship between the

computing time and the size of the list for both best and

worst cases. There exists an average case which lies

averagely between the best and worst case.

Lastly by considering the variation in dimensions andkeeping number of points constant as shown in the table of

the values below.

M Time

2 1260

3 1440

4 1620

5 1800

6 1980

7 2160Table 4.2 Values by keeping points constant.

The graph below has been generated from the table of the

values were generated from the experiment

We observe that time complexity is linear

5.CONCLUSION

In conclusion the time complexity of the algorithm is

depends on the method of sorting we using it in thealgorithm, here we using bubble sort method, so the

time complexity is )( 2n in n where n represents the

number of points in the neighborhood. If a better

sorting technique like merge sort was used in the above

algorithm then time complexity would be nn log .

The space complexity is depend on the size of

dimension space, and the calculated distance values are

stored in one-dimensional array whose size depends on

size of the inputs, and some temporary variables.

Reference

[1] R.G Dromey, How to solve by computer, Prentice

Hall of India, New Delhi, 2004.

[2] Horowitz, Sahni, Rajasekaran, Fundamentals of

computer algorithms, Glogotia Publisher New Delhi

2004.

[3] Steven C. Chapra and Raymond P. Canale,

Numerical methods for engineers, Tata Mcgraw

Hill, new Delhi 2004.

kthnearest

Documents

Transcript of kthnearest