kthnearest

download kthnearest

of 3

Transcript of kthnearest

  • 8/3/2019 kthnearest

    1/3

    1

    K

    Nearest and Furthest Points in M-dimensional SpaceShashidhar M. Sugur

    III sem. M.Tech. Department of Studies in Computer Science

    University of Mysore,

    Mysore.

    Abstract: In this paper, we present an algorithm to find

    the k nearest and furthest point from a particular point

    for data set having n points in m-dimensional space,. The

    distance measure and the neighborhood function has many

    applications in various fields of computing and

    engineering such as artificial intelligence, image

    processing and pattern recognition.

    Keywords: Distance measure, k-nearest-neighbors,

    and k-farthest-neighbors.

    1. INTRODUCTIONGiven n points in m dimensional space, we want to find all

    distances from the selected point P (x1, x2, ..xm).

    The distances between p and other points, is computed by

    the distance formula

    We tag each distance from the point p and put these pointswith their related distances into an array a[1..n], then we

    implement any sorting technique either in ascending or

    descending order to these points and keeping their tags

    attached to them. Using the position of the points which

    are sorted we select the Kth

    nearest point and Kth

    farthest

    point.

    2.ALGORITHM DEVELOPMENT

    To trace out the operation of the algorithm, we take

    random example in simple two dimensional array and

    explain the mechanism of finding the Kth

    nearest andfarthest from the selected point.

    Considering five points, for instance (1,9), (2,6), (6,5),

    (3,3) (4,4) and (5,7) in two dimensional array as shown in

    the graph below.

    Note: For two dimensional graphs the distance formula

    is

    d =2

    12

    2

    12 )()( yyxx

    If the chosen point is (3,3) from the points as given in

    the table below.

    Tag 1 2 3 4 5

    Point (1,9) (2,6) (4,4) (5,7) (6,5)

    Therefore, the algorithm makes use of the formula to

    determine (d) the distance between two points for all

    neighbors and store them in one-dimensional array as

    shown below.

    Tag 1 2 3 4 5

    Distance 6.32 9.06 1.41 4.47 3.61

    The distance is sorted in ascending order by using any

    sorting technique. In our case, we have used bubble

    sort. As shown below

    PrevTag 3 5 4 1 2

    NewTag 1 2 3 4 5

    Distance 1.41 3.61 4.47 6.32 9.06

    pp

    i

    n

    i

    i yxd/1

    1

    )||(

  • 8/3/2019 kthnearest

    2/3

    2

    The kth

    nearest point is previous tag 3 and the kth

    farthest

    point is previous tag 2 are fetched from the sorted array.

    Algorithm: Kth

    -nearest and farthest.

    Input: n, m, f[i][j]i=1..n,j=1..m, x[i]i=1..m, k.

    Output: k nearest and k farthest points from point x.

    Method:

    {

    // to find distance matrix

    for (i0 to n)

    {

    s0

    for(j0 to m)

    {

    qx[j]-f[i][j]

    ss+q*q

    }

    d[i] square root(s)

    c[i]i

    }

    for (i0 to n-1)

    {

    for (ji+1 to n)

    {

    if(d[i]>d[j])

    {

    tempd[j]

    d[j]d[i]

    d[i]temp

    tempc[j]

    c[j]c[i]c[i]temp

    }

    }

    }

    //to display k nearest points.

    for (i=0 to k)

    {

    for (j0 to m)

    {

    Display ( f[c[i]][j])

    }

    }

    //to display k farthest points.

    for (i=n to (n-k+1))

    {

    for(j0 to m)

    {

    Display (f[c[i]][j])

    }

    }

    }

    3.A PRIORI ANALYSIS

    After having made an analytical review of the

    algorithm, we ended up with the following calculated

    time complexity for the algorithm, as implementation

    using table would be too lengthy.

    The time complexity for the algorithm is as follows:

    Best case:

    )(

    arg

    15123

    22

    2

    nnt

    eln

    for

    nmnnt

    Worst case:

    )(

    arg

    3151211

    22

    2

    nnt

    eln

    for

    nmnnt

    o

    o

    (n2) O(n2) = (n2).The total time taken by the algorithm, hence is

    dependent of two factors, total number of points and

    size of dimensional space.

    4.PROFILING:

    As seen in the a priori analysis, total time taken by the

    program to execute is dependent on two factors, that is

    total number of points in space, and dimensions of the

    space.

    For the above reason a posteriori analysis has been

    done by first considering size of dimensions constant

    and varying only number of points whose table of

    values is shown below.

  • 8/3/2019 kthnearest

    3/3

    3

    n Best case Worst case Average

    5 270 430 356.5833

    10 690 1410 1081.833

    15 1260 2970 2093.667

    20 1980 5020 3582

    25 2850 7650 5541.833

    30 3870 10830 6807.5Table 4.1 Values by keeping constant dimensions.

    The graph below has been generated from the table of the

    values were generated from the experiment.

    The graph shows a quadratic relationship between the

    computing time and the size of the list for both best and

    worst cases. There exists an average case which lies

    averagely between the best and worst case.

    Lastly by considering the variation in dimensions andkeeping number of points constant as shown in the table of

    the values below.

    M Time

    2 1260

    3 1440

    4 1620

    5 1800

    6 1980

    7 2160Table 4.2 Values by keeping points constant.

    The graph below has been generated from the table of the

    values were generated from the experiment

    We observe that time complexity is linear

    5.CONCLUSION

    In conclusion the time complexity of the algorithm is

    depends on the method of sorting we using it in thealgorithm, here we using bubble sort method, so the

    time complexity is )( 2n in n where n represents the

    number of points in the neighborhood. If a better

    sorting technique like merge sort was used in the above

    algorithm then time complexity would be nn log .

    The space complexity is depend on the size of

    dimension space, and the calculated distance values are

    stored in one-dimensional array whose size depends on

    size of the inputs, and some temporary variables.

    Reference

    [1] R.G Dromey, How to solve by computer, Prentice

    Hall of India, New Delhi, 2004.

    [2] Horowitz, Sahni, Rajasekaran, Fundamentals of

    computer algorithms, Glogotia Publisher New Delhi

    2004.

    [3] Steven C. Chapra and Raymond P. Canale,

    Numerical methods for engineers, Tata Mcgraw

    Hill, new Delhi 2004.