Hello and welcome to our community! Is this your first visit?
Enjoy an ad free experience by logging in. Not a member yet? Register.

Thread: Help with nearest neighbor problem

1. Help with nearest neighbor problem

I'm working on a code which takes in data (which is separated by commas) in which each "data point" consists of 32 things,
1. the identity of the point
2. if the point is malignant of benign (we're categorizing cancer cells)
3. the 30 other points are just data points.

Our assignment is to randomly shuffle the data (there 569 points), and pick the first 80% to be our known data, and for the remaining 20%, find the "nearest neighbor" using euclidian distance (distance formula) to try to classify if the cell is malignant or benign.
since we actually have the cell type of the cell we're testing, we can check for accuracy. we're supposed to test this out 100 times, and then print out the accuracy.

This is what I have so far, but its def not working. It prints out the accuracy to be 1.0 each time.. and I don't know how to fix it..

Help is much appreciated, thanks in advance.

Code:
```import java.io.*;
import java.util.*;
import java.math.*;

public class NearestNeighbor {

static int sum;

public static void main (String [] args) throws IOException
{
double [][] data = new double [569][32];
File file = new File("wdbc.txt");
String line = null;
int row = 0;
int col = 0;

while((line = bufRdr.readLine()) != null && row < 569)
{
StringTokenizer st = new StringTokenizer(line,",");
while (st.hasMoreTokens())
{

if (col == 0)
{
String x = st.nextToken();
data[row][0] = Double.parseDouble(x);
col++;
}
else if (col == 1)
{
String type = st.nextToken();

if (type == "M")
{
data[row][1] = 1.0;
}
else
{
data [row][1] = 0.0;
}
col++;
}
else
{
String x = st.nextToken();
data [row][col]= Double.parseDouble(x);
}

}
col = 0;
row++;
}
Classifier a = new Classifier();
a.checkCorrectness();
for (int i = 0; i < 100; i++)
{
sum = sum + a.checkCorrectness();
}
double accuracy = sum/100;

System.out.println("The accuracy is " + accuracy);

}

}```

Code:
```import java.util.*;
import java.lang.Object;
import java.lang.Math;

public class Classifier{

double number = 0;
double minDistance =0;
int cell = 0;
double [][] testData = new double [114][32];
double [][] knownData = new double [455][32];
double [][] knownPoints = new double [455][30];
double [][] testPoints = new double [114][30];

public void getData(double[][] data)
{
List list = (List)Arrays.asList(data);
Collections.shuffle(list);
boolean a = true;
if (a)
{
list.subList(0,456);
knownData = (double [][])(list.toArray(new double[455][32]));
a = false;
}
else
{
list.subList(456, 570);
testData = (double [][])(list.toArray(new double [114][32]));
}

}

public void findDataPoints ()
{
for (int i = 0; i < 115; i ++)
{
for (int j = 0; j < 31; j++ )
{
testPoints[i][j] = testData [i][j+2];
}
}

for (int k = 0; k < 115; k ++)
{
for (int l = 0; l < 31; l++ )
{
knownPoints[k][l] = knownData [k][l+2];
}
}

}

public double typeofClosestCell()
{
for (int i = 0; i < 113; i++)
{
for (int j = 0; j <30; j++)
{
number = number + Math.pow(knownPoints[i][j]-testPoints[i][j], 2);
}
double distance = Math.sqrt(number);
if (distance < minDistance)
{
minDistance = distance;
cell = i;
}

}

return knownData[cell][1];
}

public int checkCorrectness()
{
if (this.typeofClosestCell() == testData[cell][1])
return 1;
else
return 0;
}
}```

2. How is classifier getting any data to work with?
Without going over most of this, a call against checkCorrectness will drop to the if and check the testData[cell][1] as 0.0 (since there is no data), and compare it against typeofClosestCell which will always return 0.0.
Since 0 == 0, the checkCorrectness always returns 1.

The first thing you need to do is get data into Classifier. If thats the data from the file, then drop the 'data' variable created from the file reading into the getData method on classifier. Get that going first and then see how it works.

3. Users who have thanked Fou-Lu for this post:

newcoder1234 (12-12-2010)

4. hey, thanks for replying so fast ^.^

ah, you're right, I suppose thats silly.

but it still gives an accuracy of 1.0. did i not correct it, or is there another error?

5. There has to be another error now, so lets take a closer look.
checkCorrectness makes a call against typeofClosestCall. This method does some math with this: `number = number + Math.pow(knownPoints[i][j]-testPoints[i][j], 2);`. Unfortunately, this will still return 0; both knownPoints and testPoints still have not been initialize by this point either. This result in checkCorrectness always returning 1 if testData[0][1] is 0 on the first run, or testData[112][1] is 0 on subsequent runs. So while this no longer guarantees that the result is 1, it still only allows it to look in two places and I'm not sure if that is what you're intention is (though I somehow doubt it :P).
A lot of it will come down to this:
PHP Code:
``` public void getData(double[][] data){    List list = (List)Arrays.asList(data);    Collections.shuffle(list);    boolean a = true;    if (a)    {        list.subList(0,456);        knownData = (double [][])(list.toArray(new double[455][32]));        a = false;    }    else    {        list.subList(456, 570);        testData = (double [][])(list.toArray(new double [114][32]));    }}  ```
Your getData (which if you want could be chain called from a constructor for the class to save yourself a step), only ever creates the knownData array, and will never create any of the other ones including the testData. The 'if' check is always true.

You're nearestNeighbour class is pretty much fine, but you'll want to do something to control the 'sum' variable in there. While it runs once and terminates, if you looped it will continue adding to the sum variable which makes the /100 incorrect. I'd either move it into the main and non-static it, or I would return its value to 0 at the end of the call. Otherwise, that class shouldn't be a problem assuming that its properly reading in the data.

The first thing I'd do to start the debugging process is put a breakpoint on the first line of the getData method before the collections.shuffle. Using a debugger and comparing it to the text file you have read in, you can then easily confirm that the data is indeed intact. Then, walk it a few lines to the (double[][]) cast to ensure that the knownData has been properly populated with the new shuffled data - I've always had a heck of a time working with the collections.toArray method myself.

Right now I can't even run this (long story, biffed my mirror and I can't get jdk installed >.<), but even so as an assignment I cannot really give you any code to work with. I have noticed that you haven't made a call against the findDataPoints method which would provide you with a populated knownPoints array. The testPoints will still be nothing but 0's since the testData isn't populated in the getData method.
If I understand what you are doing here, you appear to be quite close; you just need to get data into those other arrays....

6. Users who have thanked Fou-Lu for this post:

newcoder1234 (12-12-2010)

7. Thank you!

never mind, it works up to the checkCorrectness() method.
For some reason, this.typeofClosestCell() is always equal to testData[cell][1]
This is now why I keep getting 1.

I can't figure out why though..

8. Okay, so i figured it out. for some reason, in the constructor,

knownData is equal to testData, hence it would always be 1.

I don't know why, but for some reason when I convect the list into a sublist and back into an array, it somehow doesn't work...

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•