# Homework 8

Read/skim "Strong and Weak Emergence" by David Chalmers from 2006 (PDF; full citation). Answer these questions:

1. Define what Chalmers means by weak emergence.
2. Chalmers writes, "We can think of strongly emergent phenomena as being systematically determined by low-level facts without being deducible from those facts." Give an example (1-2 sentences) that may possibly satisfy this definition of strong emergence.
3. Are the NetLogo models we looked at (sheep and wolves, ants, termites) examples of strong or weak emergence? Provide a 1-2 sentence argument.

Execute the k-means algorithm by hand on the following data:

item # w x y z true label
1 1.0 1.0 0.0 1.0 A
2 0.0 0.0 1.0 0.0 B
3 2.0 2.0 0.0 2.0 A
4 0.0 0.0 1.0 1.0 B
5 2.0 2.0 0.0 2.0 A
6 0.0 2.0 1.0 1.0 B

Use $$k=2$$. Show the centroids as they change, and give the final centroids. Choose random (or not so random) starting centroid values. Finally, give the confusion matrix.

Run the k-means algorithm in Weka using this dataset: iris.arff (iris species clustering).

Choose $$k=3$$ and $$k=4$$. Give the confusion matrix for each value of $$k$$. Also report the percent of correctly classified instances for each class, for each $$k$$.

Execute the k-nearest neighbor algorithm by hand on the dataset below (same as before). Use $$k = 2$$. Classify the data point: $$<1, 0, 1, 2>$$.

item # w x y z true label
1 1.0 1.0 0.0 1.0 A
2 0.0 0.0 1.0 0.0 B
3 2.0 2.0 0.0 2.0 A
4 0.0 0.0 1.0 1.0 B
5 2.0 2.0 0.0 2.0 A
6 0.0 2.0 1.0 1.0 B

Run the k-nearest neighbor in Weka using this dataset: letter.arff (handwritten letter classification). Find a good value of $$k$$. Use 10-fold cross validation and report the accuracy and give the confusion matrix.

Explain the differences between k-means and k-nearest neighbor algorithms.

## Extra credit (+20 pts)

Play around with Weka. Report how well at least three different classification algorithms (avoid k-means and k-nn) perform on the letter.arff data with 10-fold cross validation. Collect accuracies in a table.