Machine Learning & Data Mining | CSE 450

Helpful Hints for Clustering in R

Overview

There are many good resources on the Internet that discuss clustering in R. This document is not intended to replace them, or to hold you back from searching and finding them. The intention of this document is to give you context and point you in the right direction in terms of the functions in R that you have at your disposal.

Code Examples

Agglomerative Hierarchical Clustering


# first compute a distance matrix
distance = dist(as.matrix(data))
                
# now perform the clustering
hc = hclust(distance)
                
# finally, plot the dendrogram
plot(hc)

Scaling data (normalizing)


data_scaled = scale(data)

Using k-means Clustering


# Cluster into k=5 clusters:
myClusters = kmeans(data, 5)

# Summary of the clusters
summary(myClusters)

# Centers (mean values) of the clusters
myClusters$centers

# Cluster assignments
myClusters$cluster

# Within-cluster sum of squares and total sum of squares across clusters
myClusters$withinss
myClusters$tot.withinss


# Plotting a visual representation of k-means clusters
library(cluster)
clusplot(data, myClusters$cluster, color=TRUE, shade=TRUE, labels=2, lines=0)

Creating a 2D plot of the clustering


library(cluster)
clusplot(data, myClusters$cluster, color=TRUE, shade=TRUE, labels=2, lines=0)

A simple for loop in R


table = NULL;
for (i in 1:10) {
  table[i] = i * 2
}

Remember, the Web is filled with many more examples and tutorials...