Machine Learning & Data Mining | CSE 450

Helpful Hints for Clustering in R


There are many good resources on the Internet that discuss clustering in R. This document is not intended to replace them, or to hold you back from searching and finding them. The intention of this document is to give you context and point you in the right direction in terms of the functions in R that you have at your disposal.

Code Examples

Agglomerative Hierarchical Clustering

# first compute a distance matrix
distance = dist(as.matrix(data))
# now perform the clustering
hc = hclust(distance)
# finally, plot the dendrogram

Scaling data (normalizing)

data_scaled = scale(data)

Using k-means Clustering

# Cluster into k=5 clusters:
myClusters = kmeans(data, 5)

# Summary of the clusters

# Centers (mean values) of the clusters

# Cluster assignments

# Within-cluster sum of squares and total sum of squares across clusters

# Plotting a visual representation of k-means clusters
clusplot(data, myClusters$cluster, color=TRUE, shade=TRUE, labels=2, lines=0)

Creating a 2D plot of the clustering

clusplot(data, myClusters$cluster, color=TRUE, shade=TRUE, labels=2, lines=0)

A simple for loop in R

table = NULL;
for (i in 1:10) {
  table[i] = i * 2

Remember, the Web is filled with many more examples and tutorials...