The colon dataset of Alon et. al. (1999)

1)     

Clustering algorithms:

a)      OKM

i)        Preprocessing method: SVD, 2 dimension

Figure 2: BIC Values when applying OKM (SVD reduced to 2 dimensions) on the colon dataset

 

Figure 3: Comparison of the internal (BIC) and external (Jaccard) criteria of the colon dataset (OKM)

 

ii)       Preprocessing method: SVD, 3 dimension

Figure 4: BIC Values when applying OKM (SVD reduced to 3 dimensions) on the colon dataset

Figure 5: BIC Values when applying OKM (2-5 clusters) on the colon dataset

 

Figure 6: Comparison of the internal (BIC) and external (Jaccard) criteria of the colon dataset (OKM)

iii)     Preprocessing method: PCA, 2 dimension[1]

Figure 7: BIC Values when applying OKM (PCA reduced to 2 dimensions) on the colon dataset

Figure 8: Comparison of the internal (BIC) and external (Jaccard) criteria of the colon dataset (OKM)

 

iv)     Preprocessing method: PCA, 3 dimension

Figure 9: BIC Values when applying OKM (PCA reduced to 3 dimensions) on the colon dataset

 

Figure 10: Comparison of the internal (BIC) and external (Jaccard) criteria of the colon dataset (OKM)

v)      Comparison of preprocessing methods

b)     

OKM

i)        Preprocessing method: SVD, 3 dimesions

 

Figure 12: Comparison of the internal (BIC) and external (Jaccard) criteria of the colon dataset (OQC)

 

Figure 13: Comparison of the standard and optimized version of the KM and QC algorithms

 

Figure 14: Comparison of the various clustering algorithms (results according to Sharan and Shamir, 2003 and Getz et al., 2000)

 

 

Method

Jaccard

K-Means (raw data)

0.345

QC (raw data)

0.4

K-Means  (Preprocessing & BIC)

0.678

QC (Preprocessing & BIC)

0.715

CLICK (Sharan & Shamir, 2003)

0.64

CAST (Sharan & Shamir, 2003, Ben Dor et al.,1999)

0.682

CTWC (Getz et al.  2000, and [2])

0.508

Table 1. Comparison of the Clustering Performance of the Colon

 



[1] : from matlab: “princomp centers X by subtracting off column means, but does not rescale the columns of X. To perform principal components analysis with standardized variables, that is, based on correlations, use princomp(zscore(X)).”

[2] http://www.weizmann.ac.il/physics/complex/compphys/ctwc/