Computational Complexity
·
Calculating BIC
As described by
Zhong and Ghosh, and Cadez et al. 2003, the runtime of the EM while computing
the BIC is O(KNc+KD2), where
o K
is the number of clusters
o N
is the number of elements
o c
is the number of iterations (Zhong and Ghosh, assume by empirical results that
the EM algorithm converges within 20 to 30 iterations).
o D
is the number of dimensions.
Although most
writers assume that D is very small (usually up to 3 dimensions and seldom
reaches to a dozen), and therefore exclude it from the runtime complexity
calculation, we find it relevant to emphasize its significant contribution to
the overall runtime complexity.
Figure 1 describes
the runtime (in seconds) as a function of the dataset’s dimension.

Figure 1:
Runtime of the BIC calculation as a function of the number of dimensions of the
datasets[1], and the
corresponding quadratic trendline.
· QC
Horn
and Axel (2003), showed that finding the clusters minima can be simplified to
analysis of the data points only, thus reducing the time complexity to
O(D∙N), where D is the number of dimensions after the SVD reduction and N
is the number of data points (the potential at each point is a function of all
other points). They also showed that gradient descent process involves
calculation in the order of
where
o c
is the number of iterations required for convergence (for most practical cases,
30-50 iterations were found to be sufficient).
o D
is the number of dimensions.
o N
is the number of elements
· K-Means
The
runtime for basic K-Means is O(c*K*D*N), where
o c
is the number of iterations. (c is often small, 5-10).
o K
is the number of clusters
o D
is the number of dimensions.
o N
is the number of elements
Cadez
I. V., Heckerman D., Meek C., Smyth P., White S. Model-based Clustering and
Visualization of Navigation Patterns on a Web Site. Journal of Data Mining and
Knowledge Discovery, 7 (4), pp. 399-424, 2003.
Horn,
D. and Axel,
Zhong,
S. and Ghosh, J. (2003) A unified framework for model-based clustering, Journal
of Machine Learning Research 4, 1001-1037