Cluster Radius of BIRCH
BIRCH algorithm maintains a 3 tuple feature for each of its clusters called a Characteristic Feature. If a cluster contains points,
CF &= (n,LS,SS) \\ LS &= \sum_{i=1}^n x_i \\ SS &= \sum_{i=1}^n \langle x_i,x_i\rangle As the definition implies, LS stands for Linear Sum and SS is Square Sum of the cluster points.
is very useful to quickly measure how close or far a pair of clusters are. If we decide to combine clusters into one, the combined can easily be computed from individual 's. Many other useful measures are explained in [R2]. Only the ones relevant to the problem at hand will be enumerated here.
The centroid of the cluster is given as x_0 = \frac{1}{n} \sum_{i=1}^n x_i
Radius of a cluster is defined as R_x = \left ( \frac{\sum_{i=1}^n \langle x_i-x_0, x_i-x_0 \rangle}{n}\right )^{\frac{1}{2}} This is not equivalent to average distance between all points and the centroid. That would be R_c = \frac{1}{n}\sum_{i=1}^n \langle x_i-x_0, x_i-x_0 \rangle^{\frac{1}{2}} The relationship between and is the subject of this article.
Some limits on BIRCH cluster radius
Consider the cluster in question contains points . BIRCH algorithm guaranties that the average cluster radius . The algorithms takes as one if its parameters.
Lets abstract out the basic building blocks of and by defining z_i = \langle x_i-x_0, x_i-x_0 \rangle^{\frac{1}{2}} Then and . This looks like and norm on . So lets try to connect all of them through the following inequality on norms.
\textbf{[P1]}\qquad \|z\|_p \leq \|z\|_r \leq n^{\left(\frac{1}{r}-\frac{1}{p}\right)}\|z\|_p A simple consequence of this inequality is . With this the bounds possible on the distances are \sqrt{nR_x^2} &\leq nR_c \leq \sqrt{n} \sqrt{nR_x^2} \\ \textbf{[B1]}\qquad \frac{R_x}{\sqrt{n}} &\leq R_c \leq R_x [P1] also gives us and . represents the distance of farthest point from the centroid. So what we have is \textbf{[B2]}\qquad \frac{\max{z_i}}{n} &\leq R_c \leq \max{z_i} \\ \textbf{[B3]}\qquad \frac{\max{z_i}}{\sqrt{n}} &\leq R_x \leq \max{z_i}
References
P1 This property can be proved from Holder's Inequality by assuming
R2 Zhang, T., Ramakrishnan, R. and Livny, M., 1997. BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1(2), pp.141-182.