BIRCH$^{[R2]}$ is a very cleaver clustering algorithm which is very fast in practise. At the end of its clustering, it guaranties that the average cluster radius is bounded. This is an attempt to convert this bound to a more useful one.

BIRCH algorithm maintains a 3 tuple feature for each of its clusters called a Characteristic Feature. If a cluster contains $n$ points,

CF &= (n,LS,SS) \\ LS &= \sum_{i=1}^n x_i \\ SS &= \sum_{i=1}^n \langle x_i,x_i\rangle As the definition implies, LS stands for Linear Sum and SS is Square Sum of the cluster points.

$CF$ is very useful to quickly measure how close or far a pair of clusters are. If we decide to combine $2$ clusters into one, the combined $CF$ can easily be computed from individual $CF$'s. Many other useful measures are explained in [R2]. Only the ones relevant to the problem at hand will be enumerated here.

The centroid of the cluster is given as x_0 = \frac{1}{n} \sum_{i=1}^n x_i

Radius of a cluster is defined as R_x = \left ( \frac{\sum_{i=1}^n \langle x_i-x_0, x_i-x_0 \rangle}{n}\right )^{\frac{1}{2}} This is not equivalent to average distance between all points and the centroid. That would be R_c = \frac{1}{n}\sum_{i=1}^n \langle x_i-x_0, x_i-x_0 \rangle^{\frac{1}{2}} The relationship between $R_x$ and $R_c$ is the subject of this article.

# Some limits on BIRCH cluster radius

Consider the cluster in question contains points $\{ x \}_{i=1}^n$. BIRCH algorithm guaranties that the average cluster radius $R_x \leq \epsilon$. The algorithms takes $\epsilon$ as one if its parameters.

Lets abstract out the basic building blocks of $R_x$ and $R_c$ by defining z_i = \langle x_i-x_0, x_i-x_0 \rangle^{\frac{1}{2}} Then $nR_x^2 = \sum z_i^2$ and $nR_c = \sum |z_i|$. This looks like $1$ and $2$ norm on $z$. So lets try to connect all of them through the following inequality on norms.

\textbf{[P1]}\qquad \|z\|_p \leq \|z\|_r \leq n^{\left(\frac{1}{r}-\frac{1}{p}\right)}\|z\|_p A simple consequence of this inequality is $\|z\|_2 \leq \|z\|_1 \leq \sqrt{n}\|z\|_2$. With this the bounds possible on the distances are \sqrt{nR_x^2} &\leq nR_c \leq \sqrt{n} \sqrt{nR_x^2} \\ \textbf{[B1]}\qquad \frac{R_x}{\sqrt{n}} &\leq R_c \leq R_x [P1] also gives us $\|z\|_{\infty} \leq \|z\|_1 \leq n\|z\|_{\infty}$ and $\|z\|_{\infty} \leq \|z\|_2 \leq \sqrt{n}\|z\|_{\infty}$. $\max{z_i}$ represents the distance of farthest point from the centroid. So what we have is \textbf{[B2]}\qquad \frac{\max{z_i}}{n} &\leq R_c \leq \max{z_i} \\ \textbf{[B3]}\qquad \frac{\max{z_i}}{\sqrt{n}} &\leq R_x \leq \max{z_i}

# References

P1 This property can be proved from Holder's Inequality by assuming $y=1$

R2 Zhang, T., Ramakrishnan, R. and Livny, M., 1997. BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1(2), pp.141-182.