In the post on bayes error, we discussed what is the best classifier if the features are not enough to tell the class. We also derived that in such situation, the best classifier is h(x) = sign \left( n(x) - \frac{1}{2} \right) This formulation cannot be used in general situations as there is no easy way to estimate for any data distribution. But what if does follow a simple distribution?

Lets assume that the data is a gaussian for each class P(x|y) = N(\mu_y,\Sigma_y) = f_y The parametric form of immediately give us a closed form for by a simple application of bayes rule n(x) = \frac{pf_{+1}}{p f_{+1}+(1-p)f_{-1}} which in turn gives us a simple classifier n(x) - \frac{1}{2} = \frac{pf_{+1}}{pf_{+1}+(1-p)f_{-1}} - \frac{1}{2}

= \frac{f_{+1}}{f_{-1}} - \frac{1-p}{p}

To further simplify, we use a strictly increasing property of function and write h(x) = \operatorname{sign} \left( \log \frac{f_{+1}}{f_{-1}}- \log\frac{1-p}{p} \right) This gives us simpler form of the classifier h(x) = sign(x^TAx + b^Tx+c) where .

If we further assume that class covariances are the same( ), then what we get a linear classifier. h(x) = \operatorname{sign}(b^Tx+c)

Appendix

\log \frac{f_{+1}}{f_{-1}} &= \log \frac{|\Sigma_{-1}|}{|\Sigma_{+1}|} - \frac{1}{2} \left[ (x-\mu_{+1})^T\Sigma_{+1}^{-1}(x-\mu_{+1}) - (x-\mu_{-1})^T\Sigma_{-1}^{-1}(x-\mu_{-1}) \right]