What is the best classifier you can build when your data is not enough to predict the labels?

The God Function

In an ideal world, everything has reason. Every question has a unambiguous answer. The data in sufficient to explain its behaviours, like the class it belongs to. g(x) = y

In the non ideal world, however, there is always something missing that stops us from knowing the entire truth. is beyond reach. In such cases we resort to probability. n(x) = P(y=1|x) It simply tells us how probable is the data belonging to a class() if my observations are .

If we build a classifier on this data, how good will it be? This is the question Bayes error answers.

Bayes Error

Lets say I've built a classifier to predict the class of data. is the predicted class and is the true class. Even ambiguous data needs to come from somewhere, So we assume is the joint distribution of and . er_D[h] = P_D[h(x) \neq y] Using an old trick to convert probability to expectation, , we have er_D[h] = E_{x,y}[1(h(x)\neq y)] = E_x E_{y|x}[1(h(x)\neq y)] The inner expectation is easier to solve when expanded. E_{y|x}[1(h(x)\neq y)] = 1(h(x)\neq +1) P(y=+1|x) + 1(h(x)\neq -1)P(y=-1|x) Which give the final error to be er_D[h] = E_x[1(h(x)\neq +1) n(x) + 1(h(x)\neq -1)(1-n(x))]

The last equation means, if the classifier predicts for the data, it will contribute to the error. On the other hand if it predicts for the data, the contribution will be .

The best classifier would predict when is small and when is large. The minimum achievable error is then er_D = E_x [\min(n(x),1-n(x))] This error is called Bayes Error.

References

Shivani Agarwal's lectures