menu

UML_5: Agnostic PAC Learning | 3.2.1.

3.2.1. Agnostic PAC Learning

3.2.1. Agnostic PAC Learning
  • In previous articles of UML series, we know that the realizability assumption requires that there exist such that . In fact, labels don't completely depend on the features. Then, we relax the realizability assumption by replacing the "target labeling function" with more flexible, a data-labels generating distribution.

  • We redefine be a probability distribution over . So, includes two parts: a distribution (marginal distribution) and (conditional probability).

  • True Error Revised:

  • Goal: We wish to find some hypothesis that minimizes .

  • The Bayes Optimal Predictor:

    • Given over , the best labeling function is:

    We can show that is optimal, means with every , .

    Prove: With every :

    Suppose we have , so we also have . If we choose , . And if we choose , .

    We want to be as small as possible. So if , we choose . Similarly with opposite case, we can prove is optimal.

  • Agnostic PAC Learning: A hypothesis is agnostic PAC learnable if exist with property:

    • For every , every distribution over , when running algorithm on generated by , the algorithm returns a hypothesis such that, with probability of at least :

      If the realizability assumption holds, agnostic PAC learning generalizes the definition of PAC learning. If not, agnostic PAC learning can still guarantee success if its error is not much larger than the best error. It contrast to PAC learning, in which the learner is requires to achieve a small error and not relative to the best error.

Reference: