Notes 03-3: Overfitting


Basic Overfitting Phenomenon

Overfitting is bad.

Example: polynomial regression

Suppose we use a hypothesis space, with many classes of functions

hypotheses

Given n = 10 training points (x1, y1)...(x10, y10).

graph

hypotheses2

Since n = 10, with 10 pieces of data, then in

Hn - 1 = H9

there is a function that gives an exact fit.

Which hypothesis would predict well on future examples?

When does empirical error minimization overfit?

Depends on the relationship between