Overfitting

Basic Overfitting Phenomenon

Overfitting is bad.

Example: polynomial regression

Suppose we use a hypothesis space, with many classes of functions

hypotheses

Given n = 10 training points (x₁, y₁)...(x₁₀, y₁₀).

hypotheses2

Since n = 10, with 10 pieces of data, then in

H_{n - 1} = H₉

there is a function that gives an exact fit.

Which hypothesis would predict well on future examples?

Although hypothesis h₉ fits these 10 data exactly, it is unlikely to fit the next piece of data well.

When does empirical error minimization overfit?

Depends on the relationship between

Notes 03-3: Overfitting