Back Machine Learning HomeNext


Machine learning is a process which causes systems to improve with experience.

Elements of a Learning Task


  1. Items of Experience
  2. Space of Available Actions
  3. Evaluation
  4. Base Performance System
  5. Learning System

Types of Learning Problems

Fundamental Distinctions

  1. Batch versus Online Learning:
  2. Learning from Complete versus Partial Feedback:
  3. Passive versus Active Learning
  4. Learning in Acausal versus Causal Situations
  5. Learning in Stationary versus Nonstationary Environments

Concept Learning

The problem of inducing general functions from specific training examples is central to learning.

Concept learning acquires the definition of a general category given a sample of positive and negative training examples of the category, the method of which is the problem of searching through a hypothesis space for a hypothesis that best fits a given set of training examples.

A hypothesis space, in turn, is a predefined space of potential hypotheses, often implicitly defined by the hypothesis representation.

Learning a Function from Examples


Idea: to extrapolate observed y's over all X.

Hope: to predict well on future y's given x's.

Require: there must be regularities to be found!

(Note type: batch, complete, passive (we are not choosing which x), acausal, stationary).

Many Research Communities

Traditional Statistics

Traditional Pattern Recognition

"Symbolic" Machine Learning

Neural Networks

Inductive Logic Programming

Learning Theory

Standard Learning Algorithm

Given a batch set of training data S = {(x1, y1)...(xt, yt)}, consider a fixed hypothesis class H and compute a function h H that minimizes empirical error.

Example: Least-Squares Linear Regression

Here, the standard learning algorithm corresponds to least squares linear regression.

X = Rn, Y = R

H = {linear functions Rn R}

Choosing a Hypothesis Space

In practice, for a given problem X Y, which hypothesis space H do we choose?

Question: since we know the true function f: X Y, should we make H as "expressive" as possible and let the training data do the rest?

Answer: no!
Reason: overfitting!

Basic Overfitting Phenomenon

Overfitting is bad.

Example: polynomial regression

Suppose we use a hypothesis space, with many classes of functions

Given n = 10 training points (x1, y1)...(x10, y10).

Since n = 10, with 10 pieces of data, then in

Hn - 1 = H9
there is a function that gives an exact fit.

Which hypothesis would predict well on future examples?

When does empirical error minimization overfit?

Depends on the relationship between

Back   HomeNext