Machine Learning

Reference:

D. Schuurmans, Machine Learning course notes, University of Waterloo, 1999.
T. Mitchell, Machine Learning, McGraw-Hill, 1997, pp. 20-50.

Machine learning is a process which causes systems to improve with experience.

Elements of a Learning Task

Representation

Items of Experience
- i I
Space of Available Actions
- a A
Evaluation
- v (a, i)
Base Performance System
- b: I A
Learning System
- L: (i₁, a₁, v₁)...(i_n, a_n, v_n) b
- Maps training experience sequence to base performance system.

Types of Learning Problems

Fundamental Distinctions

Between problems rather than methods.
Influences choice of method.

Batch versus Online Learning:
- Batch Learning
  - Training phase (no responsibility for performance during learning), then testing.
  - Synthesizing a base performance system.
  - Pure exploration.
- Online Learning
  - No separate phase.
  - Learning while doing.
  - Have to decide between choosing actions to perform well now versus gaining knowledge to perform well later.
Learning from Complete versus Partial Feedback:
- Complete Feedback
  - Each item of experience evaluates every action (or base system).
  - Tells what best action would have been.
- Partial Feedback
  - Each item of experience evaluates only a subset of possible actions (or base systems).
  - Only tells you how you did.
  - Creates exploration/exploitation tradeoff in online setting.
    - Should the reward be optimized with what is already known or should learning be attempted?
    - In batch learning, exploration would be chosen since there is no responsibility for performance.
- Pointwise Feedback
  - Evaluates single action.
    - Optimization problem: (Controller, State) Action.
Passive versus Active Learning
- Passive: observation.
- Active: experimentation, exploitation.
Learning in Acausal versus Causal Situations
- Acausal: actions do not affect future experience.
  - e.g., rain prediction
- Causal: actions do effect future experience.
Learning in Stationary versus Nonstationary Environments
- Stationary: evaluation doesn't change.
- Nonstationary: evaluation changes with time.

Concept Learning

The problem of inducing general functions from specific training examples is central to learning.

Concept learning acquires the definition of a general category given a sample of positive and negative training examples of the category, the method of which is the problem of searching through a hypothesis space for a hypothesis that best fits a given set of training examples.

A hypothesis space, in turn, is a predefined space of potential hypotheses, often implicitly defined by the hypothesis representation.

Learning a Function from Examples

An example of concept learning where the concepts are mathematical functions.

Given:

Domain X: descriptions
Domain Y: predictions
H: hypothesis space; the set of all possible hypotheses
h: target hypothesis

learner

Idea: to extrapolate observed y's over all X.

Hope: to predict well on future y's given x's.

Require: there must be regularities to be found!

(Note type: batch, complete, passive (we are not choosing which x), acausal, stationary).

Many Research Communities

Representation choice differs.
Prefer maximum level of abstraction.

Traditional Statistics

h: Rⁿ R
Squared prediction error.
h is a linear function.

Traditional Pattern Recognition

h: R {0, 1}
Right/wrong prediction error.
h is a linear discriminant boundary.

"Symbolic" Machine Learning

h: {attribute-value vectors} {0, 1}
h is a simple boolean function (eg. a decision tree).

Neural Networks

h: Rⁿ R
h is a feedforward neural net.

Inductive Logic Programming

h: {term structure} {0, 1}
h is a simple logic program.

Learning Theory

Postulate mathematical model of learning situations.
Rigorously establish achievable levels of learning performance.
Characterize/quantify value of prior knowledge and constraints.
Identify limits/possibilities of learning algorithms.

Standard Learning Algorithm

Given a batch set of training data S = {(x₁, y₁)...(x_t, y_t)}, consider a fixed hypothesis class H and compute a function h epsilon H that minimizes empirical error.

empirical