Inductive Inference |

**Reference**: T. Mitchell, 1997.

**Inductive inference** is the process of reaching a general conclusion from specific examples.

The general conclusion should apply to unseen examples.

**Inductive Learning Hypothesis**: any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

__Example:__

Identified relevant attributes: x, y, z

x |
y |
z |

2 | 3 | 5 |

4 | 6 | 10 |

5 | 2 | 7 |

__Model 1:__

x + y = z

Prediction: x = 0, z = 0 y = 0

__Model 2:__

if x = 2 and z = 5, then y = 3.

if x = 4 and z = 10, then y = 6.

if x = 5 and z = 7, then y = 2.

otherwise y = 1.

Model 2 is likely overfitting.

Good: | completely consistent with data. |

Bad: | no justification in the data for the prediction that y = 1 in all other cases. |

not in the class of algebraic functions (but nothing was said about class of descriptions). |

**Inductive bias:** explicit or implicit assumption(s) about what kind of model is wanted.

Typical inductive bias:

- prefer models that can be written in a concise way.
- Select the shortest one.

__Example:__

- The inductive bias of the decision tree ID3 algorithm is a preference for certain hypotheses over others (e.g., for shorter hypotheses), with no hard restriction on the hypotheses that can be eventually enumerated. This form of bias is typically called a
*preference bias*(or, alternatively, a*search bias*). - In contrast, the bias of the version space candidate-elimination algorithm is in the form of a categorical restriction on the set of hypotheses considered. This form of bias is typically called a
*restriction bias*(or, alternatively, a*language bias*). - Typically, a preference bias is more desirable than a restriction bias because it allows the learner to work within a complete hypothesis space that is assured to contain the unknown target function.
- A restriction bias that strictly limits the set of potential hypotheses is generally less desirable because it introduces the possibility of excluding the unknown target function altogether.

Some languages of interest:

- Conjunctive normal form for Boolean variables A, B, C.
- e.g., ABC

- Disjunctive normal form.
- e.g., A(~B)(~C) + A(~B)C + AB(~C)

- Algebraic expressions.

__Positive and Negative Examples__

**Positive Examples**

- Are all true.

x |
y |
z |

2 | 3 | 5 |

2 | 5 | 7 |

4 | 6 | 10 |

general | x, y, z I |

more specific | x, y, z I^{+} |

more specific than the first two | 1 x, y, z 11 ; x, y, z I |

even more specific model | x + y = z |

**Negative Examples**

- Constrain the set of models consistent with the examples.

x |
y |
z |
Decision |

2 | 3 | 5 | Y |

2 | 5 | 7 | Y |

4 | 6 | 10 | Y |

2 | 2 | 5 | N |

__Search for Description__

Description keeps getting larger or longer.

Finite language - algorithm terminates.

Infinite language - algorithm runs

- Forever.
- Until out of memory.
- Until a final answer is reached.

*X* = example space/instance space (all possible examples)

*D* = description space (set of descriptions defined as a language *L*)

- Each description
*l**L*corresponds to a set of examples*X*._{l}

__Success Criterion__

Look for a description *l* *L* such that *l* is consistent with all observed examples.

- description can be relaxed to match all instances.
- inductive bias can be expressed in the success criterion.

__Example:__

L = {x op y = z}, op = {+, -, *, /}

Given a precise specification of language and data, write a program to test descriptions one by one against the examples.

- finite language: size = |L|
- finite number of examples: size = |X|
- (|L| * |X|)

__Why is Machine Learning Hard (Slow)?__

It is very difficult to specify a small finite language that contains a description of the examples.

e.g., algebraic expressions on 3 variables is an infinite language