Machine Learning/Decision Trees/C4.5 Tutorial/C4.5rules Manual Page

C4.5rules Manual Page

C4.5RULES(1)

NAME

c4.5rules - form production rules from unpruned decision trees

SYNOPSIS

c4.5rules [ -f filestem ] [ -u ] [ -v verb ] [ -F siglevel ] [ -c cf ] [ -r redundancy ]

DESCRIPTION

C4.5rules reads the decision tree or trees produced by C4.5 and generates a set of production rules from each tree and from all trees together. All files read and written by C4.5 are of the form filestem.ext where filestem is a file name stem that identifies the induction task and ext is an extension that defines the type of file. The Rules program expects to find a names file defining class, attribute and attribute value names, a data file containing a set of objects whose class and value of each attribute is specified, a unpruned file generated by C4.5 from the data file, and (optionally) a test file containing unseen objects.

For each tree that it finds, the program generates a set of pruned rules, and then sifts this set in an attempt to find the most useful subset of them. If more than one tree was found, all subsets are then merged and the resulting composite set of rules is then sifted. The final set of rules is saved in a machine-readable format in a rules file. Each of the rulesets produced is then evaluated on the original training data and (optionally) on the test data.

OPTIONS

-f filestem Specify the filename stem (default DF).

-u Evaluate rulesets on unseen cases in file filestem.test.

-v verb Set the verbosity level [0-3] (default 0).

-F siglevel Invoke Fisher's significance test when pruning rules. If a rule contains a condition whose probability of being irrelevant is greater than the stated level, the rule is pruned further (default: no significance testing).

-c cf Set the confidence level used in forming the pessimistic estimate of a rule's error rate (default 25%).

-r redundancy If many irrelevant or redundant attributes are included, estimate the ratio of attributes to "sensible" attributes (default 1).

FILES

c4.5
c4.5rules
filestem.data
filestem.names
filestem.unpruned (unpruned trees)
filestem.rules (production rules)
filestem.test (unseen data)

SEE ALSO

c4.5(1), consultr(1)

BUGS

-f filestem	Specify the filename stem (default DF).
-u	Evaluate rulesets on unseen cases in file filestem.test.
-v verb	Set the verbosity level [0-3] (default 0).
-F siglevel	Invoke Fisher's significance test when pruning rules. If a rule contains a condition whose probability of being irrelevant is greater than the stated level, the rule is pruned further (default: no significance testing).
-c cf	Set the confidence level used in forming the pessimistic estimate of a rule's error rate (default 25%).
-r redundancy	If many irrelevant or redundant attributes are included, estimate the ratio of attributes to "sensible" attributes (default 1).