References:
- P. Winston, 1992.
C4.5 is a software extension of the basic ID3 algorithm designed by Quinlan to address the following issues not dealt with by ID3:
- Avoiding overfitting the data
- Determining how deeply to grow a decision tree.
- Reduced error pruning.
- Rule post-pruning.
- Handling continuous attributes.
- e.g., temperature
- Choosing an appropriate attribute selection measure.
- Handling training data with missing attribute values.
- Handling attributes with differing costs.
- Improving computational efficiency.
It is installed for use on Grendel (grendel.icd.uregina.ca), but it may be set up on a local machine as follows:
C4.5 Release 8 Installation Instructions for UNIX
- Download the C4.5 source code.
- Decompress the archive:
- Type "tar xvzf c4.5r8.tar" (not universally supported), or, alternatively,
- Type "gunzip c4.5r8.tar.gz" to decompress the gzip archive, and then
Type "tar xvf c4.5r8.tar" to decompress the tar archive.
- Change to ./R8/Src
- Type "make all" to compile the executables.
- Put the executables into a "bin" subdirectory and include it in the path for command-line usage.
Manual Pages
- c4.5: using the c4.5 decision tree generator.
- verbose c4.5: interpreting output generated by c4.5.
- c4.5rules: using the c4.5 rule generator.
- verbose c4.5rules: interpreting output generated by c4.5rules.
Examples
Click on the links below for examples of C4.5 usage:
- Example 1 - Golf
- A simple, detailed example of how C4.5 and C4.5rules work.
- Example 2 - Sunburn
- The sunburn example revisited.
- Example 3 - Homonyms
- Advanced usage of, and a practical application of, C4.5 and C4.5rules.