# source:orange/docs/reference/rst/Orange.data.discretization.rst@9943:364085431ea7

Revision 9943:364085431ea7, 2.9 KB checked in by blaz <blaz.zupan@…>, 2 years ago (diff)

updates for discretization rst

Line
1.. py:currentmodule:: Orange.data.discretization
2
3###################################
4Data discretization (``discretization``)
5###################################
6
7.. index:: discretization
8
9.. index::
10   single: data; discretization
11
12Continues features in the data can be discretized using a uniform discretization method. The approach will consider
13only continues features, and replace them in the data set with corresponding categorical features:
14
15.. literalinclude:: code/discretization-table.py
16
17Discretization introduces new categorical features and computes their values in accordance to
18a discretization method::
19
20    Original data set:
21    [5.1, 3.5, 1.4, 0.2, 'Iris-setosa']
22    [4.9, 3.0, 1.4, 0.2, 'Iris-setosa']
23    [4.7, 3.2, 1.3, 0.2, 'Iris-setosa']
24
25    Discretized data set:
26    ['<=5.45', '>3.15', '<=2.45', '<=0.80', 'Iris-setosa']
27    ['<=5.45', '(2.85, 3.15]', '<=2.45', '<=0.80', 'Iris-setosa']
28    ['<=5.45', '>3.15', '<=2.45', '<=0.80', 'Iris-setosa']
29
30The procedure uses feature discretization classes as defined in :doc:`Orange.feature.discretization` and applies them
31on entire data set. The suported discretization methods are:
32
33* equal width discretization, where the domain of continuous feature is split to intervals of the same
34  width equal-sized intervals (uses :class:`Orange.feature.discretization.EqualWidth`),
35* equal frequency discretization, where each intervals contains equal number of data instances (uses
36  :class:`Orange.feature.discretization.EqualFreq`),
37* entropy-based, as originally proposed by [FayyadIrani1993]_ that infers the intervals to minimize
38  within-interval entropy of class distributions (uses :class:`Orange.feature.discretization.Entropy`),
39* bi-modal, using three intervals to optimize the difference of the class distribution in
40  the middle with the distribution outside it (uses :class:`Orange.feature.discretization.BiModal`),
41* fixed, with the user-defined cut-off points.
42
43.. FIXME give a corresponding class for fixed discretization
44
45The above script used the default discretization method (equal frequency with three intervals). This can be
46changed while some selected discretization approach as demonstrated below:
47
48.. literalinclude:: code/discretization-table-method.py
49    :lines: 3-5
50
51Data discretization classes
52===========================
53
54.. .. autoclass:: Orange.feature.discretization.DiscretizedLearner_Class
55
56.. autoclass:: DiscretizeTable
57
58.. A chapter on `feature subset selection <../ofb/o_fss.htm>`_ in Orange
59   for Beginners tutorial shows the use of DiscretizedLearner. Other
60   discretization classes from core Orange are listed in chapter on
61   `categorization <../ofb/o_categorization.htm>`_ of the same tutorial. -> should put in classification/wrappers
62
63.. [FayyadIrani1993] UM Fayyad and KB Irani. Multi-interval discretization of continuous valued
64  attributes for classification learning. In Proc. 13th International Joint Conference on Artificial Intelligence, pages
65  1022--1029, Chambery, France, 1993.
Note: See TracBrowser for help on using the repository browser.