Ignore:
Timestamp:
02/07/12 19:56:08 (2 years ago)
Author:
blaz <blaz.zupan@…>
Branch:
default
Message:

Data discretization polished.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.data.discretization.rst

    r9943 r9963  
    11.. py:currentmodule:: Orange.data.discretization 
    22 
    3 ################################### 
     3######################################## 
    44Data discretization (``discretization``) 
    5 ################################### 
     5######################################## 
    66 
    77.. index:: discretization 
     
    1010   single: data; discretization 
    1111 
    12 Continues features in the data can be discretized using a uniform discretization method. The approach will consider 
    13 only continues features, and replace them in the data set with corresponding categorical features: 
     12Continues features in the data can be discretized using a uniform discretization method. Discretization considers 
     13only continues features, and replaces them in the new data set with corresponding categorical features: 
    1414 
    1515.. literalinclude:: code/discretization-table.py 
    1616 
    17 Discretization introduces new categorical features and computes their values in accordance to 
    18 a discretization method:: 
     17Discretization introduces new categorical features with discretized values:: 
    1918 
    2019    Original data set: 
     
    2827    ['<=5.45', '>3.15', '<=2.45', '<=0.80', 'Iris-setosa'] 
    2928 
    30 The procedure uses feature discretization classes as defined in :doc:`Orange.feature.discretization` and applies them 
    31 on entire data set. The suported discretization methods are: 
     29Data discretization uses feature discretization classes from :doc:`Orange.feature 
     30.discretization` and applies them on entire data set. The suported discretization methods are: 
    3231 
    3332* equal width discretization, where the domain of continuous feature is split to intervals of the same 
     
    4342.. FIXME give a corresponding class for fixed discretization 
    4443 
    45 The above script used the default discretization method (equal frequency with three intervals). This can be 
    46 changed while some selected discretization approach as demonstrated below: 
     44Default discretization method (equal frequency with three intervals) can be replaced with other 
     45discretization approaches as demonstrated below: 
    4746 
    4847.. literalinclude:: code/discretization-table-method.py 
    4948    :lines: 3-5 
     49 
     50Entropy-based discretization is special as it may infer new features that are constant and have only one value. Such 
     51features are redundant and provide no information about the class are. By default, 
     52:class:`DiscretizeTable` would remove them, a way performing feature subset selection. The effect of removal of 
     53non-informative features is also demonstrated in the following script: 
     54 
     55.. literalinclude:: code/discretization-entropy.py 
     56    :lines: 3- 
     57 
     58In the sampled dat set above three features were discretized to a constant and thus removed:: 
     59 
     60    Redundant features (3 of 13): 
     61    cholesterol, rest SBP, age 
     62 
     63.. note:: 
     64    Entropy-based and bi-modal discretization require class-labeled data sets. 
    5065 
    5166Data discretization classes 
Note: See TracChangeset for help on using the changeset viewer.