Changeset 7596:b394dbbbf200 in orange


Ignore:
Timestamp:
02/05/11 00:40:21 (3 years ago)
Author:
matija <matija.polajnar@…>
Branch:
default
Convert:
e7e928bca3f33077384e8259b455cddcb495540c
Message:

Corrections to Orange.data.feature documentation.

Location:
orange
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/data/feature.py

    r7379 r7596  
    88features are stored in descriptors contained in this module. 
    99 
    10 Feature Descriptors 
     10Feature descriptors 
    1111------------------- 
    1212 
    13 Feature descriptors can be constructed directly, using constructors, or by a 
    14 factory function :obj:`make`, which either retrieves an existing descriptor or 
    15 constructs a new one. 
     13Feature descriptors can be constructed directly, using constructors and passing 
     14attributes as parameters, or by a factory function 
     15:func:`Orange.data.feature.make`, which either retrieves an existing descriptor 
     16or constructs a new one. 
    1617 
    1718.. class:: Feature 
     
    3536    .. attribute:: getValueFrom 
    3637 
    37         A function (an instance of :obj:`Orange.core.Clasifier`) which computes a 
    38         value of the feature from values of one or more other features. This is 
    39         used, for instance, in discretization where the features describing the  
    40         discretized feature are computed from the original feature.  
     38        A function (an instance of :obj:`Orange.core.Clasifier`) which computes 
     39        a value of the feature from values of one or more other features. This 
     40        is used, for instance, in discretization where the features describing 
     41        the discretized feature are computed from the original feature.  
    4142 
    4243    .. attribute:: ordered 
     
    7576    .. method:: randomvalue() 
    7677 
    77            Return a random value of the feature 
     78           Return a random value of the feature. 
    7879        
    7980           :rtype: :class:`Orange.data.Value` 
     
    8990.. _discrete: 
    9091.. class:: Discrete 
     92 
     93    Bases: :class:`Feature` 
    9194    
    9295    Descriptor for discrete features. 
     
    9598     
    9699        A list with symbolic names for feature's values. Values are stored as 
    97         indices referring to this list. Therefore, modifying this list instantly 
    98         changes (symbolic) names of values as they are printed out or referred to 
    99         by user. 
     100        indices referring to this list. Therefore, modifying this list  
     101        instantly changes (symbolic) names of values as they are printed out or 
     102        referred to by user. 
    100103     
    101104        .. note:: 
     
    106109            really recommendable. Also, do not add values to the list by 
    107110            calling its append or extend method: call the :obj:`addValue` 
    108             method instead described below. 
     111            method instead. 
    109112 
    110113            It is also assumed that this attribute is always defined (but can 
     
    114117 
    115118            Stores the base value for the feature as an index into `values`. 
    116             This can be, for instance a "normal" value, such as "no 
     119            This can be, for instance, a "normal" value, such as "no 
    117120            complications" as opposed to abnormal "low blood pressure". The 
    118121            base value is used by certain statistics, continuization etc. 
     
    122125    .. method:: addValue 
    123126     
    124             Adds a value to values. Always call this function instead of 
     127            Add a value to values. Always call this function instead of 
    125128            appending to values. 
    126129 
     
    128131.. class:: Continuous 
    129132 
     133    Bases: :class:`Feature` 
     134 
    130135    Descriptor for continuous features. 
    131136     
     
    133138     
    134139        The number of decimals used when the value is printed out, converted to 
    135         a string or saved to a file  
     140        a string or saved to a file. 
    136141     
    137142    .. attribute:: scientificFormat 
    138143     
    139         If ``True``, the value is printed in scientific format whenever it would 
    140         have more than 5 digits. In this case, `numberOfDecimals` is ignored. 
     144        If ``True``, the value is printed in scientific format whenever it 
     145        would have more than 5 digits. In this case, `numberOfDecimals` is 
     146        ignored. 
    141147 
    142148    .. attribute:: adjustDecimals 
     
    169175.. class:: String 
    170176 
     177    Bases: :class:`Feature` 
     178 
    171179    Descriptor for features that contains strings. No method can use them for  
    172     learning; some will complain and other will silently ignore them when the  
     180    learning; some will complain and other will silently ignore them when they  
    173181    encounter them. They can be, however, useful for meta-attributes; if  
    174     instance in dataset have unique id's, the most efficient way to store them  
     182    instances in dataset have unique id's, the most efficient way to store them  
    175183    is to read them as meta-attributes. In general, never use discrete  
    176184    attributes with many (say, more than 50) values. Such attributes are  
    177     probably not of any use for learning and should be stored as string attributes. 
     185    probably not of any use for learning and should be stored as string 
     186    attributes. 
    178187 
    179188    When converting strings into values and back, empty strings are treated  
    180189    differently than usual. For other types, an empty string can be used to 
    181     denote undefined values, while :obj:`StringVariable` will take empty string as 
    182     an empty string -- that is, except when loading or saving into file. Empty 
    183     strings in files are interpreted as undefined; to specify an empty string, 
    184     enclose the string into double quotes; these get removed when the string is 
    185     loaded. 
     190    denote undefined values, while :obj:`StringVariable` will take empty string 
     191    as an empty string -- that is, except when loading or saving into file. 
     192    Empty strings in files are interpreted as undefined; to specify an empty 
     193    string, enclose the string into double quotes; these get removed when the 
     194    string is loaded. 
    186195 
    187196.. _Python: 
    188197.. class:: Python 
     198 
     199    Bases: :class:`Feature` 
    189200 
    190201    Base class for descriptors defined in Python. It is fully functional, 
     
    199210Values of features are often computed from other features, such as in 
    200211discretization. The mechanism described below usually occurs behind the scenes, 
    201 so understanding it required only for implementing specific transformations. 
    202  
    203 Monk 1 is a well-known dataset with target concept ``y := a==b`` or ``e==1``. 
    204 It can help the learning algorithm if the four-valued  
    205 attribute ``e`` with a binary attribute having values `"1"` and `"not 1"`. The 
     212so understanding it is required only for implementing specific transformations. 
     213 
     214Monk 1 is a well-known dataset with target concept ``y := a==b or e==1``. 
     215It can help the learning algorithm if the four-valued attribute ``e`` is 
     216replaced with a binary attribute having values `"1"` and `"not 1"`. The 
    206217new feature will be computed from the old one on the fly.  
    207218 
     
    219230don't care about here. If the instance's ``e`` equals ``1``, the function  
    220231returns value ``1``, otherwise it returns ``not 1``. Both are returned as  
    221 values, not plain strings .  
     232values, not plain strings. 
    222233 
    223234In most circumstances, value of ``e2`` can be computed on the fly - we can  
     
    234245to a new :obj:`Orange.data.Table`:: 
    235246 
    236     newDomain = orange.Domain([data.domain["a"], data.domain["b"], e2, data.domain.classVar]) 
    237     newData = orange.ExampleTable(newDomain, data)  
     247    newDomain = Orange.data.Domain([data.domain["a"], data.domain["b"], e2, data.domain.classVar]) 
     248    newData = Orange.data.Table(newDomain, data)  
    238249 
    239250Automatic computation is useful when the data is split onto training and  
     
    249260    :lines: 24- 
    250261 
    251 Reuse of Descriptors 
     262Reuse of descriptors 
    252263-------------------- 
    253264 
     
    260271Orange checks whether an appropriate descriptor (with the same name and, in case 
    261272of discrete features, also values) already exists and reuses it. When new 
    262 descriptors are constructed by explicitly calling the above descriptors, this 
     273descriptors are constructed by explicitly calling the above constructors, this 
    263274always creates new descriptors and thus new features, although the feature with 
    264275the same name may already exist. 
     
    283294    does not matter. The formal rule is thus that the values are compatible if ``existing_values[:len(ordered_values)] == ordered_values[:len(existing_values)]``. 
    284295 
    285 orange.data.feature.MakeStatus.NoRecognizedValues (2) 
     296Orange.data.feature.Feature.MakeStatus.NoRecognizedValues (2) 
    286297    There is a matching feature, yet it has none of the values that the new 
    287298    feature will have (this is obviously possible only if the new attribute has 
     
    294305    some from the old. 
    295306 
    296 Orange.data.feature.MakeStatus.MissingValues (1) 
    297     there is a matching feature with some of the values that the new one  
     307Orange.data.feature.Feature.MakeStatus.MissingValues (1) 
     308    There is a matching feature with some of the values that the new one  
    298309    requires, but some values are missing. This situation is neither uncommon  
    299310    nor suspicious: in case of separate training and testing data sets there may 
    300311    be values which occur in one set but not in the other. 
    301312 
    302 Orange.data.feature.MakeStatus.OK (0) 
    303     There is a perfect metch which contains all the prescribed values in the 
     313Orange.data.feature.Feature.MakeStatus.OK (0) 
     314    There is a perfect match which contains all the prescribed values in the 
    304315    correct order. The existing attribute may have some extra values, though. 
    305316 
    306 Continuous attributes can obviously have only two statuses, ``NotFound`` or ``OK``. 
    307  
    308 When loading the data using :obj:``Orange.data.Table``, Orange takes the safest  
     317Continuous attributes can obviously have only two statuses, ``NotFound`` or 
     318``OK``. 
     319 
     320When loading the data using :obj:`Orange.data.Table`, Orange takes the safest  
    309321approach and, by default, reuses everything that is compatible, that is, up to  
    310322and including ``NoRecognizedValues``. Unintended reuse would be obvious from the 
    311323feature having too many values, which the user can notice and fix. More on that  
    312 in the page on `loading data`. 
     324in the page on `loading data`. !!TODO!! 
    313325 
    314326There are two functions for reusing the attributes instead of creating new ones. 
    315327 
    316 .. function:: Orange.data.feature.make(name, type, ordered_values, onordered_values[, createNewOn]) 
     328.. function:: Orange.data.feature.make(name, type, ordered_values, unordered_values[, createNewOn]) 
    317329 
    318330    Find and return an existing feature or create a new one if none existing 
     
    320332     
    321333    The optional `createOnNew` specifies the status at which a new feature is 
    322     created. The status must be at most ``Incompatible`` since incompatible (or non-existing) features cannot be reused. If it is set lower, for instance  
     334    created. The status must be at most ``Incompatible`` since incompatible (or 
     335    non-existing) features cannot be reused. If it is set lower, for instance  
    323336    to ``MissingValues``, a new feature is created even if there exists 
    324337    a feature which only misses same values. If set to ``OK``, the function 
     
    326339     
    327340    The function returns a tuple containing a feature descriptor and the 
    328     status of the best matching feature. So, if ``createOnNew`` is set to ``MissingValues``, and there exists a feature whose status is, say, 
     341    status of the best matching feature. So, if ``createOnNew`` is set to 
     342    ``MissingValues``, and there exists a feature whose status is, say, 
    329343    ``UnrecognizedValues``, a feature would be created, while the second  
    330344    element of the tuple would contain ``UnrecognizedValues``. If, on the other 
    331345    hand, there exists a feature which is perfectly OK, its descriptor is  
    332     returned and the returned status is <code>OK</code>. The function returns no  
     346    returned and the returned status is ``OK``. The function returns no  
    333347    indicator whether the returned feature is reused or not. This can be, 
    334348    however, read from the status code: if it is smaller than the specified 
     
    343357    :type type: Orange.data.feature.Type 
    344358    :param ordered_values: a list of ordered values 
    345     :param unordered_values: a list of values, for which the order does not matter 
    346     :param createNewOn: gives condition for constructing a new feature instead of using the new one 
     359    :param unordered_values: a list of values, for which the order does not 
     360        matter 
     361    :param createNewOn: gives condition for constructing a new feature instead 
     362        of using the new one 
     363     
     364    :return_type: a tuple (:class:`Orange.data.feature.Feature`, int) 
    347365     
    348366.. function:: Orange.data.feature.retrieve(name, type, ordered_values, onordered_values[, createNewOn]) 
     
    350368    Find and return an existing feature, or ``None`` if no match is found. 
    351369     
    352     :param name: Feature name 
    353     :param type: Feature type 
     370    :param name: feature name. 
     371    :param type: feature type. 
    354372    :type type: Orange.data.feature.Type 
    355373    :param ordered_values: a list of ordered values 
    356     :param unordered_values: a list of values, for which the order does not matter 
    357     :param createNewOn: gives condition for constructing a new feature instead of using the new one 
     374    :param unordered_values: a list of values, for which the order does not 
     375        matter 
     376    :param createNewOn: gives condition for constructing a new feature instead 
     377        of using the new one 
     378 
     379    :return_type: :class:`Orange.data.feature.Feature` 
    358380     
    359381.. _`feature-reuse.py`: code/feature-reuse.py 
    360382 
    361 These following examples (from `feature-reuse.py`_) give the shown results if executed only once (in a Python session) and in this order. 
    362  
    363 :py:func:`make` can be used for construction of new features.:: 
     383These following examples (from `feature-reuse.py`_) give the shown results if 
     384executed only once (in a Python session) and in this order. 
     385 
     386:func:`Orange.data.feature.make` can be used for construction of new features. :: 
    364387     
    365388    >>> v1, s = Orange.data.feature.make("a", Orange.data.Type.Discrete, ["a", "b"]) 
     
    367390    4 <a, b> 
    368391 
    369 No surprises here: new feature is created and the status is ``NotFound``.:: 
    370  
    371     >>> v2, s = Orange.data.feature.make("a", orange.data.Type.Discrete, ["a"], ["c"]) 
     392No surprises here: new feature is created and the status is ``NotFound``. :: 
     393 
     394    >>> v2, s = Orange.data.feature.make("a", Orange.data.Type.Discrete, ["a"], ["c"]) 
    372395    >>> print s, v2 is v1, v1.values 
    373396    1 True <a, b, c> 
     
    375398The status is 1 (``MissingValues``), yet the feature is reused (``v2 is v1``). 
    376399``v1`` gets a new value, ``"c"``, which was given as an unordered value. It does 
    377 not matter that the new variable does not need value ``b``.:: 
    378  
    379     >>> v3, s = Orange.data.feature.make("a", orange.data.Type.Discrete, ["a", "b", "c", "d"]) 
     400not matter that the new variable does not need value ``b``. :: 
     401 
     402    >>> v3, s = Orange.data.feature.make("a", Orange.data.Type.Discrete, ["a", "b", "c", "d"]) 
    380403    >>> print s, v3 is v1, v1.values 
    381404    1 True <a, b, c, d> 
    382405 
    383 This is similar as before, except that the new value, <code>d</code> is not among the ordered values.:: 
    384  
    385     >>> v4, s = Orange.data.feature.make("a", orange.data.Type.Discrete, ["b"]) 
     406This is similar as before, except that the new value, ``d`` is not among the 
     407ordered values. :: 
     408 
     409    >>> v4, s = Orange.data.feature.make("a", Orange.data.Type.Discrete, ["b"]) 
    386410    >>> print s, v4 is v1, v1.values, v4.values 
    387411    3, False, <b>, <a, b, c, d> 
     
    389413The new feature needs to have ``b`` as the first value, so it is incompatible  
    390414with the existing features. The status is thus 3 (``Incompatible``), the two  
    391 features are not equal and have different lists of values.:: 
    392  
    393     >>> v5, s = Orange.data.feature.make("a", orange.data.Type.Discrete, None, ["c", "a"]) 
     415features are not equal and have different lists of values. :: 
     416 
     417    >>> v5, s = Orange.data.feature.make("a", Orange.data.Type.Discrete, None, ["c", "a"]) 
    394418    >>> print s, v5 is v1, v1.values, v5.values 
    395419    0 True <a, b, c, d> <a, b, c, d> 
    396420 
    397421The new feature has values ``c`` and ``a``, but does not 
    398 mind about the order, so the existing attribute is ``OK``.:: 
    399  
    400     >>> v6, s = Orange.data.feature.make("a", orange.data.Type.Discrete, None, ["e"]) "a"]) 
     422mind about the order, so the existing attribute is ``OK``. :: 
     423 
     424    >>> v6, s = Orange.data.feature.make("a", Orange.data.Type.Discrete, None, ["e"]) "a"]) 
    401425    >>> print s, v6 is v1, v1.values, v6.values 
    402426    2 True <a, b, c, d, e> <a, b, c, d, e> 
    403427 
    404 The new feature has different values than the existing (status is 2, ``NoRecognizedValues``), but the existing is reused nevertheless. Note that we 
     428The new feature has different values than the existing (status is 2, 
     429``NoRecognizedValues``), but the existing is reused nevertheless. Note that we 
    405430gave ``e`` in the list of unordered values. If it was among the ordered, the 
    406 reuse would fail.:: 
    407  
    408     >>> v7, s = Orange.data.feature.make("a", orange.data.Type.Discrete, None, 
     431reuse would fail. :: 
     432 
     433    >>> v7, s = Orange.data.feature.make("a", Orange.data.Type.Discrete, None, 
    409434            ["f"], Orange.data.feature.make.MakeStatus.NoRecognizedValues))) 
    410435    >>> print s, v7 is v1, v1.values, v7.values 
     
    415440the same as before:: 
    416441 
    417     >>> v8, s = Orange.data.feature.make("a", orange.data.Type.Discrete, 
    418             ["a", "b", "c", "d", "e"], None, Orange.data.feature.MakeStatus.OK) 
     442    >>> v8, s = Orange.data.feature.make("a", Orange.data.Type.Discrete, 
     443            ["a", "b", "c", "d", "e"], None, Orange.data.feature.Feature.MakeStatus.OK) 
    419444    >>> print s, v8 is v1, v1.values, v8.values 
    420445    0 False <a, b, c, d, e> <a, b, c, d, e> 
  • orange/doc/Orange/rst/code/feature-getValueFrom.py

    r7338 r7596  
    2828 
    2929# Convert the training set to a new domain 
    30 newDomain = orange.Domain([data.domain["a"], data.domain["b"], e2, data.domain.classVar]) 
    31 newTrain = orange.ExampleTable(newDomain, trainData) 
     30newDomain = Orange.data.Domain([data.domain["a"], data.domain["b"], e2, data.domain.classVar]) 
     31newTrain = Orange.data.Table(newDomain, trainData) 
    3232 
    3333# Construct a tree and classify unmodified instances 
    34 tree = orange.TreeLearner(newTrain) 
     34tree = Orange.core.TreeLearner(newTrain) 
    3535for ex in testData[:10]: 
    3636    print ex.getclass(), tree(ex) 
Note: See TracChangeset for help on using the changeset viewer.