Changeset 10169:1545ac019203 in orange


Ignore:
Timestamp:
02/11/12 22:39:06 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Message:

Fixed titles in Orange.feature.descriptor

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.feature.descriptor.rst

    r9936 r10169  
    11.. py:currentmodule:: Orange.feature 
    22 
    3 =========================== 
    4 Descriptor (``Descriptor``) 
    5 =========================== 
     3========== 
     4Descriptor 
     5========== 
    66 
    77Data instances in Orange can contain several types of variables: 
    88:ref:`discrete <discrete>`, :ref:`continuous <continuous>`, 
    9 :ref:`strings <string>`, and :ref:`Python <Python>` and types derived from it. 
    10 The latter represent arbitrary Python objects. 
    11 The names, types, values (where applicable), functions for computing the 
    12 variable value from values of other variables, and other properties of the 
    13 variables are stored in descriptor classes derived from :obj:`Descriptor`. 
     9:ref:`strings <string>`, and :ref:`Python <Python>` and types derived 
     10from it.  The latter represent arbitrary Python objects.  The names, 
     11types, values (where applicable), functions for computing the variable 
     12value from values of other variables, and other properties of the 
     13variables are stored in descriptor classes derived from 
     14:obj:`Descriptor`. 
    1415 
    1516Orange considers two variables (e.g. in two different data tables) the 
     
    1819 
    1920Descriptors can be constructed either by calling the corresponding 
    20 constructors or by a factory function :func:`make`, which either retrieves 
    21 an existing descriptor or constructs a new one. 
     21constructors or by a factory function :func:`make`, which either 
     22retrieves an existing descriptor or constructs a new one. 
    2223 
    2324.. class:: Descriptor 
     
    3738    .. attribute:: get_value_from 
    3839 
    39         A function (an instance of :obj:`~Orange.classification.Classifier`) 
    40         that computes a value of the variable from values of one or more 
    41         other variables. This is used, for instance, in discretization, 
     40        A function (an instance of 
     41        :obj:`~Orange.classification.Classifier`) that computes a 
     42        value of the variable from values of one or more other 
     43        variables. This is used, for instance, in discretization, 
    4244        which computes the value of a discretized variable from the 
    4345        original continuous variable. 
     
    4547    .. attribute:: ordered 
    4648 
    47         A flag telling whether the values of a discrete variable are ordered. At 
    48         the moment, no built-in method treats ordinal variables differently than 
    49         nominal ones. 
     49        A flag telling whether the values of a discrete variable are 
     50        ordered. At the moment, no built-in method treats ordinal 
     51        variables differently than nominal ones. 
    5052 
    5153    .. attribute:: random_generator 
     
    5658    .. attribute:: default_meta_id 
    5759 
    58         A proposed (but not guaranteed) meta id to be used for that variable. 
    59         For instance, when a tab-delimited contains meta attributes and 
    60         the existing variables are reused, they will have this id 
    61         (instead of a new one assigned by :obj:`Orange.feature.Descriptor.new_meta_id()`). 
     60        A proposed (but not guaranteed) meta id to be used for that 
     61        variable.  For instance, when a tab-delimited contains meta 
     62        attributes and the existing variables are reused, they will 
     63        have this id (instead of a new one assigned by 
     64        :obj:`Orange.feature.Descriptor.new_meta_id()`). 
    6265 
    6366    .. attribute:: attributes 
    6467 
    65         A dictionary which allows the user to store additional information 
    66         about the variable. All values should be strings. See the section 
    67         about :ref:`storing additional information <attributes>`. 
     68        A dictionary which allows the user to store additional 
     69        information about the variable. All values should be 
     70        strings. See the section about :ref:`storing additional 
     71        information <attributes>`. 
    6872 
    6973    .. method:: __call__(obj) 
    7074 
    71            Convert a string, number, or other suitable object into a variable 
    72            value. 
     75           Convert a string, number, or other suitable object into a 
     76           variable value. 
    7377 
    7478           :param obj: An object to be converted into a variable value 
     
    8488    .. method:: compute_value(inst) 
    8589 
    86            Compute the value of the variable given the instance by calling 
    87            obj:`~Descriptor.get_value_from` through a mechanism that 
    88            prevents infinite recursive calls. 
     90           Compute the value of the variable given the instance by 
     91           calling obj:`~Descriptor.get_value_from` through a 
     92           mechanism that prevents infinite recursive calls. 
    8993 
    9094           :rtype: :class:`Orange.data.Value` 
    9195 
    9296 
    93 ``Discrete`` 
    94 ------------ 
     97Discrete variables 
     98------------------ 
    9599 
    96100.. _discrete: 
     
    133137            this function instead of appending to ``values``. 
    134138 
    135 ``Continuous`` 
    136 -------------- 
     139Continuous variables 
     140-------------------- 
    137141 
    138142.. _continuous: 
     
    184188        The range used for :obj:`randomvalue`. 
    185189 
    186 ``String`` 
    187 ---------- 
     190String variables 
     191---------------- 
    188192 
    189193.. _String: 
     
    210214    string is loaded. 
    211215 
    212 ``Python`` 
    213 ---------- 
     216Python objects as variables 
     217--------------------------- 
    214218 
    215219.. _Python: 
     
    271275.. data:: Descriptor.MakeStatus.Incompatible (3) 
    272276 
    273     There are variables with matching name and type, but their 
    274     values are incompatible with the prescribed ordered values. For example, 
    275     if the existing variable already has values ["a", "b"] and the new one 
    276     wants ["b", "a"], the old variable cannot be reused. The existing list can, 
    277     however be appended with the new values, so searching for ["a", "b", "c"] would 
    278     succeed. Likewise a search for ["a"] would be successful, since the extra existing value 
    279     does not matter. The formal rule is thus that the values are compatible iff ``existing_values[:len(ordered_values)] == ordered_values[:len(existing_values)]``. 
     277    There are variables with matching name and type, but their values 
     278    are incompatible with the prescribed ordered values. For example, 
     279    if the existing variable already has values ["a", "b"] and the new 
     280    one wants ["b", "a"], the old variable cannot be reused. The 
     281    existing list can, however be appended with the new values, so 
     282    searching for ["a", "b", "c"] would succeed. Likewise a search for 
     283    ["a"] would be successful, since the extra existing value does not 
     284    matter. The formal rule is thus that the values are compatible iff 
     285    ``existing_values[:len(ordered_values)] == 
     286    ordered_values[:len(existing_values)]``. 
    280287 
    281288.. data:: Descriptor.MakeStatus.NoRecognizedValues (2) 
    282289 
    283     There is a matching variable, yet it has none of the values that the new 
    284     variable will have (this is obviously possible only if the new variable has 
    285     no prescribed ordered values). For instance, we search for a variable 
    286     "sex" with values "male" and "female", while there is a variable of the same 
    287     name with values "M" and "F" (or, well, "no" and "yes" :). Reuse of this 
    288     variable is possible, though this should probably be a new variable since it 
    289     obviously comes from a different data set. If we do decide to reuse the variable, the 
    290     old variable will get some unneeded new values and the new one will inherit 
    291     some from the old. 
     290    There is a matching variable, yet it has none of the values that 
     291    the new variable will have (this is obviously possible only if the 
     292    new variable has no prescribed ordered values). For instance, we 
     293    search for a variable "sex" with values "male" and "female", while 
     294    there is a variable of the same name with values "M" and "F" (or, 
     295    well, "no" and "yes" :). Reuse of this variable is possible, 
     296    though this should probably be a new variable since it obviously 
     297    comes from a different data set. If we do decide to reuse the 
     298    variable, the old variable will get some unneeded new values and 
     299    the new one will inherit some from the old. 
    292300 
    293301.. data:: Descriptor.MakeStatus.MissingValues (1) 
    294302 
    295     There is a matching variable with some of the values that the new one 
    296     requires, but some values are missing. This situation is neither uncommon 
    297     nor suspicious: in case of separate training and testing data sets there may 
    298     be values which occur in one set but not in the other. 
     303    There is a matching variable with some of the values that the new 
     304    one requires, but some values are missing. This situation is 
     305    neither uncommon nor suspicious: in case of separate training and 
     306    testing data sets there may be values which occur in one set but 
     307    not in the other. 
    299308 
    300309.. data:: Descriptor.MakeStatus.OK (0) 
    301310 
    302     There is a perfect match which contains all the prescribed values in the 
    303     correct order. The existing variable may have some extra values, though. 
     311    There is a perfect match which contains all the prescribed values 
     312    in the correct order. The existing variable may have some extra 
     313    values, though. 
    304314 
    305315Continuous variables can obviously have only two statuses, 
    306316:obj:`~Descriptor.MakeStatus.NotFound` or :obj:`~Descriptor.MakeStatus.OK`. 
    307317 
    308 When loading the data using :obj:`Orange.data.Table`, Orange takes the safest 
    309 approach and, by default, reuses everything that is compatible up to 
    310 and including :obj:`~Descriptor.MakeStatus.NoRecognizedValues`. Unintended reuse would be obvious from the 
    311 variable having too many values, which the user can notice and fix. More on that 
    312 in the page on :doc:`Orange.data.formats`. 
    313  
    314 There are two functions for reusing the variables instead of creating new ones. 
     318When loading the data using :obj:`Orange.data.Table`, Orange takes the 
     319safest approach and, by default, reuses everything that is compatible 
     320up to and including 
     321:obj:`~Descriptor.MakeStatus.NoRecognizedValues`. Unintended reuse 
     322would be obvious from the variable having too many values, which the 
     323user can notice and fix. More on that in the page on 
     324:doc:`Orange.data.formats`. 
     325 
     326There are two functions for reusing the variables instead of creating 
     327new ones. 
    315328 
    316329.. function:: Descriptor.make(name, type, ordered_values, unordered_values[, create_new_on]) 
    317330 
    318     Find and return an existing variable or create a new one if none of the existing 
    319     variables matches the given name, type and values. 
    320  
    321     The optional `create_new_on` specifies the status at which a new variable is 
    322     created. The status must be at most :obj:`~Descriptor.MakeStatus.Incompatible` since incompatible (or 
    323     non-existing) variables cannot be reused. If it is set lower, for instance 
    324     to :obj:`~Descriptor.MakeStatus.MissingValues`, a new variable is created even if there exists 
    325     a variable which is only missing the same values. If set to :obj:`~Descriptor.MakeStatus.OK`, the function 
    326     always creates a new variable. 
    327  
    328     The function returns a tuple containing a variable descriptor and the 
    329     status of the best matching variable. So, if ``create_new_on`` is set to 
    330     :obj:`~Descriptor.MakeStatus.MissingValues`, and there exists a variable whose status is, say, 
    331     :obj:`~Descriptor.MakeStatus.NoRecognizedValues`, a variable would be created, while the second 
    332     element of the tuple would contain :obj:`~Descriptor.MakeStatus.NoRecognizedValues`. If, on the other 
    333     hand, there exists a variable which is perfectly OK, its descriptor is 
    334     returned and the returned status is :obj:`~Descriptor.MakeStatus.OK`. The function returns no 
    335     indicator whether the returned variable is reused or not. This can be, 
    336     however, read from the status code: if it is smaller than the specified 
    337     ``create_new_on``, the variable is reused, otherwise a new descriptor has been constructed. 
    338  
    339     The exception to the rule is when ``create_new_on`` is OK. In this case, the 
    340     function does not search through the existing variables and cannot know the 
    341     status, so the returned status in this case is always :obj:`~Descriptor.MakeStatus.OK`. 
     331    Find and return an existing variable or create a new one if none 
     332    of the existing variables matches the given name, type and values. 
     333 
     334    The optional `create_new_on` specifies the status at which a new 
     335    variable is created. The status must be at most 
     336    :obj:`~Descriptor.MakeStatus.Incompatible` since incompatible (or 
     337    non-existing) variables cannot be reused. If it is set lower, for 
     338    instance to :obj:`~Descriptor.MakeStatus.MissingValues`, a new 
     339    variable is created even if there exists a variable which is only 
     340    missing the same values. If set to 
     341    :obj:`~Descriptor.MakeStatus.OK`, the function always creates a 
     342    new variable. 
     343 
     344    The function returns a tuple containing a variable descriptor and 
     345    the status of the best matching variable. So, if ``create_new_on`` 
     346    is set to :obj:`~Descriptor.MakeStatus.MissingValues`, and there 
     347    exists a variable whose status is, say, 
     348    :obj:`~Descriptor.MakeStatus.NoRecognizedValues`, a variable would 
     349    be created, while the second element of the tuple would contain 
     350    :obj:`~Descriptor.MakeStatus.NoRecognizedValues`. If, on the other 
     351    hand, there exists a variable which is perfectly OK, its 
     352    descriptor is returned and the returned status is 
     353    :obj:`~Descriptor.MakeStatus.OK`. The function returns no 
     354    indicator whether the returned variable is reused or not. This can 
     355    be, however, read from the status code: if it is smaller than the 
     356    specified ``create_new_on``, the variable is reused, otherwise a 
     357    new descriptor has been constructed. 
     358 
     359    The exception to the rule is when ``create_new_on`` is OK. In this 
     360    case, the function does not search through the existing variables 
     361    and cannot know the status, so the returned status in this case is 
     362    always :obj:`~Descriptor.MakeStatus.OK`. 
    342363 
    343364    :param name: Descriptor name 
     
    376397    NotFound <a, b> 
    377398 
    378 A new variable was created and the status is :obj:`~Descriptor.MakeStatus.NotFound`. :: 
     399A new variable was created and the status is 
     400:obj:`~Descriptor.MakeStatus.NotFound`. :: 
    379401 
    380402    >>> v2, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, ["a"], ["c"]) 
     
    382404    MissingValues True <a, b, c> 
    383405 
    384 The status is :obj:`~Descriptor.MakeStatus.MissingValues`, 
    385 yet the variable is reused (``v2 is v1``). ``v1`` gets a new value, 
    386 ``"c"``, which was given as an unordered value. It does 
    387 not matter that the new variable does not need the value ``b``. :: 
     406The status is :obj:`~Descriptor.MakeStatus.MissingValues`, yet the 
     407variable is reused (``v2 is v1``). ``v1`` gets a new value, ``"c"``, 
     408which was given as an unordered value. It does not matter that the new 
     409variable does not need the value ``b``. :: 
    388410 
    389411    >>> v3, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, ["a", "b", "c", "d"]) 
     
    398420    Incompatible, False, <b>, <a, b, c, d> 
    399421 
    400 The new variable needs to have ``b`` as the first value, so it is incompatible 
    401 with the existing variables. The status is 
    402 :obj:`~Descriptor.MakeStatus.Incompatible` and 
    403 a new variable is created; the two variables are not equal and have 
    404 different lists of values. :: 
     422The new variable needs to have ``b`` as the first value, so it is 
     423incompatible with the existing variables. The status is 
     424:obj:`~Descriptor.MakeStatus.Incompatible` and a new variable is 
     425created; the two variables are not equal and have different lists of 
     426values. :: 
    405427 
    406428    >>> v5, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, None, ["c", "a"]) 
     
    408430    OK True <a, b, c, d> <a, b, c, d> 
    409431 
    410 The new variable has values ``c`` and ``a``, but the order is not important, 
    411 so the existing attribute is :obj:`~Descriptor.MakeStatus.OK`. :: 
     432The new variable has values ``c`` and ``a``, but the order is not 
     433important, so the existing attribute is 
     434:obj:`~Descriptor.MakeStatus.OK`. :: 
    412435 
    413436    >>> v6, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, None, ["e"]) "a"]) 
     
    415438    NoRecognizedValues True <a, b, c, d, e> <a, b, c, d, e> 
    416439 
    417 The new variable has different values than the existing variable (status 
    418 is :obj:`~Descriptor.MakeStatus.NoRecognizedValues`), 
    419 but the existing one is nonetheless reused. Note that we 
    420 gave ``e`` in the list of unordered values. If it was among the ordered, the 
    421 reuse would fail. :: 
     440The new variable has different values than the existing variable 
     441(status is :obj:`~Descriptor.MakeStatus.NoRecognizedValues`), but the 
     442existing one is nonetheless reused. Note that we gave ``e`` in the 
     443list of unordered values. If it was among the ordered, the reuse would 
     444fail. :: 
    422445 
    423446    >>> v7, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, None, 
     
    426449    Incompatible False <a, b, c, d, e> <f> 
    427450 
    428 This is the same as before, except that we prohibited reuse when there are no 
    429 recognized values. Hence a new variable is created, though the returned status is 
    430 the same as before:: 
     451This is the same as before, except that we prohibited reuse when there 
     452are no recognized values. Hence a new variable is created, though the 
     453returned status is the same as before:: 
    431454 
    432455    >>> v8, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, 
     
    435458    OK False <a, b, c, d, e> <a, b, c, d, e> 
    436459 
    437 Finally, this is a perfect match, but any reuse is prohibited, so a new 
    438 variable is created. 
     460Finally, this is a perfect match, but any reuse is prohibited, so a 
     461new variable is created. 
    439462 
    440463 
     
    443466--------------------------------------- 
    444467 
    445 Values of variables are often computed from other variables, such as in 
    446 discretization. The mechanism described below usually functions behind the scenes, 
    447 so understanding it is required only for implementing specific transformations. 
     468Values of variables are often computed from other variables, for 
     469instance in. The mechanism described below usually functions behind 
     470the scenes, so understanding it is required only for implementing 
     471specific transformations. 
    448472 
    449473Monk 1 is a well-known dataset with target concept ``y := a==b or e==1``. 
     
    455479    :lines: 7-17 
    456480 
    457 The new variable is named ``e2``; we define it with a descriptor of type 
    458 :obj:`Discrete`, with appropriate name and values ``"not 1"`` and ``1`` (we 
    459 chose this order so that the ``not 1``'s index is ``0``, which can be, if 
    460 needed, interpreted as ``False``). Finally, we tell e2 to use 
    461 ``checkE`` to compute its value when needed, by assigning ``checkE`` to 
    462 ``e2.get_value_from``. 
    463  
    464 ``checkE`` is a function that is passed an instance and another argument we 
    465 do not care about here. If the instance's ``e`` equals ``1``, the function 
    466 returns value ``1``, otherwise it returns ``not 1``. Both are returned as 
    467 values, not plain strings. 
    468  
    469 In most circumstances the value of ``e2`` can be computed on the fly - we can 
    470 pretend that the variable exists in the data, although it does not (but 
    471 can be computed from it). For instance, we can compute the information gain of 
    472 variable ``e2`` or its distribution without actually constructing data containing 
    473 the new variable. 
     481The new variable is named ``e2``; we define it with a descriptor of 
     482type :obj:`Discrete`, with appropriate name and values ``"not 1"`` and 
     483``1`` (we chose this order so that the ``not 1``'s index is ``0``, 
     484which can be, if needed, interpreted as ``False``). Finally, we tell 
     485e2 to use ``checkE`` to compute its value when needed, by assigning 
     486``checkE`` to ``e2.get_value_from``. 
     487 
     488``checkE`` is a function that is passed an instance and another 
     489argument we do not care about here. If the instance's ``e`` equals 
     490``1``, the function returns value ``1``, otherwise it returns ``not 
     4911``. Both are returned as values, not plain strings. 
     492 
     493In most circumstances the value of ``e2`` can be computed on the fly - 
     494we can pretend that the variable exists in the data, although it does 
     495not (but can be computed from it). For instance, we can compute the 
     496information gain of variable ``e2`` or its distribution without 
     497actually constructing data containing the new variable. 
    474498 
    475499.. literalinclude:: code/variable-get_value_from.py 
    476500    :lines: 19-22 
    477501 
    478 There are methods which cannot compute values on the fly because it would be 
    479 too complex or time consuming. In such cases, the data need to be converted 
    480 to a new :obj:`Orange.data.Table`:: 
     502There are methods which cannot compute values on the fly because it 
     503would be too complex or time consuming. In such cases, the data need 
     504to be converted to a new :obj:`Orange.data.Table`:: 
    481505 
    482506    new_domain = Orange.data.Domain([data.domain["a"], data.domain["b"], e2, data.domain.class_var]) 
    483507    new_data = Orange.data.Table(new_domain, data) 
    484508 
    485 Automatic computation is useful when the data is split into training and 
    486 testing examples. Training instances can be modified by adding, removing 
    487 and transforming variables (in a typical setup, continuous variables 
    488 are discretized prior to learning, therefore the original variables are 
    489 replaced by new ones). Test instances, on the other hand, are left as they 
    490 are. When they are classified, the classifier automatically converts the 
    491 testing instances into the new domain, which includes recomputation of 
    492 transformed variables. 
     509Automatic computation is useful when the data is split into training 
     510and testing examples. Training instances can be modified by adding, 
     511removing and transforming variables (in a typical setup, continuous 
     512variables are discretized prior to learning, therefore the original 
     513variables are replaced by new ones). Test instances, on the other 
     514hand, are left as they are. When they are classified, the classifier 
     515automatically converts the testing instances into the new domain, 
     516which includes recomputation of transformed variables. 
    493517 
    494518.. literalinclude:: code/variable-get_value_from.py 
Note: See TracChangeset for help on using the changeset viewer.