Changeset 7927:2c3b04baa4ee in orange


Ignore:
Timestamp:
05/22/11 11:29:14 (3 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Convert:
03de47fd5c3cc4c05afe0b70abda36b053e8b3a5
Message:
 
File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/data/variable.py

    r7891 r7927  
    1111-------------------- 
    1212 
    13 Variable descriptors can be constructed directly, using constructors and passing 
    14 attributes as parameters, or by a factory function 
    15 :func:`Orange.data.variable.make`, which either retrieves an existing descriptor 
    16 or constructs a new one. 
     13Variable descriptors can be constructed either directly, using  
     14constructors and passing attributes as parameters, or by a  
     15factory function :func:`Orange.data.variable.make`, which either  
     16retrieves an existing descriptor or constructs a new one. 
    1717 
    1818.. class:: Variable 
     
    2525        variables are considered the same only if they have the same descriptor 
    2626        (e.g. even multiple variables in the same table can have the same name). 
    27         This should however be avoided since it may result in unpredictable 
    28         behaviour. 
     27        This should, however, be avoided since it may result in unpredictable 
     28        behavior. 
    2929     
    3030    .. attribute:: var_type 
     
    4444     
    4545        A flag telling whether the values of a discrete variable are ordered. At 
    46         the moment, no builtin method treats ordinal variables differently than 
    47         nominal. 
     46        the moment, no built-in method treats ordinal variables differently than 
     47        nominal ones. 
    4848     
    4949    .. attribute:: distributed 
    5050     
    51         A flag telling whether the values of this variables are distributions. 
    52         As for flag ordered, no methods treat such variables in any special 
     51        A flag telling whether the values of the variables are distributions. 
     52        As for the flag ordered, no methods treat such variables in any special 
    5353        manner. 
    5454     
     
    7373    .. method:: __call__(obj) 
    7474     
    75            Convert a string, number or other suitable object into a variable 
     75           Convert a string, number, or other suitable object into a variable 
    7676           value. 
    7777            
     
    8282    .. method:: randomvalue() 
    8383 
    84            Return a random value of the variable. 
     84           Return a random value for the variable. 
    8585        
    8686           :rtype: :class:`Orange.data.Value` 
     
    103103    .. attribute:: values 
    104104     
    105         A list with symbolic names for variable's values. Values are stored as 
     105        A list with symbolic names for variables' values. Values are stored as 
    106106        indices referring to this list. Therefore, modifying this list  
    107         instantly changes (symbolic) names of values as they are printed out or 
     107        instantly changes the (symbolic) names of values as they are printed out or 
    108108        referred to by user. 
    109109     
     
    111111         
    112112            The size of the list is also used to indicate the number of 
    113             possible values for this variable. Changing the size, especially 
    114             shrinking the list can have disastrous effects and is therefore not 
    115             really recommendable. Also, do not add values to the list by 
     113            possible values for this variable. Changing the size - especially 
     114            shrinking the list - can have disastrous effects and is therefore not 
     115            really recommended. Also, do not add values to the list by 
    116116            calling its append or extend method: call the :obj:`add_value` 
    117117            method instead. 
     
    122122    .. attribute:: base_value 
    123123 
    124             Stores the base value for the variable as an index into `values`. 
     124            Stores the base value for the variable as an index in `values`. 
    125125            This can be, for instance, a "normal" value, such as "no 
    126126            complications" as opposed to abnormal "low blood pressure". The 
    127127            base value is used by certain statistics, continuization etc. 
    128             potentially, learning algorithms. Default is -1 and means that 
     128            potentially, learning algorithms. The default is -1 which means that 
    129129            there is no base value. 
    130130     
     
    156156        Tells Orange to monitor the number of decimals when the value is 
    157157        converted from a string (when the values are read from a file or 
    158         converted by, e.g. ``inst[0]="3.14"``). The value of ``0`` means that 
    159         the number of decimals should not be adjusted, while 1 and 2 mean that 
    160         adjustments are on, with 2 denoting that no values have been converted 
    161         yet. 
    162  
    163         By default, adjustment of number of decimals goes as follows. 
     158        converted by, e.g. ``inst[0]="3.14"``):  
     159        0: the number of decimals is not adjusted automatically; 
     160        1: the number of decimals is (and has already) been adjusted; 
     161        2: automatic adjustment is enabled, but no values have been converted yet. 
     162 
     163        By default, adjustment of the number of decimals goes as follows: 
    164164     
    165165        If the variable was constructed when data was read from a file, it will  
     
    170170     
    171171        If the variable is created in a script, it will have, by default, three 
    172         decimals places. This can be changed either by setting the value 
     172        decimal places. This can be changed either by setting the value 
    173173        from a string (e.g. ``inst[0]="3.14"``, but not ``inst[0]=3.14``) or by 
    174174        manually setting the `number_of_decimals`. 
     
    183183    Bases: :class:`Variable` 
    184184 
    185     Descriptor for variables that contains strings. No method can use them for  
    186     learning; some will complain and other will silently ignore them when they  
     185    Descriptor for variables that contain strings. No method can use them for  
     186    learning; some will complain and others will silently ignore them when they  
    187187    encounter them. They can be, however, useful for meta-attributes; if  
    188     instances in dataset have unique id's, the most efficient way to store them  
     188    instances in a dataset have unique IDs, the most efficient way to store them  
    189189    is to read them as meta-attributes. In general, never use discrete  
    190190    attributes with many (say, more than 50) values. Such attributes are  
     
    194194    When converting strings into values and back, empty strings are treated  
    195195    differently than usual. For other types, an empty string can be used to 
    196     denote undefined values, while :obj:`StringVariable` will take empty string 
    197     as an empty string -- that is, except when loading or saving into file. 
     196    denote undefined values, while :obj:`StringVariable` will take empty strings 
     197    as empty strings -- except when loading or saving into file. 
    198198    Empty strings in files are interpreted as undefined; to specify an empty 
    199     string, enclose the string into double quotes; these get removed when the 
     199    string, enclose the string in double quotes; these are removed when the 
    200200    string is loaded. 
    201201 
     
    205205    Bases: :class:`Variable` 
    206206 
    207     Base class for descriptors defined in Python. It is fully functional, 
     207    Base class for descriptors defined in Python. It is fully functional 
    208208    and can be used as a descriptor for attributes that contain arbitrary Python 
    209209    values. Since this is an advanced topic, PythonVariables are described on a  
     
    215215 
    216216Values of variables are often computed from other variables, such as in 
    217 discretization. The mechanism described below usually occurs behind the scenes, 
     217discretization. The mechanism described below usually functions behind the scenes, 
    218218so understanding it is required only for implementing specific transformations. 
    219219 
     
    226226    :lines: 7-17 
    227227     
    228 The new variable is named ``e2``; we define it by descriptor of type  
     228The new variable is named ``e2``; we define it with a descriptor of type  
    229229:obj:`Discrete`, with appropriate name and values ``"not 1"`` and ``1`` (we  
    230230chose this order so that the ``not 1``'s index is ``0``, which can be, if  
     
    234234 
    235235``checkE`` is a function that is passed an instance and another argument we  
    236 don't care about here. If the instance's ``e`` equals ``1``, the function  
     236do not care about here. If the instance's ``e`` equals ``1``, the function  
    237237returns value ``1``, otherwise it returns ``not 1``. Both are returned as  
    238238values, not plain strings. 
    239239 
    240 In most circumstances, value of ``e2`` can be computed on the fly - we can  
    241 pretend that the variable exists in the data, although it doesn't (but  
     240In most circumstances the value of ``e2`` can be computed on the fly - we can  
     241pretend that the variable exists in the data, although it does not (but  
    242242can be computed from it). For instance, we can compute the information gain of 
    243243variable ``e2`` or its distribution without actually constructing data containing 
     
    254254    new_data = Orange.data.Table(new_domain, data)  
    255255 
    256 Automatic computation is useful when the data is split onto training and  
    257 testing examples. Training instanced can be modified by adding, removing  
     256Automatic computation is useful when the data is split into training and  
     257testing examples. Training instances can be modified by adding, removing  
    258258and transforming variables (in a typical setup, continuous variables  
    259259are discretized prior to learning, therefore the original variables are  
    260 replaced by new ones), while test instances are left as they  
     260replaced by new ones). Test instances, on the other hand, are left as they  
    261261are. When they are classified, the classifier automatically converts the  
    262262testing instances into the new domain, which includes recomputation of  
     
    271271----------------------------- 
    272272 
    273 All variables have a field :obj:`~Variable.attributes`. It is a dictionary 
     273All variables have a field :obj:`~Variable.attributes`, a dictionary 
    274274which can contain strings. Although the current implementation allows all 
    275275types of value we strongly advise to use only strings. An example: 
     
    277277.. literalinclude:: code/attributes.py 
    278278 
    279 The attributes can only be saved to a .tab file. They are listed in the 
     279These attributes can only be saved to a .tab file. They are listed in the 
    280280third line in <name>=<value> format, after other attribute specifications 
    281281(such as "meta" or "class"), and are separated by spaces.  
     
    285285 
    286286There are situations when variable descriptors need to be reused. Typically, the  
    287 user loads some training examples, trains a classifier and then loads a separate 
     287user loads some training examples, trains a classifier, and then loads a separate 
    288288test set. For the classifier to recognize the variables in the second data set, 
    289289the descriptors, not just the names, need to be the same.  
    290290 
    291 When constructing new descriptors for data read from a file or at unpickling, 
     291When constructing new descriptors for data read from a file or during unpickling, 
    292292Orange checks whether an appropriate descriptor (with the same name and, in case 
    293293of discrete variables, also values) already exists and reuses it. When new 
     
    296296the same name may already exist. 
    297297 
    298 The search for existing variable is based on four attributes: the variable's name, 
    299 type, ordered values and unordered values. As for the latter two, the values can  
     298The search for an existing variable is based on four attributes: the variable's name, 
     299type, ordered values, and unordered values. As for the latter two, the values can  
    300300be explicitly ordered by the user, e.g. in the second line of the tab-delimited  
    301 file, for instance to order sizes as small-medium-big. 
     301file. For instance, sizes can be ordered as small, medium, or big. 
    302302 
    303303The search for existing variables can end with one of the following statuses. 
     
    307307 
    308308Orange.data.variable.Variable.MakeStatus.Incompatible (3) 
    309     There is (or are) variables with matching name and type, but their 
     309    There are variables with matching name and type, but their 
    310310    values are incompatible with the prescribed ordered values. For example, 
    311311    if the existing variable already has values ["a", "b"] and the new one 
    312312    wants ["b", "a"], the old variable cannot be reused. The existing list can, 
    313     however be appended the new values, so searching for ["a", "b", "c"] would 
    314     succeed. So will also the search for ["a"], since the extra existing value 
    315     does not matter. The formal rule is thus that the values are compatible if ``existing_values[:len(ordered_values)] == ordered_values[:len(existing_values)]``. 
     313    however be appended with the new values, so searching for ["a", "b", "c"] would 
     314    succeed. Likewise a search for ["a"] would be successful, since the extra existing value 
     315    does not matter. The formal rule is thus that the values are compatible iff ``existing_values[:len(ordered_values)] == ordered_values[:len(existing_values)]``. 
    316316 
    317317Orange.data.variable.Variable.MakeStatus.NoRecognizedValues (2) 
     
    322322    name with values "M" and "F" (or, well, "no" and "yes" :). Reuse of this  
    323323    variable is possible, though this should probably be a new variable since it  
    324     obviously comes from a different data set. If we do decide for reuse, the  
     324    obviously comes from a different data set. If we do decide to reuse the variable, the  
    325325    old variable will get some unneeded new values and the new one will inherit  
    326326    some from the old. 
     
    340340 
    341341When loading the data using :obj:`Orange.data.Table`, Orange takes the safest  
    342 approach and, by default, reuses everything that is compatible, that is, up to  
     342approach and, by default, reuses everything that is compatible up to  
    343343and including ``NoRecognizedValues``. Unintended reuse would be obvious from the 
    344344variable having too many values, which the user can notice and fix. More on that  
     
    347347There are two functions for reusing the attributes instead of creating new ones. 
    348348 
    349 .. function:: Orange.data.variable.make(name, type, ordered_values, unordered_values[, create_new_on]) 
    350  
    351     Find and return an existing variable or create a new one if none existing 
     349.. function:: Orange.data.variable.make(name, type, ordered_values, unordered_values[, createNewOn]) 
     350 
     351    Find and return an existing variable or create a new one if none of the existing 
    352352    variables matches the given name, type and values. 
    353353     
    354     The optional `create_new_on` specifies the status at which a new variable is 
     354    The optional `create_on_new` specifies the status at which a new variable is 
    355355    created. The status must be at most ``Incompatible`` since incompatible (or 
    356356    non-existing) variables cannot be reused. If it is set lower, for instance  
    357357    to ``MissingValues``, a new variable is created even if there exists 
    358     a variable which only misses same values. If set to ``OK``, the function 
     358    a variable which is only missing the same values. If set to ``OK``, the function 
    359359    always creates a new variable. 
    360360     
    361361    The function returns a tuple containing a variable descriptor and the 
    362     status of the best matching variable. So, if ``create_new_on`` is set to 
     362    status of the best matching variable. So, if ``create_on_new`` is set to 
    363363    ``MissingValues``, and there exists a variable whose status is, say, 
    364364    ``UnrecognizedValues``, a variable would be created, while the second  
     
    368368    indicator whether the returned variable is reused or not. This can be, 
    369369    however, read from the status code: if it is smaller than the specified 
    370     ``create_new_on``, the variable is reused, otherwise we got a new descriptor. 
     370    ``create_new_on``, the variable is reused, otherwise a new descriptor has been constructed. 
    371371 
    372372    The exception to the rule is when ``create_new_on`` is OK. In this case, the  
     
    380380    :param unordered_values: a list of values, for which the order does not 
    381381        matter 
    382     :param create_new_on: gives condition for constructing a new variable instead 
     382    :param create_new_on: gives the condition for constructing a new variable instead 
    383383        of using the new one 
    384384     
    385385    :return_type: a tuple (:class:`Orange.data.variable.Variable`, int) 
    386386     
    387 .. function:: Orange.data.variable.retrieve(name, type, ordered_values, onordered_values[, create_new_on]) 
     387.. function:: Orange.data.variable.retrieve(name, type, ordered_values, onordered_values[, createNewOn]) 
    388388 
    389389    Find and return an existing variable, or :obj:`None` if no match is found. 
     
    395395    :param unordered_values: a list of values, for which the order does not 
    396396        matter 
    397     :param create_new_on: gives condition for constructing a new variable instead 
     397    :param create_new_on: gives the condition for constructing a new variable instead 
    398398        of using the new one 
    399399 
     
    405405executed only once (in a Python session) and in this order. 
    406406 
    407 :func:`Orange.data.variable.make` can be used for construction of new variables. :: 
     407:func:`Orange.data.variable.make` can be used for the construction of new variables. :: 
    408408     
    409409    >>> v1, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, ["a", "b"]) 
     
    411411    4 <a, b> 
    412412 
    413 No surprises here: new variable is created and the status is ``NotFound``. :: 
     413No surprises here: a new variable is created and the status is ``NotFound``. :: 
    414414 
    415415    >>> v2, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, ["a"], ["c"]) 
     
    419419The status is 1 (``MissingValues``), yet the variable is reused (``v2 is v1``). 
    420420``v1`` gets a new value, ``"c"``, which was given as an unordered value. It does 
    421 not matter that the new variable does not need value ``b``. :: 
     421not matter that the new variable does not need the value ``b``. :: 
    422422 
    423423    >>> v3, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, ["a", "b", "c", "d"]) 
     
    425425    1 True <a, b, c, d> 
    426426 
    427 This is similar as before, except that the new value, ``d`` is not among the 
     427This is like before, except that the new value, ``d`` is not among the 
    428428ordered values. :: 
    429429 
     
    440440    0 True <a, b, c, d> <a, b, c, d> 
    441441 
    442 The new variable has values ``c`` and ``a``, but does not 
    443 mind about the order, so the existing attribute is ``OK``. :: 
     442The new variable has values ``c`` and ``a``, but the order is not important,  
     443so the existing attribute is ``OK``. :: 
    444444 
    445445    >>> v6, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, None, ["e"]) "a"]) 
     
    447447    2 True <a, b, c, d, e> <a, b, c, d, e> 
    448448 
    449 The new variable has different values than the existing (status is 2, 
    450 ``NoRecognizedValues``), but the existing is reused nevertheless. Note that we 
     449The new variable has different values than the existing variable (status is 2, 
     450``NoRecognizedValues``), but the existing one is nonetheless reused. Note that we 
    451451gave ``e`` in the list of unordered values. If it was among the ordered, the 
    452452reuse would fail. :: 
     
    468468Finally, this is a perfect match, but any reuse is prohibited, so a new  
    469469variable is created. 
    470  
    471  
    472470 
    473471""" 
Note: See TracChangeset for help on using the changeset viewer.