Ignore:
Timestamp:
02/07/12 16:12:17 (2 years ago)
Author:
markotoplak
Branch:
default
rebase_source:
1d1ce52bf1c40adcceacdbc987601870f76893c1
Message:

data.variable -> feature.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.feature.descriptor.rst

    r9897 r9927  
    3131    .. attribute:: var_type 
    3232 
    33         Variable type; it can be :obj:`~Orange.data.Type.Discrete`, 
    34         :obj:`~Orange.data.Type.Continuous`, 
    35         :obj:`~Orange.data.Type.String` or :obj:`~Orange.data.Type.Other`. 
     33        Variable type; it can be :obj:`~Orange.feature.Type.Discrete`, 
     34        :obj:`~Orange.feature.Type.Continuous`, 
     35        :obj:`~Orange.feature.Type.String` or :obj:`~Orange.feature.Type.Other`. 
    3636 
    3737    .. attribute:: get_value_from 
     
    243243-------------------- 
    244244 
    245 There are situations when variable descriptors need to be reused. Typically, the 
    246 user loads some training examples, trains a classifier, and then loads a separate 
    247 test set. For the classifier to recognize the variables in the second data set, 
    248 the descriptors, not just the names, need to be the same. 
    249  
    250 When constructing new descriptors for data read from a file or during unpickling, 
    251 Orange checks whether an appropriate descriptor (with the same name and, in case 
    252 of discrete variables, also values) already exists and reuses it. When new 
    253 descriptors are constructed by explicitly calling the above constructors, this 
    254 always creates new descriptors and thus new variables, although a variable with 
    255 the same name may already exist. 
    256  
    257 The search for an existing variable is based on four attributes: the variable's name, 
    258 type, ordered values, and unordered values. As for the latter two, the values can 
    259 be explicitly ordered by the user, e.g. in the second line of the tab-delimited 
    260 file. For instance, sizes can be ordered as small, medium, or big. 
    261  
    262 The search for existing variables can end with one of the following statuses. 
    263  
    264 .. data:: MakeStatus.NotFound (4) 
     245There are situations when variable descriptors need to be 
     246reused. Typically, the user loads some training examples, trains a 
     247classifier, and then loads a separate test set. For the classifier to 
     248recognize the variables in the second data set, the descriptors, not 
     249just the names, need to be the same. 
     250 
     251When constructing new descriptors for data read from a file or during 
     252unpickling, Orange checks whether an appropriate descriptor (with the same 
     253name and, in case of discrete variables, also values) already exists and 
     254reuses it. When new descriptors are constructed by explicitly calling 
     255the above constructors, this always creates new descriptors and thus 
     256new variables, although a variable with the same name may already exist. 
     257 
     258The search for an existing variable is based on four attributes: the 
     259variable's name, type, ordered values, and unordered values. As for the 
     260latter two, the values can be explicitly ordered by the user, e.g. in 
     261the second line of the tab-delimited file. For instance, sizes can be 
     262ordered as small, medium, or big. 
     263 
     264The search for existing variables can end with one of the following 
     265statuses. 
     266 
     267.. data:: Descriptor.MakeStatus.NotFound (4) 
    265268 
    266269    The variable with that name and type does not exist. 
    267270 
    268 .. data:: MakeStatus.Incompatible (3) 
     271.. data:: Descriptor.MakeStatus.Incompatible (3) 
    269272 
    270273    There are variables with matching name and type, but their 
     
    276279    does not matter. The formal rule is thus that the values are compatible iff ``existing_values[:len(ordered_values)] == ordered_values[:len(existing_values)]``. 
    277280 
    278 .. data:: MakeStatus.NoRecognizedValues (2) 
     281.. data:: Descriptor.MakeStatus.NoRecognizedValues (2) 
    279282 
    280283    There is a matching variable, yet it has none of the values that the new 
     
    288291    some from the old. 
    289292 
    290 .. data:: MakeStatus.MissingValues (1) 
     293.. data:: Descriptor.MakeStatus.MissingValues (1) 
    291294 
    292295    There is a matching variable with some of the values that the new one 
     
    295298    be values which occur in one set but not in the other. 
    296299 
    297 .. data:: MakeStatus.OK (0) 
     300.. data:: Descriptor.MakeStatus.OK (0) 
    298301 
    299302    There is a perfect match which contains all the prescribed values in the 
     
    301304 
    302305Continuous variables can obviously have only two statuses, 
    303 :obj:`~MakeStatus.NotFound` or :obj:`~MakeStatus.OK`. 
     306:obj:`~Descriptor.MakeStatus.NotFound` or :obj:`~Descriptor.MakeStatus.OK`. 
    304307 
    305308When loading the data using :obj:`Orange.data.Table`, Orange takes the safest 
    306309approach and, by default, reuses everything that is compatible up to 
    307 and including :obj:`~MakeStatus.NoRecognizedValues`. Unintended reuse would be obvious from the 
     310and including :obj:`~Descriptor.MakeStatus.NoRecognizedValues`. Unintended reuse would be obvious from the 
    308311variable having too many values, which the user can notice and fix. More on that 
    309312in the page on :doc:`Orange.data.formats`. 
     
    311314There are two functions for reusing the variables instead of creating new ones. 
    312315 
    313 .. function:: make(name, type, ordered_values, unordered_values[, create_new_on]) 
     316.. function:: Descriptor.make(name, type, ordered_values, unordered_values[, create_new_on]) 
    314317 
    315318    Find and return an existing variable or create a new one if none of the existing 
     
    317320 
    318321    The optional `create_new_on` specifies the status at which a new variable is 
    319     created. The status must be at most :obj:`~MakeStatus.Incompatible` since incompatible (or 
     322    created. The status must be at most :obj:`~Descriptor.MakeStatus.Incompatible` since incompatible (or 
    320323    non-existing) variables cannot be reused. If it is set lower, for instance 
    321     to :obj:`~MakeStatus.MissingValues`, a new variable is created even if there exists 
    322     a variable which is only missing the same values. If set to :obj:`~MakeStatus.OK`, the function 
     324    to :obj:`~Descriptor.MakeStatus.MissingValues`, a new variable is created even if there exists 
     325    a variable which is only missing the same values. If set to :obj:`~Descriptor.MakeStatus.OK`, the function 
    323326    always creates a new variable. 
    324327 
    325328    The function returns a tuple containing a variable descriptor and the 
    326329    status of the best matching variable. So, if ``create_new_on`` is set to 
    327     :obj:`~MakeStatus.MissingValues`, and there exists a variable whose status is, say, 
    328     :obj:`~MakeStatus.NoRecognizedValues`, a variable would be created, while the second 
    329     element of the tuple would contain :obj:`~MakeStatus.NoRecognizedValues`. If, on the other 
     330    :obj:`~Descriptor.MakeStatus.MissingValues`, and there exists a variable whose status is, say, 
     331    :obj:`~Descriptor.MakeStatus.NoRecognizedValues`, a variable would be created, while the second 
     332    element of the tuple would contain :obj:`~Descriptor.MakeStatus.NoRecognizedValues`. If, on the other 
    330333    hand, there exists a variable which is perfectly OK, its descriptor is 
    331     returned and the returned status is :obj:`~MakeStatus.OK`. The function returns no 
     334    returned and the returned status is :obj:`~Descriptor.MakeStatus.OK`. The function returns no 
    332335    indicator whether the returned variable is reused or not. This can be, 
    333336    however, read from the status code: if it is smaller than the specified 
     
    336339    The exception to the rule is when ``create_new_on`` is OK. In this case, the 
    337340    function does not search through the existing variables and cannot know the 
    338     status, so the returned status in this case is always :obj:`~MakeStatus.OK`. 
     341    status, so the returned status in this case is always :obj:`~Descriptor.MakeStatus.OK`. 
    339342 
    340343    :param name: Descriptor name 
     
    349352    :return_type: a tuple (:class:`~Descriptor`, int) 
    350353 
    351 .. function:: retrieve(name, type, ordered_values, onordered_values[, create_new_on]) 
     354.. function:: Descriptor.retrieve(name, type, ordered_values, onordered_values[, create_new_on]) 
    352355 
    353356    Find and return an existing variable, or :obj:`None` if no match is found. 
     
    369372:func:`make` can be used for the construction of new variables. :: 
    370373 
    371     >>> v1, s = Orange.feature.make("a", Orange.data.Type.Discrete, ["a", "b"]) 
     374    >>> v1, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, ["a", "b"]) 
    372375    >>> print s, v1.values 
    373376    NotFound <a, b> 
    374377 
    375 A new variable was created and the status is :obj:`~Orange.data.variable 
    376 .MakeStatus.NotFound`. :: 
    377  
    378     >>> v2, s = Orange.feature.make("a", Orange.data.Type.Discrete, ["a"], ["c"]) 
     378A new variable was created and the status is :obj:`~Descriptor.MakeStatus.NotFound`. :: 
     379 
     380    >>> v2, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, ["a"], ["c"]) 
    379381    >>> print s, v2 is v1, v1.values 
    380382    MissingValues True <a, b, c> 
    381383 
    382 The status is :obj:`~MakeStatus.MissingValues`, 
     384The status is :obj:`~Descriptor.MakeStatus.MissingValues`, 
    383385yet the variable is reused (``v2 is v1``). ``v1`` gets a new value, 
    384386``"c"``, which was given as an unordered value. It does 
    385387not matter that the new variable does not need the value ``b``. :: 
    386388 
    387     >>> v3, s = Orange.feature.make("a", Orange.data.Type.Discrete, ["a", "b", "c", "d"]) 
     389    >>> v3, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, ["a", "b", "c", "d"]) 
    388390    >>> print s, v3 is v1, v1.values 
    389391    MissingValues True <a, b, c, d> 
     
    392394ordered values. :: 
    393395 
    394     >>> v4, s = Orange.feature.make("a", Orange.data.Type.Discrete, ["b"]) 
     396    >>> v4, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, ["b"]) 
    395397    >>> print s, v4 is v1, v1.values, v4.values 
    396398    Incompatible, False, <b>, <a, b, c, d> 
     
    398400The new variable needs to have ``b`` as the first value, so it is incompatible 
    399401with the existing variables. The status is 
    400 :obj:`~MakeStatus.Incompatible` and 
     402:obj:`~Descriptor.MakeStatus.Incompatible` and 
    401403a new variable is created; the two variables are not equal and have 
    402404different lists of values. :: 
    403405 
    404     >>> v5, s = Orange.feature.make("a", Orange.data.Type.Discrete, None, ["c", "a"]) 
     406    >>> v5, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, None, ["c", "a"]) 
    405407    >>> print s, v5 is v1, v1.values, v5.values 
    406408    OK True <a, b, c, d> <a, b, c, d> 
    407409 
    408410The new variable has values ``c`` and ``a``, but the order is not important, 
    409 so the existing attribute is :obj:`~MakeStatus.OK`. :: 
    410  
    411     >>> v6, s = Orange.feature.make("a", Orange.data.Type.Discrete, None, ["e"]) "a"]) 
     411so the existing attribute is :obj:`~Descriptor.MakeStatus.OK`. :: 
     412 
     413    >>> v6, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, None, ["e"]) "a"]) 
    412414    >>> print s, v6 is v1, v1.values, v6.values 
    413415    NoRecognizedValues True <a, b, c, d, e> <a, b, c, d, e> 
    414416 
    415417The new variable has different values than the existing variable (status 
    416 is :obj:`~MakeStatus.NoRecognizedValues`), 
     418is :obj:`~Descriptor.MakeStatus.NoRecognizedValues`), 
    417419but the existing one is nonetheless reused. Note that we 
    418420gave ``e`` in the list of unordered values. If it was among the ordered, the 
    419421reuse would fail. :: 
    420422 
    421     >>> v7, s = Orange.feature.make("a", Orange.data.Type.Discrete, None, 
     423    >>> v7, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, None, 
    422424            ["f"], Orange.feature.MakeStatus.NoRecognizedValues))) 
    423425    >>> print s, v7 is v1, v1.values, v7.values 
     
    428430the same as before:: 
    429431 
    430     >>> v8, s = Orange.feature.make("a", Orange.data.Type.Discrete, 
     432    >>> v8, s = Orange.feature.Descriptor.make("a", Orange.feature.Type.Discrete, 
    431433            ["a", "b", "c", "d", "e"], None, Orange.feature.MakeStatus.OK) 
    432434    >>> print s, v8 is v1, v1.values, v8.values 
Note: See TracChangeset for help on using the changeset viewer.