Changeset 10246:11b418321f79 in orange


Ignore:
Timestamp:
02/15/12 16:46:35 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Children:
10255:5f19d86c3d2b, 10264:273260f0e2c5
Message:

Unified argument names in and 2.5 and 3.0; numerous other changes in documentation

Files:
18 edited

Legend:

Unmodified
Added
Removed
  • Orange/classification/knn.py

    r9994 r10246  
    1 """ 
    2 .. index: k-nearest neighbors (kNN) 
    3 .. index: 
    4    single: classification; k-nearest neighbors (kNN) 
    5     
    6 ***************************** 
    7 k-nearest neighbors (``knn``) 
    8 ***************************** 
    9  
    10 The `nearest neighbors 
    11 algorithm <http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_ is one of the most basic, 
    12 `lazy <http://en.wikipedia.org/wiki/Lazy_learning>`_ machine learning algorithms. 
    13 The learner only needs to store the instances of training data, while the classifier 
    14 does all the work by searching this list for the instances most similar to 
    15 the data instance being classified: 
    16  
    17 .. literalinclude:: code/knnExample0.py 
    18  
    19 .. class:: kNNLearner(k, distance_constructor, weight_id) 
    20  
    21     Lazy classifier that stores instances from the training set. Constructor 
    22     parameters set the corresponding attributes. 
    23  
    24     .. attribute:: k 
    25  
    26         number of nearest neighbors used in classification. If set to 0 
    27         (default), the square root of the numbers of instances is used. 
    28  
    29     .. attribute:: distance_constructor 
    30  
    31         component that constructs the object for measuring distances between 
    32         instances. Defaults to :class:`~Orange.distance.Euclidean`. 
    33  
    34     .. attribute:: weight_id 
    35      
    36         id of meta attribute with instance weights 
    37  
    38     .. attribute:: rank_weight 
    39  
    40         Enables weighting by ranks (default: :obj:`true`) 
    41  
    42     .. method:: __call__(instances) 
    43  
    44         Return a learned :class:`~kNNClassifier`. Learning consists of 
    45         constructing a distance measure and passing it to the classifier 
    46         along with :obj:`instances` and all attributes. 
    47  
    48         :param instances: training instances 
    49         :type instances: :class:`~Orange.data.Table` 
    50  
    51  
    52 .. class:: kNNClassifier(domain, weight_id, k, find_nearest, rank_weight, n_examples) 
    53  
    54     .. method:: __call__(instance, return_type) 
    55  
    56         :param instance: given instance to be classified 
    57         :type instance: Orange.data.Instance 
    58          
    59         :param return_type: return value and probabilities, only value or only 
    60                             probabilities 
    61         :type return_type: :obj:`~Orange.classification.Classifier.GetBoth`, 
    62                            :obj:`~Orange.classification.Classifier.GetValue`, 
    63                            :obj:`~Orange.classification.Classifier.GetProbabilities` 
    64          
    65         :rtype: :class:`~Orange.data.Value`, 
    66               :class:`~Orange.statistics.distribution.Distribution` or a 
    67               tuple with both 
    68          
    69     .. method:: find_nearest(instance) 
    70      
    71     A component which finds the nearest neighbors of a given instance. 
    72          
    73     :param instance: given instance 
    74     :type instance: :class:`~Orange.data.Instance` 
    75          
    76     :rtype: :class:`Orange.data.Instance` 
    77      
    78      
    79     .. attribute:: k 
    80      
    81         Number of neighbors. If set to 0 (which is also the default value),  
    82         the square root of the number of examples is used. 
    83      
    84     .. attribute:: rank_weight 
    85      
    86         Enables weighting by rank (default: :obj:`true`). 
    87      
    88     .. attribute:: weight_id 
    89      
    90         ID of meta attribute with weights of examples 
    91      
    92     .. attribute:: n_examples 
    93      
    94         The number of learning instances. It is used to compute the number of  
    95         neighbors if the value of :attr:`kNNClassifier.k` is zero. 
    96  
    97 When called to classify an instance, the classifier first calls  
    98 :meth:`kNNClassifier.find_nearest`  
    99 to retrieve a list with :attr:`kNNClassifier.k` nearest neighbors. The 
    100 component :meth:`kNNClassifier.find_nearest` has  
    101 a stored table of instances (those that have been passed to the learner)  
    102 together with their weights. If instances are weighted (non-zero  
    103 :obj:`weight_ID`), weights are considered when counting the neighbors. 
    104  
    105 If :meth:`kNNClassifier.find_nearest` returns only one neighbor  
    106 (this is the case if :obj:`k=1`), :class:`kNNClassifier` returns the 
    107 neighbor's class. 
    108  
    109 Otherwise, the retrieved neighbors vote about the class prediction 
    110 (or probability of classes). Voting has double weights. As first, if 
    111 instances are weighted, their weights are respected. Secondly, nearer 
    112 neighbors have a greater impact on the prediction; the weight of instance 
    113 is computed as exp(-t:sup:`2`/s:sup:`2`), where the meaning of t depends 
    114 on the setting of :obj:`rank_weight`. 
    115  
    116 * if :obj:`rank_weight` is :obj:`false`, :obj:`t` is the distance from the 
    117   instance being classified 
    118 * if :obj:`rank_weight` is :obj:`true`, neighbors are ordered and :obj:`t` 
    119   is the position of the neighbor on the list (a rank) 
    120  
    121  
    122 In both cases, :obj:`s` is chosen so that the impact of the farthest instance 
    123 is 0.001. 
    124  
    125 Weighting gives the classifier a certain insensitivity to the number of 
    126 neighbors used, making it possible to use large :obj:`k`'s. 
    127  
    128 The classifier can treat continuous and discrete features, and can even 
    129 distinguish between ordinal and nominal features. See information on 
    130 distance measuring for details. 
    131  
    132 Examples 
    133 -------- 
    134  
    135 The learner will be tested on an 'iris' data set. The data will be split  
    136 into training (80%) and testing (20%) instances. We will use the former  
    137 for "training" the classifier and test it on five testing instances  
    138 randomly selected from a part of (:download:`knnlearner.py <code/knnlearner.py>`): 
    139  
    140 .. literalinclude:: code/knnExample1.py 
    141  
    142 The output of this code is::  
    143      
    144     Iris-setosa Iris-setosa 
    145     Iris-versicolor Iris-versicolor 
    146     Iris-versicolor Iris-versicolor 
    147     Iris-setosa Iris-setosa 
    148     Iris-setosa Iris-setosa 
    149  
    150 The secret to kNN's success is that the instances in the iris data set appear in 
    151 three well separated clusters. The classifier's accuracy will remain 
    152 excellent even with a very large or very small number of neighbors. 
    153  
    154 As many experiments have shown, a selection of instances of distance measures 
    155 has neither a greater nor more predictable effect on the performance of kNN 
    156 classifiers. Therefore there is not much point in changing the default. If you 
    157 decide to do so, the distance_constructor must be set to an instance 
    158 of one of the classes for distance measuring. This can be seen in the following 
    159 part of (:download:`knnlearner.py <code/knnlearner.py>`): 
    160  
    161 .. literalinclude:: code/knnExample2.py 
    162  
    163 The output of this code is:: 
    164  
    165     Iris-virginica Iris-versicolor 
    166     Iris-setosa Iris-setosa 
    167     Iris-versicolor Iris-versicolor 
    168     Iris-setosa Iris-setosa 
    169     Iris-setosa Iris-setosa 
    170  
    171 The result is still perfect. 
    172  
    173 .. index: fnn 
    174  
    175  
    176 Finding nearest neighbors 
    177 ------------------------- 
    178  
    179 Orange provides classes for finding the nearest neighbors of a given 
    180 reference instance. While we might add some smarter classes in the future, we 
    181 now have only two - abstract classes that define the general behavior of 
    182 neighbor searching classes, and classes that implement brute force search. 
    183  
    184 As is the norm in Orange, there are a pair of classes: a class that does the work 
    185 (:class:`FindNearest`) and a class that constructs it ("learning" - getting the 
    186 instances and arranging them in an appropriate data structure that allows for 
    187 searching) (:class:`FindNearestConstructor`). 
    188  
    189 .. class:: FindNearest 
    190  
    191     A class for a brute force search for nearest neighbors. It stores a table  
    192     of instances (it's its own copy of instances, not only Orange.data.Table 
    193     with references to another Orange.data.Table). When asked for neighbors, 
    194     it measures distances to all instances, stores them in a heap and returns  
    195     the first k as an Orange.data.Table with references to instances stored in 
    196     FindNearest's field instances). 
    197      
    198     .. attribute:: distance 
    199      
    200         a component that measures the distance between examples 
    201      
    202     .. attribute:: examples 
    203      
    204         a stored list of instances 
    205      
    206     .. attribute:: weight_ID 
    207      
    208         ID of meta attribute with weight 
    209      
    210     .. method:: __call__(instance, n) 
    211      
    212     :param instance: given instance 
    213     :type instance: Orange.data.Instance 
    214      
    215     :param n: number of neighbors 
    216     :type n: int 
    217      
    218     :rtype: list of :obj:`Orange.data.Instance` 
    219      
    220 .. class:: FindNearestConstructor() 
    221  
    222      
    223     A class that constructs FindNearest. It calls the inherited 
    224     distance_constructor, which constructs a distance measure. 
    225     The distance measure, along with the instances weight_ID and 
    226     distance_ID, is then passed to the just constructed instance 
    227     of FindNearest_BruteForce. 
    228  
    229     If there are more instances with the same distance fighting for the last 
    230     places, the tie is resolved by randomly picking the appropriate number of 
    231     instances. A local random generator is constructed and initiated by a 
    232     constant computed from the reference instance. The effect of this is that 
    233     the same random neighbors will be chosen for the instance each time 
    234     FindNearest_BruteForce 
    235     is called. 
    236      
    237     .. attribute:: distance_constructor 
    238      
    239         A component of class ExamplesDistanceConstructor that "learns" to 
    240         measure distances between instances. Learning can mean, for instances, 
    241         storing the ranges of continuous features or the number of values of 
    242         a discrete feature (see the page about measuring distances for more 
    243         information). The result of learning is an instance of  
    244         ExamplesDistance that should be used for measuring distances 
    245         between instances. 
    246      
    247     .. attribute:: include_same 
    248      
    249         Tells whether or not to include the examples that are same as the reference; 
    250         the default is true. 
    251      
    252     .. method:: __call__(table, weightID, distanceID) 
    253      
    254         Constructs an instance of FindNearest that would return neighbors of 
    255         a given instance, obeying weight_ID when counting them (also, some  
    256         measures of distance might consider weights as well) and storing the  
    257         distances in a meta attribute with ID distance_ID. 
    258      
    259         :param table: table of instances 
    260         :type table: Orange.data.Table 
    261          
    262         :param weight_ID: id of meta attribute with weights of instances 
    263         :type weight_ID: int 
    264          
    265         :param distance_ID: id of meta attribute that will save distances 
    266         :type distance_ID: int 
    267          
    268         :rtype: :class:`FindNearest` 
    269  
    270 Examples 
    271 -------- 
    272  
    273 The following script (:download:`knnInstanceDistance.py <code/knnInstanceDistance.py>`) 
    274 shows how to find the five nearest neighbors of the first instance 
    275 in the lenses dataset. 
    276  
    277 .. literalinclude:: code/knnInstanceDistance.py 
    278  
    279 """ 
    280  
    2811from Orange.core import \ 
    2822            kNNLearner, \ 
  • Orange/classification/logreg.py

    r9959 r10246  
    5757    classifier is returned instead of the learner. 
    5858 
    59     :param instances: data table with either discrete or continuous features 
    60     :type instances: Orange.data.Table 
     59    :param data: data table with either discrete or continuous features 
     60    :type data: Orange.data.Table 
    6161    :param weight_id: the ID of the weight meta attribute 
    6262    :type weight_id: int 
     
    8080 
    8181    @deprecated_keywords({"weightID": "weight_id"}) 
    82     def __new__(cls, instances=None, weight_id=0, **argkw): 
     82    def __new__(cls, data=None, weight_id=0, **argkw): 
    8383        self = Orange.classification.Learner.__new__(cls, **argkw) 
    84         if instances: 
     84        if data: 
    8585            self.__init__(**argkw) 
    86             return self.__call__(instances, weight_id) 
     86            return self.__call__(data, weight_id) 
    8787        else: 
    8888            return self 
     
    9494        self.fitter = None 
    9595 
    96     @deprecated_keywords({"examples": "instances"}) 
    97     def __call__(self, instances, weight=0): 
     96    @deprecated_keywords({"examples": "data"}) 
     97    def __call__(self, data, weight=0): 
    9898        """Learn from the given table of data instances. 
    9999 
    100         :param instances: Data instances to learn from. 
    101         :type instances: :class:`~Orange.data.Table` 
     100        :param data: Data instances to learn from. 
     101        :type data: :class:`~Orange.data.Table` 
    102102        :param weight: Id of meta attribute with weights of instances 
    103103        :type weight: int 
     
    106106        imputer = getattr(self, "imputer", None) or None 
    107107        if getattr(self, "remove_missing", 0): 
    108             instances = Orange.core.Preprocessor_dropMissing(instances) 
     108            data = Orange.core.Preprocessor_dropMissing(data) 
    109109##        if hasDiscreteValues(examples.domain): 
    110110##            examples = createNoDiscTable(examples) 
    111         if not len(instances): 
     111        if not len(data): 
    112112            return None 
    113113        if getattr(self, "stepwise_lr", 0): 
     
    115115            delete_crit = getattr(self, "delete_crit", 0.3) 
    116116            num_features = getattr(self, "num_features", -1) 
    117             attributes = StepWiseFSS(instances, add_crit= add_crit, 
     117            attributes = StepWiseFSS(data, add_crit= add_crit, 
    118118                delete_crit=delete_crit, imputer = imputer, num_features= num_features) 
    119119            tmp_domain = Orange.data.Domain(attributes, 
    120                 instances.domain.class_var) 
    121             tmp_domain.addmetas(instances.domain.getmetas()) 
    122             instances = instances.select(tmp_domain) 
     120                data.domain.class_var) 
     121            tmp_domain.addmetas(data.domain.getmetas()) 
     122            data = data.select(tmp_domain) 
    123123        learner = Orange.core.LogRegLearner() # Yes, it has to be from core. 
    124124        learner.imputer_constructor = imputer 
    125125        if imputer: 
    126             instances = self.imputer(instances)(instances) 
    127         instances = Orange.core.Preprocessor_dropMissing(instances) 
     126            data = self.imputer(data)(data) 
     127        data = Orange.core.Preprocessor_dropMissing(data) 
    128128        if self.fitter: 
    129129            learner.fitter = self.fitter 
    130130        if self.remove_singular: 
    131             lr = learner.fit_model(instances, weight) 
     131            lr = learner.fit_model(data, weight) 
    132132        else: 
    133             lr = learner(instances, weight) 
     133            lr = learner(data, weight) 
    134134        while isinstance(lr, Orange.feature.Descriptor): 
    135135            if isinstance(lr.getValueFrom, Orange.core.ClassifierFromVar) and isinstance(lr.getValueFrom.transformer, Orange.core.Discrete2Continuous): 
    136136                lr = lr.getValueFrom.variable 
    137             attributes = instances.domain.features[:] 
     137            attributes = data.domain.features[:] 
    138138            if lr in attributes: 
    139139                attributes.remove(lr) 
     
    141141                attributes.remove(lr.getValueFrom.variable) 
    142142            new_domain = Orange.data.Domain(attributes,  
    143                 instances.domain.class_var) 
    144             new_domain.addmetas(instances.domain.getmetas()) 
    145             instances = instances.select(new_domain) 
    146             lr = learner.fit_model(instances, weight) 
     143                data.domain.class_var) 
     144            new_domain.addmetas(data.domain.getmetas()) 
     145            data = data.select(new_domain) 
     146            lr = learner.fit_model(data, weight) 
    147147        return lr 
    148148 
     
    157157 
    158158class UnivariateLogRegLearner(Orange.classification.Learner): 
    159     def __new__(cls, instances=None, **argkw): 
     159    def __new__(cls, data=None, **argkw): 
    160160        self = Orange.classification.Learner.__new__(cls, **argkw) 
    161         if instances: 
     161        if data: 
    162162            self.__init__(**argkw) 
    163             return self.__call__(instances) 
     163            return self.__call__(data) 
    164164        else: 
    165165            return self 
     
    168168        self.__dict__.update(kwds) 
    169169 
    170     @deprecated_keywords({"examples": "instances"}) 
    171     def __call__(self, instances): 
    172         instances = createFullNoDiscTable(instances) 
     170    @deprecated_keywords({"examples": "data"}) 
     171    def __call__(self, data): 
     172        data = createFullNoDiscTable(data) 
    173173        classifiers = map(lambda x: LogRegLearner(Orange.core.Preprocessor_dropMissing( 
    174             instances.select(Orange.data.Domain(x,  
    175             instances.domain.class_var)))), instances.domain.features) 
     174            data.select(Orange.data.Domain(x,  
     175                data.domain.class_var)))), data.domain.features) 
    176176        maj_classifier = LogRegLearner(Orange.core.Preprocessor_dropMissing 
    177             (instances.select(Orange.data.Domain(instances.domain.class_var)))) 
     177            (data.select(Orange.data.Domain(data.domain.class_var)))) 
    178178        beta = [maj_classifier.beta[0]] + [x.beta[1] for x in classifiers] 
    179179        beta_se = [maj_classifier.beta_se[0]] + [x.beta_se[1] for x in classifiers] 
    180180        P = [maj_classifier.P[0]] + [x.P[1] for x in classifiers] 
    181181        wald_Z = [maj_classifier.wald_Z[0]] + [x.wald_Z[1] for x in classifiers] 
    182         domain = instances.domain 
     182        domain = data.domain 
    183183 
    184184        return Univariate_LogRegClassifier(beta = beta, beta_se = beta_se, P = P, wald_Z = wald_Z, domain = domain) 
     
    195195 
    196196class LogRegLearnerGetPriors(object): 
    197     def __new__(cls, instances=None, weight_id=0, **argkw): 
     197    def __new__(cls, data=None, weight_id=0, **argkw): 
    198198        self = object.__new__(cls) 
    199         if instances: 
     199        if data: 
    200200            self.__init__(**argkw) 
    201             return self.__call__(instances, weight_id) 
     201            return self.__call__(data, weight_id) 
    202202        else: 
    203203            return self 
     
    208208        self.remove_singular = remove_singular 
    209209 
    210     @deprecated_keywords({"examples": "instances"}) 
    211     def __call__(self, instances, weight=0): 
     210    @deprecated_keywords({"examples": "data"}) 
     211    def __call__(self, data, weight=0): 
    212212        # next function changes data set to a extended with unknown values  
    213213        def createLogRegExampleTable(data, weight_id): 
     
    249249            remove_singular = self.remove_singular) 
    250250        # get Original Model 
    251         orig_model = learner(instances,weight) 
     251        orig_model = learner(data, weight) 
    252252        if orig_model.fit_status: 
    253253            print "Warning: model did not converge" 
     
    256256        if weight == 0: 
    257257            weight = Orange.feature.Descriptor.new_meta_id() 
    258             instances.addMetaAttribute(weight, 1.0) 
    259         extended_set_of_examples = createLogRegExampleTable(instances, weight) 
     258            data.addMetaAttribute(weight, 1.0) 
     259        extended_set_of_examples = createLogRegExampleTable(data, weight) 
    260260        extended_models = [learner(extended_examples, weight) \ 
    261261                           for extended_examples in extended_set_of_examples] 
     
    285285         
    286286        # compare it to bayes prior 
    287         bayes = Orange.classification.bayes.NaiveLearner(instances) 
     287        bayes = Orange.classification.bayes.NaiveLearner(data) 
    288288        bayes_prior = math.log(bayes.distribution[1]/bayes.distribution[0]) 
    289289 
     
    327327        self.remove_singular = remove_singular 
    328328 
    329     @deprecated_keywords({"examples": "instances"}) 
    330     def __call__(self, instances, weight=0): 
     329    @deprecated_keywords({"examples": "data"}) 
     330    def __call__(self, data, weight=0): 
    331331        # next function changes data set to a extended with unknown values  
    332332        def createLogRegExampleTable(data, weightID): 
     
    374374        learner = LogRegLearner(imputer = Orange.feature.imputation.ImputerConstructor_average(), removeSingular = self.remove_singular) 
    375375        # get Original Model 
    376         orig_model = learner(instances,weight) 
     376        orig_model = learner(data,weight) 
    377377 
    378378        # get extended Model (you should not change data) 
    379379        if weight == 0: 
    380380            weight = Orange.feature.Descriptor.new_meta_id() 
    381             instances.addMetaAttribute(weight, 1.0) 
    382         extended_examples = createLogRegExampleTable(instances, weight) 
     381            data.addMetaAttribute(weight, 1.0) 
     382        extended_examples = createLogRegExampleTable(data, weight) 
    383383        extended_model = learner(extended_examples, weight) 
    384384 
     
    403403         
    404404        # compare it to bayes prior 
    405         bayes = Orange.classification.bayes.NaiveLearner(instances) 
     405        bayes = Orange.classification.bayes.NaiveLearner(data) 
    406406        bayes_prior = math.log(bayes.distribution[1]/bayes.distribution[0]) 
    407407 
     
    670670#  Feature subset selection for logistic regression 
    671671 
    672 @deprecated_keywords({"examples": "instances"}) 
    673 def get_likelihood(fitter, instances): 
    674     res = fitter(instances) 
     672@deprecated_keywords({"examples": "data"}) 
     673def get_likelihood(fitter, data): 
     674    res = fitter(data) 
    675675    if res[0] in [fitter.OK]: #, fitter.Infinity, fitter.Divergence]: 
    676676       status, beta, beta_se, likelihood = res 
    677677       if sum([abs(b) for b in beta])<sum([abs(b) for b in beta_se]): 
    678            return -100*len(instances) 
     678           return -100*len(data) 
    679679       return likelihood 
    680680    else: 
    681        return -100*len(instances) 
     681       return -100*len(data) 
    682682         
    683683 
     
    732732  """ 
    733733 
    734   def __new__(cls, instances=None, **argkw): 
     734  def __new__(cls, data=None, **argkw): 
    735735      self = Orange.classification.Learner.__new__(cls, **argkw) 
    736       if instances: 
     736      if data: 
    737737          self.__init__(**argkw) 
    738           return self.__call__(instances) 
     738          return self.__call__(data) 
    739739      else: 
    740740          return self 
     
    873873 
    874874class StepWiseFSSFilter(object): 
    875     def __new__(cls, instances=None, **argkw): 
     875    def __new__(cls, data=None, **argkw): 
    876876        self = object.__new__(cls) 
    877         if instances: 
     877        if data: 
    878878            self.__init__(**argkw) 
    879             return self.__call__(instances) 
     879            return self.__call__(data) 
    880880        else: 
    881881            return self 
     
    888888        self.num_features = num_features 
    889889 
    890     @deprecated_keywords({"examples": "instances"}) 
    891     def __call__(self, instances): 
    892         attr = StepWiseFSS(instances, add_crit=self.add_crit, 
     890    @deprecated_keywords({"examples": "data"}) 
     891    def __call__(self, data): 
     892        attr = StepWiseFSS(data, add_crit=self.add_crit, 
    893893            delete_crit= self.delete_crit, num_features= self.num_features) 
    894         return instances.select(Orange.data.Domain(attr, instances.domain.class_var)) 
     894        return data.select(Orange.data.Domain(attr, data.domain.class_var)) 
    895895 
    896896StepWiseFSSFilter = deprecated_members({"addCrit": "add_crit", 
  • Orange/misc/__init__.py

    r10199 r10246  
    1414.. class:: CostMatrix 
    1515 
    16     .. attribute:: classVar  
    17          
    18         The (class) attribute to which the matrix applies. This can also be None. 
     16    .. attribute:: class_var  
     17         
     18        The (class) attribute to which the matrix applies. This can 
     19        also be None. 
    1920         
    2021    .. attribute:: dimension (read only) 
     
    2425    .. method:: CostMatrix(dimension[, default cost]) 
    2526     
    26         Constructs a matrix of the given size and initializes it with the default 
    27         cost (1, if not given). All elements of the matrix are assigned the given 
    28         cost, except for the diagonal that have the default cost of 0. 
    29         (Diagonal elements represent correct classifications and these usually 
    30         have no price; you can, however, change this.) 
     27        Constructs a matrix of the given size and initializes it with 
     28        the default cost (1, if not given). All elements of the matrix 
     29        are assigned the given cost, except for the diagonal that have 
     30        the default cost of 0.  (Diagonal elements represent correct 
     31        classifications and these usually have no price; you can, 
     32        however, change this.) 
    3133         
    3234        .. literalinclude:: code/CostMatrix.py 
     
    6365            :lines: 5-7 
    6466             
    65     .. method:: setcost(predicted value, correct value, cost) 
    66      
    67         Set the misclassification cost. The matrix above could be constructed by first 
    68         initializing it with 2s and then changing the prices for virginica's into 1s. 
     67    .. method:: setcost(predicted, correct, cost) 
     68     
     69        Set the misclassification cost. The matrix above could be 
     70        constructed by first initializing it with 2s and then changing 
     71        the prices for virginica's into 1s. 
    6972         
    7073        .. literalinclude:: code/CostMatrix.py 
    7174            :lines: 15-17 
    7275             
    73     .. method:: getcost(predicted value, correct value) 
    74      
    75         Returns the cost of prediction. Values must be integer indices; if classVar is 
    76         set, you can also use symbolic values (strings). Note that there's no way to 
    77         change the size of the matrix. Size is set at construction and does not change. 
    78         For the final example, we shall compute the profits of knowing attribute values 
    79         in the dataset lenses with the same cost-matrix as printed above. 
     76    .. method:: getcost(predicted, correct) 
     77     
     78        Returns the cost of prediction. Values must be integer 
     79        indices; if class_var is set, you can also use symbolic values 
     80        (strings). Note that there's no way to change the size of the 
     81        matrix. Size is set at construction and does not change.  For 
     82        the final example, we shall compute the profits of knowing 
     83        attribute values in the dataset lenses with the same 
     84        cost-matrix as printed above. 
    8085         
    8186        .. literalinclude:: code/CostMatrix.py 
     
    145150         ( 4.000,  8.000, 12.000, 16.000)) 
    146151     
    147     .. method:: __init__(dim[, default_value]) 
     152    .. method:: __init__(dim[, value]) 
    148153 
    149154        Construct a symmetric matrix of the given dimension. 
     
    152157        :type dim: int 
    153158 
    154         :param default_value: default value (0 by default) 
    155         :type default_value: double 
    156          
    157          
    158     .. method:: __init__(instances) 
    159  
    160         Construct a new symmetric matrix containing the given data instances.  
     159        :param value: default value (0 by default) 
     160        :type value: double 
     161         
     162         
     163    .. method:: __init__(data) 
     164 
     165        Construct a new symmetric matrix containing the given data.  
    161166        These can be given as Python list containing lists or tuples. 
    162  
    163         :param instances: data instances 
    164         :type instances: list of lists 
    165167         
    166168        The following example fills a matrix created above with 
     
    232234         
    233235         
    234 ------------------- 
     236 
    235237Indexing 
    236 ------------------- 
     238.......... 
    237239 
    238240For symmetric matrices the order of indices is not important:  
     
    301303 
    302304 
    303 .. class:: Random(initseed) 
     305.. class:: Random(seed) 
    304306 
    305307    :param initseed: Seed used for initializing the random generator. 
     
    312314        :type n: int 
    313315 
    314     .. method:: reset([initseed]) 
     316    .. method:: reset([seed]) 
    315317 
    316318        Reinitialize the random generator with `initseed`. If `initseed` 
  • Orange/statistics/contingency.py

    r9927 r10246  
    1 """ 
    2 .. index:: Contingency table 
    3  
    4 ================= 
    5 Contingency table 
    6 ================= 
    7  
    8 Contingency table contains conditional distributions. Unless explicitly 
    9 'normalized', they contain absolute frequencies, that is, the number of 
    10 instances with a particular combination of two variables' values. If they are 
    11 normalized by dividing each cell by the row sum, the represent conditional 
    12 probabilities of the column variable (here denoted as ``innerVariable``) 
    13 conditioned by the row variable (``outerVariable``). 
    14  
    15 Contingency tables are usually constructed for discrete variables. Tables 
    16 for continuous variables have certain limitations described in a :ref:`separate 
    17 section <contcont>`. 
    18  
    19 The example below loads the monks-1 data set and prints out the conditional 
    20 class distribution given the value of `e`. 
    21  
    22 .. literalinclude:: code/statistics-contingency.py 
    23     :lines: 1-7 
    24  
    25 This code prints out:: 
    26  
    27     1 <0.000, 108.000> 
    28     2 <72.000, 36.000> 
    29     3 <72.000, 36.000> 
    30     4 <72.000, 36.000>  
    31  
    32 Contingencies behave like lists of distributions (in this case, class 
    33 distributions) indexed by values (of `e`, in this 
    34 example). Distributions are, in turn indexed by values (class values, 
    35 here). The variable `e` from the above example is called the outer 
    36 variable, and the class is the inner. This can also be reversed. It is 
    37 also possible to use features for both, outer and inner variable, so 
    38 the table shows distributions of one variable's values given the 
    39 value of another.  There is a corresponding hierarchy of classes: 
    40 :obj:`Table` is a base class for :obj:`VarVar` (both 
    41 variables are attributes) and :obj:`Class` (one variable is 
    42 the class).  The latter is the base class for 
    43 :obj:`VarClass` and :obj:`ClassVar`. 
    44  
    45 The most commonly used of the above classes is :obj:`VarClass` which 
    46 can compute and store conditional probabilities of classes given the feature value. 
    47  
    48 Contingency tables 
    49 ================== 
    50  
    51 .. class:: Table 
    52  
    53     Provides a base class for storing and manipulating contingency 
    54     tables. Although it is not abstract, it is seldom used directly but rather 
    55     through more convenient derived classes described below. 
    56  
    57     .. attribute:: outerVariable 
    58  
    59        Outer variable (:class:`Orange.feature.Descriptor`) whose values are 
    60        used as the first, outer index. 
    61  
    62     .. attribute:: innerVariable 
    63  
    64        Inner variable(:class:`Orange.feature.Descriptor`), whose values are 
    65        used as the second, inner index. 
    66   
    67     .. attribute:: outerDistribution 
    68  
    69         The marginal distribution (:class:`Distribution`) of the outer variable. 
    70  
    71     .. attribute:: innerDistribution 
    72  
    73         The marginal distribution (:class:`Distribution`) of the inner variable. 
    74          
    75     .. attribute:: innerDistributionUnknown 
    76  
    77         The distribution (:class:`distribution.Distribution`) of the inner variable for 
    78         instances for which the outer variable was undefined. This is the 
    79         difference between the ``innerDistribution`` and (unconditional) 
    80         distribution of inner variable. 
    81        
    82     .. attribute:: varType 
    83  
    84         The type of the outer variable (:obj:`Orange.feature.Type`, usually 
    85         :obj:`Orange.feature.Discrete` or 
    86         :obj:`Orange.feature.Continuous`); equals 
    87         ``outerVariable.varType`` and ``outerDistribution.varType``. 
    88  
    89     .. method:: __init__(outer_variable, inner_variable) 
    90       
    91         Construct an instance of contingency table for the given pair of 
    92         variables. 
    93       
    94         :param outer_variable: Descriptor of the outer variable 
    95         :type outer_variable: Orange.feature.Descriptor 
    96         :param outer_variable: Descriptor of the inner variable 
    97         :type inner_variable: Orange.feature.Descriptor 
    98          
    99     .. method:: add(outer_value, inner_value[, weight=1]) 
    100      
    101         Add an element to the contingency table by adding ``weight`` to the 
    102         corresponding cell. 
    103  
    104         :param outer_value: The value for the outer variable 
    105         :type outer_value: int, float, string or :obj:`Orange.data.Value` 
    106         :param inner_value: The value for the inner variable 
    107         :type inner_value: int, float, string or :obj:`Orange.data.Value` 
    108         :param weight: Instance weight 
    109         :type weight: float 
    110  
    111     .. method:: normalize() 
    112  
    113         Normalize all distributions (rows) in the table to sum to ``1``:: 
    114          
    115             >>> cont.normalize() 
    116             >>> for val, dist in cont.items(): 
    117                    print val, dist 
    118  
    119         Output: :: 
    120  
    121             1 <0.000, 1.000> 
    122             2 <0.667, 0.333> 
    123             3 <0.667, 0.333> 
    124             4 <0.667, 0.333> 
    125  
    126         .. note:: 
    127         
    128             This method does not change the ``innerDistribution`` or 
    129             ``outerDistribution``. 
    130          
    131     With respect to indexing, contingency table is a cross between dictionary 
    132     and a list. It supports standard dictionary methods ``keys``, ``values`` and 
    133     ``items``. :: 
    134  
    135         >> print cont.keys() 
    136         ['1', '2', '3', '4'] 
    137         >>> print cont.values() 
    138         [<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>] 
    139         >>> print cont.items() 
    140         [('1', <0.000, 108.000>), ('2', <72.000, 36.000>), 
    141         ('3', <72.000, 36.000>), ('4', <72.000, 36.000>)]  
    142  
    143     Although keys returned by the above functions are strings, contingency can 
    144     be indexed by anything that can be converted into values of the outer 
    145     variable: strings, numbers or instances of ``Orange.data.Value``. :: 
    146  
    147         >>> print cont[0] 
    148         <0.000, 108.000> 
    149         >>> print cont["1"] 
    150         <0.000, 108.000> 
    151         >>> print cont[orange.Value(data.domain["e"], "1")]  
    152  
    153     The length of the table equals the number of values of the outer 
    154     variable. However, iterating through contingency 
    155     does not return keys, as with dictionaries, but distributions. :: 
    156  
    157         >>> for i in cont: 
    158             ... print i 
    159         <0.000, 108.000> 
    160         <72.000, 36.000> 
    161         <72.000, 36.000> 
    162         <72.000, 36.000> 
    163         <72.000, 36.000>  
    164  
    165  
    166 .. class:: Class 
    167  
    168     An abstract base class for contingency tables that contain the class, 
    169     either as the inner or the outer variable. 
    170  
    171     .. attribute:: classVar (read only) 
    172      
    173         The class attribute descriptor; always equal to either 
    174         :obj:`Table.innerVariable` or :obj:``Table.outerVariable``. 
    175  
    176     .. attribute:: variable 
    177      
    178         Variable; always equal either to either ``innerVariable`` or ``outerVariable`` 
    179  
    180     .. method:: add_var_class(variable_value, class_value[, weight=1]) 
    181  
    182         Add an element to contingency by increasing the corresponding count. The 
    183         difference between this and :obj:`Table.add` is that the variable 
    184         value is always the first argument and class value the second, 
    185         regardless of which one is inner and which one is outer. 
    186  
    187         :param variable_value: Variable value 
    188         :type variable_value: int, float, string or :obj:`Orange.data.Value` 
    189         :param class_value: Class value 
    190         :type class_value: int, float, string or :obj:`Orange.data.Value` 
    191         :param weight: Instance weight 
    192         :type weight: float 
    193  
    194  
    195 .. class:: VarClass 
    196  
    197     A class derived from :obj:`Class` in which the variable is 
    198     used as :obj:`Table.outerVariable` and class as the 
    199     :obj:`Table.innerVariable`. This form is a form suitable for 
    200     computation of conditional class probabilities given the variable value. 
    201      
    202     Calling :obj:`VarClass.add_var_class(v, c)` is equivalent to 
    203     :obj:`Table.add(v, c)`. Similar as :obj:`Table`, 
    204     :obj:`VarClass` can compute contingency from instances. 
    205  
    206     .. method:: __init__(feature, class_variable) 
    207  
    208         Construct an instance of :obj:`VarClass` for the given pair of 
    209         variables. Inherited from :obj:`Table`. 
    210  
    211         :param feature: Outer variable 
    212         :type feature: Orange.feature.Descriptor 
    213         :param class_attribute: Class variable; used as ``innerVariable`` 
    214         :type class_attribute: Orange.feature.Descriptor 
    215          
    216     .. method:: __init__(feature, data[, weightId]) 
    217  
    218         Compute the contingency table from data. 
    219  
    220         :param feature: Outer variable 
    221         :type feature: Orange.feature.Descriptor 
    222         :param data: A set of instances 
    223         :type data: Orange.data.Table 
    224         :param weightId: meta attribute with weights of instances 
    225         :type weightId: int 
    226  
    227     .. method:: p_class(value) 
    228  
    229         Return the probability distribution of classes given the value of the 
    230         variable. 
    231  
    232         :param value: The value of the variable 
    233         :type value: int, float, string or :obj:`Orange.data.Value` 
    234         :rtype: Orange.statistics.distribution.Distribution 
    235  
    236  
    237     .. method:: p_class(value, class_value) 
    238  
    239         Returns the conditional probability of the class_value given the 
    240         feature value, p(class_value|value) (note the order of arguments!) 
    241          
    242         :param value: The value of the variable 
    243         :type value: int, float, string or :obj:`Orange.data.Value` 
    244         :param class_value: The class value 
    245         :type value: int, float, string or :obj:`Orange.data.Value` 
    246         :rtype: float 
    247  
    248     .. literalinclude:: code/statistics-contingency3.py 
    249         :lines: 1-23 
    250  
    251     The inner and the outer variable and their relations to the class are 
    252     as follows:: 
    253  
    254         Inner variable:  y 
    255         Outer variable:  e 
    256      
    257         Class variable:  y 
    258         Feature:         e 
    259  
    260     Distributions are normalized, and probabilities are elements from the 
    261     normalized distributions. Knowing that the target concept is 
    262     y := (e=1) or (a=b), distributions are as expected: when e equals 1, class 1 
    263     has a 100% probability, while for the rest, probability is one third, which 
    264     agrees with a probability that two three-valued independent features 
    265     have the same value. :: 
    266  
    267         Distributions: 
    268           p(.|1) = <0.000, 1.000> 
    269           p(.|2) = <0.662, 0.338> 
    270           p(.|3) = <0.659, 0.341> 
    271           p(.|4) = <0.669, 0.331> 
    272      
    273         Probabilities of class '1' 
    274           p(1|1) = 1.000 
    275           p(1|2) = 0.338 
    276           p(1|3) = 0.341 
    277           p(1|4) = 0.331 
    278      
    279         Distributions from a matrix computed manually: 
    280           p(.|1) = <0.000, 1.000> 
    281           p(.|2) = <0.662, 0.338> 
    282           p(.|3) = <0.659, 0.341> 
    283           p(.|4) = <0.669, 0.331> 
    284  
    285  
    286 .. class:: ClassVar 
    287  
    288     :obj:`ClassVar` is similar to :obj:`VarClass` except 
    289     that the class is outside and the variable is inside. This form of 
    290     contingency table is suitable for computing conditional probabilities of 
    291     variable given the class. All methods get the two arguments in the same 
    292     order as :obj:`VarClass`. 
    293  
    294     .. method:: __init__(feature, class_variable) 
    295  
    296         Construct an instance of :obj:`VarClass` for the given pair of 
    297         variables. Inherited from :obj:`Table`, except for the reversed 
    298         order of arguments. 
    299  
    300         :param feature: Outer variable 
    301         :type feature: Orange.feature.Descriptor 
    302         :param class_variable: Class variable 
    303         :type class_variable: Orange.feature.Descriptor 
    304          
    305     .. method:: __init__(feature, data[, weightId]) 
    306  
    307         Compute contingency table from the data. 
    308  
    309         :param feature: Descriptor of the outer variable 
    310         :type feature: Orange.feature.Descriptor 
    311         :param data: A set of instances 
    312         :type data: Orange.data.Table 
    313         :param weightId: meta attribute with weights of instances 
    314         :type weightId: int 
    315  
    316     .. method:: p_attr(class_value) 
    317  
    318         Return the probability distribution of variable given the class. 
    319  
    320         :param class_value: The value of the variable 
    321         :type class_value: int, float, string or :obj:`Orange.data.Value` 
    322         :rtype: Orange.statistics.distribution.Distribution 
    323  
    324     .. method:: p_attr(value, class_value) 
    325  
    326         Returns the conditional probability of the value given the 
    327         class, p(value|class_value). 
    328  
    329         :param value: Value of the variable 
    330         :type value: int, float, string or :obj:`Orange.data.Value` 
    331         :param class_value: Class value 
    332         :type value: int, float, string or :obj:`Orange.data.Value` 
    333         :rtype: float 
    334  
    335     .. literalinclude:: code/statistics-contingency4.py 
    336         :lines: 1-27 
    337  
    338     The role of the feature and the class are reversed compared to 
    339     :obj:`ClassVar`:: 
    340      
    341         Inner variable:  e 
    342         Outer variable:  y 
    343      
    344         Class variable:  y 
    345         Feature:         e 
    346      
    347     Distributions given the class can be printed out by calling :meth:`p_attr`. 
    348      
    349     .. literalinclude:: code/statistics-contingency4.py 
    350         :lines: 30-31 
    351      
    352     will print:: 
    353         p(.|0) = <0.000, 0.333, 0.333, 0.333> 
    354         p(.|1) = <0.500, 0.167, 0.167, 0.167> 
    355      
    356     If the class value is '0', the attribute `e` cannot be `1` (the first 
    357     value), while distribution across other values is uniform.  If the class 
    358     value is `1`, `e` is `1` for exactly half of instances, and distribution of 
    359     other values is again uniform. 
    360  
    361 .. class:: VarVar 
    362  
    363     Contingency table in which none of the variables is the class.  The class 
    364     is derived from :obj:`Table`, and adds an additional constructor and 
    365     method for getting conditional probabilities. 
    366  
    367     .. method:: VarVar(outer_variable, inner_variable) 
    368  
    369         Inherited from :obj:`Table`. 
    370  
    371     .. method:: __init__(outer_variable, inner_variable, data[, weightId]) 
    372  
    373         Compute the contingency from the given instances. 
    374  
    375         :param outer_variable: Outer variable 
    376         :type outer_variable: Orange.feature.Descriptor 
    377         :param inner_variable: Inner variable 
    378         :type inner_variable: Orange.feature.Descriptor 
    379         :param data: A set of instances 
    380         :type data: Orange.data.Table 
    381         :param weightId: meta attribute with weights of instances 
    382         :type weightId: int 
    383  
    384     .. method:: p_attr(outer_value) 
    385  
    386         Return the probability distribution of the inner variable given the 
    387         outer variable value. 
    388  
    389         :param outer_value: The value of the outer variable 
    390         :type outer_value: int, float, string or :obj:`Orange.data.Value` 
    391         :rtype: Orange.statistics.distribution.Distribution 
    392   
    393     .. method:: p_attr(outer_value, inner_value) 
    394  
    395         Return the conditional probability of the inner_value 
    396         given the outer_value. 
    397  
    398         :param outer_value: The value of the outer variable 
    399         :type outer_value: int, float, string or :obj:`Orange.data.Value` 
    400         :param inner_value: The value of the inner variable 
    401         :type inner_value: int, float, string or :obj:`Orange.data.Value` 
    402         :rtype: float 
    403  
    404     The following example investigates which material is used for 
    405     bridges of different lengths. 
    406      
    407     .. literalinclude:: code/statistics-contingency5.py 
    408         :lines: 1-17 
    409  
    410     Short bridges are mostly wooden or iron, and the longer (and most of the 
    411     middle sized) are made from steel:: 
    412      
    413         SHORT: 
    414            WOOD (56%) 
    415            IRON (44%) 
    416      
    417         MEDIUM: 
    418            WOOD (9%) 
    419            IRON (11%) 
    420            STEEL (79%) 
    421      
    422         LONG: 
    423            STEEL (100%) 
    424      
    425     As all other contingency tables, this one can also be computed "manually". 
    426      
    427     .. literalinclude:: code/statistics-contingency5.py 
    428         :lines: 18- 
    429  
    430  
    431 Contingencies for entire domain 
    432 =============================== 
    433  
    434 A list of contingency tables, either :obj:`VarClass` or 
    435 :obj:`ClassVar`. 
    436  
    437 .. class:: Domain 
    438  
    439     .. method:: __init__(data[, weightId=0, classOuter=0|1]) 
    440  
    441         Compute a list of contingency tables. 
    442  
    443         :param data: A set of instances 
    444         :type data: Orange.data.Table 
    445         :param weightId: meta attribute with weights of instances 
    446         :type weightId: int 
    447         :param classOuter: `True`, if class is the outer variable 
    448         :type classOuter: bool 
    449  
    450         .. note:: 
    451          
    452             ``classIsOuter`` cannot be given as positional argument, 
    453             but needs to be passed by keyword. 
    454  
    455     .. attribute:: classIsOuter (read only) 
    456  
    457         Tells whether the class is the outer or the inner variable. 
    458  
    459     .. attribute:: classes 
    460  
    461         Contains the distribution of class values on the entire dataset. 
    462  
    463     .. method:: normalize() 
    464  
    465         Call normalize for all contingencies. 
    466  
    467     The following script prints the contingency tables for features 
    468     "a", "b" and "e" for the dataset Monk 1. 
    469          
    470     .. literalinclude:: code/statistics-contingency8.py 
    471         :lines: 9 
    472  
    473     Contingency tables of type :obj:`VarClass` give 
    474     the conditional distributions of classes, given the value of the variable. 
    475      
    476     .. literalinclude:: code/statistics-contingency8.py 
    477         :lines: 12-  
    478  
    479 .. _contcont: 
    480  
    481 Contingency tables for continuous variables 
    482 =========================================== 
    483  
    484 If the outer variable is continuous, the index must be one of the 
    485 values that do exist in the contingency table; other values raise an 
    486 exception: 
    487  
    488 .. literalinclude:: code/statistics-contingency6.py 
    489     :lines: 1-4,17- 
    490  
    491 Since even rounding can be a problem, the only safe way to get the key 
    492 is to take it from from the contingencies' ``keys``. 
    493  
    494 Contingency tables with discrete outer variable and continuous inner variables 
    495 are more useful, since methods :obj:`ContingencyClassVar.p_class` 
    496 and :obj:`ContingencyVarClass.p_attr` use the primitive density estimation 
    497 provided by :obj:`Orange.statistics.distribution.Distribution`. 
    498  
    499 For example, :obj:`ClassVar` on the iris dataset can return the 
    500 probability of the sepal length 5.5 for different classes: 
    501  
    502 .. literalinclude:: code/statistics-contingency7.py 
    503  
    504 The script outputs:: 
    505  
    506     Estimated frequencies for e=5.5 
    507       f(5.5|Iris-setosa) = 2.000 
    508       f(5.5|Iris-versicolor) = 5.000 
    509       f(5.5|Iris-virginica) = 1.000 
    510  
    511 """ 
    512  
    5131from Orange.core import Contingency as Table 
    5142from Orange.core import ContingencyAttrAttr as VarVar 
  • docs/reference/rst/Orange.classification.knn.rst

    r9372 r10246  
     1.. py:currentmodule:: Orange.classification.knn 
     2 
     3.. index: k-nearest neighbors (kNN) 
     4.. index: 
     5   single: classification; k-nearest neighbors (kNN) 
     6    
     7***************************** 
     8k-nearest neighbors (``knn``) 
     9***************************** 
     10 
     11The `nearest neighbors 
     12algorithm <http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_ is one of the most basic, 
     13`lazy <http://en.wikipedia.org/wiki/Lazy_learning>`_ machine learning algorithms. 
     14The learner only needs to store the instances of training data, while the classifier 
     15does all the work by searching this list for the instances most similar to 
     16the data instance being classified: 
     17 
     18.. literalinclude:: code/knnExample0.py 
     19 
     20.. class:: kNNLearner(k, distance_constructor, weight_id) 
     21 
     22    Lazy classifier that stores instances from the training set. Constructor 
     23    parameters set the corresponding attributes. 
     24 
     25    .. attribute:: k 
     26 
     27        number of nearest neighbors used in classification. If set to 0 
     28        (default), the square root of the numbers of instances is used. 
     29 
     30    .. attribute:: distance_constructor 
     31 
     32        component that constructs the object for measuring distances between 
     33        instances. Defaults to :class:`~Orange.distance.Euclidean`. 
     34 
     35    .. attribute:: weight_id 
     36     
     37        id of meta attribute with instance weights 
     38 
     39    .. attribute:: rank_weight 
     40 
     41        Enables weighting by ranks (default: :obj:`true`) 
     42 
     43    .. method:: __call__(data) 
     44 
     45        Return a learned :class:`~kNNClassifier`. Learning consists of 
     46        constructing a distance measure and passing it to the classifier 
     47        along with :obj:`instances` and all attributes. 
     48 
     49        :param instances: training instances 
     50        :type instances: :class:`~Orange.data.Table` 
     51 
     52 
     53.. class:: kNNClassifier(domain, weight_id, k, find_nearest, rank_weight, n_examples) 
     54 
     55    .. method:: __call__(instance, return_type) 
     56 
     57        :param instance: given instance to be classified 
     58        :type instance: Orange.data.Instance 
     59         
     60        :param return_type: return value and probabilities, only value or only 
     61                            probabilities 
     62        :type return_type: :obj:`~Orange.classification.Classifier.GetBoth`, 
     63                           :obj:`~Orange.classification.Classifier.GetValue`, 
     64                           :obj:`~Orange.classification.Classifier.GetProbabilities` 
     65         
     66        :rtype: :class:`~Orange.data.Value`, 
     67              :class:`~Orange.statistics.distribution.Distribution` or a 
     68              tuple with both 
     69         
     70    .. method:: find_nearest(instance) 
     71     
     72    A component which finds the nearest neighbors of a given instance. 
     73         
     74    :param instance: given instance 
     75    :type instance: :class:`~Orange.data.Instance` 
     76         
     77    :rtype: :class:`Orange.data.Instance` 
     78     
     79     
     80    .. attribute:: k 
     81     
     82        Number of neighbors. If set to 0 (which is also the default value),  
     83        the square root of the number of examples is used. 
     84     
     85    .. attribute:: rank_weight 
     86     
     87        Enables weighting by rank (default: :obj:`true`). 
     88     
     89    .. attribute:: weight_id 
     90     
     91        ID of meta attribute with weights of examples 
     92     
     93    .. attribute:: n_examples 
     94     
     95        The number of learning instances. It is used to compute the number of  
     96        neighbors if the value of :attr:`kNNClassifier.k` is zero. 
     97 
     98When called to classify an instance, the classifier first calls  
     99:meth:`kNNClassifier.find_nearest`  
     100to retrieve a list with :attr:`kNNClassifier.k` nearest neighbors. The 
     101component :meth:`kNNClassifier.find_nearest` has  
     102a stored table of instances (those that have been passed to the learner)  
     103together with their weights. If instances are weighted (non-zero  
     104:obj:`weight_ID`), weights are considered when counting the neighbors. 
     105 
     106If :meth:`kNNClassifier.find_nearest` returns only one neighbor  
     107(this is the case if :obj:`k=1`), :class:`kNNClassifier` returns the 
     108neighbor's class. 
     109 
     110Otherwise, the retrieved neighbors vote about the class prediction 
     111(or probability of classes). Voting has double weights. As first, if 
     112instances are weighted, their weights are respected. Secondly, nearer 
     113neighbors have a greater impact on the prediction; the weight of instance 
     114is computed as exp(-t:sup:`2`/s:sup:`2`), where the meaning of t depends 
     115on the setting of :obj:`rank_weight`. 
     116 
     117* if :obj:`rank_weight` is :obj:`false`, :obj:`t` is the distance from the 
     118  instance being classified 
     119* if :obj:`rank_weight` is :obj:`true`, neighbors are ordered and :obj:`t` 
     120  is the position of the neighbor on the list (a rank) 
     121 
     122 
     123In both cases, :obj:`s` is chosen so that the impact of the farthest instance 
     124is 0.001. 
     125 
     126Weighting gives the classifier a certain insensitivity to the number of 
     127neighbors used, making it possible to use large :obj:`k`'s. 
     128 
     129The classifier can treat continuous and discrete features, and can even 
     130distinguish between ordinal and nominal features. See information on 
     131distance measuring for details. 
     132 
     133Examples 
     134-------- 
     135 
     136The learner will be tested on an 'iris' data set. The data will be split  
     137into training (80%) and testing (20%) instances. We will use the former  
     138for "training" the classifier and test it on five testing instances  
     139randomly selected from a part of (:download:`knnlearner.py <code/knnlearner.py>`): 
     140 
     141.. literalinclude:: code/knnExample1.py 
     142 
     143The output of this code is::  
     144     
     145    Iris-setosa Iris-setosa 
     146    Iris-versicolor Iris-versicolor 
     147    Iris-versicolor Iris-versicolor 
     148    Iris-setosa Iris-setosa 
     149    Iris-setosa Iris-setosa 
     150 
     151The secret to kNN's success is that the instances in the iris data set appear in 
     152three well separated clusters. The classifier's accuracy will remain 
     153excellent even with a very large or very small number of neighbors. 
     154 
     155As many experiments have shown, a selection of instances of distance measures 
     156has neither a greater nor more predictable effect on the performance of kNN 
     157classifiers. Therefore there is not much point in changing the default. If you 
     158decide to do so, the distance_constructor must be set to an instance 
     159of one of the classes for distance measuring. This can be seen in the following 
     160part of (:download:`knnlearner.py <code/knnlearner.py>`): 
     161 
     162.. literalinclude:: code/knnExample2.py 
     163 
     164The output of this code is:: 
     165 
     166    Iris-virginica Iris-versicolor 
     167    Iris-setosa Iris-setosa 
     168    Iris-versicolor Iris-versicolor 
     169    Iris-setosa Iris-setosa 
     170    Iris-setosa Iris-setosa 
     171 
     172The result is still perfect. 
     173 
     174.. index: fnn 
     175 
     176 
     177Finding nearest neighbors 
     178------------------------- 
     179 
     180Orange provides classes for finding the nearest neighbors of a given 
     181reference instance. While we might add some smarter classes in the future, we 
     182now have only two - abstract classes that define the general behavior of 
     183neighbor searching classes, and classes that implement brute force search. 
     184 
     185As is the norm in Orange, there are a pair of classes: a class that does the work 
     186(:class:`FindNearest`) and a class that constructs it ("learning" - getting the 
     187instances and arranging them in an appropriate data structure that allows for 
     188searching) (:class:`FindNearestConstructor`). 
     189 
     190.. class:: FindNearest 
     191 
     192    A class for a brute force search for nearest neighbors. It stores a table  
     193    of instances (it's its own copy of instances, not only Orange.data.Table 
     194    with references to another Orange.data.Table). When asked for neighbors, 
     195    it measures distances to all instances, stores them in a heap and returns  
     196    the first k as an Orange.data.Table with references to instances stored in 
     197    FindNearest's field instances). 
     198     
     199    .. attribute:: distance 
     200     
     201        a component that measures the distance between examples 
     202     
     203    .. attribute:: examples 
     204     
     205        a stored list of instances 
     206     
     207    .. attribute:: weight_ID 
     208     
     209        ID of meta attribute with weight 
     210     
     211    .. method:: __call__(instance, k) 
     212     
     213        Return a data table with ``k`` nearest neighbours of ``instance``. 
     214 
     215    :param instance: given instance 
     216    :type instance: Orange.data.Instance 
     217 
     218    :param k: number of neighbors 
     219    :type k: int 
     220 
     221    :rtype: :obj:`Orange.data.Table` 
     222     
     223.. class:: FindNearestConstructor() 
     224 
     225     
     226    A class that constructs FindNearest. It calls the inherited 
     227    distance_constructor, which constructs a distance measure. 
     228    The distance measure, along with the instances weight_ID and 
     229    distance_ID, is then passed to the just constructed instance 
     230    of FindNearest_BruteForce. 
     231 
     232    If there are more instances with the same distance fighting for the last 
     233    places, the tie is resolved by randomly picking the appropriate number of 
     234    instances. A local random generator is constructed and initiated by a 
     235    constant computed from the reference instance. The effect of this is that 
     236    the same random neighbors will be chosen for the instance each time 
     237    FindNearest_BruteForce 
     238    is called. 
     239     
     240    .. attribute:: distance_constructor 
     241     
     242        A component of class ExamplesDistanceConstructor that "learns" to 
     243        measure distances between instances. Learning can mean, for instances, 
     244        storing the ranges of continuous features or the number of values of 
     245        a discrete feature (see the page about measuring distances for more 
     246        information). The result of learning is an instance of  
     247        ExamplesDistance that should be used for measuring distances 
     248        between instances. 
     249     
     250    .. attribute:: include_same 
     251     
     252        Tells whether or not to include the examples that are same as the reference; 
     253        the default is true. 
     254     
     255    .. method:: __call__(table, weightID, distanceID) 
     256     
     257        Constructs an instance of FindNearest that would return neighbors of 
     258        a given instance, obeying weight_ID when counting them (also, some  
     259        measures of distance might consider weights as well) and storing the  
     260        distances in a meta attribute with ID distance_ID. 
     261     
     262        :param table: table of instances 
     263        :type table: Orange.data.Table 
     264         
     265        :param weight_ID: id of meta attribute with weights of instances 
     266        :type weight_ID: int 
     267         
     268        :param distance_ID: id of meta attribute that will save distances 
     269        :type distance_ID: int 
     270         
     271        :rtype: :class:`FindNearest` 
     272 
     273Examples 
     274-------- 
     275 
     276The following script (:download:`knnInstanceDistance.py <code/knnInstanceDistance.py>`) 
     277shows how to find the five nearest neighbors of the first instance 
     278in the lenses dataset. 
     279 
     280.. literalinclude:: code/knnInstanceDistance.py 
     281 
     282 
    1283.. automodule:: Orange.classification.knn 
  • docs/reference/rst/Orange.classification.logreg.rst

    r9818 r10246  
    103103 
    104104 
    105     .. method:: __call__(examples, weight_id) 
     105    .. method:: __call__(data, weight_id) 
    106106 
    107107        Performs the fitting. There can be two different cases: either 
  • docs/reference/rst/Orange.classification.rst

    r10227 r10246  
    55################################### 
    66 
    7 All Orange prediction models for classification consist of two parts, 
    8 a learner and a classifier. A learner is constructed with all parameters that 
    9 will be used for learning. When learner is called with a data table, 
    10 a model is fitted to the data and returned in the form of a 
    11 Classifier, which is then used for predicting the dependent variable(s) of 
    12 new instances. 
     7Induction of models in Orange is implemented through a two-class schema: 
     8"learners" are classes that induce models, and classifiers represent 
     9trained models. The learner holds the parameters that 
     10are used for fitting the model. When learner is called with a data table, 
     11it fits a model and returns an instance of classifier. Classifiers can be subsequently used to predict dependent values for new data instances. 
    1312 
    1413.. literalinclude:: code/bayes-run.py 
     
    1918 
    2019.. toctree:: 
    21    :maxdepth: 2 
     20   :maxdepth: 1 
    2221 
    2322   Orange.classification.bayes 
     
    3130   Orange.classification.classfromvar 
    3231    
     32Base classes 
     33------------ 
     34 
     35All learners and classifiers, including regressors, are derived from the following two clases. 
     36 
     37.. class:: Learner() 
     38 
     39    Base class for all orange learners. 
     40 
     41    .. method:: __call__(data) 
     42 
     43        Fit a model and return it as an instance of :class:`Classifier`. 
     44 
     45        This method is abstract and needs to be implemented on each learner. 
     46 
     47.. class:: Classifier() 
     48 
     49    Base class for all orange classifiers. 
     50 
     51    .. method:: __call__(instance, return_type=GetValue) 
     52 
     53        Classify a new instance using this model. Results depends upon 
     54        the second parameter that must be one of the following. 
     55 
     56    :obj:`Orange.classification.Classifier.GetValue` 
     57 
     58        Return value of the target class when performing prediction. 
     59 
     60    :obj:`Orange.classification.Classifier.GetProbabilities` 
     61 
     62        Return probability of each target class when performing prediction. 
     63 
     64    :obj:`Orange.classification.Classifier.GetBoth` 
     65 
     66        Return a tuple of target class value and probabilities for each class. 
     67 
     68        This method is abstract and needs to be implemented on each 
     69        classifier. 
     70 
     71        :param instance: data instance to be classified. 
     72        :type instance: :class:`~Orange.data.Instance` 
     73 
     74        :param return_type: what needs to be predicted 
     75        :type return_type: :obj:`GetBoth`, 
     76                           :obj:`GetValue`, 
     77                           :obj:`GetProbabilities` 
     78 
     79        :rtype: :class:`~Orange.data.Value`, 
     80              :class:`~Orange.statistics.distribution.Distribution` or a 
     81              tuple with both 
     82 
     83 
    3384Constant Classifier 
    3485------------------- 
    3586 
    36 The classification module also contains a classifier that always predicts 
    37 constant values regardless of given data instances. It is usually not used 
     87The classification module also contains a classifier that always predicts a 
     88constant value regardless of given data instances. It is usually not used 
    3889directly but through other other learners and methods, such as 
    3990:obj:`~Orange.classification.majority.MajorityLearner`. 
     
    74125        :type dstribution: :obj:`Orange.statistics.distribution.Distribution` 
    75126        
    76     .. method:: __call__(instances, return_type) 
     127    .. method:: __call__(data, return_type) 
    77128         
    78129        ConstantClassifier always returns the same prediction 
     
    82133 
    83134 
    84 Writing custom Classifiers 
    85 -------------------------- 
    86  
    87 When developing new prediction models, one should extend :obj:`Learner` and 
    88 :obj:`Classifier`\. Code that infers the model from the data should be placed 
    89 in learner's :obj:`~Learner.__call__` method. This method should 
    90 return a :obj:`Classifier`. Classifiers' :obj:`~Classifier.__call__` method 
    91 should  return the prediction; :class:`~Orange.data.Value`, 
    92 :class:`~Orange.statistics.distribution.Distribution` or a tuple with both 
    93 based on the value of the parameter :obj:`return_type`. 
    94  
    95 .. class:: Learner() 
    96  
    97     Base class for all orange learners. 
    98  
    99     .. method:: __call__(instances) 
    100  
    101         Fit a model and return it as an instance of :class:`Classifier`. 
    102  
    103         This method is abstract and needs to be implemented on each learner. 
    104  
    105 .. class:: Classifier() 
    106  
    107     Base class for all orange classifiers. 
    108  
    109     .. attribute:: GetValue 
    110  
    111         Return value of the target class when performing prediction. 
    112  
    113     .. attribute:: GetProbabilities 
    114  
    115         Return probability of each target class when performing prediction. 
    116  
    117     .. attribute:: GetBoth 
    118  
    119         Return a tuple of target class value and probabilities for each class. 
    120  
    121  
    122     .. method:: __call__(instance, return_type) 
    123  
    124         Classify a new instance using this model. 
    125  
    126         This method is abstract and needs to be implemented on each classifier. 
    127  
    128         :param instance: data instance to be classified. 
    129         :type instance: :class:`~Orange.data.Instance` 
    130  
    131         :param return_type: what needs to be predicted 
    132         :type return_type: :obj:`GetBoth`, 
    133                            :obj:`GetValue`, 
    134                            :obj:`GetProbabilities` 
    135  
    136         :rtype: :class:`~Orange.data.Value`, 
    137               :class:`~Orange.statistics.distribution.Distribution` or a 
    138               tuple with both 
  • docs/reference/rst/Orange.data.domain.rst

    r10086 r10246  
    266266         :type class_vars: list 
    267267 
    268      .. method:: __init__(features, class_variable[, class_vars=]) 
     268     .. method:: __init__(features, class_var[, class_vars=]) 
    269269 
    270270         Construct a domain with the given list of features and the 
  • docs/reference/rst/Orange.data.filter.rst

    r10165 r10246  
    4343        return either ``True`` or ``False``. 
    4444 
    45     .. method:: __call__(table) 
     45    .. method:: __call__(data) 
    4646 
    4747        Return a new data table containing the instances that match 
  • docs/reference/rst/Orange.data.instance.rst

    r9958 r10246  
    302302        :type key_type: `type`` 
    303303 
    304     .. method:: has_meta(meta_attr) 
     304    .. method:: has_meta(attr) 
    305305 
    306306        Return ``True`` if the data instance has the specified meta 
    307307        attribute. 
    308308 
    309         :param meta_attr: meta attribute 
    310         :type meta_attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor` 
    311  
    312     .. method:: remove_meta(meta_attr) 
     309        :param attr: meta attribute 
     310        :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor` 
     311 
     312    .. method:: remove_meta(attr) 
    313313 
    314314        Remove the specified meta attribute. 
    315315 
    316         :param meta_attr: meta attribute 
    317         :type meta_attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor` 
    318  
    319     .. method:: get_weight(meta_attr) 
     316        :param attr: meta attribute 
     317        :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor` 
     318 
     319    .. method:: get_weight(attr) 
    320320 
    321321        Return the value of the specified meta attribute. The 
    322322        attribute's value must be continuous and is returned as ``float``. 
    323323 
    324         :param meta_attr: meta attribute 
    325         :type meta_attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor` 
    326  
    327     .. method:: set_weight(meta_attr, weight=1) 
     324        :param attr: meta attribute 
     325        :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor` 
     326 
     327    .. method:: set_weight(attr, weight=1) 
    328328 
    329329        Set the value of the specified meta attribute to ``weight``. 
    330330 
    331         :param meta_attr: meta attribute 
    332         :type meta_attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor` 
     331        :param attr: meta attribute 
     332        :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor` 
    333333        :param weight: weight of instance 
    334334        :type weight: ``float`` 
  • docs/reference/rst/Orange.data.sample.rst

    r10073 r10246  
    8383    .. method:: __call__(data) 
    8484 
    85         Return a list of indices. The argument can be either the 
    86         desired length of the list or a set of instances, given as 
    87         :obj:`Orange.data.Table` or as plain Python list. In the 
    88         former case, sampling cannot be stratified. 
     85        Return a list of indices for the given data table. If data has 
     86        a discrete class, sampling can be stratified. 
     87 
     88    .. method:: __call__(n) 
     89 
     90        Return a list of ``n`` indices. Sampling cannot be stratified. 
    8991 
    9092.. class:: SubsetIndices2 
  • docs/reference/rst/Orange.data.table.rst

    r10069 r10246  
    222222        same for all matching instances from both tables. 
    223223 
    224     .. method:: append(inst) 
     224    .. method:: append(instance) 
    225225 
    226226        Append the given instance to the end of the table. 
    227227 
    228         :param inst: instance to be appended 
    229         :type inst: :obj:`Orange.data.Instance` or a list 
     228        :param instance: instance to be appended 
     229        :type instance: :obj:`Orange.data.Instance` or a list 
    230230 
    231231        .. literalinclude:: code/datatable1.py 
     
    240240 
    241241 
    242     .. method:: select(filter[, idx, negate=False]) 
     242    .. method:: select(folds[, select, negate=False]) 
    243243 
    244244        Return a subset of instances as a new :obj:`Table`. The first 
     
    248248        list. 
    249249 
    250         If the second argument is given, it must be an integer; 
    251         select will then return the data instances for which the 
    252         corresponding `filter`'s elements match `idx`. 
    253  
    254         The third argument, `negate`, can only be given as a 
    255         keyword. Its effect is to negate the selection. 
     250        If the second argument is given, it must be an integer; method 
     251        ``select`` will then return the data instances for which the 
     252        corresponding ``fold``'s elements match the value of the 
     253        argument ``select``. 
     254 
     255        The third argument, `negate` inverts the selection. It can 
     256        only be given as a keyword. 
    256257 
    257258        Note: This method should be used when the selected data 
    258         instances are going to be modified. In all other cases, 
    259         method :obj:`select_ref` is preferred. 
    260  
    261         :param filt: filter list 
    262         :type filt: list of integers 
    263         :param idx: selects which instances to pick 
    264         :type idx: int 
    265         :param negate: negates the selection 
     259        instances are going to be modified later on. In all other 
     260        cases, method :obj:`select_ref` is preferred. 
     261 
     262        :param folds: list of fold indices corresponding to data instances 
     263        :type folds: list 
     264        :param select: select which instances to pick 
     265        :type select: int 
     266        :param negate: inverts the selection 
    266267        :type negate: bool 
    267268        :rtype: :obj:`Orange.data.Table` 
     
    307308            [9.000000] 
    308309 
    309     .. method:: select_ref(filt[, idx, negate=False]) 
     310    .. method:: select_ref(folds[, select, negate=False]) 
    310311 
    311312        Same as :obj:`select`, except that the resulting table 
     
    316317        since it consumes less memory. 
    317318 
    318         :param filt: filter list 
    319         :type filt: list of integers 
    320         :param idx: selects which instances to pick 
    321         :type idx: int 
    322         :param negate: negates the selection 
     319        :param folds: list of fold indices corresponding to data instances 
     320        :type folds: list 
     321        :param select: select which instances to pick 
     322        :type select: int 
     323        :param negate: inverts the selection 
    323324        :type negate: bool 
    324325        :rtype: :obj:`Orange.data.Table` 
    325  
    326     .. method:: select_list(filt[, idx, negate=False]) 
    327  
    328         Same as :obj:`select`, except that it returns a Python list 
    329         with data instances. 
    330  
    331         :param filt: filter list 
    332         :type filt: list of integers 
    333         :param idx: selects which instances to pick 
    334         :type idx: int 
    335         :param negate: negates the selection 
    336         :type negate: bool 
    337         :rtype: list 
    338326 
    339327    .. method:: get_items(indices) 
     
    412400            Same as the above two, except that they return a table 
    413401            with references to instances instead of their copies. 
    414  
    415     .. method:: filter_list(conditions), filter_list(filter) 
    416  
    417             As above, except that it returns a pure Python list with 
    418             data instances. 
    419402 
    420403    .. method:: filter_bool(conditions), filter_bool(filter) 
     
    432415            :rtype: :obj:`Orange.data.Table` 
    433416 
    434     .. method:: translate(features[, keep_metas]) 
     417    .. method:: translate(variables[, keep_metas]) 
    435418 
    436419            Similar to above, except that the domain is given by a 
     
    439422            original domain. 
    440423 
    441             :param features: features for the new data 
    442             :type domain: list 
     424            :param variables: variables for the new data 
     425            :type variables: list 
    443426            :rtype: :obj:`Orange.data.Table` 
    444427 
     
    575558            :rtype: None 
    576559 
    577     .. method:: sort([features]) 
    578  
    579             Sort the data by attribute values. The argument gives the 
    580             features ordered by importance. If omitted, the order from 
    581             the domain is used. Note that the values of discrete 
     560    .. method:: sort([variables]) 
     561 
     562            Sort the data table. The argument gives the 
     563            values ordered by importance. If omitted, the order from 
     564            the domain is used. Values of discrete 
    582565            features are not ordered alphabetically but according to 
    583566            the :obj:`Orange.feature.Discrete.values`. 
     
    592575            Randomly shuffle the data instances. 
    593576 
    594     .. method:: add_meta_attribute(id[, value=1]) 
     577    .. method:: add_meta_attribute(attr[, value=1]) 
    595578 
    596579            Add a meta value to all data instances. The first argument 
     
    598581            of a meta attribute registered in the domain. 
    599582 
    600     .. method:: remove_meta_attribute(id) 
     583    .. method:: remove_meta_attribute(attr) 
    601584 
    602585            Remove a meta attribute from all data instances. 
  • docs/reference/rst/Orange.distance.rst

    r9821 r10246  
    3838.. class:: DistanceConstructor 
    3939 
    40     .. method:: __call__([instances, weightID][, distributions][, basic_var_stat]) 
     40    .. method:: __call__([data, weightID][, distributions][, basic_stat]) 
    4141 
    4242        Constructs an :obj:`Distance`. Not all arguments are required. 
    43         Most measures can be constructed from basic_var_stat; if it is 
     43        Most measures can be constructed from basic_stat; if it is 
    4444        not given, instances or distributions can be used. 
    4545 
  • docs/reference/rst/Orange.feature.descriptor.rst

    r10169 r10246  
    8686           :rtype: :class:`Orange.data.Value` 
    8787 
    88     .. method:: compute_value(inst) 
     88    .. method:: compute_value(instance) 
    8989 
    9090           Compute the value of the variable given the instance by 
     
    132132            there is no base value. 
    133133 
     134    .. method:: __init__(name) 
     135 
     136        Construct a descriptor for variable with the given name. 
     137 
    134138    .. method:: add_value(s) 
    135139 
     
    187191 
    188192        The range used for :obj:`randomvalue`. 
     193 
     194    .. method:: __init__(name) 
     195 
     196        Construct a descriptor for variable with the given name. 
     197 
    189198 
    190199String variables 
     
    213222    string, enclose the string in double quotes; these are removed when the 
    214223    string is loaded. 
     224 
     225    .. method:: __init__(name) 
     226 
     227        Construct a descriptor for variable with the given name. 
    215228 
    216229Python objects as variables 
  • docs/reference/rst/Orange.feature.discretization.rst

    r10137 r10246  
    7575.. class:: Discretization 
    7676 
    77     .. method:: __call__(feature, data[, weightID]) 
    78  
    79         Given a continuous ``feature``, ``data`` and, optionally id of 
    80         attribute with example weight, this function returns a discretized 
    81         feature. Argument ``feature`` can be a descriptor, index or 
    82         name of the attribute. 
     77    .. method:: __call__(variable, data[, weightID]) 
     78 
     79        Given a continuous ``variable``, ``data`` and, optionally id 
     80        of attribute with example weight, this function returns a 
     81        discretized feature. Argument ``variable`` can be a 
     82        :obj:`~Orange.feature.Descriptor`, index or name of the 
     83        variable within ``data.domain``. 
    8384 
    8485 
     
    230231    attribute from an existing one. 
    231232 
    232     .. method:: construct_variable(feature) 
    233  
    234         Constructs a descriptor for a new feature. The new feature's name is equal to ``feature.name`` 
    235         prefixed by "D\_". Its symbolic values are discretizer specific. 
     233    .. method:: construct_variable(variable) 
     234 
     235        Constructs a descriptor for a new variable. The new variable's 
     236        name is equal to ``variable.name`` prefixed by "D\_". Its 
     237        symbolic values are specific to discretizer. 
    236238 
    237239.. class:: IntervalDiscretizer 
  • docs/reference/rst/Orange.statistics.contingency.rst

    r9372 r10246  
     1.. py:currentmodule::Orange.statistics.contingency 
     2 
     3.. index:: Contingency table 
     4 
     5================= 
     6Contingency table 
     7================= 
     8 
     9Contingency table contains conditional distributions. Unless explicitly 
     10'normalized', they contain absolute frequencies, that is, the number of 
     11instances with a particular combination of two variables' values. If they are 
     12normalized by dividing each cell by the row sum, the represent conditional 
     13probabilities of the column variable (here denoted as ``innerVariable``) 
     14conditioned by the row variable (``outerVariable``). 
     15 
     16Contingency tables are usually constructed for discrete variables. Tables 
     17for continuous variables have certain limitations described in a :ref:`separate 
     18section <contcont>`. 
     19 
     20The example below loads the monks-1 data set and prints out the conditional 
     21class distribution given the value of `e`. 
     22 
     23.. literalinclude:: code/statistics-contingency.py 
     24    :lines: 1-7 
     25 
     26This code prints out:: 
     27 
     28    1 <0.000, 108.000> 
     29    2 <72.000, 36.000> 
     30    3 <72.000, 36.000> 
     31    4 <72.000, 36.000>  
     32 
     33Contingencies behave like lists of distributions (in this case, class 
     34distributions) indexed by values (of `e`, in this 
     35example). Distributions are, in turn indexed by values (class values, 
     36here). The variable `e` from the above example is called the outer 
     37variable, and the class is the inner. This can also be reversed. It is 
     38also possible to use features for both, outer and inner variable, so 
     39the table shows distributions of one variable's values given the 
     40value of another.  There is a corresponding hierarchy of classes: 
     41:obj:`Table` is a base class for :obj:`VarVar` (both 
     42variables are attributes) and :obj:`Class` (one variable is 
     43the class).  The latter is the base class for 
     44:obj:`VarClass` and :obj:`ClassVar`. 
     45 
     46The most commonly used of the above classes is :obj:`VarClass` which 
     47can compute and store conditional probabilities of classes given the feature value. 
     48 
     49Contingency tables 
     50================== 
     51 
     52.. class:: Table 
     53 
     54    Provides a base class for storing and manipulating contingency 
     55    tables. Although it is not abstract, it is seldom used directly but rather 
     56    through more convenient derived classes described below. 
     57 
     58    .. attribute:: outerVariable 
     59 
     60       Outer variable (:class:`Orange.feature.Descriptor`) whose values are 
     61       used as the first, outer index. 
     62 
     63    .. attribute:: innerVariable 
     64 
     65       Inner variable(:class:`Orange.feature.Descriptor`), whose values are 
     66       used as the second, inner index. 
     67  
     68    .. attribute:: outerDistribution 
     69 
     70        The marginal distribution (:class:`Distribution`) of the outer variable. 
     71 
     72    .. attribute:: innerDistribution 
     73 
     74        The marginal distribution (:class:`Distribution`) of the inner variable. 
     75         
     76    .. attribute:: innerDistributionUnknown 
     77 
     78        The distribution (:class:`distribution.Distribution`) of the inner variable for 
     79        instances for which the outer variable was undefined. This is the 
     80        difference between the ``innerDistribution`` and (unconditional) 
     81        distribution of inner variable. 
     82       
     83    .. attribute:: varType 
     84 
     85        The type of the outer variable (:obj:`Orange.feature.Type`, usually 
     86        :obj:`Orange.feature.Discrete` or 
     87        :obj:`Orange.feature.Continuous`); equals 
     88        ``outerVariable.varType`` and ``outerDistribution.varType``. 
     89 
     90    .. method:: __init__(outer_variable, inner_variable) 
     91      
     92        Construct an instance of contingency table for the given pair of 
     93        variables. 
     94      
     95        :param outer_variable: Descriptor of the outer variable 
     96        :type outer_variable: Orange.feature.Descriptor 
     97        :param outer_variable: Descriptor of the inner variable 
     98        :type inner_variable: Orange.feature.Descriptor 
     99         
     100    .. method:: add(outer_value, inner_value[, weight=1]) 
     101     
     102        Add an element to the contingency table by adding ``weight`` to the 
     103        corresponding cell. 
     104 
     105        :param outer_value: The value for the outer variable 
     106        :type outer_value: int, float, string or :obj:`Orange.data.Value` 
     107        :param inner_value: The value for the inner variable 
     108        :type inner_value: int, float, string or :obj:`Orange.data.Value` 
     109        :param weight: Instance weight 
     110        :type weight: float 
     111 
     112    .. method:: normalize() 
     113 
     114        Normalize all distributions (rows) in the table to sum to ``1``:: 
     115         
     116            >>> cont.normalize() 
     117            >>> for val, dist in cont.items(): 
     118                   print val, dist 
     119 
     120        Output: :: 
     121 
     122            1 <0.000, 1.000> 
     123            2 <0.667, 0.333> 
     124            3 <0.667, 0.333> 
     125            4 <0.667, 0.333> 
     126 
     127        .. note:: 
     128        
     129            This method does not change the ``innerDistribution`` or 
     130            ``outerDistribution``. 
     131         
     132    With respect to indexing, contingency table is a cross between dictionary 
     133    and a list. It supports standard dictionary methods ``keys``, ``values`` and 
     134    ``items``. :: 
     135 
     136        >> print cont.keys() 
     137        ['1', '2', '3', '4'] 
     138        >>> print cont.values() 
     139        [<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>] 
     140        >>> print cont.items() 
     141        [('1', <0.000, 108.000>), ('2', <72.000, 36.000>), 
     142        ('3', <72.000, 36.000>), ('4', <72.000, 36.000>)]  
     143 
     144    Although keys returned by the above functions are strings, contingency can 
     145    be indexed by anything that can be converted into values of the outer 
     146    variable: strings, numbers or instances of ``Orange.data.Value``. :: 
     147 
     148        >>> print cont[0] 
     149        <0.000, 108.000> 
     150        >>> print cont["1"] 
     151        <0.000, 108.000> 
     152        >>> print cont[orange.Value(data.domain["e"], "1")]  
     153 
     154    The length of the table equals the number of values of the outer 
     155    variable. However, iterating through contingency 
     156    does not return keys, as with dictionaries, but distributions. :: 
     157 
     158        >>> for i in cont: 
     159            ... print i 
     160        <0.000, 108.000> 
     161        <72.000, 36.000> 
     162        <72.000, 36.000> 
     163        <72.000, 36.000> 
     164        <72.000, 36.000>  
     165 
     166 
     167.. class:: Class 
     168 
     169    An abstract base class for contingency tables that contain the class, 
     170    either as the inner or the outer variable. 
     171 
     172    .. attribute:: classVar (read only) 
     173     
     174        The class attribute descriptor; always equal to either 
     175        :obj:`Table.innerVariable` or :obj:``Table.outerVariable``. 
     176 
     177    .. attribute:: variable 
     178     
     179        Variable; always equal either to either ``innerVariable`` or ``outerVariable`` 
     180 
     181    .. method:: add_var_class(variable_value, class_value[, weight=1]) 
     182 
     183        Add an element to contingency by increasing the corresponding count. The 
     184        difference between this and :obj:`Table.add` is that the variable 
     185        value is always the first argument and class value the second, 
     186        regardless of which one is inner and which one is outer. 
     187 
     188        :param variable_value: Variable value 
     189        :type variable_value: int, float, string or :obj:`Orange.data.Value` 
     190        :param class_value: Class value 
     191        :type class_value: int, float, string or :obj:`Orange.data.Value` 
     192        :param weight: Instance weight 
     193        :type weight: float 
     194 
     195 
     196.. class:: VarClass 
     197 
     198    A class derived from :obj:`Class` in which the variable is 
     199    used as :obj:`Table.outerVariable` and class as the 
     200    :obj:`Table.innerVariable`. This form is a form suitable for 
     201    computation of conditional class probabilities given the variable value. 
     202     
     203    Calling :obj:`VarClass.add_var_class(v, c)` is equivalent to 
     204    :obj:`Table.add(v, c)`. Similar as :obj:`Table`, 
     205    :obj:`VarClass` can compute contingency from instances. 
     206 
     207    .. method:: __init__(feature, class_variable) 
     208 
     209        Construct an instance of :obj:`VarClass` for the given pair of 
     210        variables. Inherited from :obj:`Table`. 
     211 
     212        :param feature: Outer variable 
     213        :type feature: Orange.feature.Descriptor 
     214        :param class_attribute: Class variable; used as ``innerVariable`` 
     215        :type class_attribute: Orange.feature.Descriptor 
     216         
     217    .. method:: __init__(feature, data[, weightId]) 
     218 
     219        Compute the contingency table from data. 
     220 
     221        :param feature: Outer variable 
     222        :type feature: Orange.feature.Descriptor 
     223        :param data: A set of instances 
     224        :type data: Orange.data.Table 
     225        :param weightId: meta attribute with weights of instances 
     226        :type weightId: int 
     227 
     228    .. method:: p_class(value) 
     229 
     230        Return the probability distribution of classes given the value of the 
     231        variable. 
     232 
     233        :param value: The value of the variable 
     234        :type value: int, float, string or :obj:`Orange.data.Value` 
     235        :rtype: Orange.statistics.distribution.Distribution 
     236 
     237 
     238    .. method:: p_class(value, class_value) 
     239 
     240        Returns the conditional probability of the class_value given the 
     241        feature value, p(class_value|value) (note the order of arguments!) 
     242         
     243        :param value: The value of the variable 
     244        :type value: int, float, string or :obj:`Orange.data.Value` 
     245        :param class_value: The class value 
     246        :type value: int, float, string or :obj:`Orange.data.Value` 
     247        :rtype: float 
     248 
     249    .. literalinclude:: code/statistics-contingency3.py 
     250        :lines: 1-23 
     251 
     252    The inner and the outer variable and their relations to the class are 
     253    as follows:: 
     254 
     255        Inner variable:  y 
     256        Outer variable:  e 
     257     
     258        Class variable:  y 
     259        Feature:         e 
     260 
     261    Distributions are normalized, and probabilities are elements from the 
     262    normalized distributions. Knowing that the target concept is 
     263    y := (e=1) or (a=b), distributions are as expected: when e equals 1, class 1 
     264    has a 100% probability, while for the rest, probability is one third, which 
     265    agrees with a probability that two three-valued independent features 
     266    have the same value. :: 
     267 
     268        Distributions: 
     269          p(.|1) = <0.000, 1.000> 
     270          p(.|2) = <0.662, 0.338> 
     271          p(.|3) = <0.659, 0.341> 
     272          p(.|4) = <0.669, 0.331> 
     273     
     274        Probabilities of class '1' 
     275          p(1|1) = 1.000 
     276          p(1|2) = 0.338 
     277          p(1|3) = 0.341 
     278          p(1|4) = 0.331 
     279     
     280        Distributions from a matrix computed manually: 
     281          p(.|1) = <0.000, 1.000> 
     282          p(.|2) = <0.662, 0.338> 
     283          p(.|3) = <0.659, 0.341> 
     284          p(.|4) = <0.669, 0.331> 
     285 
     286 
     287.. class:: ClassVar 
     288 
     289    :obj:`ClassVar` is similar to :obj:`VarClass` except 
     290    that the class is outside and the variable is inside. This form of 
     291    contingency table is suitable for computing conditional probabilities of 
     292    variable given the class. All methods get the two arguments in the same 
     293    order as :obj:`VarClass`. 
     294 
     295    .. method:: __init__(feature, class_variable) 
     296 
     297        Construct an instance of :obj:`VarClass` for the given pair of 
     298        variables. Inherited from :obj:`Table`, except for the reversed 
     299        order of arguments. 
     300 
     301        :param feature: Outer variable 
     302        :type feature: Orange.feature.Descriptor 
     303        :param class_variable: Class variable 
     304        :type class_variable: Orange.feature.Descriptor 
     305         
     306    .. method:: __init__(feature, data[, weightId]) 
     307 
     308        Compute contingency table from the data. 
     309 
     310        :param feature: Descriptor of the outer variable 
     311        :type feature: Orange.feature.Descriptor 
     312        :param data: A set of instances 
     313        :type data: Orange.data.Table 
     314        :param weightId: meta attribute with weights of instances 
     315        :type weightId: int 
     316 
     317    .. method:: p_attr(class_value) 
     318 
     319        Return the probability distribution of variable given the class. 
     320 
     321        :param class_value: The value of the variable 
     322        :type class_value: int, float, string or :obj:`Orange.data.Value` 
     323        :rtype: Orange.statistics.distribution.Distribution 
     324 
     325    .. method:: p_attr(value, class_value) 
     326 
     327        Returns the conditional probability of the value given the 
     328        class, p(value|class_value). 
     329 
     330        :param value: Value of the variable 
     331        :type value: int, float, string or :obj:`Orange.data.Value` 
     332        :param class_value: Class value 
     333        :type value: int, float, string or :obj:`Orange.data.Value` 
     334        :rtype: float 
     335 
     336    .. literalinclude:: code/statistics-contingency4.py 
     337        :lines: 1-27 
     338 
     339    The role of the feature and the class are reversed compared to 
     340    :obj:`ClassVar`:: 
     341     
     342        Inner variable:  e 
     343        Outer variable:  y 
     344     
     345        Class variable:  y 
     346        Feature:         e 
     347     
     348    Distributions given the class can be printed out by calling :meth:`p_attr`. 
     349     
     350    .. literalinclude:: code/statistics-contingency4.py 
     351        :lines: 30-31 
     352     
     353    will print:: 
     354        p(.|0) = <0.000, 0.333, 0.333, 0.333> 
     355        p(.|1) = <0.500, 0.167, 0.167, 0.167> 
     356     
     357    If the class value is '0', the attribute `e` cannot be `1` (the first 
     358    value), while distribution across other values is uniform.  If the class 
     359    value is `1`, `e` is `1` for exactly half of instances, and distribution of 
     360    other values is again uniform. 
     361 
     362.. class:: VarVar 
     363 
     364    Contingency table in which none of the variables is the class.  The class 
     365    is derived from :obj:`Table`, and adds an additional constructor and 
     366    method for getting conditional probabilities. 
     367 
     368    .. method:: VarVar(outer_variable, inner_variable) 
     369 
     370        Inherited from :obj:`Table`. 
     371 
     372    .. method:: __init__(outer_variable, inner_variable, data[, weightId]) 
     373 
     374        Compute the contingency from the given instances. 
     375 
     376        :param outer_variable: Outer variable 
     377        :type outer_variable: Orange.feature.Descriptor 
     378        :param inner_variable: Inner variable 
     379        :type inner_variable: Orange.feature.Descriptor 
     380        :param data: A set of instances 
     381        :type data: Orange.data.Table 
     382        :param weightId: meta attribute with weights of instances 
     383        :type weightId: int 
     384 
     385    .. method:: p_attr(outer_value) 
     386 
     387        Return the probability distribution of the inner variable given the 
     388        outer variable value. 
     389 
     390        :param outer_value: The value of the outer variable 
     391        :type outer_value: int, float, string or :obj:`Orange.data.Value` 
     392        :rtype: Orange.statistics.distribution.Distribution 
     393  
     394    .. method:: p_attr(outer_value, inner_value) 
     395 
     396        Return the conditional probability of the inner_value 
     397        given the outer_value. 
     398 
     399        :param outer_value: The value of the outer variable 
     400        :type outer_value: int, float, string or :obj:`Orange.data.Value` 
     401        :param inner_value: The value of the inner variable 
     402        :type inner_value: int, float, string or :obj:`Orange.data.Value` 
     403        :rtype: float 
     404 
     405    The following example investigates which material is used for 
     406    bridges of different lengths. 
     407     
     408    .. literalinclude:: code/statistics-contingency5.py 
     409        :lines: 1-17 
     410 
     411    Short bridges are mostly wooden or iron, and the longer (and most of the 
     412    middle sized) are made from steel:: 
     413     
     414        SHORT: 
     415           WOOD (56%) 
     416           IRON (44%) 
     417     
     418        MEDIUM: 
     419           WOOD (9%) 
     420           IRON (11%) 
     421           STEEL (79%) 
     422     
     423        LONG: 
     424           STEEL (100%) 
     425     
     426    As all other contingency tables, this one can also be computed "manually". 
     427     
     428    .. literalinclude:: code/statistics-contingency5.py 
     429        :lines: 18- 
     430 
     431 
     432Contingencies for entire domain 
     433=============================== 
     434 
     435A list of contingency tables, either :obj:`VarClass` or 
     436:obj:`ClassVar`. 
     437 
     438.. class:: Domain 
     439 
     440    .. method:: __init__(data[, weight_id=0, class_outer=0|1]) 
     441 
     442        Compute a list of contingency tables. 
     443 
     444        :param data: A set of instances 
     445        :type data: Orange.data.Table 
     446        :param weight_id: meta attribute with weights of instances 
     447        :type weight_id: int 
     448        :param class_is_outer: `True`, if class is the outer variable 
     449        :type class_is_outer: bool 
     450 
     451        .. note:: 
     452         
     453            ``class_is_outer`` needs to be given as keyword argument. 
     454 
     455    .. attribute:: class_is_outer (read only) 
     456 
     457        Tells whether the class is the outer or the inner variable. 
     458 
     459    .. attribute:: classes 
     460 
     461        Contains the distribution of class values on the entire dataset. 
     462 
     463    .. method:: normalize() 
     464 
     465        Call normalize for all contingencies. 
     466 
     467    The following script prints the contingency tables for features 
     468    "a", "b" and "e" for the dataset Monk 1. 
     469         
     470    .. literalinclude:: code/statistics-contingency8.py 
     471        :lines: 9 
     472 
     473    Contingency tables of type :obj:`VarClass` give 
     474    the conditional distributions of classes, given the value of the variable. 
     475     
     476    .. literalinclude:: code/statistics-contingency8.py 
     477        :lines: 12-  
     478 
     479.. _contcont: 
     480 
     481Contingency tables for continuous variables 
     482=========================================== 
     483 
     484If the outer variable is continuous, the index must be one of the 
     485values that do exist in the contingency table; other values raise an 
     486exception: 
     487 
     488.. literalinclude:: code/statistics-contingency6.py 
     489    :lines: 1-4,17- 
     490 
     491Since even rounding can be a problem, the only safe way to get the key 
     492is to take it from from the contingencies' ``keys``. 
     493 
     494Contingency tables with discrete outer variable and continuous inner variables 
     495are more useful, since methods :obj:`ContingencyClassVar.p_class` 
     496and :obj:`ContingencyVarClass.p_attr` use the primitive density estimation 
     497provided by :obj:`Orange.statistics.distribution.Distribution`. 
     498 
     499For example, :obj:`ClassVar` on the iris dataset can return the 
     500probability of the sepal length 5.5 for different classes: 
     501 
     502.. literalinclude:: code/statistics-contingency7.py 
     503 
     504The script outputs:: 
     505 
     506    Estimated frequencies for e=5.5 
     507      f(5.5|Iris-setosa) = 2.000 
     508      f(5.5|Iris-versicolor) = 5.000 
     509      f(5.5|Iris-virginica) = 1.000 
     510 
     511""" 
     512 
     513 
    1514.. automodule:: Orange.statistics.contingency 
  • docs/reference/rst/Orange.statistics.estimate.rst

    r10102 r10246  
    174174    Constructor of an unconditional probability estimator. 
    175175 
    176     .. method:: __call__([distribution[, apriori]], [instances[, weight_id]]) 
     176    .. method:: __call__([distribution[, prior]], [instances[, weight_id]]) 
    177177 
    178178        :param distribution: input distribution. 
    179179        :type distribution: :class:`~Orange.statistics.distribution.Distribution` 
    180180 
    181         :param apriori: prior distribution. 
     181        :param priori: prior distribution. 
    182182        :type distribution: :class:`~Orange.statistics.distribution.Distribution` 
    183183 
     
    224224    Constructor of a conditional probability estimator. 
    225225 
    226     .. method:: __call__([table[, apriori]], [instances[, weight_id]]) 
     226    .. method:: __call__([table[, prior]], [instances[, weight_id]]) 
    227227 
    228228        :param table: input distribution. 
    229229        :type table: :class:`Orange.statistics.contingency.Table` 
    230230 
    231         :param apriori: prior distribution. 
     231        :param prior: prior distribution. 
    232232        :type distribution: :class:`~Orange.statistics.distribution.Distribution` 
    233233 
     
    342342    constructor. 
    343343 
    344     .. method:: __call__([table[, apriori]], [instances[, weight_id]], estimator) 
     344    .. method:: __call__([table[, prior]], [instances[, weight_id]], estimator) 
    345345 
    346346        :param table: input distribution. 
    347347        :type table: :class:`Orange.statistics.contingency.Table` 
    348348 
    349         :param apriori: prior distribution. 
     349        :param prior: prior distribution. 
    350350        :type distribution: :class:`~Orange.statistics.distribution.Distribution` 
    351351 
  • docs/reference/rst/index.rst

    r9917 r10246  
    44 
    55.. toctree:: 
    6    :maxdepth: 3 
     6   :maxdepth: 2 
    77 
    88   Orange.data 
     
    1010   Orange.feature 
    1111 
    12    Orange.associate 
    13  
    1412   Orange.classification 
    1513 
    16    Orange.clustering 
    17  
    18    Orange.distance 
    19  
     14   Orange.regression 
     15    
     16   Orange.statistics 
     17    
    2018   Orange.ensemble 
    2119 
     
    2523    
    2624   Orange.multitarget 
     25 
     26   Orange.associate 
     27 
     28   Orange.clustering 
     29 
     30   Orange.distance 
    2731 
    2832   Orange.network 
     
    3438   Orange.projection 
    3539 
    36    Orange.regression 
    37     
    38    Orange.statistics 
    39     
    4040   Orange.misc 
    4141 
Note: See TracChangeset for help on using the changeset viewer.