Ignore:
Files:
5 added
2 deleted
49 edited

Legend:

Unmodified
Added
Removed
  • Orange/classification/logreg.py

    r9671 r9818  
    1 """ 
    2 .. index: logistic regression 
    3 .. index: 
    4    single: classification; logistic regression 
    5  
    6 ******************************** 
    7 Logistic regression (``logreg``) 
    8 ******************************** 
    9  
    10 Implements `logistic regression 
    11 <http://en.wikipedia.org/wiki/Logistic_regression>`_ with an extension for 
    12 proper treatment of discrete features.  The algorithm can handle various 
    13 anomalies in features, such as constant variables and singularities, that 
    14 could make fitting of logistic regression almost impossible. Stepwise 
    15 logistic regression, which iteratively selects the most informative 
    16 features, is also supported. 
    17  
    18 Logistic regression is a popular classification method that comes 
    19 from statistics. The model is described by a linear combination of 
    20 coefficients, 
    21  
    22 .. math:: 
    23      
    24     F = \\beta_0 + \\beta_1*X_1 + \\beta_2*X_2 + ... + \\beta_k*X_k 
    25  
    26 and the probability (p) of a class value is  computed as: 
    27  
    28 .. math:: 
    29  
    30     p = \\frac{\exp(F)}{1 + \exp(F)} 
    31  
    32  
    33 .. class :: LogRegClassifier 
    34  
    35     :obj:`LogRegClassifier` stores estimated values of regression 
    36     coefficients and their significances, and uses them to predict 
    37     classes and class probabilities using the equations described above. 
    38  
    39     .. attribute :: beta 
    40  
    41         Estimated regression coefficients. 
    42  
    43     .. attribute :: beta_se 
    44  
    45         Estimated standard errors for regression coefficients. 
    46  
    47     .. attribute :: wald_Z 
    48  
    49         Wald Z statistics for beta coefficients. Wald Z is computed 
    50         as beta/beta_se. 
    51  
    52     .. attribute :: P 
    53  
    54         List of P-values for beta coefficients, that is, the probability 
    55         that beta coefficients differ from 0.0. The probability is 
    56         computed from squared Wald Z statistics that is distributed with 
    57         Chi-Square distribution. 
    58  
    59     .. attribute :: likelihood 
    60  
    61         The probability of the sample (ie. learning examples) observed on 
    62         the basis of the derived model, as a function of the regression 
    63         parameters. 
    64  
    65     .. attribute :: fitStatus 
    66  
    67         Tells how the model fitting ended - either regularly 
    68         (:obj:`LogRegFitter.OK`), or it was interrupted due to one of beta 
    69         coefficients escaping towards infinity (:obj:`LogRegFitter.Infinity`) 
    70         or since the values didn't converge (:obj:`LogRegFitter.Divergence`). The 
    71         value tells about the classifier's "reliability"; the classifier 
    72         itself is useful in either case. 
    73  
    74 .. autoclass:: LogRegLearner 
    75  
    76 .. class:: LogRegFitter 
    77  
    78     :obj:`LogRegFitter` is the abstract base class for logistic fitters. It 
    79     defines the form of call operator and the constants denoting its 
    80     (un)success: 
    81  
    82     .. attribute:: OK 
    83  
    84         Fitter succeeded to converge to the optimal fit. 
    85  
    86     .. attribute:: Infinity 
    87  
    88         Fitter failed due to one or more beta coefficients escaping towards infinity. 
    89  
    90     .. attribute:: Divergence 
    91  
    92         Beta coefficients failed to converge, but none of beta coefficients escaped. 
    93  
    94     .. attribute:: Constant 
    95  
    96         There is a constant attribute that causes the matrix to be singular. 
    97  
    98     .. attribute:: Singularity 
    99  
    100         The matrix is singular. 
    101  
    102  
    103     .. method:: __call__(examples, weightID) 
    104  
    105         Performs the fitting. There can be two different cases: either 
    106         the fitting succeeded to find a set of beta coefficients (although 
    107         possibly with difficulties) or the fitting failed altogether. The 
    108         two cases return different results. 
    109  
    110         `(status, beta, beta_se, likelihood)` 
    111             The fitter managed to fit the model. The first element of 
    112             the tuple, result, tells about the problems occurred; it can 
    113             be either :obj:`OK`, :obj:`Infinity` or :obj:`Divergence`. In 
    114             the latter cases, returned values may still be useful for 
    115             making predictions, but it's recommended that you inspect 
    116             the coefficients and their errors and make your decision 
    117             whether to use the model or not. 
    118  
    119         `(status, attribute)` 
    120             The fitter failed and the returned attribute is responsible 
    121             for it. The type of failure is reported in status, which 
    122             can be either :obj:`Constant` or :obj:`Singularity`. 
    123  
    124         The proper way of calling the fitter is to expect and handle all 
    125         the situations described. For instance, if fitter is an instance 
    126         of some fitter and examples contain a set of suitable examples, 
    127         a script should look like this:: 
    128  
    129             res = fitter(examples) 
    130             if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]: 
    131                status, beta, beta_se, likelihood = res 
    132                < proceed by doing something with what you got > 
    133             else: 
    134                status, attr = res 
    135                < remove the attribute or complain to the user or ... > 
    136  
    137  
    138 .. class :: LogRegFitter_Cholesky 
    139  
    140     :obj:`LogRegFitter_Cholesky` is the sole fitter available at the 
    141     moment. It is a C++ translation of `Alan Miller's logistic regression 
    142     code <http://users.bigpond.net.au/amiller/>`_. It uses Newton-Raphson 
    143     algorithm to iteratively minimize least squares error computed from 
    144     learning examples. 
    145  
    146  
    147 .. autoclass:: StepWiseFSS 
    148 .. autofunction:: dump 
    149  
    150  
    151  
    152 Examples 
    153 -------- 
    154  
    155 The first example shows a very simple induction of a logistic regression 
    156 classifier (:download:`logreg-run.py <code/logreg-run.py>`, uses :download:`titanic.tab <code/titanic.tab>`). 
    157  
    158 .. literalinclude:: code/logreg-run.py 
    159  
    160 Result:: 
    161  
    162     Classification accuracy: 0.778282598819 
    163  
    164     class attribute = survived 
    165     class values = <no, yes> 
    166  
    167         Attribute       beta  st. error     wald Z          P OR=exp(beta) 
    168  
    169         Intercept      -1.23       0.08     -15.15      -0.00 
    170      status=first       0.86       0.16       5.39       0.00       2.36 
    171     status=second      -0.16       0.18      -0.91       0.36       0.85 
    172      status=third      -0.92       0.15      -6.12       0.00       0.40 
    173         age=child       1.06       0.25       4.30       0.00       2.89 
    174        sex=female       2.42       0.14      17.04       0.00      11.25 
    175  
    176 The next examples shows how to handle singularities in data sets 
    177 (:download:`logreg-singularities.py <code/logreg-singularities.py>`, uses :download:`adult_sample.tab <code/adult_sample.tab>`). 
    178  
    179 .. literalinclude:: code/logreg-singularities.py 
    180  
    181 The first few lines of the output of this script are:: 
    182  
    183     <=50K <=50K 
    184     <=50K <=50K 
    185     <=50K <=50K 
    186     >50K >50K 
    187     <=50K >50K 
    188  
    189     class attribute = y 
    190     class values = <>50K, <=50K> 
    191  
    192                                Attribute       beta  st. error     wald Z          P OR=exp(beta) 
    193  
    194                                Intercept       6.62      -0.00       -inf       0.00 
    195                                      age      -0.04       0.00       -inf       0.00       0.96 
    196                                   fnlwgt      -0.00       0.00       -inf       0.00       1.00 
    197                            education-num      -0.28       0.00       -inf       0.00       0.76 
    198                  marital-status=Divorced       4.29       0.00        inf       0.00      72.62 
    199             marital-status=Never-married       3.79       0.00        inf       0.00      44.45 
    200                 marital-status=Separated       3.46       0.00        inf       0.00      31.95 
    201                   marital-status=Widowed       3.85       0.00        inf       0.00      46.96 
    202     marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63 
    203         marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19 
    204                  occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72 
    205  
    206 If :obj:`removeSingular` is set to 0, inducing a logistic regression 
    207 classifier would return an error:: 
    208  
    209     Traceback (most recent call last): 
    210       File "logreg-singularities.py", line 4, in <module> 
    211         lr = classification.logreg.LogRegLearner(table, removeSingular=0) 
    212       File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner 
    213         return lr(examples, weightID) 
    214       File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__ 
    215         lr = learner(examples, weight) 
    216     orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked 
    217  
    218 We can see that the attribute workclass is causing a singularity. 
    219  
    220 The example below shows, how the use of stepwise logistic regression can help to 
    221 gain in classification performance (:download:`logreg-stepwise.py <code/logreg-stepwise.py>`, uses :download:`ionosphere.tab <code/ionosphere.tab>`): 
    222  
    223 .. literalinclude:: code/logreg-stepwise.py 
    224  
    225 The output of this script is:: 
    226  
    227     Learner      CA 
    228     logistic     0.841 
    229     filtered     0.846 
    230  
    231     Number of times attributes were used in cross-validation: 
    232      1 x a21 
    233     10 x a22 
    234      8 x a23 
    235      7 x a24 
    236      1 x a25 
    237     10 x a26 
    238     10 x a27 
    239      3 x a28 
    240      7 x a29 
    241      9 x a31 
    242      2 x a16 
    243      7 x a12 
    244      1 x a32 
    245      8 x a15 
    246     10 x a14 
    247      4 x a17 
    248      7 x a30 
    249     10 x a11 
    250      1 x a10 
    251      1 x a13 
    252     10 x a34 
    253      2 x a19 
    254      1 x a18 
    255     10 x a3 
    256     10 x a5 
    257      4 x a4 
    258      4 x a7 
    259      8 x a6 
    260     10 x a9 
    261     10 x a8 
    262  
    263 """ 
    264  
    265 from Orange.core import LogRegLearner, LogRegClassifier, LogRegFitter, LogRegFitter_Cholesky 
    266  
    2671import Orange 
    268 import math, os 
    269 import warnings 
    270 from numpy import * 
    271 from numpy.linalg import * 
    272  
    273  
    274 ########################################################################## 
    275 ## Print out methods 
     2from Orange.misc import deprecated_keywords, deprecated_members 
     3import math 
     4from numpy import dot, array, identity, reshape, diagonal, \ 
     5    transpose, concatenate, sqrt, sign 
     6from numpy.linalg import inv 
     7from Orange.core import LogRegClassifier, LogRegFitter, LogRegFitter_Cholesky 
    2768 
    2779def dump(classifier): 
    278     """ Formatted string of all major features in logistic 
    279     regression classifier.  
    280  
    281     :param classifier: logistic regression classifier 
     10    """ Return a formatted string of all major features in logistic regression 
     11    classifier. 
     12 
     13    :param classifier: logistic regression classifier. 
    28214    """ 
    28315 
    28416    # print out class values 
    28517    out = [''] 
    286     out.append("class attribute = " + classifier.domain.classVar.name) 
    287     out.append("class values = " + str(classifier.domain.classVar.values)) 
     18    out.append("class attribute = " + classifier.domain.class_var.name) 
     19    out.append("class values = " + str(classifier.domain.class_var.values)) 
    28820    out.append('') 
    28921     
    29022    # get the longest attribute name 
    29123    longest=0 
    292     for at in classifier.continuizedDomain.attributes: 
     24    for at in classifier.continuized_domain.features: 
    29325        if len(at.name)>longest: 
    294             longest=len(at.name); 
     26            longest=len(at.name) 
    29527 
    29628    # print out the head 
     
    30133    out.append(formatstr % ("Intercept", classifier.beta[0], classifier.beta_se[0], classifier.wald_Z[0], classifier.P[0])) 
    30234    formatstr = "%"+str(longest)+"s %10.2f %10.2f %10.2f %10.2f %10.2f"     
    303     for i in range(len(classifier.continuizedDomain.attributes)): 
    304         out.append(formatstr % (classifier.continuizedDomain.attributes[i].name, classifier.beta[i+1], classifier.beta_se[i+1], classifier.wald_Z[i+1], abs(classifier.P[i+1]), math.exp(classifier.beta[i+1]))) 
     35    for i in range(len(classifier.continuized_domain.features)): 
     36        out.append(formatstr % (classifier.continuized_domain.features[i].name, classifier.beta[i+1], classifier.beta_se[i+1], classifier.wald_Z[i+1], abs(classifier.P[i+1]), math.exp(classifier.beta[i+1]))) 
    30537 
    30638    return '\n'.join(out) 
     
    30840 
    30941def has_discrete_values(domain): 
    310     for at in domain.attributes: 
    311         if at.varType == Orange.core.VarTypes.Discrete: 
    312             return 1 
    313     return 0 
     42    """ 
     43    Return 1 if the given domain contains any discrete features, else 0. 
     44 
     45    :param domain: domain. 
     46    :type domain: :class:`Orange.data.Domain` 
     47    """ 
     48    return any(at.var_type == Orange.data.Type.Discrete 
     49               for at in domain.features) 
     50 
    31451 
    31552class LogRegLearner(Orange.classification.Learner): 
    31653    """ Logistic regression learner. 
    31754 
    318     Implements logistic regression. If data instances are provided to 
     55    If data instances are provided to 
    31956    the constructor, the learning algorithm is called and the resulting 
    32057    classifier is returned instead of the learner. 
    32158 
    322     :param table: data table with either discrete or continuous features 
    323     :type table: Orange.data.Table 
    324     :param weightID: the ID of the weight meta attribute 
    325     :type weightID: int 
    326     :param removeSingular: set to 1 if you want automatic removal of disturbing features, such as constants and singularities 
    327     :type removeSingular: bool 
    328     :param fitter: the fitting algorithm (by default the Newton-Raphson fitting algorithm is used) 
    329     :param stepwiseLR: set to 1 if you wish to use stepwise logistic regression 
    330     :type stepwiseLR: bool 
    331     :param addCrit: parameter for stepwise feature selection 
    332     :type addCrit: float 
    333     :param deleteCrit: parameter for stepwise feature selection 
    334     :type deleteCrit: float 
    335     :param numFeatures: parameter for stepwise feature selection 
    336     :type numFeatures: int 
     59    :param instances: data table with either discrete or continuous features 
     60    :type instances: Orange.data.Table 
     61    :param weight_id: the ID of the weight meta attribute 
     62    :type weight_id: int 
     63    :param remove_singular: set to 1 if you want automatic removal of 
     64        disturbing features, such as constants and singularities 
     65    :type remove_singular: bool 
     66    :param fitter: the fitting algorithm (by default the Newton-Raphson 
     67        fitting algorithm is used) 
     68    :param stepwise_lr: set to 1 if you wish to use stepwise logistic 
     69        regression 
     70    :type stepwise_lr: bool 
     71    :param add_crit: parameter for stepwise feature selection 
     72    :type add_crit: float 
     73    :param delete_crit: parameter for stepwise feature selection 
     74    :type delete_crit: float 
     75    :param num_features: parameter for stepwise feature selection 
     76    :type num_features: int 
    33777    :rtype: :obj:`LogRegLearner` or :obj:`LogRegClassifier` 
    33878 
    33979    """ 
    340     def __new__(cls, instances=None, weightID=0, **argkw): 
     80 
     81    @deprecated_keywords({"weightID": "weight_id"}) 
     82    def __new__(cls, instances=None, weight_id=0, **argkw): 
    34183        self = Orange.classification.Learner.__new__(cls, **argkw) 
    34284        if instances: 
    34385            self.__init__(**argkw) 
    344             return self.__call__(instances, weightID) 
     86            return self.__call__(instances, weight_id) 
    34587        else: 
    34688            return self 
    34789 
    348     def __init__(self, removeSingular=0, fitter = None, **kwds): 
     90    @deprecated_keywords({"removeSingular": "remove_singular"}) 
     91    def __init__(self, remove_singular=0, fitter = None, **kwds): 
    34992        self.__dict__.update(kwds) 
    350         self.removeSingular = removeSingular 
     93        self.remove_singular = remove_singular 
    35194        self.fitter = None 
    35295 
    353     def __call__(self, examples, weight=0): 
     96    @deprecated_keywords({"examples": "instances"}) 
     97    def __call__(self, instances, weight=0): 
     98        """Learn from the given table of data instances. 
     99 
     100        :param instances: Data instances to learn from. 
     101        :type instances: :class:`~Orange.data.Table` 
     102        :param weight: Id of meta attribute with weights of instances 
     103        :type weight: int 
     104        :rtype: :class:`~Orange.classification.logreg.LogRegClassifier` 
     105        """ 
    354106        imputer = getattr(self, "imputer", None) or None 
    355         if getattr(self, "removeMissing", 0): 
    356             examples = Orange.core.Preprocessor_dropMissing(examples) 
     107        if getattr(self, "remove_missing", 0): 
     108            instances = Orange.core.Preprocessor_dropMissing(instances) 
    357109##        if hasDiscreteValues(examples.domain): 
    358110##            examples = createNoDiscTable(examples) 
    359         if not len(examples): 
     111        if not len(instances): 
    360112            return None 
    361         if getattr(self, "stepwiseLR", 0): 
    362             addCrit = getattr(self, "addCrit", 0.2) 
    363             removeCrit = getattr(self, "removeCrit", 0.3) 
    364             numFeatures = getattr(self, "numFeatures", -1) 
    365             attributes = StepWiseFSS(examples, addCrit = addCrit, deleteCrit = removeCrit, imputer = imputer, numFeatures = numFeatures) 
    366             tmpDomain = Orange.core.Domain(attributes, examples.domain.classVar) 
    367             tmpDomain.addmetas(examples.domain.getmetas()) 
    368             examples = examples.select(tmpDomain) 
    369         learner = Orange.core.LogRegLearner() 
    370         learner.imputerConstructor = imputer 
     113        if getattr(self, "stepwise_lr", 0): 
     114            add_crit = getattr(self, "add_crit", 0.2) 
     115            delete_crit = getattr(self, "delete_crit", 0.3) 
     116            num_features = getattr(self, "num_features", -1) 
     117            attributes = StepWiseFSS(instances, add_crit= add_crit, 
     118                delete_crit=delete_crit, imputer = imputer, num_features= num_features) 
     119            tmp_domain = Orange.data.Domain(attributes, 
     120                instances.domain.class_var) 
     121            tmp_domain.addmetas(instances.domain.getmetas()) 
     122            instances = instances.select(tmp_domain) 
     123        learner = Orange.core.LogRegLearner() # Yes, it has to be from core. 
     124        learner.imputer_constructor = imputer 
    371125        if imputer: 
    372             examples = self.imputer(examples)(examples) 
    373         examples = Orange.core.Preprocessor_dropMissing(examples) 
     126            instances = self.imputer(instances)(instances) 
     127        instances = Orange.core.Preprocessor_dropMissing(instances) 
    374128        if self.fitter: 
    375129            learner.fitter = self.fitter 
    376         if self.removeSingular: 
    377             lr = learner.fitModel(examples, weight) 
     130        if self.remove_singular: 
     131            lr = learner.fit_model(instances, weight) 
    378132        else: 
    379             lr = learner(examples, weight) 
    380         while isinstance(lr, Orange.core.Variable): 
     133            lr = learner(instances, weight) 
     134        while isinstance(lr, Orange.data.variable.Variable): 
    381135            if isinstance(lr.getValueFrom, Orange.core.ClassifierFromVar) and isinstance(lr.getValueFrom.transformer, Orange.core.Discrete2Continuous): 
    382136                lr = lr.getValueFrom.variable 
    383             attributes = examples.domain.attributes[:] 
     137            attributes = instances.domain.features[:] 
    384138            if lr in attributes: 
    385139                attributes.remove(lr) 
    386140            else: 
    387141                attributes.remove(lr.getValueFrom.variable) 
    388             newDomain = Orange.core.Domain(attributes, examples.domain.classVar) 
    389             newDomain.addmetas(examples.domain.getmetas()) 
    390             examples = examples.select(newDomain) 
    391             lr = learner.fitModel(examples, weight) 
     142            new_domain = Orange.data.Domain(attributes,  
     143                instances.domain.class_var) 
     144            new_domain.addmetas(instances.domain.getmetas()) 
     145            instances = instances.select(new_domain) 
     146            lr = learner.fit_model(instances, weight) 
    392147        return lr 
    393148 
    394  
     149LogRegLearner = deprecated_members({"removeSingular": "remove_singular", 
     150                                    "weightID": "weight_id", 
     151                                    "stepwiseLR": "stepwise_lr", 
     152                                    "addCrit": "add_crit", 
     153                                    "deleteCrit": "delete_crit", 
     154                                    "numFeatures": "num_features", 
     155                                    "removeMissing": "remove_missing" 
     156                                    })(LogRegLearner) 
    395157 
    396158class UnivariateLogRegLearner(Orange.classification.Learner): 
     
    406168        self.__dict__.update(kwds) 
    407169 
    408     def __call__(self, examples): 
    409         examples = createFullNoDiscTable(examples) 
    410         classifiers = map(lambda x: LogRegLearner(Orange.core.Preprocessor_dropMissing(examples.select(Orange.core.Domain(x, examples.domain.classVar)))), examples.domain.attributes) 
    411         maj_classifier = LogRegLearner(Orange.core.Preprocessor_dropMissing(examples.select(Orange.core.Domain(examples.domain.classVar)))) 
     170    @deprecated_keywords({"examples": "instances"}) 
     171    def __call__(self, instances): 
     172        instances = createFullNoDiscTable(instances) 
     173        classifiers = map(lambda x: LogRegLearner(Orange.core.Preprocessor_dropMissing( 
     174            instances.select(Orange.data.Domain(x,  
     175            instances.domain.class_var)))), instances.domain.features) 
     176        maj_classifier = LogRegLearner(Orange.core.Preprocessor_dropMissing 
     177            (instances.select(Orange.data.Domain(instances.domain.class_var)))) 
    412178        beta = [maj_classifier.beta[0]] + [x.beta[1] for x in classifiers] 
    413179        beta_se = [maj_classifier.beta_se[0]] + [x.beta_se[1] for x in classifiers] 
    414180        P = [maj_classifier.P[0]] + [x.P[1] for x in classifiers] 
    415181        wald_Z = [maj_classifier.wald_Z[0]] + [x.wald_Z[1] for x in classifiers] 
    416         domain = examples.domain 
     182        domain = instances.domain 
    417183 
    418184        return Univariate_LogRegClassifier(beta = beta, beta_se = beta_se, P = P, wald_Z = wald_Z, domain = domain) 
    419185 
    420 class UnivariateLogRegClassifier(Orange.core.Classifier): 
     186class UnivariateLogRegClassifier(Orange.classification.Classifier): 
    421187    def __init__(self, **kwds): 
    422188        self.__dict__.update(kwds) 
    423189 
    424     def __call__(self, example, resultType = Orange.core.GetValue): 
     190    def __call__(self, instance, resultType = Orange.classification.Classifier.GetValue): 
    425191        # classification not implemented yet. For now its use is only to provide regression coefficients and its statistics 
    426192        pass 
     
    436202            return self 
    437203 
    438     def __init__(self, removeSingular=0, **kwds): 
     204    @deprecated_keywords({"removeSingular": "remove_singular"}) 
     205    def __init__(self, remove_singular=0, **kwds): 
    439206        self.__dict__.update(kwds) 
    440         self.removeSingular = removeSingular 
    441     def __call__(self, examples, weight=0): 
     207        self.remove_singular = remove_singular 
     208 
     209    @deprecated_keywords({"examples": "instances"}) 
     210    def __call__(self, instances, weight=0): 
    442211        # next function changes data set to a extended with unknown values  
    443         def createLogRegExampleTable(data, weightID): 
    444             setsOfData = [] 
    445             for at in data.domain.attributes: 
    446                 # za vsak atribut kreiraj nov newExampleTable newData 
    447                 # v dataOrig, dataFinal in newData dodaj nov atribut -- continuous variable 
    448                 if at.varType == Orange.core.VarTypes.Continuous: 
    449                     atDisc = Orange.core.FloatVariable(at.name + "Disc") 
    450                     newDomain = Orange.core.Domain(data.domain.attributes+[atDisc,data.domain.classVar]) 
    451                     newDomain.addmetas(data.domain.getmetas()) 
    452                     newData = Orange.core.ExampleTable(newDomain,data) 
    453                     altData = Orange.core.ExampleTable(newDomain,data) 
    454                     for i,d in enumerate(newData): 
    455                         d[atDisc] = 0 
    456                         d[weightID] = 1*data[i][weightID] 
    457                     for i,d in enumerate(altData): 
    458                         d[atDisc] = 1 
     212        def createLogRegExampleTable(data, weight_id): 
     213            sets_of_data = [] 
     214            for at in data.domain.features: 
     215                # za vsak atribut kreiraj nov newExampleTable new_data 
     216                # v dataOrig, dataFinal in new_data dodaj nov atribut -- continuous variable 
     217                if at.var_type == Orange.data.Type.Continuous: 
     218                    at_disc = Orange.data.variable.Continuous(at.name+ "Disc") 
     219                    new_domain = Orange.data.Domain(data.domain.features+[at_disc,data.domain.class_var]) 
     220                    new_domain.addmetas(data.domain.getmetas()) 
     221                    new_data = Orange.data.Table(new_domain,data) 
     222                    alt_data = Orange.data.Table(new_domain,data) 
     223                    for i,d in enumerate(new_data): 
     224                        d[at_disc] = 0 
     225                        d[weight_id] = 1*data[i][weight_id] 
     226                    for i,d in enumerate(alt_data): 
     227                        d[at_disc] = 1 
    459228                        d[at] = 0 
    460                         d[weightID] = 0.000001*data[i][weightID] 
    461                 elif at.varType == Orange.core.VarTypes.Discrete: 
    462                 # v dataOrig, dataFinal in newData atributu "at" dodaj ee  eno  vreednost, ki ima vrednost kar  ime atributa +  "X" 
    463                     atNew = Orange.core.EnumVariable(at.name, values = at.values + [at.name+"X"]) 
    464                     newDomain = Orange.core.Domain(filter(lambda x: x!=at, data.domain.attributes)+[atNew,data.domain.classVar]) 
    465                     newDomain.addmetas(data.domain.getmetas()) 
    466                     newData = Orange.core.ExampleTable(newDomain,data) 
    467                     altData = Orange.core.ExampleTable(newDomain,data) 
    468                     for i,d in enumerate(newData): 
    469                         d[atNew] = data[i][at] 
    470                         d[weightID] = 1*data[i][weightID] 
    471                     for i,d in enumerate(altData): 
    472                         d[atNew] = at.name+"X" 
    473                         d[weightID] = 0.000001*data[i][weightID] 
    474                 newData.extend(altData) 
    475                 setsOfData.append(newData) 
    476             return setsOfData 
     229                        d[weight_id] = 0.000001*data[i][weight_id] 
     230                elif at.var_type == Orange.data.Type.Discrete: 
     231                # v dataOrig, dataFinal in new_data atributu "at" dodaj ee  eno  vreednost, ki ima vrednost kar  ime atributa +  "X" 
     232                    at_new = Orange.data.variable.Discrete(at.name, values = at.values + [at.name+"X"]) 
     233                    new_domain = Orange.data.Domain(filter(lambda x: x!=at, data.domain.features)+[at_new,data.domain.class_var]) 
     234                    new_domain.addmetas(data.domain.getmetas()) 
     235                    new_data = Orange.data.Table(new_domain,data) 
     236                    alt_data = Orange.data.Table(new_domain,data) 
     237                    for i,d in enumerate(new_data): 
     238                        d[at_new] = data[i][at] 
     239                        d[weight_id] = 1*data[i][weight_id] 
     240                    for i,d in enumerate(alt_data): 
     241                        d[at_new] = at.name+"X" 
     242                        d[weight_id] = 0.000001*data[i][weight_id] 
     243                new_data.extend(alt_data) 
     244                sets_of_data.append(new_data) 
     245            return sets_of_data 
    477246                   
    478         learner = LogRegLearner(imputer = Orange.core.ImputerConstructor_average(), removeSingular = self.removeSingular) 
     247        learner = LogRegLearner(imputer=Orange.feature.imputation.ImputerConstructor_average(), 
     248            remove_singular = self.remove_singular) 
    479249        # get Original Model 
    480         orig_model = learner(examples,weight) 
     250        orig_model = learner(instances,weight) 
    481251        if orig_model.fit_status: 
    482252            print "Warning: model did not converge" 
     
    485255        if weight == 0: 
    486256            weight = Orange.data.new_meta_id() 
    487             examples.addMetaAttribute(weight, 1.0) 
    488         extended_set_of_examples = createLogRegExampleTable(examples, weight) 
     257            instances.addMetaAttribute(weight, 1.0) 
     258        extended_set_of_examples = createLogRegExampleTable(instances, weight) 
    489259        extended_models = [learner(extended_examples, weight) \ 
    490260                           for extended_examples in extended_set_of_examples] 
     
    494264##        print orig_model.domain 
    495265##        print orig_model.beta 
    496 ##        print orig_model.beta[orig_model.continuizedDomain.attributes[-1]] 
     266##        print orig_model.beta[orig_model.continuized_domain.features[-1]] 
    497267##        for i,m in enumerate(extended_models): 
    498 ##            print examples.domain.attributes[i] 
     268##            print examples.domain.features[i] 
    499269##            printOUT(m) 
    500270             
     
    505275        betas_ap = [] 
    506276        for m in extended_models: 
    507             beta_add = m.beta[m.continuizedDomain.attributes[-1]] 
     277            beta_add = m.beta[m.continuized_domain.features[-1]] 
    508278            betas_ap.append(beta_add) 
    509279            beta = beta + beta_add 
     
    514284         
    515285        # compare it to bayes prior 
    516         bayes = Orange.core.BayesLearner(examples) 
     286        bayes = Orange.classification.bayes.NaiveLearner(instances) 
    517287        bayes_prior = math.log(bayes.distribution[1]/bayes.distribution[0]) 
    518288 
     
    521291##        print "lr", orig_model.beta[0] 
    522292##        print "lr2", logistic_prior 
    523 ##        print "dist", Orange.core.Distribution(examples.domain.classVar,examples) 
     293##        print "dist", Orange.statistics.distribution.Distribution(examples.domain.class_var,examples) 
    524294##        print "prej", betas_ap 
    525295 
     
    544314        # vrni originalni model in pripadajoce apriorne niclele 
    545315        return (orig_model, betas_ap) 
    546         #return (bayes_prior,orig_model.beta[examples.domain.classVar],logistic_prior) 
     316        #return (bayes_prior,orig_model.beta[examples.domain.class_var],logistic_prior) 
     317 
     318LogRegLearnerGetPriors = deprecated_members({"removeSingular": 
     319                                                 "remove_singular"} 
     320)(LogRegLearnerGetPriors) 
    547321 
    548322class LogRegLearnerGetPriorsOneTable: 
    549     def __init__(self, removeSingular=0, **kwds): 
     323    @deprecated_keywords({"removeSingular": "remove_singular"}) 
     324    def __init__(self, remove_singular=0, **kwds): 
    550325        self.__dict__.update(kwds) 
    551         self.removeSingular = removeSingular 
    552     def __call__(self, examples, weight=0): 
     326        self.remove_singular = remove_singular 
     327 
     328    @deprecated_keywords({"examples": "instances"}) 
     329    def __call__(self, instances, weight=0): 
    553330        # next function changes data set to a extended with unknown values  
    554331        def createLogRegExampleTable(data, weightID): 
    555             finalData = Orange.core.ExampleTable(data) 
    556             origData = Orange.core.ExampleTable(data) 
    557             for at in data.domain.attributes: 
     332            finalData = Orange.data.Table(data) 
     333            orig_data = Orange.data.Table(data) 
     334            for at in data.domain.features: 
    558335                # za vsak atribut kreiraj nov newExampleTable newData 
    559336                # v dataOrig, dataFinal in newData dodaj nov atribut -- continuous variable 
    560                 if at.varType == Orange.core.VarTypes.Continuous: 
    561                     atDisc = Orange.core.FloatVariable(at.name + "Disc") 
    562                     newDomain = Orange.core.Domain(origData.domain.attributes+[atDisc,data.domain.classVar]) 
     337                if at.var_type == Orange.data.Type.Continuous: 
     338                    atDisc = Orange.data.variable.Continuous(at.name + "Disc") 
     339                    newDomain = Orange.data.Domain(orig_data.domain.features+[atDisc,data.domain.class_var]) 
    563340                    newDomain.addmetas(newData.domain.getmetas()) 
    564                     finalData = Orange.core.ExampleTable(newDomain,finalData) 
    565                     newData = Orange.core.ExampleTable(newDomain,origData) 
    566                     origData = Orange.core.ExampleTable(newDomain,origData) 
    567                     for d in origData: 
     341                    finalData = Orange.data.Table(newDomain,finalData) 
     342                    newData = Orange.data.Table(newDomain,orig_data) 
     343                    orig_data = Orange.data.Table(newDomain,orig_data) 
     344                    for d in orig_data: 
    568345                        d[atDisc] = 0 
    569346                    for d in finalData: 
     
    574351                        d[weightID] = 100*data[i][weightID] 
    575352                         
    576                 elif at.varType == Orange.core.VarTypes.Discrete: 
     353                elif at.var_type == Orange.data.Type.Discrete: 
    577354                # v dataOrig, dataFinal in newData atributu "at" dodaj ee  eno  vreednost, ki ima vrednost kar  ime atributa +  "X" 
    578                     atNew = Orange.core.EnumVariable(at.name, values = at.values + [at.name+"X"]) 
    579                     newDomain = Orange.core.Domain(filter(lambda x: x!=at, origData.domain.attributes)+[atNew,origData.domain.classVar]) 
    580                     newDomain.addmetas(origData.domain.getmetas()) 
    581                     temp_finalData = Orange.core.ExampleTable(finalData) 
    582                     finalData = Orange.core.ExampleTable(newDomain,finalData) 
    583                     newData = Orange.core.ExampleTable(newDomain,origData) 
    584                     temp_origData = Orange.core.ExampleTable(origData) 
    585                     origData = Orange.core.ExampleTable(newDomain,origData) 
    586                     for i,d in enumerate(origData): 
    587                         d[atNew] = temp_origData[i][at] 
     355                    at_new = Orange.data.variable.Discrete(at.name, values = at.values + [at.name+"X"]) 
     356                    newDomain = Orange.data.Domain(filter(lambda x: x!=at, orig_data.domain.features)+[at_new,orig_data.domain.class_var]) 
     357                    newDomain.addmetas(orig_data.domain.getmetas()) 
     358                    temp_finalData = Orange.data.Table(finalData) 
     359                    finalData = Orange.data.Table(newDomain,finalData) 
     360                    newData = Orange.data.Table(newDomain,orig_data) 
     361                    temp_origData = Orange.data.Table(orig_data) 
     362                    orig_data = Orange.data.Table(newDomain,orig_data) 
     363                    for i,d in enumerate(orig_data): 
     364                        d[at_new] = temp_origData[i][at] 
    588365                    for i,d in enumerate(finalData): 
    589                         d[atNew] = temp_finalData[i][at]                         
     366                        d[at_new] = temp_finalData[i][at] 
    590367                    for i,d in enumerate(newData): 
    591                         d[atNew] = at.name+"X" 
     368                        d[at_new] = at.name+"X" 
    592369                        d[weightID] = 10*data[i][weightID] 
    593370                finalData.extend(newData) 
    594371            return finalData 
    595372                   
    596         learner = LogRegLearner(imputer = Orange.core.ImputerConstructor_average(), removeSingular = self.removeSingular) 
     373        learner = LogRegLearner(imputer = Orange.feature.imputation.ImputerConstructor_average(), removeSingular = self.remove_singular) 
    597374        # get Original Model 
    598         orig_model = learner(examples,weight) 
     375        orig_model = learner(instances,weight) 
    599376 
    600377        # get extended Model (you should not change data) 
    601378        if weight == 0: 
    602379            weight = Orange.data.new_meta_id() 
    603             examples.addMetaAttribute(weight, 1.0) 
    604         extended_examples = createLogRegExampleTable(examples, weight) 
     380            instances.addMetaAttribute(weight, 1.0) 
     381        extended_examples = createLogRegExampleTable(instances, weight) 
    605382        extended_model = learner(extended_examples, weight) 
    606383 
     
    616393        betas_ap = [] 
    617394        for m in extended_models: 
    618             beta_add = m.beta[m.continuizedDomain.attributes[-1]] 
     395            beta_add = m.beta[m.continuized_domain.features[-1]] 
    619396            betas_ap.append(beta_add) 
    620397            beta = beta + beta_add 
     
    625402         
    626403        # compare it to bayes prior 
    627         bayes = Orange.core.BayesLearner(examples) 
     404        bayes = Orange.classification.bayes.NaiveLearner(instances) 
    628405        bayes_prior = math.log(bayes.distribution[1]/bayes.distribution[0]) 
    629406 
     
    632409        #print "lr", orig_model.beta[0] 
    633410        #print "lr2", logistic_prior 
    634         #print "dist", Orange.core.Distribution(examples.domain.classVar,examples) 
     411        #print "dist", Orange.statistics.distribution.Distribution(examples.domain.class_var,examples) 
    635412        k = (bayes_prior-orig_model.beta[0])/(logistic_prior-orig_model.beta[0]) 
    636413        #print "prej", betas_ap 
     
    640417        # vrni originalni model in pripadajoce apriorne niclele 
    641418        return (orig_model, betas_ap) 
    642         #return (bayes_prior,orig_model.beta[data.domain.classVar],logistic_prior) 
     419        #return (bayes_prior,orig_model.beta[data.domain.class_var],logistic_prior) 
     420 
     421LogRegLearnerGetPriorsOneTable = deprecated_members({"removeSingular": 
     422                                                         "remove_singular"} 
     423)(LogRegLearnerGetPriorsOneTable) 
    643424 
    644425 
     
    655436    for i,x_i in enumerate(x): 
    656437        pr = pr(x_i,betas) 
    657         llh += y[i]*log(max(pr,1e-6)) + (1-y[i])*log(max(1-pr,1e-6)) 
     438        llh += y[i]*math.log(max(pr,1e-6)) + (1-y[i])*log(max(1-pr,1e-6)) 
    658439    return llh 
    659440 
    660441 
    661442def diag(vector): 
    662     mat = identity(len(vector), Float) 
     443    mat = identity(len(vector)) 
    663444    for i,v in enumerate(vector): 
    664445        mat[i][i] = v 
    665446    return mat 
    666447     
    667 class SimpleFitter(Orange.core.LogRegFitter): 
     448class SimpleFitter(LogRegFitter): 
    668449    def __init__(self, penalty=0, se_penalty = False): 
    669450        self.penalty = penalty 
    670451        self.se_penalty = se_penalty 
     452 
    671453    def __call__(self, data, weight=0): 
    672454        ml = data.native(0) 
    673         for i in range(len(data.domain.attributes)): 
    674           a = data.domain.attributes[i] 
    675           if a.varType == Orange.core.VarTypes.Discrete: 
     455        for i in range(len(data.domain.features)): 
     456          a = data.domain.features[i] 
     457          if a.var_type == Orange.data.Type.Discrete: 
    676458            for m in ml: 
    677459              m[i] = a.values.index(m[i]) 
    678460        for m in ml: 
    679           m[-1] = data.domain.classVar.values.index(m[-1]) 
     461          m[-1] = data.domain.class_var.values.index(m[-1]) 
    680462        Xtmp = array(ml) 
    681463        y = Xtmp[:,-1]   # true probabilities (1's or 0's) 
     
    683465        X=concatenate((one, Xtmp[:,:-1]),1)  # intercept first, then data 
    684466 
    685         betas = array([0.0] * (len(data.domain.attributes)+1)) 
    686         oldBetas = array([1.0] * (len(data.domain.attributes)+1)) 
     467        betas = array([0.0] * (len(data.domain.features)+1)) 
     468        oldBetas = array([1.0] * (len(data.domain.features)+1)) 
    687469        N = len(data) 
    688470 
    689         pen_matrix = array([self.penalty] * (len(data.domain.attributes)+1)) 
     471        pen_matrix = array([self.penalty] * (len(data.domain.features)+1)) 
    690472        if self.se_penalty: 
    691473            p = array([pr(X[i], betas) for i in range(len(data))]) 
    692             W = identity(len(data), Float) 
     474            W = identity(len(data)) 
    693475            pp = p * (1.0-p) 
    694476            for i in range(N): 
    695477                W[i,i] = pp[i] 
    696             se = sqrt(diagonal(inverse(matrixmultiply(transpose(X), matrixmultiply(W, X))))) 
     478            se = sqrt(diagonal(inv(dot(transpose(X), dot(W, X))))) 
    697479            for i,p in enumerate(pen_matrix): 
    698480                pen_matrix[i] *= se[i] 
     
    706488            p = array([pr(X[i], betas) for i in range(len(data))]) 
    707489 
    708             W = identity(len(data), Float) 
     490            W = identity(len(data)) 
    709491            pp = p * (1.0-p) 
    710492            for i in range(N): 
    711493                W[i,i] = pp[i] 
    712494 
    713             WI = inverse(W) 
    714             z = matrixmultiply(X, betas) + matrixmultiply(WI, y - p) 
    715  
    716             tmpA = inverse(matrixmultiply(transpose(X), matrixmultiply(W, X))+diag(pen_matrix)) 
    717             tmpB = matrixmultiply(transpose(X), y-p) 
    718             betas = oldBetas + matrixmultiply(tmpA,tmpB) 
    719 #            betaTemp = matrixmultiply(matrixmultiply(matrixmultiply(matrixmultiply(tmpA,transpose(X)),W),X),oldBetas) 
     495            WI = inv(W) 
     496            z = dot(X, betas) + dot(WI, y - p) 
     497 
     498            tmpA = inv(dot(transpose(X), dot(W, X))+diag(pen_matrix)) 
     499            tmpB = dot(transpose(X), y-p) 
     500            betas = oldBetas + dot(tmpA,tmpB) 
     501#            betaTemp = dot(dot(dot(dot(tmpA,transpose(X)),W),X),oldBetas) 
    720502#            print betaTemp 
    721 #            tmpB = matrixmultiply(transpose(X), matrixmultiply(W, z)) 
    722 #            betas = matrixmultiply(tmpA, tmpB) 
     503#            tmpB = dot(transpose(X), dot(W, z)) 
     504#            betas = dot(tmpA, tmpB) 
    723505            likelihood_new = lh(X,y,betas)-self.penalty*sum([b*b for b in betas]) 
    724506            print likelihood_new 
     
    726508             
    727509             
    728 ##        XX = sqrt(diagonal(inverse(matrixmultiply(transpose(X),X)))) 
     510##        XX = sqrt(diagonal(inv(dot(transpose(X),X)))) 
    729511##        yhat = array([pr(X[i], betas) for i in range(len(data))]) 
    730 ##        ss = sum((y - yhat) ** 2) / (N - len(data.domain.attributes) - 1) 
     512##        ss = sum((y - yhat) ** 2) / (N - len(data.domain.features) - 1) 
    731513##        sigma = math.sqrt(ss) 
    732514        p = array([pr(X[i], betas) for i in range(len(data))]) 
    733         W = identity(len(data), Float) 
     515        W = identity(len(data)) 
    734516        pp = p * (1.0-p) 
    735517        for i in range(N): 
    736518            W[i,i] = pp[i] 
    737         diXWX = sqrt(diagonal(inverse(matrixmultiply(transpose(X), matrixmultiply(W, X))))) 
    738         xTemp = matrixmultiply(matrixmultiply(inverse(matrixmultiply(transpose(X), matrixmultiply(W, X))),transpose(X)),y) 
     519        diXWX = sqrt(diagonal(inv(dot(transpose(X), dot(W, X))))) 
     520        xTemp = dot(dot(inv(dot(transpose(X), dot(W, X))),transpose(X)),y) 
    739521        beta = [] 
    740522        beta_se = [] 
     
    752534    return exp(bx)/(1+exp(bx)) 
    753535 
    754 class BayesianFitter(Orange.core.LogRegFitter): 
     536class BayesianFitter(LogRegFitter): 
    755537    def __init__(self, penalty=0, anch_examples=[], tau = 0): 
    756538        self.penalty = penalty 
     
    763545        # convert data to numeric 
    764546        ml = data.native(0) 
    765         for i,a in enumerate(data.domain.attributes): 
    766           if a.varType == Orange.core.VarTypes.Discrete: 
     547        for i,a in enumerate(data.domain.features): 
     548          if a.var_type == Orange.data.Type.Discrete: 
    767549            for m in ml: 
    768550              m[i] = a.values.index(m[i]) 
    769551        for m in ml: 
    770           m[-1] = data.domain.classVar.values.index(m[-1]) 
     552          m[-1] = data.domain.class_var.values.index(m[-1]) 
    771553        Xtmp = array(ml) 
    772554        y = Xtmp[:,-1]   # true probabilities (1's or 0's) 
     
    778560        (X,y)=self.create_array_data(data) 
    779561 
    780         exTable = Orange.core.ExampleTable(data.domain) 
     562        exTable = Orange.data.Table(data.domain) 
    781563        for id,ex in self.anch_examples: 
    782             exTable.extend(Orange.core.ExampleTable(ex,data.domain)) 
     564            exTable.extend(Orange.data.Table(ex,data.domain)) 
    783565        (X_anch,y_anch)=self.create_array_data(exTable) 
    784566 
    785         betas = array([0.0] * (len(data.domain.attributes)+1)) 
     567        betas = array([0.0] * (len(data.domain.features)+1)) 
    786568 
    787569        likelihood,betas = self.estimate_beta(X,y,betas,[0]*(len(betas)),X_anch,y_anch) 
    788570 
    789571        # get attribute groups atGroup = [(startIndex, number of values), ...) 
    790         ats = data.domain.attributes 
     572        ats = data.domain.features 
    791573        atVec=reduce(lambda x,y: x+[(y,not y==x[-1][0])], [a.getValueFrom and a.getValueFrom.whichVar or a for a in ats],[(ats[0].getValueFrom and ats[0].getValueFrom.whichVar or ats[0],0)])[1:] 
    792574        atGroup=[[0,0]] 
     
    808590            print "betas", betas[0], betas_temp[0] 
    809591            sumB += betas[0]-betas_temp[0] 
    810         apriori = Orange.core.Distribution(data.domain.classVar, data) 
     592        apriori = Orange.statistics.distribution.Distribution(data.domain.class_var, data) 
    811593        aprioriProb = apriori[0]/apriori.abs 
    812594         
     
    839621            for j in range(len(betas)): 
    840622                if const_betas[j]: continue 
    841                 dl = matrixmultiply(X[:,j],transpose(y-p)) 
     623                dl = dot(X[:,j], transpose(y-p)) 
    842624                for xi,x in enumerate(X_anch): 
    843625                    dl += self.penalty*x[j]*(y_anch[xi] - pr_bx(r_anch[xi]*self.penalty)) 
    844626 
    845                 ddl = matrixmultiply(X_sq[:,j],transpose(p*(1-p))) 
     627                ddl = dot(X_sq[:,j], transpose(p*(1-p))) 
    846628                for xi,x in enumerate(X_anch): 
    847629                    ddl += self.penalty*x[j]*pr_bx(r[xi]*self.penalty)*(1-pr_bx(r[xi]*self.penalty)) 
     
    887669#  Feature subset selection for logistic regression 
    888670 
    889 def get_likelihood(fitter, examples): 
    890     res = fitter(examples) 
     671@deprecated_keywords({"examples": "instances"}) 
     672def get_likelihood(fitter, instances): 
     673    res = fitter(instances) 
    891674    if res[0] in [fitter.OK]: #, fitter.Infinity, fitter.Divergence]: 
    892675       status, beta, beta_se, likelihood = res 
    893676       if sum([abs(b) for b in beta])<sum([abs(b) for b in beta_se]): 
    894            return -100*len(examples) 
     677           return -100*len(instances) 
    895678       return likelihood 
    896679    else: 
    897        return -100*len(examples) 
     680       return -100*len(instances) 
    898681         
    899682 
    900683 
    901684class StepWiseFSS(Orange.classification.Learner): 
    902   """Implementation of algorithm described in [Hosmer and Lemeshow, Applied Logistic Regression, 2000]. 
     685  """ 
     686  Algorithm described in Hosmer and Lemeshow, 
     687  Applied Logistic Regression, 2000. 
    903688 
    904689  Perform stepwise logistic regression and return a list of the 
     
    907692  chosen feature is tested for a significant contribution to the overall 
    908693  model. If the worst among all tested features has higher significance 
    909   than is specified in :obj:`deleteCrit`, the feature is removed from 
     694  than is specified in :obj:`delete_crit`, the feature is removed from 
    910695  the model. The second step is forward selection, which is similar to 
    911696  backward elimination. It loops through all the features that are not 
    912697  in the model and tests whether they contribute to the common model 
    913   with significance lower that :obj:`addCrit`. The algorithm stops when 
     698  with significance lower that :obj:`add_crit`. The algorithm stops when 
    914699  no feature in the model is to be removed and no feature not in the 
    915   model is to be added. By setting :obj:`numFeatures` larger than -1, 
     700  model is to be added. By setting :obj:`num_features` larger than -1, 
    916701  the algorithm will stop its execution when the number of features in model 
    917702  exceeds that number. 
     
    923708  If :obj:`table` is specified, stepwise logistic regression implemented 
    924709  in :obj:`StepWiseFSS` is performed and a list of chosen features 
    925   is returned. If :obj:`table` is not specified an instance of 
    926   :obj:`StepWiseFSS` with all parameters set is returned. 
    927  
    928   :param table: data set 
     710  is returned. If :obj:`table` is not specified, an instance of 
     711  :obj:`StepWiseFSS` with all parameters set is returned and can be called 
     712  with data later. 
     713 
     714  :param table: data set. 
    929715  :type table: Orange.data.Table 
    930716 
    931   :param addCrit: "Alpha" level to judge if variable has enough importance to be added in the new set. (e.g. if addCrit is 0.2, then features is added if its P is lower than 0.2) 
    932   :type addCrit: float 
    933  
    934   :param deleteCrit: Similar to addCrit, just that it is used at backward elimination. It should be higher than addCrit! 
    935   :type deleteCrit: float 
    936  
    937   :param numFeatures: maximum number of selected features, use -1 for infinity. 
    938   :type numFeatures: int 
     717  :param add_crit: "Alpha" level to judge if variable has enough importance to 
     718       be added in the new set. (e.g. if add_crit is 0.2, 
     719       then features is added if its P is lower than 0.2). 
     720  :type add_crit: float 
     721 
     722  :param delete_crit: Similar to add_crit, just that it is used at backward 
     723      elimination. It should be higher than add_crit! 
     724  :type delete_crit: float 
     725 
     726  :param num_features: maximum number of selected features, 
     727      use -1 for infinity. 
     728  :type num_features: int 
    939729  :rtype: :obj:`StepWiseFSS` or list of features 
    940730 
     
    949739          return self 
    950740 
    951  
    952   def __init__(self, addCrit=0.2, deleteCrit=0.3, numFeatures = -1, **kwds): 
     741  @deprecated_keywords({"addCrit": "add_crit", "deleteCrit": "delete_crit", 
     742                        "numFeatures": "num_features"}) 
     743  def __init__(self, add_crit=0.2, delete_crit=0.3, num_features = -1, **kwds): 
    953744    self.__dict__.update(kwds) 
    954     self.addCrit = addCrit 
    955     self.deleteCrit = deleteCrit 
    956     self.numFeatures = numFeatures 
     745    self.add_crit = add_crit 
     746    self.delete_crit = delete_crit 
     747    self.num_features = num_features 
     748 
    957749  def __call__(self, examples): 
    958750    if getattr(self, "imputer", 0): 
     
    960752    if getattr(self, "removeMissing", 0): 
    961753        examples = Orange.core.Preprocessor_dropMissing(examples) 
    962     continuizer = Orange.core.DomainContinuizer(zeroBased=1,continuousTreatment=Orange.core.DomainContinuizer.Leave, 
    963                                            multinomialTreatment = Orange.core.DomainContinuizer.FrequentIsBase, 
    964                                            classTreatment = Orange.core.DomainContinuizer.Ignore) 
     754    continuizer = Orange.preprocess.DomainContinuizer(zeroBased=1, 
     755        continuousTreatment=Orange.preprocess.DomainContinuizer.Leave, 
     756                                           multinomialTreatment = Orange.preprocess.DomainContinuizer.FrequentIsBase, 
     757                                           classTreatment = Orange.preprocess.DomainContinuizer.Ignore) 
    965758    attr = [] 
    966     remain_attr = examples.domain.attributes[:] 
     759    remain_attr = examples.domain.features[:] 
    967760 
    968761    # get LL for Majority Learner  
    969     tempDomain = Orange.core.Domain(attr,examples.domain.classVar) 
     762    tempDomain = Orange.data.Domain(attr,examples.domain.class_var) 
    970763    #tempData  = Orange.core.Preprocessor_dropMissing(examples.select(tempDomain)) 
    971764    tempData  = Orange.core.Preprocessor_dropMissing(examples.select(tempDomain)) 
    972765 
    973     ll_Old = get_likelihood(Orange.core.LogRegFitter_Cholesky(), tempData) 
     766    ll_Old = get_likelihood(LogRegFitter_Cholesky(), tempData) 
    974767    ll_Best = -1000000 
    975768    length_Old = float(len(tempData)) 
     
    989782 
    990783                tempAttr = filter(lambda x: x!=at, attr) 
    991                 tempDomain = Orange.core.Domain(tempAttr,examples.domain.classVar) 
     784                tempDomain = Orange.data.Domain(tempAttr,examples.domain.class_var) 
    992785                tempDomain.addmetas(examples.domain.getmetas()) 
    993786                # domain, calculate P for LL improvement. 
     
    995788                tempData = Orange.core.Preprocessor_dropMissing(examples.select(tempDomain)) 
    996789 
    997                 ll_Delete = get_likelihood(Orange.core.LogRegFitter_Cholesky(), tempData) 
     790                ll_Delete = get_likelihood(LogRegFitter_Cholesky(), tempData) 
    998791                length_Delete = float(len(tempData)) 
    999792                length_Avg = (length_Delete + length_Old)/2.0 
     
    1001794                G=-2*length_Avg*(ll_Delete/length_Delete-ll_Old/length_Old) 
    1002795 
    1003                 # set new worst attribute                 
     796                # set new worst attribute 
    1004797                if G<minG: 
    1005798                    worstAt = at 
     
    1008801                    length_Best = length_Delete 
    1009802            # deletion of attribute 
    1010              
    1011             if worstAt.varType==Orange.core.VarTypes.Continuous: 
     803 
     804            if worstAt.var_type==Orange.data.Type.Continuous: 
    1012805                P=lchisqprob(minG,1); 
    1013806            else: 
    1014807                P=lchisqprob(minG,len(worstAt.values)-1); 
    1015             if P>=self.deleteCrit: 
     808            if P>=self.delete_crit: 
    1016809                attr.remove(worstAt) 
    1017810                remain_attr.append(worstAt) 
     
    1024817            nodeletion = 1 
    1025818            # END OF DELETION PART 
    1026              
     819 
    1027820        # if enough attributes has been chosen, stop the procedure 
    1028         if self.numFeatures>-1 and len(attr)>=self.numFeatures: 
     821        if self.num_features>-1 and len(attr)>=self.num_features: 
    1029822            remain_attr=[] 
    1030           
     823 
    1031824        # for each attribute in the remaining 
    1032825        maxG=-1 
     
    1036829        for at in remain_attr: 
    1037830            tempAttr = attr + [at] 
    1038             tempDomain = Orange.core.Domain(tempAttr,examples.domain.classVar) 
     831            tempDomain = Orange.data.Domain(tempAttr,examples.domain.class_var) 
    1039832            tempDomain.addmetas(examples.domain.getmetas()) 
    1040833            # domain, calculate P for LL improvement. 
    1041834            tempDomain  = continuizer(Orange.core.Preprocessor_dropMissing(examples.select(tempDomain))) 
    1042835            tempData = Orange.core.Preprocessor_dropMissing(examples.select(tempDomain)) 
    1043             ll_New = get_likelihood(Orange.core.LogRegFitter_Cholesky(), tempData) 
     836            ll_New = get_likelihood(LogRegFitter_Cholesky(), tempData) 
    1044837 
    1045838            length_New = float(len(tempData)) # get number of examples in tempData to normalize likelihood 
     
    1056849            stop = 1 
    1057850            continue 
    1058          
    1059         if bestAt.varType==Orange.core.VarTypes.Continuous: 
     851 
     852        if bestAt.var_type==Orange.data.Type.Continuous: 
    1060853            P=lchisqprob(maxG,1); 
    1061854        else: 
    1062855            P=lchisqprob(maxG,len(bestAt.values)-1); 
    1063856        # Add attribute with smallest P to attributes(attr) 
    1064         if P<=self.addCrit: 
     857        if P<=self.add_crit: 
    1065858            attr.append(bestAt) 
    1066859            remain_attr.remove(bestAt) 
     
    1068861            length_Old = length_Best 
    1069862 
    1070         if (P>self.addCrit and nodeletion) or (bestAt == worstAt): 
     863        if (P>self.add_crit and nodeletion) or (bestAt == worstAt): 
    1071864            stop = 1 
    1072865 
    1073866    return attr 
     867 
     868StepWiseFSS = deprecated_members({"addCrit": "add_crit", 
     869                                   "deleteCrit": "delete_crit", 
     870                                   "numFeatures": "num_features"})(StepWiseFSS) 
    1074871 
    1075872 
     
    1082879        else: 
    1083880            return self 
    1084      
    1085     def __init__(self, addCrit=0.2, deleteCrit=0.3, numFeatures = -1): 
    1086         self.addCrit = addCrit 
    1087         self.deleteCrit = deleteCrit 
    1088         self.numFeatures = numFeatures 
    1089  
    1090     def __call__(self, examples): 
    1091         attr = StepWiseFSS(examples, addCrit=self.addCrit, deleteCrit = self.deleteCrit, numFeatures = self.numFeatures) 
    1092         return examples.select(Orange.core.Domain(attr, examples.domain.classVar)) 
    1093                  
     881 
     882    @deprecated_keywords({"addCrit": "add_crit", "deleteCrit": "delete_crit", 
     883                          "numFeatures": "num_features"}) 
     884    def __init__(self, add_crit=0.2, delete_crit=0.3, num_features = -1): 
     885        self.add_crit = add_crit 
     886        self.delete_crit = delete_crit 
     887        self.num_features = num_features 
     888 
     889    @deprecated_keywords({"examples": "instances"}) 
     890    def __call__(self, instances): 
     891        attr = StepWiseFSS(instances, add_crit=self.add_crit, 
     892            delete_crit= self.delete_crit, num_features= self.num_features) 
     893        return instances.select(Orange.data.Domain(attr, instances.domain.class_var)) 
     894 
     895StepWiseFSSFilter = deprecated_members({"addCrit": "add_crit", 
     896                                        "deleteCrit": "delete_crit", 
     897                                        "numFeatures": "num_features"})\ 
     898    (StepWiseFSSFilter) 
     899 
    1094900 
    1095901#################################### 
  • Orange/data/io.py

    r9671 r9799  
    101101MakeStatus = Variable.MakeStatus 
    102102 
    103 def loadARFF(filename, create_on_new = MakeStatus.Incompatible, **kwargs): 
     103def loadARFF(filename, create_on_new=MakeStatus.Incompatible, **kwargs): 
    104104    """Return class:`Orange.data.Table` containing data from file in Weka ARFF format 
    105105       if there exists no .xml file with the same name. If it does, a multi-label 
     
    109109        filename = filename[:-5] 
    110110    if os.path.exists(filename + ".xml") and os.path.exists(filename + ".arff"): 
    111         xml_name = filename + ".xml"  
    112         arff_name = filename + ".arff"  
    113         return Orange.multilabel.mulan.trans_mulan_data(xml_name,arff_name,create_on_new) 
     111        xml_name = filename + ".xml" 
     112        arff_name = filename + ".arff" 
     113        return Orange.multilabel.mulan.trans_mulan_data(xml_name, arff_name, create_on_new) 
    114114    else: 
    115115        return loadARFF_Weka(filename, create_on_new) 
    116          
    117 def loadARFF_Weka(filename, create_on_new = Orange.data.variable.Variable.MakeStatus.Incompatible, **kwargs): 
     116 
     117def loadARFF_Weka(filename, create_on_new=Orange.data.variable.Variable.MakeStatus.Incompatible, **kwargs): 
    118118    """Return class:`Orange.data.Table` containing data from file in Weka ARFF format""" 
    119119    if not os.path.exists(filename) and os.path.exists(filename + ".arff"): 
    120         filename = filename + ".arff"  
    121     f = open(filename,'r') 
    122      
     120        filename = filename + ".arff" 
     121    f = open(filename, 'r') 
     122 
    123123    attributes = [] 
    124124    attributeLoadStatus = [] 
    125      
     125 
    126126    name = '' 
    127127    state = 0 # header 
     
    129129    for l in f.readlines(): 
    130130        l = l.rstrip("\n") # strip \n 
    131         l = l.replace('\t',' ') # get rid of tabs 
     131        l = l.replace('\t', ' ') # get rid of tabs 
    132132        x = l.split('%')[0] # strip comments 
    133133        if len(x.strip()) == 0: 
    134134            continue 
    135135        if state == 0 and x[0] != '@': 
    136             print "ARFF import ignoring:",x 
     136            print "ARFF import ignoring:", x 
    137137        if state == 1: 
    138138            if x[0] == '{':#sparse data format, begin with '{', ends with '}' 
    139                 r = [None]*len(attributes) 
     139                r = [None] * len(attributes) 
    140140                dd = x[1:-1] 
    141141                dd = dd.split(',') 
     
    152152                    y = xs.strip(" ") 
    153153                    if len(y) > 0: 
    154                         if y[0]=="'" or y[0]=='"': 
     154                        if y[0] == "'" or y[0] == '"': 
    155155                            r.append(xs.strip("'\"")) 
    156156                        else: 
     
    177177                    while y[idx][-1] != "'": 
    178178                        idx += 1 
    179                         atn += ' '+y[idx] 
     179                        atn += ' ' + y[idx] 
    180180                    atn = atn.strip("' ") 
    181181                else: 
     
    188188                    for y in w[0].split(','): 
    189189                        sy = y.strip(" '\"") 
    190                         if len(sy)>0: 
     190                        if len(sy) > 0: 
    191191                            vals.append(sy) 
    192192                    a, s = Variable.make(atn, Orange.data.Type.Discrete, vals, [], create_on_new) 
     
    194194                    # real... 
    195195                    a, s = Variable.make(atn, Orange.data.Type.Continuous, [], [], create_on_new) 
    196                      
     196 
    197197                attributes.append(a) 
    198198                attributeLoadStatus.append(s) 
     
    201201    lex = [] 
    202202    for dd in data: 
    203         e = Orange.data.Instance(d,dd) 
     203        e = Orange.data.Instance(d, dd) 
    204204        lex.append(e) 
    205     t = Orange.data.Table(d,lex) 
     205    t = Orange.data.Table(d, lex) 
    206206    t.name = name 
    207      
    208     if hasattr(t, "attribute_load_status"): 
    209         t.attribute_load_status = attributeLoadStatus 
     207 
     208    #if hasattr(t, "attribute_load_status"): 
     209    t.setattr("attribute_load_status", attributeLoadStatus) 
    210210    return t 
    211211loadARFF = Orange.misc.deprecated_keywords( 
     
    220220        filename = filename[:-5] 
    221221    #print filename 
    222     f = open(filename+'.arff','w') 
    223     f.write('@relation %s\n'%t.domain.classVar.name) 
     222    f = open(filename + '.arff', 'w') 
     223    f.write('@relation %s\n' % t.domain.classVar.name) 
    224224    # attributes 
    225225    ats = [i for i in t.domain.attributes] 
     
    240240        iname = str(i.name) 
    241241        if iname.find(" ") != -1: 
    242             iname = "'%s'"%iname 
    243         if real==1: 
    244             f.write('@attribute %s real\n'%iname) 
     242            iname = "'%s'" % iname 
     243        if real == 1: 
     244            f.write('@attribute %s real\n' % iname) 
    245245        else: 
    246             f.write('@attribute %s { '%iname) 
     246            f.write('@attribute %s { ' % iname) 
    247247            x = [] 
    248248            for j in i.values: 
    249249                s = str(j) 
    250250                if s.find(" ") == -1: 
    251                     x.append("%s"%s) 
     251                    x.append("%s" % s) 
    252252                else: 
    253                     x.append("'%s'"%s) 
     253                    x.append("'%s'" % s) 
    254254            for j in x[:-1]: 
    255                 f.write('%s,'%j) 
    256             f.write('%s }\n'%x[-1]) 
     255                f.write('%s,' % j) 
     256            f.write('%s }\n' % x[-1]) 
    257257 
    258258    # examples 
     
    263263            s = str(j[i]) 
    264264            if s.find(" ") == -1: 
    265                 x.append("%s"%s) 
     265                x.append("%s" % s) 
    266266            else: 
    267                 x.append("'%s'"%s) 
     267                x.append("'%s'" % s) 
    268268        for i in x[:-1]: 
    269             f.write('%s,'%i) 
    270         f.write('%s\n'%x[-1]) 
    271  
    272 def loadMULAN(filename, create_on_new = Orange.data.variable.Variable.MakeStatus.Incompatible, **kwargs): 
     269            f.write('%s,' % i) 
     270        f.write('%s\n' % x[-1]) 
     271 
     272def loadMULAN(filename, create_on_new=Orange.data.variable.Variable.MakeStatus.Incompatible, **kwargs): 
    273273    """Return class:`Orange.data.Table` containing data from file in Mulan ARFF and XML format""" 
    274274    if filename[-4:] == ".xml": 
    275275        filename = filename[:-4] 
    276276    if os.path.exists(filename + ".xml") and os.path.exists(filename + ".arff"): 
    277         xml_name = filename + ".xml"  
    278         arff_name = filename + ".arff"  
    279         return Orange.multilabel.mulan.trans_mulan_data(xml_name,arff_name) 
     277        xml_name = filename + ".xml" 
     278        arff_name = filename + ".arff" 
     279        return Orange.multilabel.mulan.trans_mulan_data(xml_name, arff_name) 
    280280    else: 
    281281        return None 
     
    309309            else: 
    310310                real = 0 
    311         if real==1: 
     311        if real == 1: 
    312312            f.write('%s: continuous.\n' % i.name) 
    313313        else: 
     
    321321    # examples 
    322322    f.close() 
    323      
     323 
    324324    f = open('%s.data' % filename_prefix, 'w') 
    325325    for j in t: 
     
    331331        f.write('%s\n' % x[-1]) 
    332332 
    333 def toR(filename,t): 
     333def toR(filename, t): 
    334334    """Save class:`Orange.data.Table` to file in R format""" 
    335335    if str.upper(filename[-2:]) == ".R": 
    336336        filename = filename[:-2] 
    337     f = open(filename+'.R','w') 
     337    f = open(filename + '.R', 'w') 
    338338 
    339339    atyp = [] 
     
    352352    for i in xrange(len(labels)): 
    353353        if atyp[i] == 2: # continuous 
    354             f.write('"%s" = c('%(labels[i])) 
     354            f.write('"%s" = c(' % (labels[i])) 
    355355            for j in xrange(len(t)): 
    356356                if t[j][i].isSpecial(): 
     
    358358                else: 
    359359                    f.write(str(t[j][i])) 
    360                 if (j == len(t)-1): 
     360                if (j == len(t) - 1): 
    361361                    f.write(')') 
    362362                else: 
     
    364364        elif atyp[i] == 1: # discrete 
    365365            if aord[i]: # ordered 
    366                 f.write('"%s" = ordered('%labels[i]) 
     366                f.write('"%s" = ordered(' % labels[i]) 
    367367            else: 
    368                 f.write('"%s" = factor('%labels[i]) 
     368                f.write('"%s" = factor(' % labels[i]) 
    369369            f.write('levels=c(') 
    370370            for j in xrange(len(as0[i].values)): 
    371                 f.write('"x%s"'%(as0[i].values[j])) 
    372                 if j == len(as0[i].values)-1: 
     371                f.write('"x%s"' % (as0[i].values[j])) 
     372                if j == len(as0[i].values) - 1: 
    373373                    f.write('),c(') 
    374374                else: 
     
    378378                    f.write('NA') 
    379379                else: 
    380                     f.write('"x%s"'%str(t[j][i])) 
    381                 if (j == len(t)-1): 
     380                    f.write('"x%s"' % str(t[j][i])) 
     381                if (j == len(t) - 1): 
    382382                    f.write('))') 
    383383                else: 
     
    385385        else: 
    386386            raise "Unknown attribute type." 
    387         if (i < len(labels)-1): 
     387        if (i < len(labels) - 1): 
    388388            f.write(',\n') 
    389389    f.write(')\n') 
    390      
     390 
    391391def toLibSVM(filename, example): 
    392392    """Save class:`Orange.data.Table` to file in LibSVM format""" 
    393393    import Orange.classification.svm 
    394394    Orange.classification.svm.tableToSVMFormat(example, open(filename, "wb")) 
    395      
     395 
    396396def loadLibSVM(filename, create_on_new=MakeStatus.Incompatible, **kwargs): 
    397397    """Return class:`Orange.data.Table` containing data from file in LibSVM format""" 
     
    401401        attributeLoadStatus[attr] = s 
    402402        return attr 
    403      
     403 
    404404    def make_disc(name, unordered): 
    405405        attr, s = Orange.data.variable.make(name, Orange.data.Type.Discrete, [], unordered, create_on_new) 
    406406        attributeLoadStatus[attr] = s 
    407407        return attr 
    408      
     408 
    409409    data = [line.split() for line in open(filename, "rb").read().splitlines() if line.strip()] 
    410410    vars = type("attr", (dict,), {"__missing__": lambda self, key: self.setdefault(key, make_float(key))})() 
     
    447447                res.append(str[start:index]) 
    448448                start = find_start = index + 1 
    449                  
     449 
    450450        elif index == -1: 
    451451            res.append(str[start:]) 
    452452    return res 
    453      
     453 
    454454def is_standard_var_def(cell): 
    455455    """Is the cell a standard variable definition (empty, cont, disc, string) 
     
    460460    except ValueError, ex: 
    461461        return False 
    462      
     462 
    463463def is_var_types_row(row): 
    464464    """ Is the row a variable type definition row (as in the orange .tab file) 
    465465    """ 
    466466    return all(map(is_standard_var_def, row)) 
    467          
     467 
    468468def var_type(cell): 
    469469    """ Return variable type from a variable type definition in cell.  
     
    488488    """ 
    489489    return map(var_type, row) 
    490      
     490 
    491491def is_var_attributes_row(row): 
    492492    """ Is the row an attribute definition row (i.e. the third row in the 
     
    533533    else: 
    534534        raise ValueError("Unknown attribute label definition") 
    535      
     535 
    536536def var_attributes(row): 
    537537    """ Return variable specifiers and label definitions for row 
    538538    """ 
    539539    return map(var_attribute, row) 
    540      
    541          
     540 
     541 
    542542class _var_placeholder(object): 
    543543    """ A place holder for an arbitrary variable while it's values are still unknown. 
     
    546546        self.name = name 
    547547        self.values = set(values) 
    548          
     548 
    549549class _disc_placeholder(_var_placeholder): 
    550550    """ A place holder for discrete variables while their values are not yet known. 
     
    560560    except ValueError: 
    561561        return False 
    562      
     562 
    563563def is_variable_cont(values, n=None, cutoff=0.5): 
    564564    """ Is variable with ``values`` in column (``n`` rows) a continuous variable.  
     
    568568        n = len(values) or 1 
    569569    return (float(cont) / n) > cutoff 
    570      
    571      
     570 
     571 
    572572def is_variable_discrete(values, n=None, cutoff=0.3): 
    573573    """ Is variable with ``values`` in column (``n`` rows) a discrete variable.  
     
    591591    file.seek(0) # Rewind 
    592592    reader = csv.reader(file, dialect=dialect) 
    593      
     593 
    594594    header = types = var_attrs = None 
    595      
     595 
    596596#    if not has_header: 
    597597#        raise ValueError("No header in the data file.") 
    598      
     598 
    599599    header = reader.next() 
    600      
     600 
    601601    if header: 
    602602        # Try to get variable definitions 
     
    604604        if is_var_types_row(types_row): 
    605605            types = var_types(types_row) 
    606      
     606 
    607607    if types: 
    608608        # Try to get the variable attributes 
     
    611611        if is_var_attributes_row(labels_row): 
    612612            var_attrs = var_attributes(labels_row) 
    613              
     613 
    614614    # If definitions not present fill with blanks 
    615615    if not types: 
     
    617617    if not var_attrs: 
    618618        var_attrs = [None] * len(header) 
    619      
     619 
    620620    # start from the beginning 
    621621    file.seek(0) 
     
    624624        if any(defined): # skip definition rows if present in the file 
    625625            reader.next() 
    626      
     626 
    627627    variables = [] 
    628628    undefined_vars = [] 
     
    646646            variables.append(_var_placeholder(name)) 
    647647            undefined_vars.append((i, variables[-1])) 
    648              
     648 
    649649    data = [] 
    650650    for row in reader: 
     
    652652        for ind, var_def in undefined_vars: 
    653653            var_def.values.add(row[ind]) 
    654      
     654 
    655655    for ind, var_def in undefined_vars: 
    656656        values = var_def.values - set(["?", ""]) # TODO: Other unknown strings? 
    657         values = sorted(values)   
     657        values = sorted(values) 
    658658        if isinstance(var_def, _disc_placeholder): 
    659659            variables[ind] = variable.make(var_def.name, Orange.data.Type.Discrete, [], values, create_new_on) 
     
    667667            else: 
    668668                raise ValueError("Strange column in the data") 
    669      
     669 
    670670    vars = [] 
    671671    vars_load_status = [] 
     
    676676        vars.append(var) 
    677677        vars_load_status.append(status) 
    678          
     678 
    679679    attributes = [] 
    680680    class_var = [] 
     
    705705            attribute_load_status.append(status) 
    706706            attribute_indices.append(i) 
    707              
     707 
    708708    if len(class_var) > 1: 
    709709        raise ValueError("Multiple class variables defined") 
    710      
     710 
    711711    class_var = class_var[0] if class_var else None 
    712      
     712 
    713713    attribute_load_status += class_var_load_status 
    714714    variable_indices = attribute_indices + class_indices 
     
    716716    domain.add_metas(metas) 
    717717    normal = [[row[i] for i in variable_indices] for row in data] 
    718     meta_part = [[row[i] for i,_ in meta_indices] for row in data] 
     718    meta_part = [[row[i] for i, _ in meta_indices] for row in data] 
    719719    table = Orange.data.Table(domain, normal) 
    720720    for ex, m_part in zip(table, meta_part): 
    721721        for (column, var), val in zip(meta_indices, m_part): 
    722722            ex[var] = var(val) 
    723              
     723 
    724724    table.metaAttributeLoadStatus = meta_attribute_load_status 
    725725    table.attributeLoadStatus = attribute_load_status 
    726      
     726 
    727727    return table 
    728728 
     
    733733        pass 
    734734    return file 
    735          
     735 
    736736def save_csv(file, table, orange_specific=True, **kwargs): 
    737737    import csv 
     
    745745    names = [v.name for v in all_vars] 
    746746    writer.writerow(names) 
    747      
     747 
    748748    if orange_specific: 
    749749        type_cells = [] 
     
    760760                raise TypeError("Unknown variable type") 
    761761        writer.writerow(type_cells) 
    762          
     762 
    763763        var_attr_cells = [] 
    764764        for spec, var in [("", v) for v in attrs] + \ 
    765                          ([("class", class_var)] if class_var else []) +\ 
     765                         ([("class", class_var)] if class_var else []) + \ 
    766766                         [("m", v) for v in metas]: 
    767              
     767 
    768768            labels = ["{0}={1}".format(*t) for t in var.attributes.items()] # TODO escape spaces 
    769769            var_attr_cells.append(" ".join([spec] if spec else [] + labels)) 
    770              
     770 
    771771        writer.writerow(var_attr_cells) 
    772          
     772 
    773773    for instance in table: 
    774774        instance = list(instance) + [instance[m] for m in metas] 
    775775        writer.writerow(instance) 
    776      
    777          
     776 
     777 
    778778register_file_type("R", None, toR, ".R") 
    779779register_file_type("Weka", loadARFF, toARFF, ".arff") 
     
    825825    """ Return a list of persistent registered (prefix, path) pairs 
    826826    """ 
    827      
     827 
    828828    global_settings_dir = Orange.misc.environ.install_dir 
    829829    user_settings_dir = Orange.misc.environ.orange_settings_dir 
     
    846846    if isinstance(path, list): 
    847847        path = os.path.pathsep.join(path) 
    848          
     848 
    849849    user_settings_dir = Orange.misc.environ.orange_settings_dir 
    850850    if not os.path.exists(user_settings_dir): 
     
    853853        except OSError: 
    854854            pass 
    855      
     855 
    856856    filename = os.path.join(user_settings_dir, "orange-search-paths.cfg") 
    857857    parser = SafeConfigParser() 
    858858    parser.read([filename]) 
    859      
     859 
    860860    if not parser.has_section("paths"): 
    861861        parser.add_section("paths") 
    862          
     862 
    863863    if path is not None: 
    864864        parser.set("paths", prefix, path) 
     
    867867        parser.remove_option("paths", prefix) 
    868868    parser.write(open(filename, "wb")) 
    869      
     869 
    870870def search_paths(prefix=None): 
    871871    """ Return a list of the registered (prefix, path) pairs. 
     
    880880    else: 
    881881        return paths 
    882      
     882 
    883883def set_search_path(prefix, path, persistent=False): 
    884884    """ Associate a search path with a prefix. 
     
    896896    """ 
    897897    global _session_paths 
    898      
     898 
    899899    if isinstance(path, list): 
    900900        path = os.path.pathsep.join(path) 
    901          
     901 
    902902    if persistent: 
    903903        save_persistent_search_path(prefix, path) 
     
    906906    else: 
    907907        _session_paths.append((prefix, path)) 
    908          
     908 
    909909 
    910910def expand_filename(prefixed_name): 
     
    926926    else: 
    927927        raise ValueError("Unknown prefix %r." % prefix) 
    928      
     928 
    929929def find_file(prefixed_name): 
    930930    """ Find the prefixed filename and return its full path. 
     
    932932    if not os.path.exists(prefixed_name): 
    933933        if ":" not in prefixed_name: 
    934             raise ValueError("Not a prefixed name.")  
    935         prefix, filename = prefixed_name.split(":", 1)  
     934            raise ValueError("Not a prefixed name.") 
     935        prefix, filename = prefixed_name.split(":", 1) 
    936936        paths = search_paths(prefix) 
    937937        if paths: 
     
    945945    else: 
    946946        return prefixed_name 
    947      
     947 
  • Orange/distance/__init__.py

    r9759 r9805  
    2727 
    2828class PearsonR(DistanceConstructor): 
    29     """Constructs an instance of :obj:`PearsonRDistance`. Not all the data needs to be given.""" 
    3029     
    3130    def __new__(cls, data=None, **argkw): 
     
    4645    `Pearson correlation coefficient 
    4746    <http://en.wikipedia.org/wiki/Pearson_product-moment\ 
    48     _correlation_coefficient>`_ 
     47    _correlation_coefficient>`_. 
    4948    """ 
    5049 
     
    5857         
    5958        Returns Pearson's disimilarity between e1 and e2, 
    60         i.e. (1-r)/2 where r is Sprearman's rank coefficient. 
     59        i.e. (1-r)/2 where r is Pearson's rank coefficient. 
    6160        """ 
    6261        X1 = [] 
     
    7473 
    7574class SpearmanR(DistanceConstructor): 
    76     """Constructs an instance of SpearmanR. Not all the data needs to be given.""" 
    7775     
    7876    def __new__(cls, data=None, **argkw): 
     
    9391    """`Spearman's rank correlation coefficient 
    9492    <http://en.wikipedia.org/wiki/Spearman%27s_rank_\ 
    95     correlation_coefficient>`_""" 
     93    correlation_coefficient>`_.""" 
    9694 
    9795    def __init__(self, **argkw): 
     
    119117 
    120118class Mahalanobis(DistanceConstructor): 
    121     """ Construct instance of Mahalanobis. """ 
    122119     
    123120    def __new__(cls, data=None, **argkw): 
  • Orange/doc/reference/undefineds.tab

    r9671 r9785  
    11a   b   c 
    22d   d   d 
    3 -dc X -dc UNK   -dc UNAVAILABLE  
     3 
    440   0   0 
    551   1   1 
  • Orange/evaluation/reliability.py

    r9725 r9816  
    763763    :obj:`Orange.classification.Classifier.GetBoth` is passed) contain an 
    764764    additional attribute :obj:`reliability_estimate`, which is an instance of 
    765      :class:`~Orange.evaluation.reliability.Estimate`. 
     765    :class:`~Orange.evaluation.reliability.Estimate`. 
    766766 
    767767    """ 
  • Orange/feature/discretization.py

    r9671 r9813  
    1 """ 
    2 ################################### 
    3 Discretization (``discretization``) 
    4 ################################### 
    5  
    6 .. index:: discretization 
    7  
    8 .. index::  
    9    single: feature; discretization 
    10  
    11  
    12 Example-based automatic discretization is in essence similar to learning: 
    13 given a set of examples, discretization method proposes a list of suitable 
    14 intervals to cut the attribute's values into. For this reason, Orange 
    15 structures for discretization resemble its structures for learning. Objects 
    16 derived from ``orange.Discretization`` play a role of "learner" that,  
    17 upon observing the examples, construct an ``orange.Discretizer`` whose role 
    18 is to convert continuous values into discrete according to the rule found by 
    19 ``Discretization``. 
    20  
    21 Orange supports several methods of discretization; here's a 
    22 list of methods with belonging classes. 
    23  
    24 * Equi-distant discretization (:class:`EquiDistDiscretization`,  
    25   :class:`EquiDistDiscretizer`). The range of attribute's values is split 
    26   into prescribed number equal-sized intervals. 
    27 * Quantile-based discretization (:class:`EquiNDiscretization`, 
    28   :class:`IntervalDiscretizer`). The range is split into intervals 
    29   containing equal number of examples. 
    30 * Entropy-based discretization (:class:`EntropyDiscretization`, 
    31   :class:`IntervalDiscretizer`). Developed by Fayyad and Irani, 
    32   this method balances between entropy in intervals and MDL of discretization. 
    33 * Bi-modal discretization (:class:`BiModalDiscretization`, 
    34   :class:`BiModalDiscretizer`/:class:`IntervalDiscretizer`). 
    35   Two cut-off points set to optimize the difference of the distribution in 
    36   the middle interval and the distributions outside it. 
    37 * Fixed discretization (:class:`IntervalDiscretizer`). Discretization with  
    38   user-prescribed cut-off points. 
    39  
    40 Instances of classes derived from :class:`Discretization`. It define a 
    41 single method: the call operator. The object can also be called through 
    42 constructor. 
    43  
    44 .. class:: Discretization 
    45  
    46     .. method:: __call__(attribute, examples[, weightID]) 
    47  
    48         Given a continuous ``attribute`, ``examples`` and, optionally id of 
    49         attribute with example weight, this function returns a discretized 
    50         attribute. Argument ``attribute`` can be a descriptor, index or 
    51         name of the attribute. 
    52  
    53 Here's an example. Part of :download:`discretization.py <code/discretization.py>`: 
    54  
    55 .. literalinclude:: code/discretization.py 
    56     :lines: 7-15 
    57  
    58 The discretized attribute ``sep_w`` is constructed with a call to 
    59 :class:`EntropyDiscretization` (instead of constructing it and calling 
    60 it afterwards, we passed the arguments for calling to the constructor, as 
    61 is often allowed in Orange). We then constructed a new  
    62 :class:`Orange.data.Table` with attributes "sepal width" (the original  
    63 continuous attribute), ``sep_w`` and the class attribute. Script output is:: 
    64  
    65     Entropy discretization, first 10 examples 
    66     [3.5, '>3.30', 'Iris-setosa'] 
    67     [3.0, '(2.90, 3.30]', 'Iris-setosa'] 
    68     [3.2, '(2.90, 3.30]', 'Iris-setosa'] 
    69     [3.1, '(2.90, 3.30]', 'Iris-setosa'] 
    70     [3.6, '>3.30', 'Iris-setosa'] 
    71     [3.9, '>3.30', 'Iris-setosa'] 
    72     [3.4, '>3.30', 'Iris-setosa'] 
    73     [3.4, '>3.30', 'Iris-setosa'] 
    74     [2.9, '<=2.90', 'Iris-setosa'] 
    75     [3.1, '(2.90, 3.30]', 'Iris-setosa'] 
    76  
    77 :class:`EntropyDiscretization` named the new attribute's values by the 
    78 interval range (it also named the attribute as "D_sepal width"). The new 
    79 attribute's values get computed automatically when they are needed. 
    80  
    81 As those that have read about :class:`Orange.data.variable.Variable` know, 
    82 the answer to  
    83 "How this works?" is hidden in the field  
    84 :obj:`~Orange.data.variable.Variable.get_value_from`. 
    85 This little dialog reveals the secret. 
    86  
    87 :: 
    88  
    89     >>> sep_w 
    90     EnumVariable 'D_sepal width' 
    91     >>> sep_w.get_value_from 
    92     <ClassifierFromVar instance at 0x01BA7DC0> 
    93     >>> sep_w.get_value_from.whichVar 
    94     FloatVariable 'sepal width' 
    95     >>> sep_w.get_value_from.transformer 
    96     <IntervalDiscretizer instance at 0x01BA2100> 
    97     >>> sep_w.get_value_from.transformer.points 
    98     <2.90000009537, 3.29999995232> 
    99  
    100 So, the ``select`` statement in the above example converted all examples 
    101 from ``data`` to the new domain. Since the new domain includes the attribute 
    102 ``sep_w`` that is not present in the original, ``sep_w``'s values are 
    103 computed on the fly. For each example in ``data``, ``sep_w.get_value_from``  
    104 is called to compute ``sep_w``'s value (if you ever need to call 
    105 ``get_value_from``, you shouldn't call ``get_value_from`` directly but call 
    106 ``compute_value`` instead). ``sep_w.get_value_from`` looks for value of 
    107 "sepal width" in the original example. The original, continuous sepal width 
    108 is passed to the ``transformer`` that determines the interval by its field 
    109 ``points``. Transformer returns the discrete value which is in turn returned 
    110 by ``get_value_from`` and stored in the new example. 
    111  
    112 You don't need to understand this mechanism exactly. It's important to know 
    113 that there are two classes of objects for discretization. Those derived from 
    114 :obj:`Discretizer` (such as :obj:`IntervalDiscretizer` that we've seen above) 
    115 are used as transformers that translate continuous value into discrete. 
    116 Discretization algorithms are derived from :obj:`Discretization`. Their  
    117 job is to construct a :obj:`Discretizer` and return a new variable 
    118 with the discretizer stored in ``get_value_from.transformer``. 
    119  
    120 Discretizers 
    121 ============ 
    122  
    123 Different discretizers support different methods for conversion of 
    124 continuous values into discrete. The most general is  
    125 :class:`IntervalDiscretizer` that is also used by most discretization 
    126 methods. Two other discretizers, :class:`EquiDistDiscretizer` and  
    127 :class:`ThresholdDiscretizer`> could easily be replaced by  
    128 :class:`IntervalDiscretizer` but are used for speed and simplicity. 
    129 The fourth discretizer, :class:`BiModalDiscretizer` is specialized 
    130 for discretizations induced by :class:`BiModalDiscretization`. 
    131  
    132 .. class:: Discretizer 
    133  
    134     All discretizers support a handy method for construction of a new 
    135     attribute from an existing one. 
    136  
    137     .. method:: construct_variable(attribute) 
    138  
    139         Constructs a new attribute descriptor; the new attribute is discretized 
    140         ``attribute``. The new attribute's name equal ``attribute.name``  
    141         prefixed  by "D\_", and its symbolic values are discretizer specific. 
    142         The above example shows what comes out form :class:`IntervalDiscretizer`.  
    143         Discretization algorithms actually first construct a discretizer and 
    144         then call its :class:`construct_variable` to construct an attribute 
    145         descriptor. 
    146  
    147 .. class:: IntervalDiscretizer 
    148  
    149     The most common discretizer.  
    150  
    151     .. attribute:: points 
    152  
    153         Cut-off points. All values below or equal to the first point belong 
    154         to the first interval, those between the first and the second 
    155         (including those equal to the second) go to the second interval and 
    156         so forth to the last interval which covers all values greater than 
    157         the last element in ``points``. The number of intervals is thus  
    158         ``len(points)+1``. 
    159  
    160 Let us manually construct an interval discretizer with cut-off points at 3.0 
    161 and 5.0. We shall use the discretizer to construct a discretized sepal length  
    162 (part of :download:`discretization.py <code/discretization.py>`): 
    163  
    164 .. literalinclude:: code/discretization.py 
    165     :lines: 22-26 
    166  
    167 That's all. First five examples of ``data2`` are now 
    168  
    169 :: 
    170  
    171     [5.1, '>5.00', 'Iris-setosa'] 
    172     [4.9, '(3.00, 5.00]', 'Iris-setosa'] 
    173     [4.7, '(3.00, 5.00]', 'Iris-setosa'] 
    174     [4.6, '(3.00, 5.00]', 'Iris-setosa'] 
    175     [5.0, '(3.00, 5.00]', 'Iris-setosa'] 
    176  
    177 Can you use the same discretizer for more than one attribute? Yes, as long 
    178 as they have same cut-off points, of course. Simply call construct_var for each 
    179 continuous attribute (part of :download:`discretization.py <code/discretization.py>`): 
    180  
    181 .. literalinclude:: code/discretization.py 
    182     :lines: 30-34 
    183  
    184 Each attribute now has its own (FIXME) ClassifierFromVar in its  
    185 ``get_value_from``, but all use the same :class:`IntervalDiscretizer`,  
    186 ``idisc``. Changing an element of its ``points`` affect all attributes. 
    187  
    188 Do not change the length of :obj:`~IntervalDiscretizer.points` if the 
    189 discretizer is used by any attribute. The length of 
    190 :obj:`~IntervalDiscretizer.points` should always match the number of values 
    191 of the attribute, which is determined by the length of the attribute's field 
    192 ``values``. Therefore, if ``attr`` is a discretized 
    193 attribute, than ``len(attr.values)`` must equal 
    194 ``len(attr.get_value_from.transformer.points)+1``. It always 
    195 does, unless you deliberately change it. If the sizes don't match, 
    196 Orange will probably crash, and it will be entirely your fault. 
    197  
    198  
    199  
    200 .. class:: EquiDistDiscretizer 
    201  
    202     More rigid than :obj:`IntervalDiscretizer`:  
    203     it uses intervals of fixed width. 
    204  
    205     .. attribute:: first_cut 
    206          
    207         The first cut-off point. 
    208      
    209     .. attribute:: step 
    210  
    211         Width of intervals. 
    212  
    213     .. attribute:: number_of_intervals 
    214          
    215         Number of intervals. 
    216  
    217     .. attribute:: points (read-only) 
    218          
    219         The cut-off points; this is not a real attribute although it behaves 
    220         as one. Reading it constructs a list of cut-off points and returns it, 
    221         but changing the list doesn't affect the discretizer - it's a separate 
    222         list. This attribute is here only for to give the  
    223         :obj:`EquiDistDiscretizer` the same interface as that of  
    224         :obj:`IntervalDiscretizer`. 
    225  
    226 All values below :obj:`~EquiDistDiscretizer.first_cut` belong to the first 
    227 intervala (including possible values smaller than ``firstVal``. Otherwise, 
    228 value ``val``'s interval is ``floor((val-firstVal)/step)``. If this is turns 
    229 out to be greater or equal to :obj:`~EquiDistDiscretizer.number_of_intervals`,  
    230 it is decreased to ``number_of_intervals-1``. 
    231  
    232 This discretizer is returned by :class:`EquiDistDiscretization`; you can 
    233 see an example in the corresponding section. You can also construct it  
    234 manually and call its ``construct_variable``, just as shown for the 
    235 :obj:`IntervalDiscretizer`. 
    236  
    237  
    238 .. class:: ThresholdDiscretizer 
    239  
    240     Threshold discretizer converts continuous values into binary by comparing 
    241     them with a threshold. This discretizer is actually not used by any 
    242     discretization method, but you can use it for manual discretization. 
    243     Orange needs this discretizer for binarization of continuous attributes 
    244     in decision trees. 
    245  
    246     .. attribute:: threshold 
    247  
    248         Threshold; values below or equal to the threshold belong to the first 
    249         interval and those that are greater go to the second. 
    250  
    251 .. class:: BiModalDiscretizer 
    252  
    253     This discretizer is the first discretizer that couldn't be replaced by 
    254     :class:`IntervalDiscretizer`. It has two cut off points and values are 
    255     discretized according to whether they belong to the middle region 
    256     (which includes the lower but not the upper boundary) or not. The 
    257     discretizer is returned by :class:`BiModalDiscretization` if its 
    258     field :obj:`~BiModalDiscretization.split_in_two` is true (the default). 
    259  
    260     .. attribute:: low 
    261          
    262         Lower boudary of the interval (included in the interval). 
    263  
    264     .. attribute:: high 
    265  
    266         Upper boundary of the interval (not included in the interval). 
    267  
    268  
    269 Discretization Algorithms 
    270 ========================= 
    271  
    272 .. class:: EquiDistDiscretization  
    273  
    274     Discretizes the attribute by cutting it into the prescribed number 
    275     of intervals of equal width. The examples are needed to determine the  
    276     span of attribute values. The interval between the smallest and the 
    277     largest is then cut into equal parts. 
    278  
    279     .. attribute:: number_of_intervals 
    280  
    281         Number of intervals into which the attribute is to be discretized.  
    282         Default value is 4. 
    283  
    284 For an example, we shall discretize all attributes of Iris dataset into 6 
    285 intervals. We shall construct an :class:`Orange.data.Table` with discretized 
    286 attributes and print description of the attributes (part 
    287 of :download:`discretization.py <code/discretization.py>`): 
    288  
    289 .. literalinclude:: code/discretization.py 
    290     :lines: 38-43 
    291  
    292 Script's answer is 
    293  
    294 :: 
    295  
    296     D_sepal length: <<4.90, [4.90, 5.50), [5.50, 6.10), [6.10, 6.70), [6.70, 7.30), >7.30> 
    297     D_sepal width: <<2.40, [2.40, 2.80), [2.80, 3.20), [3.20, 3.60), [3.60, 4.00), >4.00> 
    298     D_petal length: <<1.98, [1.98, 2.96), [2.96, 3.94), [3.94, 4.92), [4.92, 5.90), >5.90> 
    299     D_petal width: <<0.50, [0.50, 0.90), [0.90, 1.30), [1.30, 1.70), [1.70, 2.10), >2.10> 
    300  
    301 Any more decent ways for a script to find the interval boundaries than  
    302 by parsing the symbolic values? Sure, they are hidden in the discretizer, 
    303 which is, as usual, stored in ``attr.get_value_from.transformer``. 
    304  
    305 Compare the following with the values above. 
    306  
    307 :: 
    308  
    309     >>> for attr in newattrs: 
    310     ...    print "%s: first interval at %5.3f, step %5.3f" % \ 
    311     ...    (attr.name, attr.get_value_from.transformer.first_cut, \ 
    312     ...    attr.get_value_from.transformer.step) 
    313     D_sepal length: first interval at 4.900, step 0.600 
    314     D_sepal width: first interval at 2.400, step 0.400 
    315     D_petal length: first interval at 1.980, step 0.980 
    316     D_petal width: first interval at 0.500, step 0.400 
    317  
    318 As all discretizers, :class:`EquiDistDiscretizer` also has the method  
    319 ``construct_variable`` (part of :download:`discretization.py <code/discretization.py>`): 
    320  
    321 .. literalinclude:: code/discretization.py 
    322     :lines: 69-73 
    323  
    324  
    325 .. class:: EquiNDiscretization 
    326  
    327     Discretization with Intervals Containing (Approximately) Equal Number 
    328     of Examples. 
    329  
    330     Discretizes the attribute by cutting it into the prescribed number of 
    331     intervals so that each of them contains equal number of examples. The 
    332     examples are obviously needed for this discretization, too. 
    333  
    334     .. attribute:: number_of_intervals 
    335  
    336         Number of intervals into which the attribute is to be discretized. 
    337         Default value is 4. 
    338  
    339 The use of this discretization is the same as the use of  
    340 :class:`EquiDistDiscretization`. The resulting discretizer is  
    341 :class:`IntervalDiscretizer`, hence it has ``points`` instead of ``first_cut``/ 
    342 ``step``/``number_of_intervals``. 
    343  
    344 .. class:: EntropyDiscretization 
    345  
    346     Entropy-based Discretization (Fayyad-Irani). 
    347  
    348     Fayyad-Irani's discretization method works without a predefined number of 
    349     intervals. Instead, it recursively splits intervals at the cut-off point 
    350     that minimizes the entropy, until the entropy decrease is smaller than the 
    351     increase of MDL induced by the new point. 
    352  
    353     An interesting thing about this discretization technique is that an 
    354     attribute can be discretized into a single interval, if no suitable 
    355     cut-off points are found. If this is the case, the attribute is rendered 
    356     useless and can be removed. This discretization can therefore also serve 
    357     for feature subset selection. 
    358  
    359     .. attribute:: force_attribute 
    360  
    361         Forces the algorithm to induce at least one cut-off point, even when 
    362         its information gain is lower than MDL (default: false). 
    363  
    364 Part of :download:`discretization.py <code/discretization.py>`: 
    365  
    366 .. literalinclude:: code/discretization.py 
    367     :lines: 77-80 
    368  
    369 The output shows that all attributes are discretized onto three intervals:: 
    370  
    371     sepal length: <5.5, 6.09999990463> 
    372     sepal width: <2.90000009537, 3.29999995232> 
    373     petal length: <1.89999997616, 4.69999980927> 
    374     petal width: <0.600000023842, 1.0000004768> 
    375  
    376 .. class:: BiModalDiscretization 
    377  
    378     Bi-Modal Discretization 
    379  
    380     Sets two cut-off points so that the class distribution of examples in 
    381     between is as different from the overall distribution as possible. The 
    382     difference is measure by chi-square statistics. All possible cut-off 
    383     points are tried, thus the discretization runs in O(n^2). 
    384  
    385     This discretization method is especially suitable for the attributes in 
    386     which the middle region corresponds to normal and the outer regions to 
    387     abnormal values of the attribute. Depending on the nature of the 
    388     attribute, we can treat the lower and higher values separately, thus 
    389     discretizing the attribute into three intervals, or together, in a 
    390     binary attribute whose values correspond to normal and abnormal. 
    391  
    392     .. attribute:: split_in_two 
    393          
    394         Decides whether the resulting attribute should have three or two. 
    395         If true (default), we have three intervals and the discretizer is 
    396         of type :class:`BiModalDiscretizer`. If false the result is the  
    397         ordinary :class:`IntervalDiscretizer`. 
    398  
    399 Iris dataset has three-valued class attribute, classes are setosa, virginica 
    400 and versicolor. As the picture below shows, sepal lenghts of versicolors are 
    401 between lengths of setosas and virginicas (the picture itself is drawn using 
    402 LOESS probability estimation). 
    403  
    404 .. image:: files/bayes-iris.gif 
    405  
    406 If we merge classes setosa and virginica into one, we can observe whether 
    407 the bi-modal discretization would correctly recognize the interval in 
    408 which versicolors dominate. 
    409  
    410 .. literalinclude:: code/discretization.py 
    411     :lines: 84-87 
    412  
    413 In this script, we have constructed a new class attribute which tells whether 
    414 an iris is versicolor or not. We have told how this attribute's value is 
    415 computed from the original class value with a simple lambda function. 
    416 Finally, we have constructed a new domain and converted the examples. 
    417 Now for discretization. 
    418  
    419 .. literalinclude:: code/discretization.py 
    420     :lines: 97-100 
    421  
    422 The script prints out the middle intervals:: 
    423  
    424     sepal length: (5.400, 6.200] 
    425     sepal width: (2.000, 2.900] 
    426     petal length: (1.900, 4.700] 
    427     petal width: (0.600, 1.600] 
    428  
    429 Judging by the graph, the cut-off points for "sepal length" make sense. 
    430  
    431 Additional functions 
    432 ==================== 
    433  
    434 Some functions and classes that can be used for 
    435 categorization of continuous features. Besides several general classes that 
    436 can help in this task, we also provide a function that may help in 
    437 entropy-based discretization (Fayyad & Irani), and a wrapper around classes for 
    438 categorization that can be used for learning. 
    439  
    440 .. automethod:: Orange.feature.discretization.entropyDiscretization_wrapper 
    441  
    442 .. autoclass:: Orange.feature.discretization.EntropyDiscretization_wrapper 
    443  
    444 .. autoclass:: Orange.feature.discretization.DiscretizedLearner_Class 
    445  
    446 .. rubric:: Example 
    447  
    448 FIXME. A chapter on `feature subset selection <../ofb/o_fss.htm>`_ in Orange 
    449 for Beginners tutorial shows the use of DiscretizedLearner. Other 
    450 discretization classes from core Orange are listed in chapter on 
    451 `categorization <../ofb/o_categorization.htm>`_ of the same tutorial. 
    452  
    453 ========== 
    454 References 
    455 ========== 
    456  
    457 * UM Fayyad and KB Irani. Multi-interval discretization of continuous valued 
    458   attributes for classification learning. In Proceedings of the 13th 
    459   International Joint Conference on Artificial Intelligence, pages 
    460   1022--1029, Chambery, France, 1993. 
    461  
    462 """ 
    463  
     1import Orange 
    4642import Orange.core as orange 
    4653 
     
    4675    Discrete2Continuous, \ 
    4686    Discretizer, \ 
    469         BiModalDiscretizer, \ 
    470         EquiDistDiscretizer, \ 
    471         IntervalDiscretizer, \ 
    472         ThresholdDiscretizer, \ 
    473         EntropyDiscretization, \ 
    474         EquiDistDiscretization, \ 
    475         EquiNDiscretization, \ 
    476         BiModalDiscretization, \ 
    477         Discretization 
     7    BiModalDiscretizer, \ 
     8    EquiDistDiscretizer as EqualWidthDiscretizer, \ 
     9    IntervalDiscretizer, \ 
     10    ThresholdDiscretizer,\ 
     11    EntropyDiscretization as Entropy, \ 
     12    EquiDistDiscretization as EqualWidth, \ 
     13    EquiNDiscretization as EqualFreq, \ 
     14    BiModalDiscretization as BiModal, \ 
     15    Discretization, \ 
     16    Preprocessor_discretize 
    47817 
    479 ###### 
    480 # from orngDics.py 
    481 def entropyDiscretization_wrapper(table): 
    482     """Take the classified table set (table) and categorize all continuous 
    483     features using the entropy based discretization 
    484     :obj:`EntropyDiscretization`. 
     18 
     19 
     20def entropyDiscretization_wrapper(data): 
     21    """Discretize all continuous features in class-labeled data set with the entropy-based discretization 
     22    :obj:`Entropy`. 
    48523     
    486     :param table: data to discretize. 
    487     :type table: Orange.data.Table 
     24    :param data: data to discretize. 
     25    :type data: Orange.data.Table 
    48826    :rtype: :obj:`Orange.data.Table` includes all categorical and discretized\ 
    48927    continuous features from the original data table. 
     
    49533    """ 
    49634    orange.setrandseed(0) 
    497     tablen=orange.Preprocessor_discretize(table, method=EntropyDiscretization()) 
     35    data_new = orange.Preprocessor_discretize(data, method=Entropy()) 
    49836     
    499     attrlist=[] 
    500     nrem=0 
    501     for i in tablen.domain.attributes: 
     37    attrlist = [] 
     38    nrem = 0 
     39    for i in data_new.domain.attributes: 
    50240        if (len(i.values)>1): 
    50341            attrlist.append(i) 
     
    50543            nrem=nrem+1 
    50644    attrlist.append(tablen.domain.classVar) 
    507     return tablen.select(attrlist) 
     45    return data_new.select(attrlist) 
    50846 
    50947 
     
    565103 
    566104    """ 
    567     def __init__(self, baseLearner, discretizer=EntropyDiscretization(), **kwds): 
     105    def __init__(self, baseLearner, discretizer=Entropy(), **kwds): 
    568106        self.baseLearner = baseLearner 
    569107        if hasattr(baseLearner, "name"): 
     
    591129  def __call__(self, example, resultType = orange.GetValue): 
    592130    return self.classifier(example, resultType) 
     131 
     132class DiscretizeTable(object): 
     133    """Discretizes all continuous features of the data table. 
     134 
     135    :param data: data to discretize. 
     136    :type data: :class:`Orange.data.Table` 
     137 
     138    :param features: data features to discretize. None (default) to discretize all features. 
     139    :type features: list of :class:`Orange.data.variable.Variable` 
     140 
     141    :param method: feature discretization method. 
     142    :type method: :class:`Discretization` 
     143    """ 
     144    def __new__(cls, data=None, features=None, discretize_class=False, method=EqualFreq(n_intervals=3)): 
     145        if data is None: 
     146            self = object.__new__(cls, features=features, discretize_class=discretize_class, method=method) 
     147            return self 
     148        else: 
     149            self = cls(features=features, discretize_class=discretize_class, method=method) 
     150            return self(data) 
     151 
     152    def __init__(self, features=None, discretize_class=False, method=EqualFreq(n_intervals=3)): 
     153        self.features = features 
     154        self.discretize_class = discretize_class 
     155        self.method = method 
     156 
     157    def __call__(self, data): 
     158        pp = Preprocessor_discretize(attributes=self.features, discretizeClass=self.discretize_class) 
     159        pp.method = self.method 
     160        return pp(data) 
     161 
  • Orange/feature/imputation.py

    r9671 r9806  
    1 """ 
    2 ########################### 
    3 Imputation (``imputation``) 
    4 ########################### 
    5  
    6 .. index:: imputation 
    7  
    8 .. index::  
    9    single: feature; value imputation 
    10  
    11  
    12 Imputation is a procedure of replacing the missing feature values with some  
    13 appropriate values. Imputation is needed because of the methods (learning  
    14 algorithms and others) that are not capable of handling unknown values, for  
    15 instance logistic regression. 
    16  
    17 Missing values sometimes have a special meaning, so they need to be replaced 
    18 by a designated value. Sometimes we know what to replace the missing value 
    19 with; for instance, in a medical problem, some laboratory tests might not be 
    20 done when it is known what their results would be. In that case, we impute  
    21 certain fixed value instead of the missing. In the most complex case, we assign 
    22 values that are computed based on some model; we can, for instance, impute the 
    23 average or majority value or even a value which is computed from values of 
    24 other, known feature, using a classifier. 
    25  
    26 In a learning/classification process, imputation is needed on two occasions. 
    27 Before learning, the imputer needs to process the training examples. 
    28 Afterwards, the imputer is called for each example to be classified. 
    29  
    30 In general, imputer itself needs to be trained. This is, of course, not needed 
    31 when the imputer imputes certain fixed value. However, when it imputes the 
    32 average or majority value, it needs to compute the statistics on the training 
    33 examples, and use it afterwards for imputation of training and testing 
    34 examples. 
    35  
    36 While reading this document, bear in mind that imputation is a part of the 
    37 learning process. If we fit the imputation model, for instance, by learning 
    38 how to predict the feature's value from other features, or even if we  
    39 simply compute the average or the minimal value for the feature and use it 
    40 in imputation, this should only be done on learning data. If cross validation 
    41 is used for sampling, imputation should be done on training folds only. Orange 
    42 provides simple means for doing that. 
    43  
    44 This page will first explain how to construct various imputers. Then follow 
    45 the examples for `proper use of imputers <#using-imputers>`_. Finally, quite 
    46 often you will want to use imputation with special requests, such as certain 
    47 features' missing values getting replaced by constants and other by values 
    48 computed using models induced from specified other features. For instance, 
    49 in one of the studies we worked on, the patient's pulse rate needed to be 
    50 estimated using regression trees that included the scope of the patient's 
    51 injuries, sex and age, some attributes' values were replaced by the most 
    52 pessimistic ones and others were computed with regression trees based on 
    53 values of all features. If you are using learners that need the imputer as a 
    54 component, you will need to `write your own imputer constructor  
    55 <#write-your-own-imputer-constructor>`_. This is trivial and is explained at 
    56 the end of this page. 
    57  
    58 Wrapper for learning algorithms 
    59 =============================== 
    60  
    61 This wrapper can be used with learning algorithms that cannot handle missing 
    62 values: it will impute the missing examples using the imputer, call the  
    63 earning and, if the imputation is also needed by the classifier, wrap the 
    64 resulting classifier into another wrapper that will impute the missing values 
    65 in examples to be classified. 
    66  
    67 Even so, the module is somewhat redundant, as all learners that cannot handle  
    68 missing values should, in principle, provide the slots for imputer constructor. 
    69 For instance, :obj:`Orange.classification.logreg.LogRegLearner` has an attribute  
    70 :obj:`Orange.classification.logreg.LogRegLearner.imputerConstructor`, and even 
    71 if you don't set it, it will do some imputation by default. 
    72  
    73 .. class:: ImputeLearner 
    74  
    75     Wraps a learner and performs data discretization before learning. 
    76  
    77     Most of Orange's learning algorithms do not use imputers because they can 
    78     appropriately handle the missing values. Bayesian classifier, for instance, 
    79     simply skips the corresponding attributes in the formula, while 
    80     classification/regression trees have components for handling the missing 
    81     values in various ways. 
    82  
    83     If for any reason you want to use these algorithms to run on imputed data, 
    84     you can use this wrapper. The class description is a matter of a separate 
    85     page, but we shall show its code here as another demonstration of how to 
    86     use the imputers - logistic regression is implemented essentially the same 
    87     as the below classes. 
    88  
    89     This is basically a learner, so the constructor will return either an 
    90     instance of :obj:`ImputerLearner` or, if called with examples, an instance 
    91     of some classifier. There are a few attributes that need to be set, though. 
    92  
    93     .. attribute:: base_learner  
    94      
    95     A wrapped learner. 
    96  
    97     .. attribute:: imputer_constructor 
    98      
    99     An instance of a class derived from :obj:`ImputerConstructor` (or a class 
    100     with the same call operator). 
    101  
    102     .. attribute:: dont_impute_classifier 
    103  
    104     If given and set (this attribute is optional), the classifier will not be 
    105     wrapped into an imputer. Do this if the classifier doesn't mind if the 
    106     examples it is given have missing values. 
    107  
    108     The learner is best illustrated by its code - here's its complete 
    109     :obj:`__call__` method:: 
    110  
    111         def __call__(self, data, weight=0): 
    112             trained_imputer = self.imputer_constructor(data, weight) 
    113             imputed_data = trained_imputer(data, weight) 
    114             base_classifier = self.base_learner(imputed_data, weight) 
    115             if self.dont_impute_classifier: 
    116                 return base_classifier 
    117             else: 
    118                 return ImputeClassifier(base_classifier, trained_imputer) 
    119  
    120     So "learning" goes like this. :obj:`ImputeLearner` will first construct 
    121     the imputer (that is, call :obj:`self.imputer_constructor` to get a (trained) 
    122     imputer. Than it will use the imputer to impute the data, and call the 
    123     given :obj:`baseLearner` to construct a classifier. For instance, 
    124     :obj:`baseLearner` could be a learner for logistic regression and the 
    125     result would be a logistic regression model. If the classifier can handle 
    126     unknown values (that is, if :obj:`dont_impute_classifier`, we return it as  
    127     it is, otherwise we wrap it into :obj:`ImputeClassifier`, which is given 
    128     the base classifier and the imputer which it can use to impute the missing 
    129     values in (testing) examples. 
    130  
    131 .. class:: ImputeClassifier 
    132  
    133     Objects of this class are returned by :obj:`ImputeLearner` when given data. 
    134  
    135     .. attribute:: baseClassifier 
    136      
    137     A wrapped classifier. 
    138  
    139     .. attribute:: imputer 
    140      
    141     An imputer for imputation of unknown values. 
    142  
    143     .. method:: __call__  
    144      
    145     This class is even more trivial than the learner. Its constructor accepts  
    146     two arguments, the classifier and the imputer, which are stored into the 
    147     corresponding attributes. The call operator which does the classification 
    148     then looks like this:: 
    149  
    150         def __call__(self, ex, what=orange.GetValue): 
    151             return self.base_classifier(self.imputer(ex), what) 
    152  
    153     It imputes the missing values by calling the :obj:`imputer` and passes the 
    154     class to the base classifier. 
    155  
    156 .. note::  
    157    In this setup the imputer is trained on the training data - even if you do 
    158    cross validation, the imputer will be trained on the right data. In the 
    159    classification phase we again use the imputer which was classified on the 
    160    training data only. 
    161  
    162 .. rubric:: Code of ImputeLearner and ImputeClassifier  
    163  
    164 :obj:`Orange.feature.imputation.ImputeLearner` puts the keyword arguments into 
    165 the instance's  dictionary. You are expected to call it like 
    166 :obj:`ImputeLearner(base_learner=<someLearner>, 
    167 imputer=<someImputerConstructor>)`. When the learner is called with examples, it 
    168 trains the imputer, imputes the data, induces a :obj:`base_classifier` by the 
    169 :obj:`base_cearner` and constructs :obj:`ImputeClassifier` that stores the 
    170 :obj:`base_classifier` and the :obj:`imputer`. For classification, the missing 
    171 values are imputed and the classifier's prediction is returned. 
    172  
    173 Note that this code is slightly simplified, although the omitted details handle 
    174 non-essential technical issues that are unrelated to imputation:: 
    175  
    176     class ImputeLearner(orange.Learner): 
    177         def __new__(cls, examples = None, weightID = 0, **keyw): 
    178             self = orange.Learner.__new__(cls, **keyw) 
    179             self.__dict__.update(keyw) 
    180             if examples: 
    181                 return self.__call__(examples, weightID) 
    182             else: 
    183                 return self 
    184      
    185         def __call__(self, data, weight=0): 
    186             trained_imputer = self.imputer_constructor(data, weight) 
    187             imputed_data = trained_imputer(data, weight) 
    188             base_classifier = self.base_learner(imputed_data, weight) 
    189             return ImputeClassifier(base_classifier, trained_imputer) 
    190      
    191     class ImputeClassifier(orange.Classifier): 
    192         def __init__(self, base_classifier, imputer): 
    193             self.base_classifier = base_classifier 
    194             self.imputer = imputer 
    195      
    196         def __call__(self, ex, what=orange.GetValue): 
    197             return self.base_classifier(self.imputer(ex), what) 
    198  
    199 .. rubric:: Example 
    200  
    201 Although most Orange's learning algorithms will take care of imputation 
    202 internally, if needed, it can sometime happen that an expert will be able to 
    203 tell you exactly what to put in the data instead of the missing values. In this 
    204 example we shall suppose that we want to impute the minimal value of each 
    205 feature. We will try to determine whether the naive Bayesian classifier with 
    206 its  implicit internal imputation works better than one that uses imputation by  
    207 minimal values. 
    208  
    209 :download:`imputation-minimal-imputer.py <code/imputation-minimal-imputer.py>` (uses :download:`voting.tab <code/voting.tab>`): 
    210  
    211 .. literalinclude:: code/imputation-minimal-imputer.py 
    212     :lines: 7- 
    213      
    214 Should ouput this:: 
    215  
    216     Without imputation: 0.903 
    217     With imputation: 0.899 
    218  
    219 .. note::  
    220    Note that we constructed just one instance of \ 
    221    :obj:`Orange.classification.bayes.NaiveLearner`, but this same instance is 
    222    used twice in each fold, once it is given the examples as they are (and  
    223    returns an instance of :obj:`Orange.classification.bayes.NaiveClassifier`. 
    224    The second time it is called by :obj:`imba` and the \ 
    225    :obj:`Orange.classification.bayes.NaiveClassifier` it returns is wrapped 
    226    into :obj:`Orange.feature.imputation.Classifier`. We thus have only one 
    227    learner, but which produces two different classifiers in each round of 
    228    testing. 
    229  
    230 Abstract imputers 
    231 ================= 
    232  
    233 As common in Orange, imputation is done by pairs of two classes: one that does 
    234 the work and another that constructs it. :obj:`ImputerConstructor` is an 
    235 abstract root of the hierarchy of classes that get the training data (with an  
    236 optional id for weight) and constructs an instance of a class, derived from 
    237 :obj:`Imputer`. An :obj:`Imputer` can be called with an 
    238 :obj:`Orange.data.Instance` and it will return a new example with the missing 
    239 values imputed (it will leave the original example intact!). If imputer is 
    240 called with an :obj:`Orange.data.Table`, it will return a new example table 
    241 with imputed examples. 
    242  
    243 .. class:: ImputerConstructor 
    244  
    245     .. attribute:: imputeClass 
    246      
    247     Tell whether to impute the class value (default) or not. 
    248  
    249 Simple imputation 
    250 ================= 
    251  
    252 The simplest imputers always impute the same value for a particular attribute, 
    253 disregarding the values of other attributes. They all use the same imputer 
    254 class, :obj:`Imputer_defaults`. 
    255      
    256 .. class:: Imputer_defaults 
    257  
    258     .. attribute::  defaults 
    259      
    260     An example with the default values to be imputed instead of the missing.  
    261     Examples to be imputed must be from the same domain as :obj:`defaults`. 
    262  
    263     Instances of this class can be constructed by  
    264     :obj:`Orange.feature.imputation.ImputerConstructor_minimal`,  
    265     :obj:`Orange.feature.imputation.ImputerConstructor_maximal`, 
    266     :obj:`Orange.feature.imputation.ImputerConstructor_average`.  
    267  
    268     For continuous features, they will impute the smallest, largest or the 
    269     average  values encountered in the training examples. For discrete, they 
    270     will impute the lowest (the one with index 0, e. g. attr.values[0]), the  
    271     highest (attr.values[-1]), and the most common value encountered in the 
    272     data. The first two imputers will mostly be used when the discrete values 
    273     are ordered according to their impact on the class (for instance, possible 
    274     values for symptoms of some disease can be ordered according to their 
    275     seriousness). The minimal and maximal imputers will then represent 
    276     optimistic and pessimistic imputations. 
    277  
    278     The following code will load the bridges data, and first impute the values 
    279     in a single examples and then in the whole table. 
    280  
    281 :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`): 
    282  
    283 .. literalinclude:: code/imputation-complex.py 
    284     :lines: 9-23 
    285  
    286 This is example shows what the imputer does, not how it is to be used. Don't 
    287 impute all the data and then use it for cross-validation. As warned at the top 
    288 of this page, see the instructions for actual `use of 
    289 imputers <#using-imputers>`_. 
    290  
    291 .. note:: The :obj:`ImputerConstructor` are another class with schizophrenic 
    292   constructor: if you give the constructor the data, it will return an \ 
    293   :obj:`Imputer` - the above call is equivalent to calling \ 
    294   :obj:`Orange.feature.imputation.ImputerConstructor_minimal()(data)`. 
    295  
    296 You can also construct the :obj:`Orange.feature.imputation.Imputer_defaults` 
    297 yourself and specify your own defaults. Or leave some values unspecified, in 
    298 which case the imputer won't impute them, as in the following example. Here, 
    299 the only attribute whose values will get imputed is "LENGTH"; the imputed value 
    300 will be 1234. 
    301  
    302 .. literalinclude:: code/imputation-complex.py 
    303     :lines: 56-69 
    304  
    305 :obj:`Orange.feature.imputation.Imputer_defaults`'s constructor will accept an 
    306 argument of type :obj:`Orange.data.Domain` (in which case it will construct an 
    307 empty instance for :obj:`defaults`) or an example. (Be careful with this: 
    308 :obj:`Orange.feature.imputation.Imputer_defaults` will have a reference to the 
    309 instance and not a copy. But you can make a copy yourself to avoid problems: 
    310 instead of `Imputer_defaults(data[0])` you may want to write 
    311 `Imputer_defaults(Orange.data.Instance(data[0]))`. 
    312  
    313 Random imputation 
    314 ================= 
    315  
    316 .. class:: Imputer_Random 
    317  
    318     Imputes random values. The corresponding constructor is 
    319     :obj:`ImputerConstructor_Random`. 
    320  
    321     .. attribute:: impute_class 
    322      
    323     Tells whether to impute the class values or not. Defaults to True. 
    324  
    325     .. attribute:: deterministic 
    326  
    327     If true (default is False), random generator is initialized for each 
    328     example using the example's hash value as a seed. This results in same 
    329     examples being always imputed the same values. 
    330      
    331 Model-based imputation 
    332 ====================== 
    333  
    334 .. class:: ImputerConstructor_model 
    335  
    336     Model-based imputers learn to predict the attribute's value from values of 
    337     other attributes. :obj:`ImputerConstructor_model` are given a learning 
    338     algorithm (two, actually - one for discrete and one for continuous 
    339     attributes) and they construct a classifier for each attribute. The 
    340     constructed imputer :obj:`Imputer_model` stores a list of classifiers which 
    341     are used when needed. 
    342  
    343     .. attribute:: learner_discrete, learner_continuous 
    344      
    345     Learner for discrete and for continuous attributes. If any of them is 
    346     missing, the attributes of the corresponding type won't get imputed. 
    347  
    348     .. attribute:: use_class 
    349      
    350     Tells whether the imputer is allowed to use the class value. As this is 
    351     most often undesired, this option is by default set to False. It can 
    352     however be useful for a more complex design in which we would use one 
    353     imputer for learning examples (this one would use the class value) and 
    354     another for testing examples (which would not use the class value as this 
    355     is unavailable at that moment). 
    356  
    357 .. class:: Imputer_model 
    358  
    359     .. attribute: models 
    360  
    361     A list of classifiers, each corresponding to one attribute of the examples 
    362     whose values are to be imputed. The :obj:`classVar`'s of the models should 
    363     equal the examples' attributes. If any of classifier is missing (that is, 
    364     the corresponding element of the table is :obj:`None`, the corresponding 
    365     attribute's values will not be imputed. 
    366  
    367 .. rubric:: Examples 
    368  
    369 The following imputer predicts the missing attribute values using 
    370 classification and regression trees with the minimum of 20 examples in a leaf.  
    371 Part of :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`): 
    372  
    373 .. literalinclude:: code/imputation-complex.py 
    374     :lines: 74-76 
    375  
    376 We could even use the same learner for discrete and continuous attributes, 
    377 as :class:`Orange.classification.tree.TreeLearner` checks the class type 
    378 and constructs regression or classification trees accordingly. The  
    379 common parameters, such as the minimal number of 
    380 examples in leaves, are used in both cases. 
    381  
    382 You can also use different learning algorithms for discrete and 
    383 continuous attributes. Probably a common setup will be to use 
    384 :class:`Orange.classification.bayes.BayesLearner` for discrete and  
    385 :class:`Orange.regression.mean.MeanLearner` (which 
    386 just remembers the average) for continuous attributes. Part of  
    387 :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`): 
    388  
    389 .. literalinclude:: code/imputation-complex.py 
    390     :lines: 91-94 
    391  
    392 You can also construct an :class:`Imputer_model` yourself. You will do  
    393 this if different attributes need different treatment. Brace for an  
    394 example that will be a bit more complex. First we shall construct an  
    395 :class:`Imputer_model` and initialize an empty list of models.  
    396 The following code snippets are from 
    397 :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`): 
    398  
    399 .. literalinclude:: code/imputation-complex.py 
    400     :lines: 108-109 
    401  
    402 Attributes "LANES" and "T-OR-D" will always be imputed values 2 and 
    403 "THROUGH". Since "LANES" is continuous, it suffices to construct a 
    404 :obj:`DefaultClassifier` with the default value 2.0 (don't forget the 
    405 decimal part, or else Orange will think you talk about an index of a discrete 
    406 value - how could it tell?). For the discrete attribute "T-OR-D", we could 
    407 construct a :class:`Orange.classification.ConstantClassifier` and give the index of value 
    408 "THROUGH" as an argument. But we shall do it nicer, by constructing a 
    409 :class:`Orange.data.Value`. Both classifiers will be stored at the appropriate places 
    410 in :obj:`imputer.models`. 
    411  
    412 .. literalinclude:: code/imputation-complex.py 
    413     :lines: 110-112 
    414  
    415  
    416 "LENGTH" will be computed with a regression tree induced from "MATERIAL",  
    417 "SPAN" and "ERECTED" (together with "LENGTH" as the class attribute, of 
    418 course). Note that we initialized the domain by simply giving a list with 
    419 the names of the attributes, with the domain as an additional argument 
    420 in which Orange will look for the named attributes. 
    421  
    422 .. literalinclude:: code/imputation-complex.py 
    423     :lines: 114-119 
    424  
    425 We printed the tree just to see what it looks like. 
    426  
    427 :: 
    428  
    429     <XMP class=code>SPAN=SHORT: 1158 
    430     SPAN=LONG: 1907 
    431     SPAN=MEDIUM 
    432     |    ERECTED<1908.500: 1325 
    433     |    ERECTED>=1908.500: 1528 
    434     </XMP> 
    435  
    436 Small and nice. Now for the "SPAN". Wooden bridges and walkways are short, 
    437 while the others are mostly medium. This could be done by 
    438 :class:`Orange.classifier.ClassifierByLookupTable` - this would be faster 
    439 than what we plan here. See the corresponding documentation on lookup 
    440 classifier. Here we are going to do it with a Python function. 
    441  
    442 .. literalinclude:: code/imputation-complex.py 
    443     :lines: 121-128 
    444  
    445 :obj:`compute_span` could also be written as a class, if you'd prefer 
    446 it. It's important that it behaves like a classifier, that is, gets an example 
    447 and returns a value. The second element tells, as usual, what the caller expect 
    448 the classifier to return - a value, a distribution or both. Since the caller, 
    449 :obj:`Imputer_model`, always wants values, we shall ignore the argument 
    450 (at risk of having problems in the future when imputers might handle 
    451 distribution as well). 
    452  
    453 Missing values as special values 
    454 ================================ 
    455  
    456 Missing values sometimes have a special meaning. The fact that something was 
    457 not measured can sometimes tell a lot. Be, however, cautious when using such 
    458 values in decision models; it the decision not to measure something (for 
    459 instance performing a laboratory test on a patient) is based on the expert's 
    460 knowledge of the class value, such unknown values clearly should not be used  
    461 in models. 
    462  
    463 .. class:: ImputerConstructor_asValue 
    464  
    465     Constructs a new domain in which each 
    466     discrete attribute is replaced with a new attribute that has one value more: 
    467     "NA". The new attribute will compute its values on the fly from the old one, 
    468     copying the normal values and replacing the unknowns with "NA". 
    469  
    470     For continuous attributes, it will 
    471     construct a two-valued discrete attribute with values "def" and "undef", 
    472     telling whether the continuous attribute was defined or not. The attribute's 
    473     name will equal the original's with "_def" appended. The original continuous 
    474     attribute will remain in the domain and its unknowns will be replaced by 
    475     averages. 
    476  
    477     :class:`ImputerConstructor_asValue` has no specific attributes. 
    478  
    479     It constructs :class:`Imputer_asValue` (I bet you 
    480     wouldn't guess). It converts the example into the new domain, which imputes  
    481     the values for discrete attributes. If continuous attributes are present, it  
    482     will also replace their values by the averages. 
    483  
    484 .. class:: Imputer_asValue 
    485  
    486     .. attribute:: domain 
    487  
    488         The domain with the new attributes constructed by  
    489         :class:`ImputerConstructor_asValue`. 
    490  
    491     .. attribute:: defaults 
    492  
    493         Default values for continuous attributes. Present only if there are any. 
    494  
    495 The following code shows what this imputer actually does to the domain. 
    496 Part of :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`): 
    497  
    498 .. literalinclude:: code/imputation-complex.py 
    499     :lines: 137-151 
    500  
    501 The script's output looks like this:: 
    502  
    503     [RIVER, ERECTED, PURPOSE, LENGTH, LANES, CLEAR-G, T-OR-D, MATERIAL, SPAN, REL-L, TYPE] 
    504  
    505     [RIVER, ERECTED_def, ERECTED, PURPOSE, LENGTH_def, LENGTH, LANES_def, LANES, CLEAR-G, T-OR-D, MATERIAL, SPAN, REL-L, TYPE] 
    506  
    507     RIVER: M -> M 
    508     ERECTED: 1874 -> 1874 (def) 
    509     PURPOSE: RR -> RR 
    510     LENGTH: ? -> 1567 (undef) 
    511     LANES: 2 -> 2 (def) 
    512     CLEAR-G: ? -> NA 
    513     T-OR-D: THROUGH -> THROUGH 
    514     MATERIAL: IRON -> IRON 
    515     SPAN: ? -> NA 
    516     REL-L: ? -> NA 
    517     TYPE: SIMPLE-T -> SIMPLE-T 
    518  
    519 Seemingly, the two examples have the same attributes (with 
    520 :samp:`imputed` having a few additional ones). If you check this by 
    521 :samp:`original.domain[0] == imputed.domain[0]`, you shall see that this 
    522 first glance is False. The attributes only have the same names, 
    523 but they are different attributes. If you read this page (which is already a 
    524 bit advanced), you know that Orange does not really care about the attribute 
    525 names). 
    526  
    527 Therefore, if we wrote :samp:`imputed[i]` the program would fail 
    528 since :samp:`imputed` has no attribute :samp:`i`. But it has an 
    529 attribute with the same name (which even usually has the same value). We 
    530 therefore use :samp:`i.name` to index the attributes of 
    531 :samp:`imputed`. (Using names for indexing is not fast, though; if you do 
    532 it a lot, compute the integer index with 
    533 :samp:`imputed.domain.index(i.name)`.)</P> 
    534  
    535 For continuous attributes, there is an additional attribute with "_def" 
    536 appended; we get it by :samp:`i.name+"_def"`. 
    537  
    538 The first continuous attribute, "ERECTED" is defined. Its value remains 1874 
    539 and the additional attribute "ERECTED_def" has value "def". Not so for 
    540 "LENGTH". Its undefined value is replaced by the average (1567) and the new 
    541 attribute has value "undef". The undefined discrete attribute "CLEAR-G" (and 
    542 all other undefined discrete attributes) is assigned the value "NA". 
    543  
    544 Using imputers 
    545 ============== 
    546  
    547 To properly use the imputation classes in learning process, they must be 
    548 trained on training examples only. Imputing the missing values and subsequently 
    549 using the data set in cross-validation will give overly optimistic results. 
    550  
    551 Learners with imputer as a component 
    552 ------------------------------------ 
    553  
    554 Orange learners that cannot handle missing values will generally provide a slot 
    555 for the imputer component. An example of such a class is 
    556 :obj:`Orange.classification.logreg.LogRegLearner` with an attribute called 
    557 :obj:`Orange.classification.logreg.LogRegLearner.imputerConstructor`. To it you 
    558 can assign an imputer constructor - one of the above constructors or a specific 
    559 constructor you wrote yourself. When given learning examples, 
    560 :obj:`Orange.classification.logreg.LogRegLearner` will pass them to 
    561 :obj:`Orange.classification.logreg.LogRegLearner.imputerConstructor` to get an 
    562 imputer (again some of the above or a specific imputer you programmed). It will 
    563 immediately use the imputer to impute the missing values in the learning data 
    564 set, so it can be used by the actual learning algorithm. Besides, when the 
    565 classifier :obj:`Orange.classification.logreg.LogRegClassifier` is constructed, 
    566 the imputer will be stored in its attribute 
    567 :obj:`Orange.classification.logreg.LogRegClassifier.imputer`. At 
    568 classification, the imputer will be used for imputation of missing values in 
    569 (testing) examples. 
    570  
    571 Although details may vary from algorithm to algorithm, this is how the 
    572 imputation is generally used in Orange's learners. Also, if you write your own 
    573 learners, it is recommended that you use imputation according to the described 
    574 procedure. 
    575  
    576 Write your own imputer 
    577 ====================== 
    578  
    579 Imputation classes provide the Python-callback functionality (not all Orange 
    580 classes do so, refer to the documentation on `subtyping the Orange classes  
    581 in Python <callbacks.htm>`_ for a list). If you want to write your own 
    582 imputation constructor or an imputer, you need to simply program a Python 
    583 function that will behave like the built-in Orange classes (and even less, 
    584 for imputer, you only need to write a function that gets an example as 
    585 argument, imputation for example tables will then use that function). 
    586  
    587 You will most often write the imputation constructor when you have a special 
    588 imputation procedure or separate procedures for various attributes, as we've  
    589 demonstrated in the description of 
    590 :obj:`Orange.feature.imputation.ImputerConstructor_model`. You basically only  
    591 need to pack everything we've written there to an imputer constructor that 
    592 will accept a data set and the id of the weight meta-attribute (ignore it if 
    593 you will, but you must accept two arguments), and return the imputer (probably 
    594 :obj:`Orange.feature.imputation.Imputer_model`. The benefit of implementing an 
    595 imputer constructor as opposed to what we did above is that you can use such a 
    596 constructor as a component for Orange learners (like logistic regression) or 
    597 for wrappers from module orngImpute, and that way properly use the in 
    598 classifier testing procedures. 
    599  
    600 """ 
    601  
    6021import Orange.core as orange 
    6032from orange import ImputerConstructor_minimal  
  • Orange/feature/scoring.py

    r9671 r9813  
    206206    Assesses features' ability to distinguish between very similar 
    207207    instances from different classes. This scoring method was first 
    208     developed by Kira and Rendell and then improved by Kononenko. The 
     208    developed by Kira and Rendell and then improved by  Kononenko. The 
    209209    class :obj:`Relief` works on discrete and continuous classes and 
    210210    thus implements ReliefF and RReliefF. 
  • Orange/fixes/fix_orange_imports.py

    r9671 r9818  
    4242           "orngSOM": "Orange.projection.som", 
    4343           "orngBayes":"Orange.classification.bayes", 
     44           "orngLR":"Orange.classification.logreg", 
    4445           "orngNetwork":"Orange.network", 
    4546           "orngMisc":"Orange.misc", 
  • Orange/orng/orngCA.py

    r9671 r9817  
     1# This has to be seriously outdated, as it uses matrixmultiply, which is not 
     2# present in numpy since, like, 2006.     Matija Polajnar, 2012 a.d. 
     3 
    14""" 
    25Correspondence analysis is a descriptive/exploratory technique designed to analyze simple two-way and  
  • Orange/testing/regression/results_modules/outlier1.py.txt

    r9689 r9778  
    1 [1.9537338562966178, 1.4916207549367024, 2.3567027252768531, 1.3547757144824362, 1.3165608011293919, 0.77090435634545718, 2.5037293510604748, 1.5073157621637685, 3.1360623720829142, 0.69078079138014914, 2.5285764279332521, 0.7433781485122628, 1.0879648893391614, 1.5809701088397101, 0.95551967016414863, 0.64109928938427407, 1.3610443564385841, 0.59842805343843031, 0.5764828338874588, 0.65896660247970529, -0.11532119366264716, 0.5946079506443136, 0.2164048799939679, -0.13532808531244495, -0.77356783494291881, -0.75705635746691069, -0.86104391212529852, -1.2168053958148779, -0.8496508795832981, -0.50123387464222346, -0.77341815879931297, 0.16767396953609892, -0.35117712691223024, 0.48721479093189457, -1.7288308112675763, -0.41177669611696449, -0.78877602511487799, -1.3872717005506068, 2.106596327546038, -1.2871466419288893, -0.41878699142408998, -1.2405469480761979, -0.24337773166773838, -0.95375835103295281, 0.48201088266530023, -0.56333257217473698, -0.7443069891450993, -1.764166660649225, -1.7065434213509496, -0.50504384697095039, -1.3054185775296043, -0.80681324891277462, -0.63828120116943221, -1.1627861693976371, -0.45401443972098127, -0.61371961477796033, -0.4977304132654839, -0.028586481599992053, -0.30729060363501759, -0.88295861769597794, -1.4449272394895061, -0.096688263582083503, -0.59221286077400181, 0.60840114873648288, -0.65173597952659179, -0.29992196992479447, -1.2488097930212723, 1.199403408838386, -1.1695890709679937, 0.43695418246540901, -1.1042173101212782, -0.66509777841096018, -0.51396431962727951, -0.51984330801587042, 0.0043002748331546275, -0.55181190984179918, 1.3441195820509282, -0.71294892448637903, -0.018440170740114344, 0.36568005901789147, -0.5072588384978175, -1.1650625423904195, -0.38855801862147704, 0.30554976386664046, -0.69074060168648355, 1.5280992689860888, -0.10141847608741561, 0.47051948353017276, -0.53870459912488544, 0.97851359900596802, -0.40786746694014064, 0.48008263656307049, 0.2240552203634309, -0.95959950865225618, -0.83515808262514213, 0.47533200198642123, -0.57881353707992, 1.0640441766510356, -0.58475334818831626, -0.36677168526134823, -0.11674188951656596, 0.58513238481347862, 0.50799047774082506, -0.41123866164618883, 0.78235502874783658, 0.036505126098621624, 0.57776931524450592, 0.17483092285652521] 
     1[1.9537338562966178, 1.4916207549367024, 2.356702725276853, 1.3547757144824362, 1.3165608011293919, 0.7709043563454572, 2.503729351060475, 1.5073157621637685, 3.136062372082914, 0.6907807913801491, 2.528576427933252, 0.7433781485122628, 1.0879648893391614, 1.5809701088397101, 0.9555196701641486, 0.6410992893842741, 1.3610443564385841, 0.5984280534384303, 0.5764828338874588, 0.6589666024797053, -0.11532119366264716, 0.5946079506443136, 0.2164048799939679, -0.13532808531244495, -0.7735678349429188, -0.7570563574669107, -0.8610439121252985, -1.216805395814878, -0.8496508795832981, -0.5012338746422235, -0.773418158799313, 0.16767396953609892, -0.35117712691223024, 0.48721479093189457, -1.7288308112675763, -0.4117766961169645, -0.788776025114878, -1.3872717005506068, 2.106596327546038, -1.2871466419288893, -0.41878699142409, -1.240546948076198, -0.24337773166773838, -0.9537583510329528, 0.48201088266530023, -0.563332572174737, -0.7443069891450993, -1.764166660649225, -1.7065434213509496, -0.5050438469709504, -1.3054185775296043, -0.8068132489127746, -0.6382812011694322, -1.1627861693976371, -0.45401443972098127, -0.6137196147779603, -0.4977304132654839, -0.028586481599992053, -0.3072906036350176, -0.8829586176959779, -1.444927239489506, -0.0966882635820835, -0.5922128607740018, 0.6084011487364829, -0.6517359795265918, -0.2999219699247945, -1.2488097930212723, 1.199403408838386, -1.1695890709679937, 0.436954182465409, -1.1042173101212782, -0.6650977784109602, -0.5139643196272795, -0.5198433080158704, 0.0043002748331546275, -0.5518119098417992, 1.3441195820509282, -0.712948924486379, -0.018440170740114344, 0.36568005901789147, -0.5072588384978175, -1.1650625423904195, -0.38855801862147704, 0.30554976386664046, -0.6907406016864835, 1.5280992689860888, -0.10141847608741561, 0.47051948353017276, -0.5387045991248854, 0.978513599005968, -0.40786746694014064, 0.4800826365630705, 0.2240552203634309, -0.9595995086522562, -0.8351580826251421, 0.4753320019864212, -0.57881353707992, 1.0640441766510356, -0.5847533481883163, -0.36677168526134823, -0.11674188951656596, 0.5851323848134786, 0.5079904777408251, -0.41123866164618883, 0.7823550287478366, 0.036505126098621624, 0.5777693152445059, 0.1748309228565252] 
  • Orange/testing/regression/results_modules/som1.py.txt

    r9689 r9779  
    11node: 0 0 
    2     [5.2, 2.7, 3.9, 1.4, 'Iris-versicolor'] 
     2    [5.1, 3.5, 1.4, 0.3, 'Iris-setosa'] 
     3    [5.0, 3.5, 1.3, 0.3, 'Iris-setosa'] 
    34node: 0 1 
     5    [5.0, 3.6, 1.4, 0.2, 'Iris-setosa'] 
    46node: 0 2 
    5     [5.7, 3.8, 1.7, 0.3, 'Iris-setosa'] 
     7    [5.1, 3.7, 1.5, 0.4, 'Iris-setosa'] 
    68node: 0 3 
     9    [5.1, 3.8, 1.5, 0.3, 'Iris-setosa'] 
     10    [5.1, 3.8, 1.6, 0.2, 'Iris-setosa'] 
    711node: 0 4 
    8     [5.7, 4.4, 1.5, 0.4, 'Iris-setosa'] 
    912node: 0 5 
    10     [5.8, 4.0, 1.2, 0.2, 'Iris-setosa'] 
     13    [5.2, 4.1, 1.5, 0.1, 'Iris-setosa'] 
    1114node: 0 6 
    1215    [5.4, 3.9, 1.3, 0.4, 'Iris-setosa'] 
    1316node: 0 7 
    1417node: 0 8 
     18    [5.4, 3.7, 1.5, 0.2, 'Iris-setosa'] 
    1519    [5.5, 3.5, 1.3, 0.2, 'Iris-setosa'] 
     20    [5.3, 3.7, 1.5, 0.2, 'Iris-setosa'] 
    1621node: 0 9 
    1722node: 0 10 
     23    [5.1, 3.5, 1.4, 0.2, 'Iris-setosa'] 
     24    [5.2, 3.5, 1.5, 0.2, 'Iris-setosa'] 
    1825    [5.2, 3.4, 1.4, 0.2, 'Iris-setosa'] 
    1926node: 0 11 
    20     [5.1, 3.5, 1.4, 0.2, 'Iris-setosa'] 
    21     [5.1, 3.5, 1.4, 0.3, 'Iris-setosa'] 
    2227node: 0 12 
    23     [5.0, 3.6, 1.4, 0.2, 'Iris-setosa'] 
    24     [5.0, 3.5, 1.3, 0.3, 'Iris-setosa'] 
     28    [5.1, 3.4, 1.5, 0.2, 'Iris-setosa'] 
    2529node: 0 13 
     30    [5.0, 3.4, 1.5, 0.2, 'Iris-setosa'] 
    2631node: 0 14 
     32    [5.0, 3.3, 1.4, 0.2, 'Iris-setosa'] 
     33node: 0 15 
     34    [5.0, 3.2, 1.2, 0.2, 'Iris-setosa'] 
     35node: 0 16 
     36    [4.7, 3.2, 1.3, 0.2, 'Iris-setosa'] 
     37node: 0 17 
     38    [4.4, 3.2, 1.3, 0.2, 'Iris-setosa'] 
     39    [4.6, 3.2, 1.4, 0.2, 'Iris-setosa'] 
     40node: 0 18 
     41    [4.6, 3.4, 1.4, 0.3, 'Iris-setosa'] 
     42node: 0 19 
    2743    [4.6, 3.6, 1.0, 0.2, 'Iris-setosa'] 
    28 node: 0 15 
    29 node: 0 16 
    30     [4.3, 3.0, 1.1, 0.1, 'Iris-setosa'] 
    31     [4.4, 3.2, 1.3, 0.2, 'Iris-setosa'] 
    32 node: 0 17 
    33     [4.4, 3.0, 1.3, 0.2, 'Iris-setosa'] 
    34 node: 0 18 
    35     [4.4, 2.9, 1.4, 0.2, 'Iris-setosa'] 
    36 node: 0 19 
    37     [4.5, 2.3, 1.3, 0.3, 'Iris-setosa'] 
    3844node: 1 0 
    39     [5.5, 2.3, 4.0, 1.3, 'Iris-versicolor'] 
    40     [5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'] 
     45    [5.0, 3.4, 1.6, 0.4, 'Iris-setosa'] 
    4146node: 1 1 
     47    [5.1, 3.3, 1.7, 0.5, 'Iris-setosa'] 
     48    [5.0, 3.5, 1.6, 0.6, 'Iris-setosa'] 
    4249node: 1 2 
    43     [5.5, 4.2, 1.4, 0.2, 'Iris-setosa'] 
    4450node: 1 3 
    4551node: 1 4 
    46     [5.2, 4.1, 1.5, 0.1, 'Iris-setosa'] 
     52    [5.4, 3.9, 1.7, 0.4, 'Iris-setosa'] 
     53    [5.1, 3.8, 1.9, 0.4, 'Iris-setosa'] 
    4754node: 1 5 
     55    [5.5, 4.2, 1.4, 0.2, 'Iris-setosa'] 
    4856node: 1 6 
    49     [5.4, 3.9, 1.7, 0.4, 'Iris-setosa'] 
     57    [5.8, 4.0, 1.2, 0.2, 'Iris-setosa'] 
     58    [5.7, 4.4, 1.5, 0.4, 'Iris-setosa'] 
    5059node: 1 7 
    51     [5.4, 3.7, 1.5, 0.2, 'Iris-setosa'] 
    5260node: 1 8 
    53     [5.3, 3.7, 1.5, 0.2, 'Iris-setosa'] 
     61    [5.7, 3.8, 1.7, 0.3, 'Iris-setosa'] 
    5462node: 1 9 
    55     [5.2, 3.5, 1.5, 0.2, 'Iris-setosa'] 
    5663node: 1 10 
    57     [5.1, 3.4, 1.5, 0.2, 'Iris-setosa'] 
     64    [5.4, 3.4, 1.7, 0.2, 'Iris-setosa'] 
     65    [5.4, 3.4, 1.5, 0.4, 'Iris-setosa'] 
    5866node: 1 11 
    59     [5.0, 3.4, 1.5, 0.2, 'Iris-setosa'] 
    6067node: 1 12 
    61     [5.0, 3.3, 1.4, 0.2, 'Iris-setosa'] 
     68    [4.8, 3.4, 1.6, 0.2, 'Iris-setosa'] 
     69    [4.8, 3.4, 1.9, 0.2, 'Iris-setosa'] 
    6270node: 1 13 
    63     [5.0, 3.2, 1.2, 0.2, 'Iris-setosa'] 
     71    [4.7, 3.2, 1.6, 0.2, 'Iris-setosa'] 
    6472node: 1 14 
     73    [4.8, 3.1, 1.6, 0.2, 'Iris-setosa'] 
    6574node: 1 15 
    66     [4.6, 3.4, 1.4, 0.3, 'Iris-setosa'] 
    67 node: 1 16 
    68     [4.7, 3.2, 1.3, 0.2, 'Iris-setosa'] 
    69     [4.6, 3.2, 1.4, 0.2, 'Iris-setosa'] 
    70 node: 1 17 
    71     [4.6, 3.1, 1.5, 0.2, 'Iris-setosa'] 
    72 node: 1 18 
    73 node: 1 19 
    74     [4.9, 3.0, 1.4, 0.2, 'Iris-setosa'] 
    75     [4.8, 3.0, 1.4, 0.1, 'Iris-setosa'] 
    76     [4.8, 3.0, 1.4, 0.3, 'Iris-setosa'] 
    77 node: 2 0 
    78     [4.9, 2.5, 4.5, 1.7, 'Iris-virginica'] 
    79 node: 2 1 
    80 node: 2 2 
    81     [5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'] 
    82     [5.6, 2.7, 4.2, 1.3, 'Iris-versicolor'] 
    83 node: 2 3 
    84 node: 2 4 
    85     [5.1, 3.8, 1.9, 0.4, 'Iris-setosa'] 
    86 node: 2 5 
    87 node: 2 6 
    88     [5.1, 3.8, 1.5, 0.3, 'Iris-setosa'] 
    89     [5.1, 3.7, 1.5, 0.4, 'Iris-setosa'] 
    90     [5.1, 3.8, 1.6, 0.2, 'Iris-setosa'] 
    91 node: 2 7 
    92 node: 2 8 
    93     [5.4, 3.4, 1.7, 0.2, 'Iris-setosa'] 
    94 node: 2 9 
    95     [5.4, 3.4, 1.5, 0.4, 'Iris-setosa'] 
    96 node: 2 10 
    97 node: 2 11 
    98 node: 2 12 
    99     [5.1, 3.3, 1.7, 0.5, 'Iris-setosa'] 
    100 node: 2 13 
    101     [5.0, 3.5, 1.6, 0.6, 'Iris-setosa'] 
    102 node: 2 14 
    103     [5.0, 3.4, 1.6, 0.4, 'Iris-setosa'] 
    104 node: 2 15 
    105     [4.8, 3.4, 1.6, 0.2, 'Iris-setosa'] 
    106 node: 2 16 
    107     [4.7, 3.2, 1.6, 0.2, 'Iris-setosa'] 
    108 node: 2 17 
    109     [4.8, 3.1, 1.6, 0.2, 'Iris-setosa'] 
    110 node: 2 18 
    111 node: 2 19 
    11275    [4.9, 3.1, 1.5, 0.1, 'Iris-setosa'] 
    11376    [5.0, 3.0, 1.6, 0.2, 'Iris-setosa'] 
    11477    [4.9, 3.1, 1.5, 0.1, 'Iris-setosa'] 
    11578    [4.9, 3.1, 1.5, 0.1, 'Iris-setosa'] 
     79node: 1 16 
     80    [4.9, 3.0, 1.4, 0.2, 'Iris-setosa'] 
     81    [4.8, 3.0, 1.4, 0.1, 'Iris-setosa'] 
     82node: 1 17 
     83    [4.6, 3.1, 1.5, 0.2, 'Iris-setosa'] 
     84    [4.8, 3.0, 1.4, 0.3, 'Iris-setosa'] 
     85node: 1 18 
     86    [4.4, 2.9, 1.4, 0.2, 'Iris-setosa'] 
     87node: 1 19 
     88    [4.3, 3.0, 1.1, 0.1, 'Iris-setosa'] 
     89    [4.4, 3.0, 1.3, 0.2, 'Iris-setosa'] 
     90node: 2 0 
     91    [5.7, 2.6, 3.5, 1.0, 'Iris-versicolor'] 
     92node: 2 1 
     93node: 2 2 
     94    [5.5, 2.4, 3.7, 1.0, 'Iris-versicolor'] 
     95node: 2 3 
     96    [5.5, 2.4, 3.8, 1.1, 'Iris-versicolor'] 
     97node: 2 4 
     98    [5.6, 2.5, 3.9, 1.1, 'Iris-versicolor'] 
     99node: 2 5 
     100node: 2 6 
     101    [5.1, 2.5, 3.0, 1.1, 'Iris-versicolor'] 
     102node: 2 7 
     103node: 2 8 
     104    [4.9, 2.4, 3.3, 1.0, 'Iris-versicolor'] 
     105    [5.0, 2.3, 3.3, 1.0, 'Iris-versicolor'] 
     106node: 2 9 
     107node: 2 10 
     108    [5.0, 2.0, 3.5, 1.0, 'Iris-versicolor'] 
     109node: 2 11 
     110node: 2 12 
     111    [4.5, 2.3, 1.3, 0.3, 'Iris-setosa'] 
     112node: 2 13 
     113node: 2 14 
     114node: 2 15 
     115node: 2 16 
     116node: 2 17 
     117node: 2 18 
     118node: 2 19 
    116119node: 3 0 
    117     [5.4, 3.0, 4.5, 1.5, 'Iris-versicolor'] 
     120    [5.6, 2.9, 3.6, 1.3, 'Iris-versicolor'] 
    118121node: 3 1 
    119     [5.6, 3.0, 4.5, 1.5, 'Iris-versicolor'] 
    120122node: 3 2 
    121     [5.7, 2.8, 4.5, 1.3, 'Iris-versicolor'] 
     123    [5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'] 
     124    [5.7, 3.0, 4.2, 1.2, 'Iris-versicolor'] 
    122125node: 3 3 
     126    [5.7, 2.9, 4.2, 1.3, 'Iris-versicolor'] 
     127    [5.7, 2.8, 4.1, 1.3, 'Iris-versicolor'] 
    123128node: 3 4 
    124     [5.7, 2.8, 4.1, 1.3, 'Iris-versicolor'] 
    125129node: 3 5 
    126     [5.7, 2.9, 4.2, 1.3, 'Iris-versicolor'] 
     130    [5.9, 3.0, 4.2, 1.5, 'Iris-versicolor'] 
    127131node: 3 6 
    128     [5.7, 3.0, 4.2, 1.2, 'Iris-versicolor'] 
     132    [6.1, 2.8, 4.0, 1.3, 'Iris-versicolor'] 
    129133node: 3 7 
     134    [5.8, 2.7, 3.9, 1.2, 'Iris-versicolor'] 
    130135node: 3 8 
    131     [5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'] 
     136    [5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'] 
     137    [5.8, 2.6, 4.0, 1.2, 'Iris-versicolor'] 
    132138node: 3 9 
    133139node: 3 10 
    134     [4.8, 3.4, 1.9, 0.2, 'Iris-setosa'] 
    135140node: 3 11 
     141    [5.5, 2.3, 4.0, 1.3, 'Iris-versicolor'] 
     142    [5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'] 
    136143node: 3 12 
    137     [7.1, 3.0, 5.9, 2.1, 'Iris-virginica'] 
     144    [5.2, 2.7, 3.9, 1.4, 'Iris-versicolor'] 
    138145node: 3 13 
    139146node: 3 14 
    140     [7.6, 3.0, 6.6, 2.1, 'Iris-virginica'] 
    141147node: 3 15 
     148    [6.0, 2.2, 4.0, 1.0, 'Iris-versicolor'] 
    142149node: 3 16 
    143     [7.7, 2.8, 6.7, 2.0, 'Iris-virginica'] 
     150    [6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'] 
     151    [6.3, 2.3, 4.4, 1.3, 'Iris-versicolor'] 
    144152node: 3 17 
    145153node: 3 18 
    146154node: 3 19 
    147155    [7.7, 2.6, 6.9, 2.3, 'Iris-virginica'] 
     156    [7.7, 2.8, 6.7, 2.0, 'Iris-virginica'] 
    148157node: 4 0 
    149     [5.8, 2.7, 5.1, 1.9, 'Iris-virginica'] 
    150     [5.7, 2.5, 5.0, 2.0, 'Iris-virginica'] 
    151     [5.8, 2.7, 5.1, 1.9, 'Iris-virginica'] 
     158    [5.4, 3.0, 4.5, 1.5, 'Iris-versicolor'] 
    152159node: 4 1 
     160    [5.6, 3.0, 4.5, 1.5, 'Iris-versicolor'] 
    153161node: 4 2 
    154     [5.6, 2.8, 4.9, 2.0, 'Iris-virginica'] 
     162    [5.7, 2.8, 4.5, 1.3, 'Iris-versicolor'] 
    155163node: 4 3 
    156164node: 4 4 
    157     [5.9, 3.0, 4.2, 1.5, 'Iris-versicolor'] 
     165    [6.0, 2.9, 4.5, 1.5, 'Iris-versicolor'] 
    158166node: 4 5 
     167    [6.1, 3.0, 4.6, 1.4, 'Iris-versicolor'] 
    159168node: 4 6 
    160     [6.0, 2.9, 4.5, 1.5, 'Iris-versicolor'] 
     169    [6.1, 2.9, 4.7, 1.4, 'Iris-versicolor'] 
    161170node: 4 7 
     171    [6.1, 2.8, 4.7, 1.2, 'Iris-versicolor'] 
    162172node: 4 8 
    163     [6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'] 
    164     [6.3, 2.3, 4.4, 1.3, 'Iris-versicolor'] 
     173    [5.6, 2.7, 4.2, 1.3, 'Iris-versicolor'] 
    165174node: 4 9 
    166175node: 4 10 
    167     [7.4, 2.8, 6.1, 1.9, 'Iris-virginica'] 
    168176node: 4 11 
     177    [5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'] 
    169178node: 4 12 
    170     [7.7, 3.0, 6.1, 2.3, 'Iris-virginica'] 
     179    [6.0, 2.2, 5.0, 1.5, 'Iris-virginica'] 
    171180node: 4 13 
    172181node: 4 14 
    173182node: 4 15 
     183    [6.3, 2.5, 4.9, 1.5, 'Iris-versicolor'] 
    174184node: 4 16 
    175185node: 4 17 
     186    [6.7, 2.5, 5.8, 1.8, 'Iris-virginica'] 
    176187node: 4 18 
    177188node: 4 19 
    178189node: 5 0 
    179     [5.8, 2.8, 5.1, 2.4, 'Iris-virginica'] 
     190    [4.9, 2.5, 4.5, 1.7, 'Iris-virginica'] 
    180191node: 5 1 
    181192node: 5 2 
     
    189200    [6.1, 3.0, 4.9, 1.8, 'Iris-virginica'] 
    190201node: 5 7 
     202    [6.2, 2.8, 4.8, 1.8, 'Iris-virginica'] 
    191203node: 5 8 
    192     [6.2, 2.8, 4.8, 1.8, 'Iris-virginica'] 
     204    [6.3, 2.7, 4.9, 1.8, 'Iris-virginica'] 
    193205node: 5 9 
    194     [6.3, 2.7, 4.9, 1.8, 'Iris-virginica'] 
     206    [6.3, 2.5, 5.0, 1.9, 'Iris-virginica'] 
    195207node: 5 10 
    196     [6.3, 2.5, 5.0, 1.9, 'Iris-virginica'] 
    197208node: 5 11 
     209    [6.0, 2.7, 5.1, 1.6, 'Iris-versicolor'] 
    198210node: 5 12 
    199     [6.4, 2.7, 5.3, 1.9, 'Iris-virginica'] 
    200211node: 5 13 
     212    [6.1, 2.6, 5.6, 1.4, 'Iris-virginica'] 
    201213node: 5 14 
    202     [6.7, 2.5, 5.8, 1.8, 'Iris-virginica'] 
    203214node: 5 15 
     215    [7.4, 2.8, 6.1, 1.9, 'Iris-virginica'] 
    204216node: 5 16 
    205     [7.2, 3.0, 5.8, 1.6, 'Iris-virginica'] 
    206217node: 5 17 
    207     [7.2, 3.2, 6.0, 1.8, 'Iris-virginica'] 
     218    [7.3, 2.9, 6.3, 1.8, 'Iris-virginica'] 
    208219node: 5 18 
    209220node: 5 19 
    210     [7.3, 2.9, 6.3, 1.8, 'Iris-virginica'] 
     221    [7.6, 3.0, 6.6, 2.1, 'Iris-virginica'] 
    211222node: 6 0 
    212     [6.4, 2.8, 5.6, 2.1, 'Iris-virginica'] 
    213     [6.4, 2.8, 5.6, 2.2, 'Iris-virginica'] 
     223    [5.6, 2.8, 4.9, 2.0, 'Iris-virginica'] 
    214224node: 6 1 
    215225node: 6 2 
    216     [6.5, 3.2, 5.1, 2.0, 'Iris-virginica'] 
     226    [5.7, 2.5, 5.0, 2.0, 'Iris-virginica'] 
    217227node: 6 3 
    218228node: 6 4 
    219     [6.5, 3.0, 5.2, 2.0, 'Iris-virginica'] 
     229    [5.8, 2.7, 5.1, 1.9, 'Iris-virginica'] 
     230    [5.8, 2.7, 5.1, 1.9, 'Iris-virginica'] 
    220231node: 6 5 
    221232node: 6 6 
    222     [6.0, 2.2, 5.0, 1.5, 'Iris-virginica'] 
     233    [6.4, 2.7, 5.3, 1.9, 'Iris-virginica'] 
    223234node: 6 7 
    224     [6.3, 2.5, 4.9, 1.5, 'Iris-versicolor'] 
    225235node: 6 8 
     236    [6.3, 2.8, 5.1, 1.5, 'Iris-virginica'] 
    226237node: 6 9 
    227     [6.8, 2.8, 4.8, 1.4, 'Iris-versicolor'] 
    228238node: 6 10 
     239    [7.2, 3.0, 5.8, 1.6, 'Iris-virginica'] 
    229240node: 6 11 
    230     [6.7, 3.0, 5.0, 1.7, 'Iris-versicolor'] 
    231241node: 6 12 
     242    [7.2, 3.2, 6.0, 1.8, 'Iris-virginica'] 
    232243node: 6 13 
    233     [6.7, 3.1, 4.7, 1.5, 'Iris-versicolor'] 
    234244node: 6 14 
     245    [7.1, 3.0, 5.9, 2.1, 'Iris-virginica'] 
    235246node: 6 15 
    236     [6.9, 3.1, 4.9, 1.5, 'Iris-versicolor'] 
    237247node: 6 16 
     248    [7.7, 3.0, 6.1, 2.3, 'Iris-virginica'] 
    238249node: 6 17 
    239     [7.0, 3.2, 4.7, 1.4, 'Iris-versicolor'] 
    240250node: 6 18 
    241251node: 6 19 
    242252node: 7 0 
    243     [6.5, 3.0, 5.8, 2.2, 'Iris-virginica'] 
     253    [5.8, 2.8, 5.1, 2.4, 'Iris-virginica'] 
    244254node: 7 1 
    245255node: 7 2 
     
    249259node: 7 4 
    250260    [6.9, 3.1, 5.1, 2.3, 'Iris-virginica'] 
    251     [6.7, 3.0, 5.2, 2.3, 'Iris-virginica'] 
    252261node: 7 5 
    253262node: 7 6 
     263    [6.7, 3.0, 5.2, 2.3, 'Iris-virginica'] 
     264node: 7 7 
     265node: 7 8 
     266    [6.5, 3.0, 5.2, 2.0, 'Iris-virginica'] 
     267node: 7 9 
     268node: 7 10 
     269    [6.5, 3.2, 5.1, 2.0, 'Iris-virginica'] 
     270node: 7 11 
     271node: 7 12 
     272    [6.7, 3.0, 5.0, 1.7, 'Iris-versicolor'] 
     273node: 7 13 
     274node: 7 14 
     275    [6.8, 2.8, 4.8, 1.4, 'Iris-versicolor'] 
     276node: 7 15 
     277node: 7 16 
     278    [7.2, 3.6, 6.1, 2.5, 'Iris-virginica'] 
     279node: 7 17 
     280node: 7 18 
     281    [7.9, 3.8, 6.4, 2.0, 'Iris-virginica'] 
     282node: 7 19 
     283    [7.7, 3.8, 6.7, 2.2, 'Iris-virginica'] 
     284node: 8 0 
     285    [6.3, 2.9, 5.6, 1.8, 'Iris-virginica'] 
     286node: 8 1 
     287    [6.4, 2.8, 5.6, 2.1, 'Iris-virginica'] 
     288node: 8 2 
     289    [6.4, 2.8, 5.6, 2.2, 'Iris-virginica'] 
     290node: 8 3 
     291    [6.5, 3.0, 5.8, 2.2, 'Iris-virginica'] 
     292node: 8 4 
     293node: 8 5 
     294    [6.9, 3.2, 5.7, 2.3, 'Iris-virginica'] 
     295node: 8 6 
     296node: 8 7 
     297    [6.7, 3.1, 5.6, 2.4, 'Iris-virginica'] 
     298    [6.7, 3.3, 5.7, 2.5, 'Iris-virginica'] 
     299node: 8 8 
     300node: 8 9 
     301    [6.4, 3.2, 5.3, 2.3, 'Iris-virginica'] 
     302node: 8 10 
     303node: 8 11 
     304    [6.7, 3.1, 4.7, 1.5, 'Iris-versicolor'] 
     305node: 8 12 
     306node: 8 13 
     307    [6.9, 3.1, 4.9, 1.5, 'Iris-versicolor'] 
     308node: 8 14 
     309node: 8 15 
     310    [7.0, 3.2, 4.7, 1.4, 'Iris-versicolor'] 
     311node: 8 16 
     312node: 8 17 
     313    [6.6, 2.9, 4.6, 1.3, 'Iris-versicolor'] 
     314node: 8 18 
     315node: 8 19 
     316    [6.5, 2.8, 4.6, 1.5, 'Iris-versicolor'] 
     317node: 9 0 
    254318    [6.5, 3.0, 5.5, 1.8, 'Iris-virginica'] 
    255319    [6.4, 3.1, 5.5, 1.8, 'Iris-virginica'] 
    256 node: 7 7 
    257 node: 7 8 
    258     [6.3, 2.9, 5.6, 1.8, 'Iris-virginica'] 
    259 node: 7 9 
    260 node: 7 10 
    261     [6.1, 2.6, 5.6, 1.4, 'Iris-virginica'] 
    262 node: 7 11 
    263 node: 7 12 
    264     [6.0, 2.7, 5.1, 1.6, 'Iris-versicolor'] 
    265 node: 7 13 
    266     [6.3, 2.8, 5.1, 1.5, 'Iris-virginica'] 
    267 node: 7 14 
    268 node: 7 15 
    269     [6.5, 2.8, 4.6, 1.5, 'Iris-versicolor'] 
    270 node: 7 16 
    271     [6.6, 2.9, 4.6, 1.3, 'Iris-versicolor'] 
    272 node: 7 17 
    273     [6.7, 3.1, 4.4, 1.4, 'Iris-versicolor'] 
    274     [6.6, 3.0, 4.4, 1.4, 'Iris-versicolor'] 
    275 node: 7 18 
    276 node: 7 19 
    277     [6.4, 2.9, 4.3, 1.3, 'Iris-versicolor'] 
    278     [6.2, 2.9, 4.3, 1.3, 'Iris-versicolor'] 
    279 node: 8 0 
    280     [6.7, 3.3, 5.7, 2.1, 'Iris-virginica'] 
    281 node: 8 1 
    282 node: 8 2 
    283     [6.9, 3.2, 5.7, 2.3, 'Iris-virginica'] 
    284     [6.8, 3.2, 5.9, 2.3, 'Iris-virginica'] 
    285 node: 8 3 
    286 node: 8 4 
    287     [6.7, 3.1, 5.6, 2.4, 'Iris-virginica'] 
    288     [6.7, 3.3, 5.7, 2.5, 'Iris-virginica'] 
    289 node: 8 5 
    290 node: 8 6 
    291     [6.3, 3.4, 5.6, 2.4, 'Iris-virginica'] 
    292 node: 8 7 
    293     [6.2, 3.4, 5.4, 2.3, 'Iris-virginica'] 
    294 node: 8 8 
    295     [6.4, 3.2, 5.3, 2.3, 'Iris-virginica'] 
    296 node: 8 9 
    297 node: 8 10 
    298     [6.1, 2.9, 4.7, 1.4, 'Iris-versicolor'] 
    299     [6.1, 2.8, 4.7, 1.2, 'Iris-versicolor'] 
    300     [6.1, 3.0, 4.6, 1.4, 'Iris-versicolor'] 
    301 node: 8 11 
    302 node: 8 12 
    303     [5.7, 2.6, 3.5, 1.0, 'Iris-versicolor'] 
    304 node: 8 13 
    305 node: 8 14 
    306     [5.8, 2.7, 3.9, 1.2, 'Iris-versicolor'] 
    307 node: 8 15 
    308     [6.1, 2.8, 4.0, 1.3, 'Iris-versicolor'] 
    309 node: 8 16 
    310 node: 8 17 
    311     [6.0, 2.2, 4.0, 1.0, 'Iris-versicolor'] 
    312 node: 8 18 
    313 node: 8 19 
    314     [5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'] 
    315     [5.8, 2.6, 4.0, 1.2, 'Iris-versicolor'] 
    316 node: 9 0 
    317     [7.7, 3.8, 6.7, 2.2, 'Iris-virginica'] 
    318     [7.9, 3.8, 6.4, 2.0, 'Iris-virginica'] 
    319320node: 9 1 
    320321node: 9 2 
    321     [7.2, 3.6, 6.1, 2.5, 'Iris-virginica'] 
     322    [6.7, 3.3, 5.7, 2.1, 'Iris-virginica'] 
    322323node: 9 3 
    323324node: 9 4 
    324     [6.3, 3.3, 6.0, 2.5, 'Iris-virginica'] 
     325    [6.8, 3.2, 5.9, 2.3, 'Iris-virginica'] 
    325326node: 9 5 
    326327node: 9 6 
     328    [6.3, 3.3, 6.0, 2.5, 'Iris-virginica'] 
     329node: 9 7 
     330node: 9 8 
     331    [6.3, 3.4, 5.6, 2.4, 'Iris-virginica'] 
     332node: 9 9 
     333    [6.2, 3.4, 5.4, 2.3, 'Iris-virginica'] 
     334node: 9 10 
     335node: 9 11 
     336node: 9 12 
     337    [6.0, 3.4, 4.5, 1.6, 'Iris-versicolor'] 
     338node: 9 13 
     339    [6.3, 3.3, 4.7, 1.6, 'Iris-versicolor'] 
     340node: 9 14 
    327341    [6.4, 3.2, 4.5, 1.5, 'Iris-versicolor'] 
    328     [6.3, 3.3, 4.7, 1.6, 'Iris-versicolor'] 
    329 node: 9 7 
    330     [6.0, 3.4, 4.5, 1.6, 'Iris-versicolor'] 
    331 node: 9 8 
    332 node: 9 9 
    333 node: 9 10 
    334     [5.6, 2.9, 3.6, 1.3, 'Iris-versicolor'] 
    335 node: 9 11 
    336     [5.1, 2.5, 3.0, 1.1, 'Iris-versicolor'] 
    337 node: 9 12 
    338 node: 9 13 
    339     [4.9, 2.4, 3.3, 1.0, 'Iris-versicolor'] 
    340 node: 9 14 
    341     [5.0, 2.3, 3.3, 1.0, 'Iris-versicolor'] 
    342342node: 9 15 
    343     [5.0, 2.0, 3.5, 1.0, 'Iris-versicolor'] 
    344343node: 9 16 
     344    [6.2, 2.9, 4.3, 1.3, 'Iris-versicolor'] 
    345345node: 9 17 
    346     [5.5, 2.4, 3.7, 1.0, 'Iris-versicolor'] 
     346    [6.4, 2.9, 4.3, 1.3, 'Iris-versicolor'] 
    347347node: 9 18 
    348     [5.5, 2.4, 3.8, 1.1, 'Iris-versicolor'] 
     348    [6.6, 3.0, 4.4, 1.4, 'Iris-versicolor'] 
    349349node: 9 19 
    350     [5.6, 2.5, 3.9, 1.1, 'Iris-versicolor'] 
     350    [6.7, 3.1, 4.4, 1.4, 'Iris-versicolor'] 
  • Orange/testing/regression/results_modules/statExamples.py.txt

    r9767 r9790  
    11 
    22method  CA  AP  Brier   IS 
    3 bayes   0.903   0.902   0.176    0.758 
    4 tree    0.825   0.824   0.326    0.599 
     3bayes   0.903   0.902   0.175    0.759 
     4tree    0.846   0.845   0.286    0.641 
    55majrty  0.614   0.526   0.474   -0.000 
    66 
    77method  CA  AP  Brier   IS 
    8 bayes   0.903+-0.008    0.902+-0.008    0.176+-0.016     0.758+-0.017 
    9 tree    0.825+-0.016    0.824+-0.016    0.326+-0.033     0.599+-0.034 
     8bayes   0.903+-0.019    0.902+-0.019    0.175+-0.036     0.759+-0.039 
     9tree    0.846+-0.016    0.845+-0.015    0.286+-0.030     0.641+-0.032 
    1010majrty  0.614+-0.003    0.526+-0.001    0.474+-0.001    -0.000+-0.000 
    1111 
     
    1414 
    1515Confusion matrix for naive Bayes: 
    16 TP: 240, FP: 18, FN: 27.0, TN: 150 
     16TP: 239, FP: 18, FN: 28.0, TN: 150 
    1717 
    1818Confusion matrix for naive Bayes for 'van': 
    19 TP: 192, FP: 151, FN: 7.0, TN: 496 
     19TP: 189, FP: 241, FN: 10.0, TN: 406 
    2020 
    2121Confusion matrix for naive Bayes for 'opel': 
    22 TP: 79, FP: 75, FN: 133.0, TN: 559 
     22TP: 86, FP: 112, FN: 126.0, TN: 522 
    2323 
    2424    bus van saab    opel 
    25 bus 156 19  17  26 
    26 van 4   192 2   1 
    27 saab    8   68  93  48 
    28 opel    8   64  61  79 
     25bus 56  95  21  46 
     26van 6   189 4   0 
     27saab    3   75  73  66 
     28opel    4   71  51  86 
    2929 
    3030Sensitivity and specificity for 'voting' 
    3131method  sens    spec 
    3232bayes   0.891   0.923 
    33 tree    0.801   0.863 
     33tree    0.816   0.893 
    3434majrty  1.000   0.000 
    3535 
    3636Sensitivity and specificity for 'vehicle=van' 
    3737method  sens    spec 
    38 bayes   0.965   0.767 
    39 tree    0.834   0.966 
     38bayes   0.950   0.628 
     39tree    0.809   0.966 
    4040majrty  0.000   1.000 
    4141 
    4242AUC (voting) 
    4343     bayes: 0.974 
    44       tree: 0.926 
     44      tree: 0.930 
    4545    majrty: 0.500 
    4646 
    4747AUC for vehicle using weighted single-out method 
    4848bayes   tree    majority 
    49 0.840   0.816   0.500 
     490.783   0.800   0.500 
    5050 
    5151AUC for vehicle, using different methods 
    5252                            bayes   tree    majority 
    53        by pairs, weighted:  0.861   0.883   0.500 
    54                  by pairs:  0.863   0.884   0.500 
    55     one vs. all, weighted:  0.840   0.816   0.500 
    56               one vs. all:  0.840   0.816   0.500 
     53       by pairs, weighted:  0.789   0.870   0.500 
     54                 by pairs:  0.791   0.871   0.500 
     55    one vs. all, weighted:  0.783   0.800   0.500 
     56              one vs. all:  0.783   0.800   0.500 
    5757 
    5858AUC for detecting class 'van' in 'vehicle' 
    59 0.923   0.900   0.500 
     590.858   0.888   0.500 
    6060 
    6161AUCs for detecting various classes in 'vehicle' 
    62 bus (218.000) vs others:    0.952   0.936   0.500 
    63 van (199.000) vs others:    0.923   0.900   0.500 
    64 saab (217.000) vs others:   0.737   0.707   0.500 
    65 opel (212.000) vs others:   0.749   0.718   0.500 
     62bus (218.000) vs others:    0.894   0.932   0.500 
     63van (199.000) vs others:    0.858   0.888   0.500 
     64saab (217.000) vs others:   0.699   0.687   0.500 
     65opel (212.000) vs others:   0.682   0.694   0.500 
    6666 
    6767    bus van saab 
    68 van 0.987 
    69 saab    0.927   0.860 
    70 opel    0.921   0.894   0.587 
     68van 0.933 
     69saab    0.820   0.828 
     70opel    0.822   0.825   0.519 
    7171 
    7272AUCs for detecting various pairs of classes in 'vehicle' 
    73 van vs bus:     0.987   0.976   0.500 
    74 saab vs bus:    0.927   0.936   0.500 
    75 saab vs van:    0.860   0.906   0.500 
    76 opel vs bus:    0.921   0.951   0.500 
    77 opel vs van:    0.894   0.915   0.500 
    78 opel vs saab:   0.587   0.622   0.500 
     73van vs bus:     0.933   0.978   0.500 
     74saab vs bus:    0.820   0.938   0.500 
     75saab vs van:    0.828   0.879   0.500 
     76opel vs bus:    0.822   0.932   0.500 
     77opel vs van:    0.825   0.903   0.500 
     78opel vs saab:   0.519   0.599   0.500 
    7979 
    8080AUC and SE for voting 
    81 bayes: 0.982+-0.008 
    82 tree: 0.888+-0.025 
     81bayes: 0.968+-0.015 
     82tree: 0.924+-0.022 
    8383majrty: 0.500+-0.045 
    8484 
    85 Difference between naive Bayes and tree: 0.065+-0.066 
     85Difference between naive Bayes and tree: 0.014+-0.062 
    8686 
    8787ROC (first 20 points) for bayes on 'voting' 
    88881.000   1.000 
    89890.970   1.000 
    90 0.940   1.000 
    91900.910   1.000 
    92 0.896   1.000 
    93910.881   1.000 
    94 0.836   1.000 
    95920.821   1.000 
    96930.806   1.000 
     940.791   1.000 
    97950.761   1.000 
    98960.746   1.000 
     
    101990.687   1.000 
    1021000.672   1.000 
    103 0.627   1.000 
    104 0.612   1.000 
    105 0.597   1.000 
    106 0.582   1.000 
    107 0.567   1.000 
     1010.672   0.991 
     1020.657   0.991 
     1030.642   0.991 
     1040.552   0.991 
     1050.537   0.991 
     1060.522   0.991 
     1070.507   0.991 
  • Orange/testing/regression/results_ofb/accuracy3.py.txt

    r9689 r9786  
    11Classification accuracies: 
    22bayes 0.93119266055 
    3 tree 0.876146788991 
     3tree 0.871559633028 
  • Orange/testing/regression/results_ofb/accuracy4.py.txt

    r9689 r9786  
    1 1: [0.90839694656488545, 0.9007633587786259] 
    2 2: [0.90839694656488545, 0.9007633587786259] 
    3 3: [0.90839694656488545, 0.9007633587786259] 
    4 4: [0.90839694656488545, 0.9007633587786259] 
    5 5: [0.90839694656488545, 0.9007633587786259] 
    6 6: [0.90839694656488545, 0.9007633587786259] 
    7 7: [0.90839694656488545, 0.9007633587786259] 
    8 8: [0.90839694656488545, 0.9007633587786259] 
    9 9: [0.90839694656488545, 0.9007633587786259] 
    10 10: [0.90839694656488545, 0.9007633587786259] 
     11: [0.9083969465648855, 0.9007633587786259] 
     22: [0.9083969465648855, 0.9007633587786259] 
     33: [0.9083969465648855, 0.9007633587786259] 
     44: [0.9083969465648855, 0.9007633587786259] 
     55: [0.9083969465648855, 0.9007633587786259] 
     66: [0.9083969465648855, 0.9007633587786259] 
     77: [0.9083969465648855, 0.9007633587786259] 
     88: [0.9083969465648855, 0.9007633587786259] 
     99: [0.9083969465648855, 0.9007633587786259] 
     1010: [0.9083969465648855, 0.9007633587786259] 
    1111Classification accuracies: 
    1212bayes 0.908396946565 
  • Orange/testing/regression/results_ofb/accuracy5.py.txt

    r9689 r9786  
    1 1: [0.88636363636363635, 0.93181818181818177] 
    2 2: [0.88636363636363635, 0.93181818181818177] 
    3 3: [0.88636363636363635, 0.93181818181818177] 
    4 4: [0.93181818181818177, 1.0] 
    5 5: [0.95454545454545459, 1.0] 
    6 6: [0.88372093023255816, 0.97674418604651159] 
    7 7: [0.93023255813953487, 0.95348837209302328] 
    8 8: [0.88372093023255816, 0.90697674418604646] 
    9 9: [0.88372093023255816, 1.0] 
    10 10: [0.90697674418604646, 0.90697674418604646] 
     11: [0.8863636363636364, 0.9318181818181818] 
     22: [0.8863636363636364, 0.9318181818181818] 
     33: [0.8863636363636364, 0.9318181818181818] 
     44: [0.9318181818181818, 1.0] 
     55: [0.9545454545454546, 1.0] 
     66: [0.8837209302325582, 0.9767441860465116] 
     77: [0.9302325581395349, 0.9534883720930233] 
     88: [0.8837209302325582, 0.9069767441860465] 
     99: [0.8837209302325582, 1.0] 
     1010: [0.9069767441860465, 0.9069767441860465] 
    1111Classification accuracies: 
    1212bayes 0.903382663848 
  • Orange/testing/regression/results_ofb/assoc2.py.txt

    r9689 r9786  
    115 most confident rules: 
    22conf    supp    lift    rule 
    3 1.000   0.585   1.015   drive-wheels=fwd -> engine-location=front 
    4 1.000   0.478   1.015   fuel-type=gas num-of-doors=four -> engine-location=front 
     31.000   0.478   1.015   fuel-type=gas aspiration=std drive-wheels=fwd -> engine-location=front 
     41.000   0.429   1.015   fuel-type=gas aspiration=std num-of-doors=four -> engine-location=front 
     51.000   0.507   1.015   aspiration=std drive-wheels=fwd -> engine-location=front 
    561.000   0.556   1.015   num-of-doors=four -> engine-location=front 
    6 1.000   0.541   1.015   fuel-type=gas drive-wheels=fwd -> engine-location=front 
    771.000   0.449   1.015   aspiration=std num-of-doors=four -> engine-location=front 
    88 
  • Orange/testing/regression/results_ofb/bagging_test.py.linux2.txt

    r9689 r9795  
    1 tree: 0.777 
    2 bagged classifier: 0.794 
     1tree: 0.795 
     2bagged classifier: 0.802 
  • Orange/testing/regression/results_ofb/domain13.py.txt

    r9689 r9786  
    1 original: 0.940, new: 0.960 
     1original: 0.947, new: 0.960 
  • Orange/testing/regression/results_ofb/ensemble3.py.txt

    r9689 r9796  
    11        Learner   CA     Brier Score 
    22        default:  0.473  0.501 
    3     k-NN (k=11):  0.881  0.233 
    4     bagged k-NN:  0.825  0.252 
    5    boosted k-NN:  0.853  0.238 
     3    k-NN (k=11):  0.861  0.231 
     4    bagged k-NN:  0.854  0.249 
     5   boosted k-NN:  0.843  0.261 
  • Orange/testing/regression/results_ofb/handful.py.txt

    r9689 r9786  
    66(democrat  )   0.386         0.995         0.011         0.048         
    77(democrat  )   0.386         0.002         0.015         0.000         
    8 (democrat  )   0.386         0.043         0.015         0.018         
    9 (democrat  )   0.386         0.228         0.015         0.192         
    10 (democrat  )   0.386         1.000         0.973         0.665         
     8(democrat  )   0.386         0.043         0.015         0.015         
     9(democrat  )   0.386         0.228         0.015         0.191         
     10(democrat  )   0.386         1.000         0.973         0.776         
    1111(republican)   0.386         1.000         0.973         0.861         
    1212(republican)   0.386         1.000         0.973         1.000         
  • Orange/testing/regression/results_ofb/regression3.py.txt

    r9689 r9786  
    11Learner        MSE 
    22default         84.777 
    3 regression tree 40.096 
     3regression tree 39.705 
    44k-NN (k=5)      17.532 
  • Orange/testing/regression/results_ofb/regression4.py.txt

    r9689 r9786  
    22maj      84.777  9.207  6.659  1.004  1.002  1.002 -0.004  
    33lr       23.729  4.871  3.413  0.281  0.530  0.513  0.719  
    4 rt       40.096  6.332  4.569  0.475  0.689  0.687  0.525  
     4rt       39.705  6.301  4.549  0.470  0.686  0.684  0.530  
    55knn      17.244  4.153  2.670  0.204  0.452  0.402  0.796  
  • Orange/testing/regression/results_orange25/linear-example.py.txt

    r9689 r9798  
    1 30.0 
    2 25.0 
    3 30.6 
    4 28.6 
    5 27.9 
     1Actual: 24.00, predicted: 30.00  
     2Actual: 21.60, predicted: 25.03  
     3Actual: 34.70, predicted: 30.57  
     4Actual: 33.40, predicted: 28.61  
     5Actual: 36.20, predicted: 27.94  
    66  Variable  Coeff Est  Std Error    t-value          p 
    77 Intercept     36.459      5.103      7.144      0.000   *** 
     
    2020     LSTAT     -0.525      0.051    -10.347      0.000   *** 
    2121Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1 
     22  Variable  Coeff Est  Std Error    t-value          p 
     23 Intercept     36.341      5.067      7.171      0.000   *** 
     24     LSTAT     -0.523      0.047    -11.019      0.000   *** 
     25        RM      3.802      0.406      9.356      0.000   *** 
     26   PTRATIO     -0.947      0.129     -7.334      0.000   *** 
     27       DIS     -1.493      0.186     -8.037      0.000   *** 
     28       NOX    -17.376      3.535     -4.915      0.000   *** 
     29      CHAS      2.719      0.854      3.183      0.002    ** 
     30         B      0.009      0.003      3.475      0.001   *** 
     31        ZN      0.046      0.014      3.390      0.001   *** 
     32      CRIM     -0.108      0.033     -3.307      0.001    ** 
     33       RAD      0.300      0.063      4.726      0.000   *** 
     34       TAX     -0.012      0.003     -3.493      0.001   *** 
     35Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1 
  • Orange/testing/regression/results_orange25/simple_tree_random_forest.py.txt

    r9689 r9802  
    11Learner  CA     Brier  AUC 
    2 for_gain 0.933  0.080  0.994 
     2for_gain 0.947  0.078  0.995 
    33for_simp 0.933  0.079  0.995 
    44 
    55Runtimes: 
    6 for_gain 0.0959219932556 
    7 for_simp 0.0247950553894 
     6for_gain 0.0934960842133 
     7for_simp 0.0251078605652 
  • Orange/testing/regression/results_orange25/statExamplesRegression.py.txt

    r9689 r9803  
    11Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2       
    22maj       84.585   9.197   6.653   1.002   1.001   1.001  -0.002  
    3 rt        21.685   4.657   3.024   0.257   0.507   0.455   0.743  
     3rt        40.015   6.326   4.592   0.474   0.688   0.691   0.526  
    44knn       21.248   4.610   2.870   0.252   0.502   0.432   0.748  
    55lr        24.092   4.908   3.425   0.285   0.534   0.515   0.715  
  • Orange/testing/regression/results_orange25/svm-linear-weights.py.txt

    r9689 r9804  
    1 defaultdict(<type 'float'>, {FloatVariable 'Elu 360': 0.16577258493775038, FloatVariable 'alpha 28': 0.04645238275160125, FloatVariable 'Elu 390': 0.1820761083768519, FloatVariable 'alpha 35': 0.21379911334384863, FloatVariable 'heat 80': 0.38909905042009185, FloatVariable 'cdc15 10': 0.11428450056762224, FloatVariable 'alpha 42': 0.13641650791763626, FloatVariable 'cdc15 30': 0.18270335600911874, FloatVariable 'alpha 49': 0.16085715739160683, FloatVariable 'Elu 150': 0.5965599387054419, FloatVariable 'alpha 7': 0.06557659077448137, FloatVariable 'alpha 63': 0.18433878873311124, FloatVariable 'cdc15 70': 0.13876265882583333, FloatVariable 'Elu 120': 0.5939764872379445, FloatVariable 'alpha 70': 0.26459268873021585, FloatVariable 'Elu 90': 0.3834942300173535, FloatVariable 'cdc15 90': 0.32729758144823035, FloatVariable 'heat 0': 0.19091815990337407, FloatVariable 'alpha 77': 0.20088381949723239, FloatVariable 'cdc15 110': 0.564756618474929, FloatVariable 'heat 160': 0.3192185000510415, FloatVariable 'alpha 56': 0.3367416107117914, FloatVariable 'cdc15 130': 0.3658301477295572, FloatVariable 'alpha 21': 0.030539018557891578, FloatVariable 'cdc15 150': 0.693249161777514, FloatVariable 'dtt 120': 0.55305024494988, FloatVariable 'cdc15 170': 0.44789694844623534, FloatVariable 'alpha 119': 0.1699365258012951, FloatVariable 'cold 0': 0.27980454530046744, FloatVariable 'cdc15 190': 0.16956982427965123, FloatVariable 'alpha 91': 0.1905674816738207, FloatVariable 'heat 10': 1.000320207469925, FloatVariable 'cdc15 210': 0.15183429673463036, FloatVariable 'cold 40': 0.3092287272528724, FloatVariable 'cdc15 230': 0.5474715182870784, FloatVariable 'alpha 105': 0.14088060621674625, FloatVariable 'dtt 15': 0.49451797411099035, FloatVariable 'cdc15 250': 0.3573777070361102, FloatVariable 'diau a': 0.14935761521655416, FloatVariable 'cold 20': 0.4097605043285248, FloatVariable 'cdc15 270': 0.21951931922594184, FloatVariable 'diau b': 0.23473821977067888, FloatVariable 'cdc15 290': 0.24965577080121784, FloatVariable 'diau c': 0.13741762346585432, FloatVariable 'Elu 0': 0.8466167037587657, FloatVariable 'Elu 60': 0.36698713537474476, FloatVariable 'Elu 30': 0.4786304184632838, FloatVariable 'alpha 84': 0.1308234316119738, FloatVariable 'spo 2': 0.7078062232168753, FloatVariable 'heat 20': 0.9867456006798212, FloatVariable 'diau e': 0.864767371223623, FloatVariable 'spo 5': 1.0683498605674933, FloatVariable 'dtt 30': 0.5838556086306895, FloatVariable 'diau f': 1.4452997087693935, FloatVariable 'spo 0': 0.13024372486415015, FloatVariable 'spo 7': 0.8081568079276176, FloatVariable 'diau g': 2.248793727194904, FloatVariable 'spo 9': 0.2498282401589412, FloatVariable 'alpha 98': 0.21754881357923167, FloatVariable 'alpha 0': 0.19198054903386352, FloatVariable 'spo 11': 0.20615575833508282, FloatVariable 'diau d': 0.44932601596409105, FloatVariable 'Elu 180': 0.42425268429856355, FloatVariable 'alpha 14': 0.18310901108005986, FloatVariable 'spo5 2': 0.40417556232809326, FloatVariable 'Elu 210': 0.12396520046361383, FloatVariable 'heat 40': 0.4580618143377812, FloatVariable 'spo5 7': 0.26780459416067937, FloatVariable 'alpha 112': 0.19329749923741518, FloatVariable 'Elu 240': 0.2093390824178926, FloatVariable 'spo5 11': 1.200079459496442, FloatVariable 'Elu 270': 0.33471969574325466, FloatVariable 'dtt 60': 0.5951914850021424, FloatVariable 'spo- early': 1.9466556509082333, FloatVariable 'Elu 300': 0.15913983311663107, FloatVariable 'cold 160': 0.6947037871090163, FloatVariable 'spo- mid': 3.2086605964825132, FloatVariable 'Elu 330': 0.11474308886955724, FloatVariable 'cdc15 50': 0.24968263583989325}) 
     1defaultdict(<type 'float'>, {FloatVariable 'Elu 30': 0.4786304184632838, FloatVariable 'spo 0': 0.13024372486415015, FloatVariable 'Elu 60': 0.36698713537474476, FloatVariable 'spo 2': 0.7078062232168753, FloatVariable 'Elu 90': 0.3834942300173535, FloatVariable 'spo 5': 1.0683498605674933, FloatVariable 'alpha 7': 0.06557659077448137, FloatVariable 'Elu 120': 0.5939764872379445, FloatVariable 'spo 7': 0.8081568079276176, FloatVariable 'diau d': 0.44932601596409105, FloatVariable 'Elu 150': 0.5965599387054419, FloatVariable 'alpha 119': 0.1699365258012951, FloatVariable 'spo 9': 0.2498282401589412, FloatVariable 'Elu 180': 0.42425268429856355, FloatVariable 'spo 11': 0.20615575833508282, FloatVariable 'alpha 70': 0.26459268873021585, FloatVariable 'Elu 210': 0.12396520046361383, FloatVariable 'spo5 2': 0.40417556232809326, FloatVariable 'Elu 240': 0.2093390824178926, FloatVariable 'spo5 7': 0.26780459416067937, FloatVariable 'Elu 270': 0.33471969574325466, FloatVariable 'alpha 84': 0.1308234316119738, FloatVariable 'spo5 11': 1.200079459496442, FloatVariable 'diau e': 0.864767371223623, FloatVariable 'Elu 300': 0.15913983311663107, FloatVariable 'spo- early': 1.9466556509082333, FloatVariable 'Elu 330': 0.11474308886955724, FloatVariable 'alpha 42': 0.13641650791763626, FloatVariable 'spo- mid': 3.2086605964825132, FloatVariable 'Elu 360': 0.16577258493775038, FloatVariable 'alpha 14': 0.18310901108005986, FloatVariable 'Elu 390': 0.1820761083768519, FloatVariable 'alpha 21': 0.030539018557891578, FloatVariable 'cdc15 10': 0.11428450056762224, FloatVariable 'alpha 28': 0.04645238275160125, FloatVariable 'alpha 91': 0.1905674816738207, FloatVariable 'cdc15 30': 0.18270335600911874, FloatVariable 'alpha 35': 0.21379911334384863, FloatVariable 'heat 0': 0.19091815990337407, FloatVariable 'cdc15 50': 0.24968263583989325, FloatVariable 'cdc15 170': 0.44789694844623534, FloatVariable 'heat 80': 0.38909905042009185, FloatVariable 'diau f': 1.4452997087693935, FloatVariable 'cdc15 70': 0.13876265882583333, FloatVariable 'alpha 49': 0.16085715739160683, FloatVariable 'cdc15 90': 0.32729758144823035, FloatVariable 'alpha 56': 0.3367416107117914, FloatVariable 'cold 0': 0.27980454530046744, FloatVariable 'cdc15 110': 0.564756618474929, FloatVariable 'Elu 0': 0.8466167037587657, FloatVariable 'alpha 63': 0.18433878873311124, FloatVariable 'cdc15 130': 0.3658301477295572, FloatVariable 'dtt 60': 0.5951914850021424, FloatVariable 'alpha 105': 0.14088060621674625, FloatVariable 'cdc15 150': 0.693249161777514, FloatVariable 'dtt 120': 0.55305024494988, FloatVariable 'alpha 112': 0.19329749923741518, FloatVariable 'diau g': 2.248793727194904, FloatVariable 'heat 10': 1.000320207469925, FloatVariable 'cdc15 190': 0.16956982427965123, FloatVariable 'heat 160': 0.3192185000510415, FloatVariable 'dtt 15': 0.49451797411099035, FloatVariable 'cold 20': 0.4097605043285248, FloatVariable 'alpha 0': 0.19198054903386352, FloatVariable 'cdc15 210': 0.15183429673463036, FloatVariable 'cold 40': 0.3092287272528724, FloatVariable 'alpha 98': 0.21754881357923167, FloatVariable 'cdc15 230': 0.5474715182870784, FloatVariable 'cold 160': 0.6947037871090163, FloatVariable 'heat 40': 0.4580618143377812, FloatVariable 'cdc15 250': 0.3573777070361102, FloatVariable 'dtt 30': 0.5838556086306895, FloatVariable 'diau a': 0.14935761521655416, FloatVariable 'alpha 77': 0.20088381949723239, FloatVariable 'cdc15 270': 0.21951931922594184, FloatVariable 'diau b': 0.23473821977067888, FloatVariable 'heat 20': 0.9867456006798212, FloatVariable 'cdc15 290': 0.24965577080121784, FloatVariable 'diau c': 0.13741762346585432}) 
  • Orange/testing/regression/results_reference/MeasureAttribute1a.py.txt

    r9689 r9780  
    1 0.793992996216 
     10.794493019581 
    221.100: 0.005 
    331.200: 0.015 
     
    22224.500: 0.238 
    23234.600: 0.235 
    24 4.700: 0.232 
     244.700: 0.233 
    25254.800: 0.206 
    26 4.900: 0.166 
     264.900: 0.167 
    27275.000: 0.153 
    28285.100: 0.107 
  • Orange/testing/regression/results_reference/MeasureAttribute3.py.txt

    r9689 r9781  
    88 
    99Relief 
    10                 - no unknowns:         0.1609         0.0337         0.1219         0.0266 
    11               - with unknowns:         0.0878         0.0989         0.3216         0.0830 
     10                - no unknowns:         0.6167         0.2816         0.1425         0.0532 
     11              - with unknowns:         0.5056         0.3881         0.1849         0.0857 
    1212 
  • Orange/testing/regression/results_reference/contingency6.py.txt

    r9689 r9782  
    664.69999980927 <2.000, 0.000, 0.000> 
    77 
    8 Contingency keys:  [4.3000001907348633, 4.4000000953674316, 4.5] 
     8Contingency keys:  [4.300000190734863, 4.400000095367432, 4.5] 
    99Contingency values:  [<1.000, 0.000, 0.000>, <3.000, 0.000, 0.000>, <1.000, 0.000, 0.000>] 
    10 Contingency items:  [(4.3000001907348633, <1.000, 0.000, 0.000>), (4.4000000953674316, <3.000, 0.000, 0.000>), (4.5, <1.000, 0.000, 0.000>)] 
     10Contingency items:  [(4.300000190734863, <1.000, 0.000, 0.000>), (4.400000095367432, <3.000, 0.000, 0.000>), (4.5, <1.000, 0.000, 0.000>)] 
    1111 
    1212Error:  invalid index (%5.3f) 
  • Orange/testing/regression/results_reference/distributions.py.txt

    r9689 r9783  
    2222 
    2323Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private 
    24 Private Self-emp-not-inc Self-emp-not-inc Private Private Self-emp-not-inc Self-emp-not-inc Private Self-emp-not-inc Private Private Self-emp-not-inc Self-emp-not-inc Private Private Self-emp-not-inc Self-emp-not-inc Private Private Self-emp-not-inc 
     24Private Self-emp-not-inc Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private Private 
    2525Private:  685.0 
    2626Private:  685.0 
  • Orange/testing/regression/results_reference/imputation.py.txt

    r9689 r9793  
    9797['M', 1874, 'RR', ?, 2, '?', 'THROUGH', 'IRON', '?', '?', 'SIMPLE-T'] 
    9898Imputed: 
    99 ['M', 1874, 'RR', 1000, 2, 'N', 'THROUGH', 'IRON', 'MEDIUM', 'S', 'SIMPLE-T'] 
     99['M', 1874, 'RR', 1257, 2, 'N', 'THROUGH', 'IRON', 'MEDIUM', 'S', 'SIMPLE-T'] 
    100100 
    101101['M', 1876, 'HIGHWAY', 1245, ?, '?', 'THROUGH', 'STEEL', 'LONG', 'F', 'SUSPEN'] 
     
    106106 
    107107['O', 1878, 'RR', ?, 2, 'G', '?', 'STEEL', '?', '?', 'SIMPLE-T'] 
    108 ['O', 1878, 'RR', 804, 2, 'G', 'THROUGH', 'STEEL', 'MEDIUM', 'S', 'SIMPLE-T'] 
     108['O', 1878, 'RR', 1257, 2, 'G', 'THROUGH', 'STEEL', 'MEDIUM', 'S', 'SIMPLE-T'] 
    109109 
    110110['M', 1882, 'RR', ?, 2, 'G', '?', 'STEEL', '?', '?', 'SIMPLE-T'] 
    111 ['M', 1882, 'RR', 1000, 2, 'G', 'THROUGH', 'STEEL', 'MEDIUM', 'F', 'SIMPLE-T'] 
     111['M', 1882, 'RR', 1257, 2, 'G', 'THROUGH', 'STEEL', 'MEDIUM', 'F', 'SIMPLE-T'] 
    112112 
    113113['A', 1883, 'RR', ?, 2, 'G', 'THROUGH', 'STEEL', '?', 'F', 'SIMPLE-T'] 
    114 ['A', 1883, 'RR', 1000, 2, 'G', 'THROUGH', 'STEEL', 'MEDIUM', 'F', 'SIMPLE-T'] 
     114['A', 1883, 'RR', 1257, 2, 'G', 'THROUGH', 'STEEL', 'MEDIUM', 'F', 'SIMPLE-T'] 
    115115 
    116116 
     
    140140*** CUSTOM IMPUTATION BY MODELS *** 
    141141 
    142 SPAN=SHORT: 1158 
    143 SPAN=LONG: 1907 
    144 SPAN=MEDIUM 
    145 |    ERECTED<1911.500: 1325 
    146 |    ERECTED>=1911.500: 1528 
     142ERECTED<=1894.500: 1257 
     143ERECTED>1894.500 
     144|    SPAN=SHORT<null node>: <null node> 
     145|    SPAN=MEDIUM: 1571 
     146|    SPAN=LONG: 1829 
    147147 
    148148['M', 1876, 'HIGHWAY', 1245, ?, '?', 'THROUGH', 'STEEL', 'LONG', 'F', 'SUSPEN'] 
  • Orange/testing/regression/results_reference/undefineds.py.txt

    r9689 r9792  
    77['?', '?', '?'] 
    88['~', '~', '~'] 
    9 ['?', 'X', 'X'] 
    10 ['?', 'UNK', 'UNK'] 
    11 ['UNAVAILABLE', '?', 'UNAVAILABLE'] 
     9['X', 'X', 'X'] 
     10['UNK', 'UNK', 'UNK'] 
     11['UNAVAILABLE', 'UNAVAILABLE', 'UNAVAILABLE'] 
    1212Default saving 
    1313 
    1414a   b   c 
    15 0 1 UNAVAILABLE 0 1 X UNK   0 1 X UNK UNAVAILABLE 
     150 1 UNAVAILABLE UNK X   0 1 UNAVAILABLE UNK X   0 1 UNAVAILABLE UNK X 
    1616         
    17170   0   0 
     
    2323?   ?   ? 
    2424~   ~   ~ 
    25 ?   X   X 
    26 ?   UNK UNK 
    27 UNAVAILABLE ?   UNAVAILABLE 
     25X   X   X 
     26UNK UNK UNK 
     27UNAVAILABLE UNAVAILABLE UNAVAILABLE 
    2828 
    2929Saving with all undefined as NA 
    3030 
    3131a   b   c 
    32 0 1 UNAVAILABLE 0 1 X UNK   0 1 X UNK UNAVAILABLE 
     320 1 UNAVAILABLE UNK X   0 1 UNAVAILABLE UNK X   0 1 UNAVAILABLE UNK X 
    3333         
    34340   0   0 
     
    4040?   ?   ? 
    4141~   ~   ~ 
    42 ?   X   X 
    43 ?   UNK UNK 
    44 UNAVAILABLE ?   UNAVAILABLE 
     42X   X   X 
     43UNK UNK UNK 
     44UNAVAILABLE UNAVAILABLE UNAVAILABLE 
    4545 
    4646Saving with all undefined as NA 
    4747 
    4848a   b   c 
    49 0 1 UNAVAILABLE 0 1 X UNK   0 1 X UNK UNAVAILABLE 
     490 1 UNAVAILABLE UNK X   0 1 UNAVAILABLE UNK X   0 1 UNAVAILABLE UNK X 
    5050         
    51510   0   0 
     
    5757?   ?   ? 
    5858~   ~   ~ 
    59 ?   X   X 
    60 ?   UNK UNK 
    61 UNAVAILABLE ?   UNAVAILABLE 
     59X   X   X 
     60UNK UNK UNK 
     61UNAVAILABLE UNAVAILABLE UNAVAILABLE 
    6262 
  • docs/reference/rst/Orange.classification.logreg.rst

    r9372 r9818  
    11.. automodule:: Orange.classification.logreg 
     2 
     3.. index: logistic regression 
     4.. index: 
     5   single: classification; logistic regression 
     6 
     7******************************** 
     8Logistic regression (``logreg``) 
     9******************************** 
     10 
     11`Logistic regression <http://en.wikipedia.org/wiki/Logistic_regression>`_ 
     12is a statistical classification methods that fits data to a logistic 
     13function. Orange's implementation of algorithm 
     14can handle various anomalies in features, such as constant variables and 
     15singularities, that could make direct fitting of logistic regression almost 
     16impossible. Stepwise logistic regression, which iteratively selects the most 
     17informative features, is also supported. 
     18 
     19.. autoclass:: LogRegLearner 
     20   :members: 
     21 
     22.. class :: LogRegClassifier 
     23 
     24    A logistic regression classification model. Stores estimated values of 
     25    regression coefficients and their significances, and uses them to predict 
     26    classes and class probabilities. 
     27 
     28    .. attribute :: beta 
     29 
     30        Estimated regression coefficients. 
     31 
     32    .. attribute :: beta_se 
     33 
     34        Estimated standard errors for regression coefficients. 
     35 
     36    .. attribute :: wald_Z 
     37 
     38        Wald Z statistics for beta coefficients. Wald Z is computed 
     39        as beta/beta_se. 
     40 
     41    .. attribute :: P 
     42 
     43        List of P-values for beta coefficients, that is, the probability 
     44        that beta coefficients differ from 0.0. The probability is 
     45        computed from squared Wald Z statistics that is distributed with 
     46        Chi-Square distribution. 
     47 
     48    .. attribute :: likelihood 
     49 
     50        The probability of the sample (ie. learning examples) observed on 
     51        the basis of the derived model, as a function of the regression 
     52        parameters. 
     53 
     54    .. attribute :: fit_status 
     55 
     56        Tells how the model fitting ended - either regularly 
     57        (:obj:`LogRegFitter.OK`), or it was interrupted due to one of beta 
     58        coefficients escaping towards infinity (:obj:`LogRegFitter.Infinity`) 
     59        or since the values didn't converge (:obj:`LogRegFitter.Divergence`). The 
     60        value tells about the classifier's "reliability"; the classifier 
     61        itself is useful in either case. 
     62 
     63    .. method:: __call__(instance, result_type) 
     64 
     65        Classify a new instance. 
     66 
     67        :param instance: instance to be classified. 
     68        :type instance: :class:`~Orange.data.Instance` 
     69        :param result_type: :class:`~Orange.classification.Classifier.GetValue` or 
     70              :class:`~Orange.classification.Classifier.GetProbabilities` or 
     71              :class:`~Orange.classification.Classifier.GetBoth` 
     72 
     73        :rtype: :class:`~Orange.data.Value`, 
     74              :class:`~Orange.statistics.distribution.Distribution` or a 
     75              tuple with both 
     76 
     77 
     78.. class:: LogRegFitter 
     79 
     80    :obj:`LogRegFitter` is the abstract base class for logistic fitters. It 
     81    defines the form of call operator and the constants denoting its 
     82    (un)success: 
     83 
     84    .. attribute:: OK 
     85 
     86        Fitter succeeded to converge to the optimal fit. 
     87 
     88    .. attribute:: Infinity 
     89 
     90        Fitter failed due to one or more beta coefficients escaping towards infinity. 
     91 
     92    .. attribute:: Divergence 
     93 
     94        Beta coefficients failed to converge, but none of beta coefficients escaped. 
     95 
     96    .. attribute:: Constant 
     97 
     98        There is a constant attribute that causes the matrix to be singular. 
     99 
     100    .. attribute:: Singularity 
     101 
     102        The matrix is singular. 
     103 
     104 
     105    .. method:: __call__(examples, weight_id) 
     106 
     107        Performs the fitting. There can be two different cases: either 
     108        the fitting succeeded to find a set of beta coefficients (although 
     109        possibly with difficulties) or the fitting failed altogether. The 
     110        two cases return different results. 
     111 
     112        `(status, beta, beta_se, likelihood)` 
     113            The fitter managed to fit the model. The first element of 
     114            the tuple, result, tells about the problems occurred; it can 
     115            be either :obj:`OK`, :obj:`Infinity` or :obj:`Divergence`. In 
     116            the latter cases, returned values may still be useful for 
     117            making predictions, but it's recommended that you inspect 
     118            the coefficients and their errors and make your decision 
     119            whether to use the model or not. 
     120 
     121        `(status, attribute)` 
     122            The fitter failed and the returned attribute is responsible 
     123            for it. The type of failure is reported in status, which 
     124            can be either :obj:`Constant` or :obj:`Singularity`. 
     125 
     126        The proper way of calling the fitter is to expect and handle all 
     127        the situations described. For instance, if fitter is an instance 
     128        of some fitter and examples contain a set of suitable examples, 
     129        a script should look like this:: 
     130 
     131            res = fitter(examples) 
     132            if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]: 
     133               status, beta, beta_se, likelihood = res 
     134               < proceed by doing something with what you got > 
     135            else: 
     136               status, attr = res 
     137               < remove the attribute or complain to the user or ... > 
     138 
     139 
     140.. class :: LogRegFitter_Cholesky 
     141 
     142    The sole fitter available at the 
     143    moment. It is a C++ translation of `Alan Miller's logistic regression 
     144    code <http://users.bigpond.net.au/amiller/>`_. It uses Newton-Raphson 
     145    algorithm to iteratively minimize least squares error computed from 
     146    learning examples. 
     147 
     148 
     149.. autoclass:: StepWiseFSS 
     150   :members: 
     151   :show-inheritance: 
     152 
     153.. autofunction:: dump 
     154 
     155 
     156 
     157Examples 
     158-------- 
     159 
     160The first example shows a very simple induction of a logistic regression 
     161classifier (:download:`logreg-run.py <code/logreg-run.py>`). 
     162 
     163.. literalinclude:: code/logreg-run.py 
     164 
     165Result:: 
     166 
     167    Classification accuracy: 0.778282598819 
     168 
     169    class attribute = survived 
     170    class values = <no, yes> 
     171 
     172        Attribute       beta  st. error     wald Z          P OR=exp(beta) 
     173 
     174        Intercept      -1.23       0.08     -15.15      -0.00 
     175     status=first       0.86       0.16       5.39       0.00       2.36 
     176    status=second      -0.16       0.18      -0.91       0.36       0.85 
     177     status=third      -0.92       0.15      -6.12       0.00       0.40 
     178        age=child       1.06       0.25       4.30       0.00       2.89 
     179       sex=female       2.42       0.14      17.04       0.00      11.25 
     180 
     181The next examples shows how to handle singularities in data sets 
     182(:download:`logreg-singularities.py <code/logreg-singularities.py>`). 
     183 
     184.. literalinclude:: code/logreg-singularities.py 
     185 
     186The first few lines of the output of this script are:: 
     187 
     188    <=50K <=50K 
     189    <=50K <=50K 
     190    <=50K <=50K 
     191    >50K >50K 
     192    <=50K >50K 
     193 
     194    class attribute = y 
     195    class values = <>50K, <=50K> 
     196 
     197                               Attribute       beta  st. error     wald Z          P OR=exp(beta) 
     198 
     199                               Intercept       6.62      -0.00       -inf       0.00 
     200                                     age      -0.04       0.00       -inf       0.00       0.96 
     201                                  fnlwgt      -0.00       0.00       -inf       0.00       1.00 
     202                           education-num      -0.28       0.00       -inf       0.00       0.76 
     203                 marital-status=Divorced       4.29       0.00        inf       0.00      72.62 
     204            marital-status=Never-married       3.79       0.00        inf       0.00      44.45 
     205                marital-status=Separated       3.46       0.00        inf       0.00      31.95 
     206                  marital-status=Widowed       3.85       0.00        inf       0.00      46.96 
     207    marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63 
     208        marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19 
     209                 occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72 
     210 
     211If :obj:`remove_singular` is set to 0, inducing a logistic regression 
     212classifier would return an error:: 
     213 
     214    Traceback (most recent call last): 
     215      File "logreg-singularities.py", line 4, in <module> 
     216        lr = classification.logreg.LogRegLearner(table, removeSingular=0) 
     217      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner 
     218        return lr(examples, weightID) 
     219      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__ 
     220        lr = learner(examples, weight) 
     221    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked 
     222 
     223We can see that the attribute workclass is causing a singularity. 
     224 
     225The example below shows how the use of stepwise logistic regression can help to 
     226gain in classification performance (:download:`logreg-stepwise.py <code/logreg-stepwise.py>`): 
     227 
     228.. literalinclude:: code/logreg-stepwise.py 
     229 
     230The output of this script is:: 
     231 
     232    Learner      CA 
     233    logistic     0.841 
     234    filtered     0.846 
     235 
     236    Number of times attributes were used in cross-validation: 
     237     1 x a21 
     238    10 x a22 
     239     8 x a23 
     240     7 x a24 
     241     1 x a25 
     242    10 x a26 
     243    10 x a27 
     244     3 x a28 
     245     7 x a29 
     246     9 x a31 
     247     2 x a16 
     248     7 x a12 
     249     1 x a32 
     250     8 x a15 
     251    10 x a14 
     252     4 x a17 
     253     7 x a30 
     254    10 x a11 
     255     1 x a10 
     256     1 x a13 
     257    10 x a34 
     258     2 x a19 
     259     1 x a18 
     260    10 x a3 
     261    10 x a5 
     262     4 x a4 
     263     4 x a7 
     264     8 x a6 
     265    10 x a9 
     266    10 x a8 
  • docs/reference/rst/Orange.data.instance.rst

    r9525 r9788  
    55============================= 
    66 
    7 Class `Orange.data.Instance` holds data instances. Each data instance 
     7Class `Orange.data.Instance` holds a data instance. Each data instance 
    88corresponds to a domain, which defines its length, data types and 
    99values for symbolic indices. 
     
    1313-------- 
    1414 
    15 The data instance is described by a list of features, defined by the 
    16 domain descriptor. Instances support indexing with either integer 
    17 indices, strings or variable descriptors. 
     15The data instance is described by a list of features defined by the 
     16domain descriptor (:obj:`Orange.data.domain`). Instances support indexing 
     17with either integer indices, strings or variable descriptors. 
    1818 
    1919Since "age" is the the first attribute in dataset lenses, the 
    20 below statements are equivalent.:: 
    21  
     20below statements are equivalent:: 
     21 
     22    >>> data = Orange.data.Table("lenses") 
    2223    >>> age = data.domain["age"] 
    2324    >>> example = data[0] 
     
    2930    young 
    3031 
    31 Negative indices do not work as usual in Python, since they return 
     32Negative indices do not work as usual in Python, since they refer to 
    3233the values of meta attributes. 
    3334 
    34 The last element of data instance is the class label, if it 
    35 exists. It should be accessed using :obj:`get_class` and 
    36 :obj:`set_class`. 
    37  
    38 Data instances can be traversed using a for loop. 
    39  
    40 The list has a fixed length, determined by the domain to which the 
    41 instance corresponds. 
     35The last element of data instance is the class label, 
     36if the domain has a class. It should be accessed using 
     37:obj:`~Orange.data.Instance.get_class()` and 
     38:obj:`~Orange.data.Instance.set_class()`. 
     39 
     40The list has a fixed length that equals the number of variables. 
    4241 
    4342--------------- 
     
    4544--------------- 
    4645 
    47 Meta attributes provide a way to attach additional information to 
    48 examples. These attributes are treated specially, for instance, they 
    49 are not used for learning, but can carry additional information, such 
    50 as, for example, a name of a patient or the number of times the 
    51 instance was missclassified during some test procedure. The most 
    52 common additional information is the instance's weight. 
    53  
    54 For contrast from ordinary features, instances from the same domain do 
    55 not need to have the same meta attributes. Meta attributes are hence 
    56 not addressed by positions, but by their id's, which are represented 
    57 by negative indices. Id's are generated by function 
    58 :obj:`Orange.data.variable.new_meta_id()`. Id's can be reused for 
    59 multiple domains. 
    60  
    61 If ordinary features resemble lists, meta attributes can be seen as a 
    62 dictionaries. 
     46Meta attributes provide a way to attach additional information to data 
     47instances, such as, for example, an id of a patient or the number of times 
     48the instance was missclassified during some test procedure. The most 
     49common additional information is the instance's weight. These attributes 
     50do not appear in induced models. 
     51 
     52Instances from the same domain do not need to have the same meta 
     53attributes. Meta attributes are hence not addressed by positions, 
     54but by their id's, which are represented by negative indices. Id's are 
     55generated by function :obj:`Orange.data.variable.new_meta_id()`. Id's can 
     56be reused for multiple domains. 
    6357 
    6458Domain descriptor can, but doesn't need to know about 
     
    6963for the domain, attribute or its name can also be used for 
    7064indexing. When registering meta attributes with domains, it is 
    71 recommended to used the same id for the same attribute in all domains. 
     65recommended to use the same id for the same attribute in all domains. 
    7266 
    7367Meta values can also be loaded from files in tab-delimited format. 
     
    7569Meta attributes are often used as weights. Many procedures, such as 
    7670learning algorithms, accept the id of the meta attribute defining the 
    77 weights of instances as an additional argument besides the data. 
     71weights of instances as an additional argument. 
    7872 
    7973The following example adds a meta attribute with a random value to 
    80 each data instance 
     74each data instance. 
    8175 
    8276.. literalinclude:: code/instance-metavar.py 
    8377    :lines: 1- 
    8478 
    85 The code prints out something like:: 
     79The code prints out:: 
    8680 
    8781    ['young', 'myope', 'no', 'reduced', 'none'], {-2:0.84} 
    8882 
    89 Data instance now consists of two parts, ordinary features that 
     83(except for a different random value). Data instance now consists of 
     84two parts, ordinary features that 
    9085resemble a list since they are addressed by positions (eg. the first 
    9186value is "psby"), and meta values that are more like dictionaries, 
    92 where the id (-2) is a key and 0.34 is a value (of type 
     87where the id (-2) is a key and 0.84 is a value (of type 
    9388:obj:`Orange.data.Value`). 
    9489 
     
    10095Many other functions accept weights in similar fashion. 
    10196 
    102 Code:: 
     97Code :: 
    10398 
    10499    print orange.getClassDistribution(data) 
    105100    print orange.getClassDistribution(data, id) 
    106101 
    107 prints out:: 
     102prints out :: 
    108103 
    109104    <15.000, 5.000, 4.000> 
    110105    <9.691, 3.232, 1.969> 
    111106 
    112 Registering the meta attribute changes how the data instance is 
    113 printed out and how it can be accessed:: 
     107where the first line is the actual distribution and the second a 
     108distribution with random weights assigned to the instances. 
     109 
     110Registering the meta attribute using :obj:`Orange.data.Domain.add_meta` 
     111changes how the data instance is printed out and how it can be 
     112accessed:: 
    114113 
    115114    w = orange.FloatVariable("w") 
     
    152151 
    153152        Construct a data instance with the given domain and initialize 
    154         the values. Values should be given as a list containing 
     153        the values. Values are given as a list of 
    155154        objects that can be converted into values of corresponding 
    156         variables; generally, they can be given as strings and 
    157         integer indices (for discrete varaibles) or numbers (for 
    158         continuous variables), and also as instances of 
     155        variables: strings and integer indices (for discrete varaibles), 
     156        strings or numbers (for continuous variables), or instances of 
    159157        :obj:`Orange.data.Value`. 
    160158 
     
    181179        Construct a new data instance as a shallow copy of the 
    182180        original. If a domain descriptor is given, the instance is 
    183         converted; conversion can add or remove variables, including 
    184         transformations, like discretization ets. 
     181        converted to another domain. 
    185182 
    186183        :param domain: domain descriptor 
     
    197194    .. method:: __init__(domain, instances) 
    198195 
    199         Construct a new data instance for the given domain, where 
    200         attribute values are taken from the provided instances, using 
    201         both their ordinary features and meta attributes, which are 
    202         registered with their corresponding domains. Meta attributes 
    203         which appear in the provided instances and do not appear in 
    204         the domain of the new instance, are copied as well. 
     196        Construct a new data instance for the given domain, where the 
     197        feature values are found in the provided instances using 
     198        both their ordinary features and meta attributes that are 
     199        registered with their corresponding domains. The new instance 
     200        also includes the meta attributes that appear in the provided 
     201        instances and whose values are not used for the instance's 
     202        features. 
    205203 
    206204        :param domain: domain descriptor 
     
    237235    .. method:: native([level]) 
    238236 
    239         Converts the instance into an ordinary Python list. If the 
    240         optional argument is 1 (default), the result is a list of 
    241         objects of :obj:`orange.Data.value`. If it is 0, it contains 
    242         pure Pyhon objects, that is, strings for discrete variables 
     237        Convert the instance into an ordinary Python list. If the 
     238        optional argument `level` is 1 (default), the result is a list of 
     239        instances of :obj:`orange.data.Value`. If it is 0, it contains 
     240        pure Python objects, that is, strings for discrete variables 
    243241        and numbers for continuous ones. 
    244242 
    245     .. method:: compatible(other, ignore_class=0) 
    246  
    247         Return :obj:`True` if the two instances are compatible, that 
     243    .. method:: compatible(other, ignore_class=False) 
     244 
     245        Return ``True`` if the two instances are compatible, that 
    248246        is, equal in all features which are not missing in one of 
    249247        them. The optional second argument can be used to omit the 
     
    276274 
    277275        Return a dictionary containing meta values of the data 
    278         instance. The key type can be :obj:`int` (default), :obj:`str` 
    279         or :obj:`Orange.data.variable.Variable` and determines whether 
    280         the dictionary keys will be meta ids, variables names or 
     276        instance. The argument ``key_type`` can be ``int`` (default), 
     277        ``str`` or :obj:`Orange.data.variable.Variable` and 
     278        determines whether 
     279        the dictionary keys are meta ids, variables names or 
    281280        variable descriptors. In the latter two cases, only registered 
    282281        attributes are returned. :: 
     
    289288            print example.getmetas(orange.Variable) 
    290289 
    291         :param key_type: the key type; either :obj:`int`, :obj:`str` or :obj:`Orange.data.variable.Variable` 
    292         :type key_type: :obj:`type` 
     290        :param key_type: the key type; either ``int``, ``str`` or :obj:`~Orange.data.variable.Variable` 
     291        :type key_type: ``type`` 
    293292 
    294293    .. method:: get_metas(optional, [key_type]) 
    295294 
    296         Similar to above, but return a dictionary containing meta 
    297         values of the data instance which are or which are not 
    298         optional. 
     295        Similar to above, but return a dictionary that contains 
     296        only non-optional attributes (if ``optional`` is 0) or 
     297        only optional attributes. 
    299298 
    300299        :param optional: tells whether to return optional or non-optional attributes 
    301         :type optional: :obj:`bool` 
    302         :param key_type: the key type; either :obj:`int`, :obj:`str` or :obj:`Orange.data.variable.Variable` 
    303         :type key_type: :obj:`type` 
     300        :type optional: ``bool`` 
     301        :param key_type: the key type; either ``int``, ``str`` or :obj:`~Orange.data.variable.Variable` 
     302        :type key_type: `type`` 
    304303 
    305304    .. method:: has_meta(meta_attr) 
    306305 
    307         Return :obj:`True` if the data instance has the specified meta 
    308         attribute, specified by id, string or descriptor. 
     306        Return ``True`` if the data instance has the specified meta 
     307        attribute. 
    309308 
    310309        :param meta_attr: meta attribute 
    311         :type meta_attr: :obj:`id`, :obj:`str` or :obj:`Orange.data.variable.Variable` 
     310        :type meta_attr: :obj:`id`, ``str`` or :obj:`~Orange.data.variable.Variable` 
    312311 
    313312    .. method:: remove_meta(meta_attr) 
    314313 
    315         Remove meta attribute. 
     314        Remove the specified meta attribute. 
    316315 
    317316        :param meta_attr: meta attribute 
    318         :type meta_attr: :obj:`id`, :obj:`str` or :obj:`Orange.data.variable.Variable` 
     317        :type meta_attr: :obj:`id`, ``str`` or :obj:`~Orange.data.variable.Variable` 
    319318 
    320319    .. method:: get_weight(meta_attr) 
    321320 
    322         Return the value of the specified meta attribute. The value 
    323         must be continuous; it is returned as a :obj:`float`. 
     321        Return the value of the specified meta attribute. The 
     322        attribute's value must be continuous and is returned as ``float``. 
    324323 
    325324        :param meta_attr: meta attribute 
    326         :type meta_attr: :obj:`id`, :obj:`str` or :obj:`Orange.data.variable.Variable` 
     325        :type meta_attr: :obj:`id`, ``str`` or :obj:`~Orange.data.variable.Variable` 
    327326 
    328327    .. method:: set_weight(meta_attr, weight=1) 
    329328 
    330         Set the value of the specified meta attribute to `weight`. The value 
    331         must be continuous; it is returned as a :obj:`float`. 
     329        Set the value of the specified meta attribute to ``weight``. 
    332330 
    333331        :param meta_attr: meta attribute 
    334         :type meta_attr: :obj:`id`, :obj:`str` or :obj:`Orange.data.variable.Variable` 
    335         :param weight: weight of the instance 
    336         :type weight: :obj:`float` 
     332        :type meta_attr: :obj:`id`, ``str`` or :obj:`~Orange.data.variable.Variable` 
     333        :param weight: weight of instance 
     334        :type weight: ``float`` 
  • docs/reference/rst/Orange.data.table.rst

    r9726 r9789  
    2424:obj:`Table` supports most list-like operations: getting, setting, 
    2525removing data instances, as well as methods :obj:`append` and 
    26 :obj:`extend`. The limitation is that table contain instances of 
    27 :obj:`Orange.data.Instance`. When setting items, the item must be 
     26:obj:`extend`. When setting items, the item must be 
    2827either the instance of the correct type or a Python list of 
    2928appropriate length and content to be converted into a data instance of 
    30 the corresponding domain. 
    31  
    32 When retrieving data instances, what we get are references and not 
    33 copies. Changing the retrieved instance changes the data in the table, 
    34 too. 
    35  
    36 Slicing returns ordinary Python lists containing the data instance, 
    37 not a new Table. 
    38  
    39 As usual in Python, the data table is considered False, when empty. 
     29the corresponding domain. Retrieving data instances returns references 
     30and not copies: changing the retrieved instance changes the data in the 
     31table. Slicing returns ordinary Python lists containing references to 
     32data instances, not a new :obj:`Orange.data.Table`. 
     33 
     34According to a Python convention, the data table is considered ``False`` 
     35when empty. 
    4036 
    4137.. class:: Table 
     
    4844    .. attribute:: owns_instances 
    4945 
    50         True, if the table contains the data instances, False if it 
    51         contains just references to instances owned by another table. 
     46        ``True``, if the table contains the data instances and ``False`` if 
     47        it contains references to instances owned by another table. 
    5248 
    5349    .. attribute:: owner 
    5450 
    55         If the table does not own the data instances, this attribute 
    56         gives the actual owner. 
     51        The actual owner of the data when ``own_instances`` is ``False``. 
    5752 
    5853    .. attribute:: version 
    5954 
    60         An integer that is increased whenever the table is 
    61         changed. This is not foolproof, since the object cannot 
    62         detect when individual instances are changed. It will, however, 
    63         catch any additions and removals from the table. 
     55        An integer that is increased when instances are added or 
     56        removed from the table. It does not detect changes of the data. 
    6457 
    6558    .. attribute:: random_generator 
     
    6760       Random generator that is used by method 
    6861       :obj:`random_instance`. If the method is called and 
    69        random_generator is None, a new generator is constructed with 
    70        random seed 0, and stored here for subsequent use. 
     62       ``random_generator`` is ``None``, a new generator is constructed 
     63       with random seed 0 and stored here for future use. 
    7164 
    7265    .. attribute:: attribute_load_status 
     
    7467       If the table was loaded from a file, this list of flags tells 
    7568       whether the feature descriptors were reused and how they 
    76        matched. See :ref:`descriptor reuse <variable_descriptor_reuse>` for details. 
     69       matched. See :ref:`descriptor reuse <variable_descriptor_reuse>` 
     70       for details. 
    7771 
    7872    .. attribute:: meta_attribute_load_status 
    7973 
    80        Same as above, except that this is a dictionary for meta 
    81        attributes, with keys corresponding to their ids. 
     74       A dictionary holding this same information for meta 
     75       attributes, with keys corresponding to their ids and values to 
     76       load statuses. 
    8277 
    8378    .. method:: __init__(filename[, create_new_on]) 
     
    9085        specified in the environment variable `ORANGE_DATA_PATH`. 
    9186 
    92         The optional flag `create_new_on` decides when variable 
     87        The optional flag ``create_new_on`` decides when variable 
    9388        descriptors are reused. See :ref:`descriptor reuse 
    9489        <variable_descriptor_reuse>` for more details. 
  • docs/reference/rst/Orange.distance.rst

    r9819 r9821  
    55########################################## 
    66 
     7The following example demonstrates how to compute distances between two instances: 
     8 
     9.. literalinclude:: code/distance-simple.py 
     10    :lines: 1-7 
     11 
     12A matrix with all pairwise distances can be computed with :obj:`distance_matrix`: 
     13 
     14.. literalinclude:: code/distance-simple.py 
     15    :lines: 9-11 
     16 
     17Unknown values are treated correctly only by Euclidean and Relief 
     18distance.  For other measures, a distance between unknown and known or 
     19between two unknown values is always 0.5. 
     20 
     21=================== 
     22Computing distances  
     23=================== 
     24 
    725Distance measures typically have to be adjusted to the data. For instance, 
    826when the data set contains continuous features, the distances between 
     
    1028similar impats, e.g. by dividing the distance with the range. 
    1129 
    12 Distance measures thus appear in pairs - a class that measures 
    13 the distance (:obj:`Distance`) and a class that constructs it based on the 
    14 data (:obj:`DistanceConstructor`). 
     30Distance measures thus appear in pairs: 
    1531 
    16 Since most measures work on normalized distances between corresponding 
    17 features, an abstract class `DistanceNormalized` takes care of 
    18 normalizing. 
    19  
    20 Unknown values are treated correctly only by Euclidean and Relief 
    21 distance.  For other measures, a distance between unknown and known or 
    22 between two unknown values is always 0.5. 
    23  
    24 .. autofunction:: distance_matrix 
    25  
    26 .. class:: Distance 
    27  
    28     .. method:: __call__(instance1, instance2) 
    29  
    30         Return a distance between the given instances (as a floating point number). 
     32- a class that constructs the distance measure based on the 
     33  data (subclass of :obj:`DistanceConstructor`, for 
     34  example :obj:`Euclidean`), and returns is as 
     35- a class that measures the distance between two instances 
     36  (subclass of :obj:`Distance`, for example :obj:`EuclideanDistance`). 
    3137 
    3238.. class:: DistanceConstructor 
     
    3844        not given, instances or distributions can be used. 
    3945 
    40 .. class:: DistanceNormalized 
     46.. class:: Distance 
    4147 
    42     An abstract class that provides normalization. 
     48    .. method:: __call__(instance1, instance2) 
    4349 
    44     .. attribute:: normalizers 
     50        Return a distance between the given instances (as a floating point number). 
    4551 
    46         A precomputed list of normalizing factors for feature values. They are: 
     52Pairwise distances 
     53================== 
    4754 
    48         - 1/(max_value-min_value) for continuous and 1/number_of_values 
    49           for ordinal features. 
    50           If either feature is unknown, the distance is 0.5. Such factors 
    51           are used to multiply differences in feature's values. 
    52         - ``-1`` for nominal features; the distance 
    53           between two values is 0 if they are same (or at least one is 
    54           unknown) and 1 if they are different. 
    55         - ``0`` for ignored features. 
     55.. autofunction:: distance_matrix 
    5656 
    57     .. attribute:: bases, averages, variances 
     57========= 
     58Measures 
     59========= 
    5860 
    59         The minimal values, averages and variances 
    60         (continuous features only). 
    61  
    62     .. attribute:: domain_version 
    63  
    64         The domain version changes each time a domain description is 
    65         changed (i.e. features are added or removed). 
    66  
    67     .. method:: feature_distances(instance1, instance2) 
    68  
    69         Return a list of floats representing normalized distances between 
    70         pairs of feature values of the two instances. 
     61Distance measures are defined with two classes: a subclass of obj:`DistanceConstructor` 
     62and a subclass of :obj:`Distance`. 
    7163 
    7264.. class:: Hamming 
     
    8173    The maximal distance 
    8274    between two feature values. If dist is the result of 
    83     ~:obj:`DistanceNormalized.feature_distances`, 
     75    :obj:`~DistanceNormalized.feature_distances`, 
    8476    then :class:`Maximal` returns ``max(dist)``. 
    8577 
     
    8981    The sum of absolute values 
    9082    of distances between pairs of features, e.g. ``sum(abs(x) for x in dist)`` 
    91     where dist is the result of ~:obj:`DistanceNormalized.feature_distances`. 
     83    where dist is the result of :obj:`~DistanceNormalized.feature_distances`. 
    9284 
    9385.. class:: Euclidean 
     
    9688    The square root of sum of squared per-feature distances, 
    9789    i.e. ``sqrt(sum(x*x for x in dist))``, where dist is the result of 
    98     ~:obj:`DistanceNormalized.feature_distances`. 
     90    :obj:`~DistanceNormalized.feature_distances`. 
    9991 
    10092    .. method:: distributions 
     
    137129    This class is derived directly from :obj:`Distance`. 
    138130 
    139  
    140131.. autoclass:: PearsonR 
    141132    :members: 
     
    150141    :members: 
    151142 
     143.. autoclass:: Mahalanobis 
     144    :members: 
    152145 
     146.. autoclass:: MahalanobisDistance 
     147    :members: 
     148 
     149========= 
     150Utilities 
     151========= 
     152 
     153.. class:: DistanceNormalized 
     154 
     155    An abstract class that provides normalization. 
     156 
     157    .. attribute:: normalizers 
     158 
     159        A precomputed list of normalizing factors for feature values. They are: 
     160 
     161        - 1/(max_value-min_value) for continuous and 1/number_of_values 
     162          for ordinal features. 
     163          If either feature is unknown, the distance is 0.5. Such factors 
     164          are used to multiply differences in feature's values. 
     165        - ``-1`` for nominal features; the distance 
     166          between two values is 0 if they are same (or at least one is 
     167          unknown) and 1 if they are different. 
     168        - ``0`` for ignored features. 
     169 
     170    .. attribute:: bases, averages, variances 
     171 
     172        The minimal values, averages and variances 
     173        (continuous features only). 
     174 
     175    .. attribute:: domain_version 
     176 
     177        The domain version changes each time a domain description is 
     178        changed (i.e. features are added or removed). 
     179 
     180    .. method:: feature_distances(instance1, instance2) 
     181 
     182        Return a list of floats representing normalized distances between 
     183        pairs of feature values of the two instances. 
     184 
     185 
  • docs/reference/rst/Orange.feature.discretization.rst

    r9372 r9812  
    1 .. automodule:: Orange.feature.discretization 
     1.. py:currentmodule:: Orange.feature.discretization 
     2 
     3################################### 
     4Discretization (``discretization``) 
     5################################### 
     6 
     7.. index:: discretization 
     8 
     9.. index:: 
     10   single: feature; discretization 
     11 
     12Continues features can be discretized either one feature at a time, or, as demonstrated in the following script, 
     13using a single discretization method on entire set of data features: 
     14 
     15.. literalinclude:: code/discretization-table.py 
     16 
     17Discretization introduces new categorical features and computes their values in accordance to 
     18selected (or default) discretization method:: 
     19 
     20    Original data set: 
     21    [5.1, 3.5, 1.4, 0.2, 'Iris-setosa'] 
     22    [4.9, 3.0, 1.4, 0.2, 'Iris-setosa'] 
     23    [4.7, 3.2, 1.3, 0.2, 'Iris-setosa'] 
     24 
     25    Discretized data set: 
     26    ['<=5.45', '>3.15', '<=2.45', '<=0.80', 'Iris-setosa'] 
     27    ['<=5.45', '(2.85, 3.15]', '<=2.45', '<=0.80', 'Iris-setosa'] 
     28    ['<=5.45', '>3.15', '<=2.45', '<=0.80', 'Iris-setosa'] 
     29 
     30The following discretization methods are supported: 
     31 
     32* equal width discretization, where the domain of continuous feature is split to intervals of the same 
     33  width equal-sized intervals (:class:`EqualWidth`), 
     34* equal frequency discretization, where each intervals contains equal number of data instances (:class:`EqualFreq`), 
     35* entropy-based, as originally proposed by [FayyadIrani1993]_ that infers the intervals to minimize 
     36  within-interval entropy of class distributions (:class:`Entropy`), 
     37* bi-modal, using three intervals to optimize the difference of the class distribution in 
     38  the middle with the distribution outside it (:class:`BiModal`), 
     39* fixed, with the user-defined cut-off points. 
     40 
     41The above script used the default discretization method (equal frequency with three intervals). This can be changed 
     42as demonstrated below: 
     43 
     44.. literalinclude:: code/discretization-table-method.py 
     45    :lines: 3-5 
     46 
     47With exception to fixed discretization, discretization approaches infer the cut-off points from the 
     48training data set and thus construct a discretizer to convert continuous values of this feature into categorical 
     49value according to the rule found by discretization. In this respect, the discretization behaves similar to 
     50:class:`Orange.classification.Learner`. 
     51 
     52Utility functions 
     53================= 
     54 
     55Some functions and classes that can be used for 
     56categorization of continuous features. Besides several general classes that 
     57can help in this task, we also provide a function that may help in 
     58entropy-based discretization (Fayyad & Irani), and a wrapper around classes for 
     59categorization that can be used for learning. 
     60 
     61.. autoclass:: Orange.feature.discretization.DiscretizedLearner_Class 
     62 
     63.. autoclass:: DiscretizeTable 
     64 
     65.. rubric:: Example 
     66 
     67FIXME. A chapter on `feature subset selection <../ofb/o_fss.htm>`_ in Orange 
     68for Beginners tutorial shows the use of DiscretizedLearner. Other 
     69discretization classes from core Orange are listed in chapter on 
     70`categorization <../ofb/o_categorization.htm>`_ of the same tutorial. 
     71 
     72Discretization Algorithms 
     73========================= 
     74 
     75Instances of discretization classes are all derived from :class:`Discretization`. 
     76 
     77.. class:: Discretization 
     78 
     79    .. method:: __call__(feature, data[, weightID]) 
     80 
     81        Given a continuous ``feature``, ``data`` and, optionally id of 
     82        attribute with example weight, this function returns a discretized 
     83        feature. Argument ``feature`` can be a descriptor, index or 
     84        name of the attribute. 
     85 
     86 
     87.. class:: EqualWidth 
     88 
     89    Discretizes the feature by spliting its domain to a fixed number 
     90    of equal-width intervals. The span of original domain is computed 
     91    from the training data and is defined by the smallest and the 
     92    largest feature value. 
     93 
     94    .. attribute:: n 
     95 
     96        Number of discretization intervals (default: 4). 
     97 
     98The following example discretizes Iris dataset features using six 
     99intervals. The script constructs a :class:`Orange.data.Table` with discretized 
     100features and outputs their description: 
     101 
     102.. literalinclude:: code/discretization.py 
     103    :lines: 38-43 
     104 
     105The output of this script is:: 
     106 
     107    D_sepal length: <<4.90, [4.90, 5.50), [5.50, 6.10), [6.10, 6.70), [6.70, 7.30), >7.30> 
     108    D_sepal width: <<2.40, [2.40, 2.80), [2.80, 3.20), [3.20, 3.60), [3.60, 4.00), >4.00> 
     109    D_petal length: <<1.98, [1.98, 2.96), [2.96, 3.94), [3.94, 4.92), [4.92, 5.90), >5.90> 
     110    D_petal width: <<0.50, [0.50, 0.90), [0.90, 1.30), [1.30, 1.70), [1.70, 2.10), >2.10> 
     111 
     112The cut-off values are hidden in the discretizer and stored in ``attr.get_value_from.transformer``:: 
     113 
     114    >>> for attr in newattrs: 
     115    ...    print "%s: first interval at %5.3f, step %5.3f" % \ 
     116    ...    (attr.name, attr.get_value_from.transformer.first_cut, \ 
     117    ...    attr.get_value_from.transformer.step) 
     118    D_sepal length: first interval at 4.900, step 0.600 
     119    D_sepal width: first interval at 2.400, step 0.400 
     120    D_petal length: first interval at 1.980, step 0.980 
     121    D_petal width: first interval at 0.500, step 0.400 
     122 
     123All discretizers have the method 
     124``construct_variable``: 
     125 
     126.. literalinclude:: code/discretization.py 
     127    :lines: 69-73 
     128 
     129 
     130.. class:: EqualFreq 
     131 
     132    Infers the cut-off points so that the discretization intervals contain 
     133    approximately equal number of training data instances. 
     134 
     135    .. attribute:: n 
     136 
     137        Number of discretization intervals (default: 4). 
     138 
     139The resulting discretizer is of class :class:`IntervalDiscretizer`. Its ``transformer`` includes ``points`` 
     140that store the inferred cut-offs. 
     141 
     142.. class:: Entropy 
     143 
     144    Entropy-based discretization as originally proposed by [FayyadIrani1993]_. The approach infers the most 
     145    appropriate number of intervals by recursively splitting the domain of continuous feature to minimize the 
     146    class-entropy of training examples. The splitting is repeated until the entropy decrease is smaller than the 
     147    increase of minimal descripton length (MDL) induced by the new cut-off point. 
     148 
     149    Entropy-based discretization can reduce a continuous feature into 
     150    a single interval if no suitable cut-off points are found. In this case the new feature is constant and can be 
     151    removed. This discretization can 
     152    therefore also serve for identification of non-informative features and thus used for feature subset selection. 
     153 
     154    .. attribute:: force_attribute 
     155 
     156        Forces the algorithm to induce at least one cut-off point, even when 
     157        its information gain is lower than MDL (default: ``False``). 
     158 
     159Part of :download:`discretization.py <code/discretization.py>`: 
     160 
     161.. literalinclude:: code/discretization.py 
     162    :lines: 77-80 
     163 
     164The output shows that all attributes are discretized onto three intervals:: 
     165 
     166    sepal length: <5.5, 6.09999990463> 
     167    sepal width: <2.90000009537, 3.29999995232> 
     168    petal length: <1.89999997616, 4.69999980927> 
     169    petal width: <0.600000023842, 1.0000004768> 
     170 
     171.. class:: BiModal 
     172 
     173    Infers two cut-off points to optimize the difference of class distribution of data instances in the 
     174    middle and in the other two intervals. The 
     175    difference is scored by chi-square statistics. All possible cut-off 
     176    points are examined, thus the discretization runs in O(n^2). This discretization method is especially suitable 
     177    for the attributes in 
     178    which the middle region corresponds to normal and the outer regions to 
     179    abnormal values of the feature. 
     180 
     181    .. attribute:: split_in_two 
     182 
     183        Decides whether the resulting attribute should have three or two values. 
     184        If ``True`` (default), the feature will be discretized to three intervals and the discretizer 
     185         is of type :class:`BiModalDiscretizer`. If ``False`` the result is the 
     186        ordinary :class:`IntervalDiscretizer`. 
     187 
     188Iris dataset has three-valued class attribute. The figure below, drawn using LOESS probability estimation, shows that 
     189sepal lenghts of versicolors are between lengths of setosas and virginicas. 
     190 
     191.. image:: files/bayes-iris.gif 
     192 
     193If we merge classes setosa and virginica, we can observe if 
     194the bi-modal discretization would correctly recognize the interval in 
     195which versicolors dominate. The following scripts peforms the merging and construction of new data set with class 
     196that reports if iris is versicolor or not. 
     197 
     198.. literalinclude:: code/discretization.py 
     199    :lines: 84-87 
     200 
     201The following script implements the discretization: 
     202 
     203.. literalinclude:: code/discretization.py 
     204    :lines: 97-100 
     205 
     206The middle intervals are printed:: 
     207 
     208    sepal length: (5.400, 6.200] 
     209    sepal width: (2.000, 2.900] 
     210    petal length: (1.900, 4.700] 
     211    petal width: (0.600, 1.600] 
     212 
     213Judging by the graph, the cut-off points inferred by discretization for "sepal length" make sense. 
     214 
     215Discretizers 
     216============ 
     217 
     218Discretizers construct a categorical feature from the continuous feature according to the method they implement and 
     219its parameters. The most general is 
     220:class:`IntervalDiscretizer` that is also used by most discretization 
     221methods. Two other discretizers, :class:`EquiDistDiscretizer` and 
     222:class:`ThresholdDiscretizer`> could easily be replaced by 
     223:class:`IntervalDiscretizer` but are used for speed and simplicity. 
     224The fourth discretizer, :class:`BiModalDiscretizer` is specialized 
     225for discretizations induced by :class:`BiModalDiscretization`. 
     226 
     227.. class:: Discretizer 
     228 
     229    A superclass implementing the construction of a new 
     230    attribute from an existing one. 
     231 
     232    .. method:: construct_variable(feature) 
     233 
     234        Constructs a descriptor for a new feature. The new feature's name is equal to ``feature.name`` 
     235        prefixed by "D\_". Its symbolic values are discretizer specific. 
     236 
     237.. class:: IntervalDiscretizer 
     238 
     239    Discretizer defined with a set of cut-off points. 
     240 
     241    .. attribute:: points 
     242 
     243        The cut-off points; feature values below or equal to the first point will be mapped to the first interval, 
     244        those between the first and the second point 
     245        (including those equal to the second) are mapped to the second interval and 
     246        so forth to the last interval which covers all values greater than 
     247        the last value in ``points``. The number of intervals is thus 
     248        ``len(points)+1``. 
     249 
     250The script that follows is an examples of a manual construction of a discretizer with cut-off points 
     251at 3.0 and 5.0: 
     252 
     253.. literalinclude:: code/discretization.py 
     254    :lines: 22-26 
     255 
     256First five data instances of ``data2`` are:: 
     257 
     258    [5.1, '>5.00', 'Iris-setosa'] 
     259    [4.9, '(3.00, 5.00]', 'Iris-setosa'] 
     260    [4.7, '(3.00, 5.00]', 'Iris-setosa'] 
     261    [4.6, '(3.00, 5.00]', 'Iris-setosa'] 
     262    [5.0, '(3.00, 5.00]', 'Iris-setosa'] 
     263 
     264The same discretizer can be used on several features by calling the function construct_var: 
     265 
     266.. literalinclude:: code/discretization.py 
     267    :lines: 30-34 
     268 
     269Each feature has its own instance of :class:`ClassifierFromVar` stored in 
     270``get_value_from``, but all use the same :class:`IntervalDiscretizer`, 
     271``idisc``. Changing any element of its ``points`` affect all attributes. 
     272 
     273.. note:: 
     274 
     275    The length of :obj:`~IntervalDiscretizer.points` should not be changed if the 
     276    discretizer is used by any attribute. The length of 
     277    :obj:`~IntervalDiscretizer.points` should always match the number of values 
     278    of the feature, which is determined by the length of the attribute's field 
     279    ``values``. If ``attr`` is a discretized attribute, than ``len(attr.values)`` must equal 
     280    ``len(attr.get_value_from.transformer.points)+1``. 
     281 
     282 
     283.. class:: EqualWidthDiscretizer 
     284 
     285    Discretizes to intervals of the fixed width. All values lower than :obj:`~EquiDistDiscretizer.first_cut` are mapped to the first 
     286    interval. Otherwise, value ``val``'s interval is ``floor((val-first_cut)/step)``. Possible overflows are mapped to the 
     287    last intervals. 
     288 
     289 
     290    .. attribute:: first_cut 
     291 
     292        The first cut-off point. 
     293 
     294    .. attribute:: step 
     295 
     296        Width of the intervals. 
     297 
     298    .. attribute:: n 
     299 
     300        Number of the intervals. 
     301 
     302    .. attribute:: points (read-only) 
     303 
     304        The cut-off points; this is not a real attribute although it behaves 
     305        as one. Reading it constructs a list of cut-off points and returns it, 
     306        but changing the list doesn't affect the discretizer. Only present to provide 
     307        the :obj:`EquiDistDiscretizer` the same interface as that of 
     308        :obj:`IntervalDiscretizer`. 
     309 
     310 
     311.. class:: ThresholdDiscretizer 
     312 
     313    Threshold discretizer converts continuous values into binary by comparing 
     314    them to a fixed threshold. Orange uses this discretizer for 
     315    binarization of continuous attributes in decision trees. 
     316 
     317    .. attribute:: threshold 
     318 
     319        The value threshold; values below or equal to the threshold belong to the first 
     320        interval and those that are greater go to the second. 
     321 
     322 
     323.. class:: BiModalDiscretizer 
     324 
     325    Bimodal discretizer has two cut off points and values are 
     326    discretized according to whether or not they belong to the region between these points 
     327    which includes the lower but not the upper boundary. The 
     328    discretizer is returned by :class:`BiModalDiscretization` if its 
     329    field :obj:`~BiModalDiscretization.split_in_two` is true (the default). 
     330 
     331    .. attribute:: low 
     332 
     333        Lower boundary of the interval (included in the interval). 
     334 
     335    .. attribute:: high 
     336 
     337        Upper boundary of the interval (not included in the interval). 
     338 
     339 
     340Implementational details 
     341======================== 
     342 
     343Consider a following example (part of :download:`discretization.py <code/discretization.py>`): 
     344 
     345.. literalinclude:: code/discretization.py 
     346    :lines: 7-15 
     347 
     348The discretized attribute ``sep_w`` is constructed with a call to 
     349:class:`Entropy`; instead of constructing it and calling 
     350it afterwards, we passed the arguments for calling to the constructor. We then constructed a new 
     351:class:`Orange.data.Table` with attributes "sepal width" (the original 
     352continuous attribute), ``sep_w`` and the class attribute:: 
     353 
     354    Entropy discretization, first 5 data instances 
     355    [3.5, '>3.30', 'Iris-setosa'] 
     356    [3.0, '(2.90, 3.30]', 'Iris-setosa'] 
     357    [3.2, '(2.90, 3.30]', 'Iris-setosa'] 
     358    [3.1, '(2.90, 3.30]', 'Iris-setosa'] 
     359    [3.6, '>3.30', 'Iris-setosa'] 
     360 
     361The name of the new categorical variable derives from the name of original continuous variable by adding a prefix 
     362"D_". The values of the new attributes are computed automatically when they are needed using a transformation function 
     363:obj:`~Orange.data.variable.Variable.get_value_from` (see :class:`Orange.data.variable.Variable`) which encodes the 
     364discretization:: 
     365 
     366    >>> sep_w 
     367    EnumVariable 'D_sepal width' 
     368    >>> sep_w.get_value_from 
     369    <ClassifierFromVar instance at 0x01BA7DC0> 
     370    >>> sep_w.get_value_from.whichVar 
     371    FloatVariable 'sepal width' 
     372    >>> sep_w.get_value_from.transformer 
     373    <IntervalDiscretizer instance at 0x01BA2100> 
     374    >>> sep_w.get_value_from.transformer.points 
     375    <2.90000009537, 3.29999995232> 
     376 
     377The ``select`` statement in the discretization script converted all data instances 
     378from ``data`` to the new domain. This includes a new feature 
     379``sep_w`` whose values are computed on the fly by calling ``sep_w.get_value_from`` for each data instance. 
     380The original, continuous sepal width 
     381is passed to the ``transformer`` that determines the interval by its field 
     382``points``. Transformer returns the discrete value which is in turn returned 
     383by ``get_value_from`` and stored in the new example. 
     384 
     385References 
     386========== 
     387 
     388.. [FayyadIrani1993] UM Fayyad and KB Irani. Multi-interval discretization of continuous valued 
     389  attributes for classification learning. In Proc. 13th International Joint Conference on Artificial Intelligence, pages 
     390  1022--1029, Chambery, France, 1993.