Index: Orange/evaluation/scoring.py
===================================================================
 Orange/evaluation/scoring.py (revision 9725)
+++ Orange/evaluation/scoring.py (revision 9892)
@@ 1,452 +1,2 @@
"""
############################
Method scoring (``scoring``)
############################

.. index: scoring

This module contains various measures of quality for classification and
regression. Most functions require an argument named :obj:`res`, an instance of
:class:`Orange.evaluation.testing.ExperimentResults` as computed by
functions from :mod:`Orange.evaluation.testing` and which contains
predictions obtained through crossvalidation,
leave oneout, testing on training data or test set instances.

==============
Classification
==============

To prepare some data for examples on this page, we shall load the voting data
set (problem of predicting the congressman's party (republican, democrat)
based on a selection of votes) and evaluate naive Bayesian learner,
classification trees and majority classifier using crossvalidation.
For examples requiring a multivalued class problem, we shall do the same
with the vehicle data set (telling whether a vehicle described by the features
extracted from a picture is a van, bus, or Opel or Saab car).

Basic cross validation example is shown in the following part of
(:download:`statExamples.py `, uses :download:`voting.tab ` and :download:`vehicle.tab `):

.. literalinclude:: code/statExample0.py

If instances are weighted, weights are taken into account. This can be
disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of
disabling weights is to clear the
:class:`Orange.evaluation.testing.ExperimentResults`' flag weights.

General Measures of Quality
===========================

.. autofunction:: CA

.. autofunction:: AP

.. autofunction:: Brier_score

.. autofunction:: IS

So, let's compute all this in part of
(:download:`statExamples.py `, uses :download:`voting.tab ` and :download:`vehicle.tab `) and print it out:

.. literalinclude:: code/statExample1.py
 :lines: 13

The output should look like this::

 method CA AP Brier IS
 bayes 0.903 0.902 0.175 0.759
 tree 0.846 0.845 0.286 0.641
 majrty 0.614 0.526 0.474 0.000

Script :download:`statExamples.py ` contains another example that also prints out
the standard errors.

Confusion Matrix
================

.. autofunction:: confusion_matrices

 **A positivenegative confusion matrix** is computed (a) if the class is
 binary unless :obj:`classIndex` argument is 2, (b) if the class is
 multivalued and the :obj:`classIndex` is nonnegative. Argument
 :obj:`classIndex` then tells which class is positive. In case (a),
 :obj:`classIndex` may be omitted; the first class
 is then negative and the second is positive, unless the :obj:`baseClass`
 attribute in the object with results has nonnegative value. In that case,
 :obj:`baseClass` is an index of the target class. :obj:`baseClass`
 attribute of results object should be set manually. The result of a
 function is a list of instances of class :class:`ConfusionMatrix`,
 containing the (weighted) number of true positives (TP), false
 negatives (FN), false positives (FP) and true negatives (TN).

 We can also add the keyword argument :obj:`cutoff`
 (e.g. confusion_matrices(results, cutoff=0.3); if we do, :obj:`confusion_matrices`
 will disregard the classifiers' class predictions and observe the predicted
 probabilities, and consider the prediction "positive" if the predicted
 probability of the positive class is higher than the :obj:`cutoff`.

 The example (part of :download:`statExamples.py `) below shows how setting the
 cut off threshold from the default 0.5 to 0.2 affects the confusion matrics
 for naive Bayesian classifier::

 cm = Orange.evaluation.scoring.confusion_matrices(res)[0]
 print "Confusion matrix for naive Bayes:"
 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)

 cm = Orange.evaluation.scoring.confusion_matrices(res, cutoff=0.2)[0]
 print "Confusion matrix for naive Bayes:"
 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)

 The output::

 Confusion matrix for naive Bayes:
 TP: 238, FP: 13, FN: 29.0, TN: 155
 Confusion matrix for naive Bayes:
 TP: 239, FP: 18, FN: 28.0, TN: 150

 shows that the number of true positives increases (and hence the number of
 false negatives decreases) by only a single instance, while five instances
 that were originally true negatives become false positives due to the
 lower threshold.

 To observe how good are the classifiers in detecting vans in the vehicle
 data set, we would compute the matrix like this::

 cm = Orange.evaluation.scoring.confusion_matrices(resVeh, \
vehicle.domain.classVar.values.index("van"))

 and get the results like these::

 TP: 189, FP: 241, FN: 10.0, TN: 406

 while the same for class "opel" would give::

 TP: 86, FP: 112, FN: 126.0, TN: 522

 The main difference is that there are only a few false negatives for the
 van, meaning that the classifier seldom misses it (if it says it's not a
 van, it's almost certainly not a van). Not so for the Opel car, where the
 classifier missed 126 of them and correctly detected only 86.

 **General confusion matrix** is computed (a) in case of a binary class,
 when :obj:`classIndex` is set to 2, (b) when we have multivalued class and
 the caller doesn't specify the :obj:`classIndex` of the positive class.
 When called in this manner, the function cannot use the argument
 :obj:`cutoff`.

 The function then returns a threedimensional matrix, where the element
 A[:obj:`learner`][:obj:`actual_class`][:obj:`predictedClass`]
 gives the number of instances belonging to 'actual_class' for which the
 'learner' predicted 'predictedClass'. We shall compute and print out
 the matrix for naive Bayesian classifier.

 Here we see another example from :download:`statExamples.py `::

 cm = Orange.evaluation.scoring.confusion_matrices(resVeh)[0]
 classes = vehicle.domain.classVar.values
 print "\t"+"\t".join(classes)
 for className, classConfusions in zip(classes, cm):
 print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions))

 So, here's what this nice piece of code gives::

 bus van saab opel
 bus 56 95 21 46
 van 6 189 4 0
 saab 3 75 73 66
 opel 4 71 51 86

 Van's are clearly simple: 189 vans were classified as vans (we know this
 already, we've printed it out above), and the 10 misclassified pictures
 were classified as buses (6) and Saab cars (4). In all other classes,
 there were more instances misclassified as vans than correctly classified
 instances. The classifier is obviously quite biased to vans.

 .. method:: sens(confm)
 .. method:: spec(confm)
 .. method:: PPV(confm)
 .. method:: NPV(confm)
 .. method:: precision(confm)
 .. method:: recall(confm)
 .. method:: F2(confm)
 .. method:: Falpha(confm, alpha=2.0)
 .. method:: MCC(conf)

 With the confusion matrix defined in terms of positive and negative
 classes, you can also compute the
 `sensitivity `_
 [TP/(TP+FN)], `specificity \
`_
 [TN/(TN+FP)], `positive predictive value \
`_
 [TP/(TP+FP)] and `negative predictive value \
`_ [TN/(TN+FN)].
 In information retrieval, positive predictive value is called precision
 (the ratio of the number of relevant records retrieved to the total number
 of irrelevant and relevant records retrieved), and sensitivity is called
 `recall `_
 (the ratio of the number of relevant records retrieved to the total number
 of relevant records in the database). The
 `harmonic mean `_ of precision
 and recall is called an
 `Fmeasure `_, where, depending
 on the ratio of the weight between precision and recall is implemented
 as F1 [2*precision*recall/(precision+recall)] or, for a general case,
 Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)].
 The `Matthews correlation coefficient \
`_
 in essence a correlation coefficient between
 the observed and predicted binary classifications; it returns a value
 between 1 and +1. A coefficient of +1 represents a perfect prediction,
 0 an average random prediction and 1 an inverse prediction.

 If the argument :obj:`confm` is a single confusion matrix, a single
 result (a number) is returned. If confm is a list of confusion matrices,
 a list of scores is returned, one for each confusion matrix.

 Note that weights are taken into account when computing the matrix, so
 these functions don't check the 'weighted' keyword argument.

 Let us print out sensitivities and specificities of our classifiers in
 part of :download:`statExamples.py `::

 cm = Orange.evaluation.scoring.confusion_matrices(res)
 print
 print "method\tsens\tspec"
 for l in range(len(learners)):
 print "%s\t%5.3f\t%5.3f" % (learners[l].name, Orange.evaluation.scoring.sens(cm[l]), Orange.evaluation.scoring.spec(cm[l]))

ROC Analysis
============

`Receiver Operating Characteristic \
`_
(ROC) analysis was initially developed for
a binarylike problems and there is no consensus on how to apply it in
multiclass problems, nor do we know for sure how to do ROC analysis after
cross validation and similar multiple sampling techniques. If you are
interested in the area under the curve, function AUC will deal with those
problems as specifically described below.

.. autofunction:: AUC

 .. attribute:: AUC.ByWeightedPairs (or 0)

 Computes AUC for each pair of classes (ignoring instances of all other
 classes) and averages the results, weighting them by the number of
 pairs of instances from these two classes (e.g. by the product of
 probabilities of the two classes). AUC computed in this way still
 behaves as concordance index, e.g., gives the probability that two
 randomly chosen instances from different classes will be correctly
 recognized (this is of course true only if the classifier knows
 from which two classes the instances came).

 .. attribute:: AUC.ByPairs (or 1)

 Similar as above, except that the average over class pairs is not
 weighted. This AUC is, like the binary, independent of class
 distributions, but it is not related to concordance index any more.

 .. attribute:: AUC.WeightedOneAgainstAll (or 2)

 For each class, it computes AUC for this class against all others (that
 is, treating other classes as one class). The AUCs are then averaged by
 the class probabilities. This is related to concordance index in which
 we test the classifier's (average) capability for distinguishing the
 instances from a specified class from those that come from other classes.
 Unlike the binary AUC, the measure is not independent of class
 distributions.

 .. attribute:: AUC.OneAgainstAll (or 3)

 As above, except that the average is not weighted.

 In case of multiple folds (for instance if the data comes from cross
 validation), the computation goes like this. When computing the partial
 AUCs for individual pairs of classes or singledout classes, AUC is
 computed for each fold separately and then averaged (ignoring the number
 of instances in each fold, it's just a simple average). However, if a
 certain fold doesn't contain any instances of a certain class (from the
 pair), the partial AUC is computed treating the results as if they came
 from a singlefold. This is not really correct since the class
 probabilities from different folds are not necessarily comparable,
 yet this will most often occur in a leaveoneout experiments,
 comparability shouldn't be a problem.

 Computing and printing out the AUC's looks just like printing out
 classification accuracies (except that we call AUC instead of
 CA, of course)::

 AUCs = Orange.evaluation.scoring.AUC(res)
 for l in range(len(learners)):
 print "%10s: %5.3f" % (learners[l].name, AUCs[l])

 For vehicle, you can run exactly this same code; it will compute AUCs
 for all pairs of classes and return the average weighted by probabilities
 of pairs. Or, you can specify the averaging method yourself, like this::

 AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll)

 The following snippet tries out all four. (We don't claim that this is
 how the function needs to be used; it's better to stay with the default.)::

 methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"]
 print " " *25 + " \tbayes\ttree\tmajority"
 for i in range(4):
 AUCs = Orange.evaluation.scoring.AUC(resVeh, i)
 print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs))

 As you can see from the output::

 bayes tree majority
 by pairs, weighted: 0.789 0.871 0.500
 by pairs: 0.791 0.872 0.500
 one vs. all, weighted: 0.783 0.800 0.500
 one vs. all: 0.783 0.800 0.500

.. autofunction:: AUC_single

.. autofunction:: AUC_pair

.. autofunction:: AUC_matrix

The remaining functions, which plot the curves and statistically compare
them, require that the results come from a test with a single iteration,
and they always compare one chosen class against all others. If you have
cross validation results, you can either use split_by_iterations to split the
results by folds, call the function for each fold separately and then sum
the results up however you see fit, or you can set the ExperimentResults'
attribute number_of_iterations to 1, to cheat the function  at your own
responsibility for the statistical correctness. Regarding the multiclass
problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class
attribute's baseValue at the time when results were computed. If baseValue
was not given at that time, 1 (that is, the second class) is used as default.

We shall use the following code to prepare suitable experimental results::

 ri2 = Orange.core.MakeRandomIndices2(voting, 0.6)
 train = voting.selectref(ri2, 0)
 test = voting.selectref(ri2, 1)
 res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test)


.. autofunction:: AUCWilcoxon

.. autofunction:: compute_ROC

Comparison of Algorithms


.. autofunction:: McNemar

.. autofunction:: McNemar_of_two

==========
Regression
==========

General Measure of Quality
==========================

Several alternative measures, as given below, can be used to evaluate
the sucess of numeric prediction:

.. image:: files/statRegression.png

.. autofunction:: MSE

.. autofunction:: RMSE

.. autofunction:: MAE

.. autofunction:: RSE

.. autofunction:: RRSE

.. autofunction:: RAE

.. autofunction:: R2

The following code (:download:`statExamples.py `) uses most of the above measures to
score several regression methods.

.. literalinclude:: code/statExamplesRegression.py

The code above produces the following output::

 Learner MSE RMSE MAE RSE RRSE RAE R2
 maj 84.585 9.197 6.653 1.002 1.001 1.001 0.002
 rt 40.015 6.326 4.592 0.474 0.688 0.691 0.526
 knn 21.248 4.610 2.870 0.252 0.502 0.432 0.748
 lr 24.092 4.908 3.425 0.285 0.534 0.515 0.715

=================
Ploting functions
=================

.. autofunction:: graph_ranks

The following script (:download:`statExamplesGraphRanks.py `) shows hot to plot a graph:

.. literalinclude:: code/statExamplesGraphRanks.py

Code produces the following graph:

.. image:: files/statExamplesGraphRanks1.png

.. autofunction:: compute_CD

.. autofunction:: compute_friedman

=================
Utility Functions
=================

.. autofunction:: split_by_iterations

=====================================
Scoring for multilabel classification
=====================================

Multilabel classification requries different metrics than those used in traditional singlelabel
classification. This module presents the various methrics that have been proposed in the literature.
Let :math:`D` be a multilabel evaluation data set, conisting of :math:`D` multilabel examples
:math:`(x_i,Y_i)`, :math:`i=1..D`, :math:`Y_i \\subseteq L`. Let :math:`H` be a multilabel classifier
and :math:`Z_i=H(x_i)` be the set of labels predicted by :math:`H` for example :math:`x_i`.

.. autofunction:: mlc_hamming_loss
.. autofunction:: mlc_accuracy
.. autofunction:: mlc_precision
.. autofunction:: mlc_recall

So, let's compute all this and print it out (part of
:download:`mlcevaluate.py `, uses
:download:`emotions.tab `):

.. literalinclude:: code/mlcevaluate.py
 :lines: 115

The output should look like this::

 loss= [0.9375]
 accuracy= [0.875]
 precision= [1.0]
 recall= [0.875]

References
==========

Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multilabel scene classification',
Pattern Recogintion, vol.37, no.9, pp:175771

Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multilabeled Classification', paper
presented to Proceedings of the 8th PacificAsia Conference on Knowledge Discovery and Data Mining
(PAKDD 2004)

Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bosstingbased system for text categorization',
Machine Learning, vol.39, no.2/3, pp:13568.

"""

import operator, math
from operator import add
@@ 455,5 +5,5 @@
import Orange
from Orange import statc

+from Orange.misc import deprecated_keywords
#### Private stuff
@@ 533,14 +83,18 @@
def statistics_by_folds(stats, foldN, reportSE, iterationIsOuter):
+@deprecated_keywords({
+ "foldN": "fold_n",
+ "reportSE": "report_se",
+ "iterationIsOuter": "iteration_is_outer"})
+def statistics_by_folds(stats, fold_n, report_se, iteration_is_outer):
# remove empty folds, turn the matrix so that learner is outer
 if iterationIsOuter:
+ if iteration_is_outer:
if not stats:
raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
number_of_learners = len(stats[0])
 stats = filter(lambda (x, fN): fN>0.0, zip(stats,foldN))
+ stats = filter(lambda (x, fN): fN>0.0, zip(stats,fold_n))
stats = [ [x[lrn]/fN for x, fN in stats] for lrn in range(number_of_learners)]
else:
 stats = [ [x/Fn for x, Fn in filter(lambda (x, Fn): Fn > 0.0, zip(lrnD, foldN))] for lrnD in stats]
+ stats = [ [x/Fn for x, Fn in filter(lambda (x, Fn): Fn > 0.0, zip(lrnD, fold_n))] for lrnD in stats]
if not stats:
@@ 549,5 +103,5 @@
raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
 if reportSE:
+ if report_se:
return [(statc.mean(x), statc.sterr(x)) for x in stats]
else:
@@ 751,5 +305,6 @@
# Scores for evaluation of classifiers
def CA(res, reportSE = False, **argkw):
+@deprecated_keywords({"reportSE": "report_se"})
+def CA(res, report_se = False, **argkw):
""" Computes classification accuracy, i.e. percentage of matches between
predicted and actual class. The function returns a list of classification
@@ 793,5 +348,5 @@
ca = [x/totweight for x in CAs]
 if reportSE:
+ if report_se:
return [(x, x*(1x)/math.sqrt(totweight)) for x in ca]
else:
@@ 813,5 +368,5 @@
foldN[tex.iteration_number] += tex.weight
 return statistics_by_folds(CAsByFold, foldN, reportSE, False)
+ return statistics_by_folds(CAsByFold, foldN, report_se, False)
@@ 820,6 +375,6 @@
return CA(res, True, **argkw)

def AP(res, reportSE = False, **argkw):
+@deprecated_keywords({"reportSE": "report_se"})
+def AP(res, report_se = False, **argkw):
""" Computes the average probability assigned to the correct class. """
if res.number_of_iterations == 1:
@@ 848,8 +403,9 @@
foldN[tex.iteration_number] += tex.weight
 return statistics_by_folds(APsByFold, foldN, reportSE, True)


def Brier_score(res, reportSE = False, **argkw):
+ return statistics_by_folds(APsByFold, foldN, report_se, True)
+
+
+@deprecated_keywords({"reportSE": "report_se"})
+def Brier_score(res, report_se = False, **argkw):
""" Computes the Brier's score, defined as the average (over test examples)
of sumx(t(x)p(x))2, where x is a class, t(x) is 1 for the correct class
@@ 881,5 +437,5 @@
totweight = gettotweight(res)
check_non_zero(totweight)
 if reportSE:
+ if report_se:
return [(max(x/totweight+1.0, 0), 0) for x in MSEs] ## change this, not zero!!!
else:
@@ 900,6 +456,6 @@
foldN[tex.iteration_number] += tex.weight
 stats = statistics_by_folds(BSs, foldN, reportSE, True)
 if reportSE:
+ stats = statistics_by_folds(BSs, foldN, report_se, True)
+ if report_se:
return [(x+1.0, y) for x, y in stats]
else:
@@ 915,6 +471,8 @@
else:
return (log2(1P)+log2(1Pc))

def IS(res, apriori=None, reportSE = False, **argkw):
+
+
+@deprecated_keywords({"reportSE": "report_se"})
+def IS(res, apriori=None, report_se = False, **argkw):
""" Computes the information score as defined by
`Kononenko and Bratko (1991) \
@@ 941,5 +499,5 @@
ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight
totweight = gettotweight(res)
 if reportSE:
+ if report_se:
return [(IS/totweight,0) for IS in ISs]
else:
@@ 964,5 +522,5 @@
foldN[tex.iteration_number] += tex.weight
 return statistics_by_folds(ISs, foldN, reportSE, False)
+ return statistics_by_folds(ISs, foldN, report_se, False)
@@ 1026,5 +584,6 @@
def confusion_matrices(res, classIndex=1, **argkw):
+@deprecated_keywords({"classIndex": "class_index"})
+def confusion_matrices(res, class_index=1, **argkw):
""" This function can compute two different forms of confusion matrix:
one in which a certain class is marked as positive and the other(s)
@@ 1035,7 +594,7 @@
tfpns = [ConfusionMatrix() for i in range(res.number_of_learners)]
 if classIndex<0:
+ if class_index<0:
numberOfClasses = len(res.class_values)
 if classIndex < 1 or numberOfClasses > 2:
+ if class_index < 1 or numberOfClasses > 2:
cm = [[[0.0] * numberOfClasses for i in range(numberOfClasses)] for l in range(res.number_of_learners)]
if argkw.get("unweighted", 0) or not res.weights:
@@ 1056,7 +615,7 @@
elif res.baseClass>=0:
 classIndex = res.baseClass
 else:
 classIndex = 1
+ class_index = res.baseClass
+ else:
+ class_index = 1
cutoff = argkw.get("cutoff")
@@ 1064,23 +623,23 @@
if argkw.get("unweighted", 0) or not res.weights:
for lr in res.results:
 isPositive=(lr.actual_class==classIndex)
+ isPositive=(lr.actual_class==class_index)
for i in range(res.number_of_learners):
 tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive)
+ tfpns[i].addTFPosNeg(lr.probabilities[i][class_index]>cutoff, isPositive)
else:
for lr in res.results:
 isPositive=(lr.actual_class==classIndex)
+ isPositive=(lr.actual_class==class_index)
for i in range(res.number_of_learners):
 tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive, lr.weight)
+ tfpns[i].addTFPosNeg(lr.probabilities[i][class_index]>cutoff, isPositive, lr.weight)
else:
if argkw.get("unweighted", 0) or not res.weights:
for lr in res.results:
 isPositive=(lr.actual_class==classIndex)
+ isPositive=(lr.actual_class==class_index)
for i in range(res.number_of_learners):
 tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive)
+ tfpns[i].addTFPosNeg(lr.classes[i]==class_index, isPositive)
else:
for lr in res.results:
 isPositive=(lr.actual_class==classIndex)
+ isPositive=(lr.actual_class==class_index)
for i in range(res.number_of_learners):
 tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive, lr.weight)
+ tfpns[i].addTFPosNeg(lr.classes[i]==class_index, isPositive, lr.weight)
return tfpns
@@ 1090,13 +649,14 @@
def confusion_chi_square(confusionMatrix):
 dim = len(confusionMatrix)
 rowPriors = [sum(r) for r in confusionMatrix]
 colPriors = [sum([r[i] for r in confusionMatrix]) for i in range(dim)]
+@deprecated_keywords({"confusionMatrix": "confusion_matrix"})
+def confusion_chi_square(confusion_matrix):
+ dim = len(confusion_matrix)
+ rowPriors = [sum(r) for r in confusion_matrix]
+ colPriors = [sum([r[i] for r in confusion_matrix]) for i in range(dim)]
total = sum(rowPriors)
rowPriors = [r/total for r in rowPriors]
colPriors = [r/total for r in colPriors]
ss = 0
 for ri, row in enumerate(confusionMatrix):
+ for ri, row in enumerate(confusion_matrix):
for ci, o in enumerate(row):
e = total * rowPriors[ri] * colPriors[ci]
@@ 1229,5 +789,7 @@
return r
def scotts_pi(confm, bIsListOfMatrices=True):
+
+@deprecated_keywords({"bIsListOfMatrices": "b_is_list_of_matrices"})
+def scotts_pi(confm, b_is_list_of_matrices=True):
"""Compute Scott's Pi for measuring interrater agreement for nominal data
@@ 1240,5 +802,5 @@
Orange.evaluation.scoring.compute_confusion_matrices and set the
classIndex parameter to 2.
 @param bIsListOfMatrices: specifies whether confm is list of matrices.
+ @param b_is_list_of_matrices: specifies whether confm is list of matrices.
This function needs to operate on nonbinary
confusion matrices, which are represented by python
@@ 1247,7 +809,7 @@
"""
 if bIsListOfMatrices:
+ if b_is_list_of_matrices:
try:
 return [scotts_pi(cm, bIsListOfMatrices=False) for cm in confm]
+ return [scotts_pi(cm, b_is_list_of_matrices=False) for cm in confm]
except TypeError:
# Nevermind the parameter, maybe this is a "conventional" binary
@@ 1276,5 +838,6 @@
return ret
def AUCWilcoxon(res, classIndex=1, **argkw):
+@deprecated_keywords({"classIndex": "class_index"})
+def AUCWilcoxon(res, class_index=1, **argkw):
""" Computes the area under ROC (AUC) and its standard error using
Wilcoxon's approach proposed by Hanley and McNeal (1982). If
@@ 1285,5 +848,5 @@
import corn
useweights = res.weights and not argkw.get("unweighted", 0)
 problists, tots = corn.computeROCCumulative(res, classIndex, useweights)
+ problists, tots = corn.computeROCCumulative(res, class_index, useweights)
results=[]
@@ 1313,17 +876,20 @@
AROC = AUCWilcoxon # for backward compatibility, AROC is obsolote
def compare_2_AUCs(res, lrn1, lrn2, classIndex=1, **argkw):
+
+@deprecated_keywords({"classIndex": "class_index"})
+def compare_2_AUCs(res, lrn1, lrn2, class_index=1, **argkw):
import corn
 return corn.compare2ROCs(res, lrn1, lrn2, classIndex, res.weights and not argkw.get("unweighted"))
+ return corn.compare2ROCs(res, lrn1, lrn2, class_index, res.weights and not argkw.get("unweighted"))
compare_2_AROCs = compare_2_AUCs # for backward compatibility, compare_2_AROCs is obsolote

def compute_ROC(res, classIndex=1):
+
+@deprecated_keywords({"classIndex": "class_index"})
+def compute_ROC(res, class_index=1):
""" Computes a ROC curve as a list of (x, y) tuples, where x is
1specificity and y is sensitivity.
"""
import corn
 problists, tots = corn.computeROCCumulative(res, classIndex)
+ problists, tots = corn.computeROCCumulative(res, class_index)
results = []
@@ 1357,5 +923,7 @@
return (P1y  P2y) / (P1x  P2x)
def ROC_add_point(P, R, keepConcavities=1):
+
+@deprecated_keywords({"keepConcavities": "keep_concavities"})
+def ROC_add_point(P, R, keep_concavities=1):
if keepConcavities:
R.append(P)
@@ 1374,7 +942,10 @@
return R
def TC_compute_ROC(res, classIndex=1, keepConcavities=1):
+
+@deprecated_keywords({"classIndex": "class_index",
+ "keepConcavities": "keep_concavities"})
+def TC_compute_ROC(res, class_index=1, keep_concavities=1):
import corn
 problists, tots = corn.computeROCCumulative(res, classIndex)
+ problists, tots = corn.computeROCCumulative(res, class_index)
results = []
@@ 1399,5 +970,5 @@
else:
fpr = 0.0
 curve = ROC_add_point((fpr, tpr, fPrev), curve, keepConcavities)
+ curve = ROC_add_point((fpr, tpr, fPrev), curve, keep_concavities)
fPrev = f
thisPos, thisNeg = prob[1][1], prob[1][0]
@@ 1412,5 +983,5 @@
else:
fpr = 0.0
 curve = ROC_add_point((fpr, tpr, f), curve, keepConcavities) ## ugly
+ curve = ROC_add_point((fpr, tpr, f), curve, keep_concavities) ## ugly
results.append(curve)
@@ 1472,5 +1043,6 @@
## for each (sub)set of input ROC curves
## returns the average ROC curve and an array of (vertical) standard deviations
def TC_vertical_average_ROC(ROCcurves, samples = 10):
+@deprecated_keywords({"ROCcurves": "roc_curves"})
+def TC_vertical_average_ROC(roc_curves, samples = 10):
def INTERPOLATE((P1x, P1y, P1fscore), (P2x, P2y, P2fscore), X):
if (P1x == P2x) or ((X > P1x) and (X > P2x)) or ((X < P1x) and (X < P2x)):
@@ 1501,5 +1073,5 @@
average = []
stdev = []
 for ROCS in ROCcurves:
+ for ROCS in roc_curves:
npts = []
for c in ROCS:
@@ 1531,5 +1103,6 @@
## for each (sub)set of input ROC curves
## returns the average ROC curve, an array of vertical standard deviations and an array of horizontal standard deviations
def TC_threshold_average_ROC(ROCcurves, samples = 10):
+@deprecated_keywords({"ROCcurves": "roc_curves"})
+def TC_threshold_average_ROC(roc_curves, samples = 10):
def POINT_AT_THRESH(ROC, npts, thresh):
i = 0
@@ 1545,5 +1118,5 @@
stdevV = []
stdevH = []
 for ROCS in ROCcurves:
+ for ROCS in roc_curves:
npts = []
for c in ROCS:
@@ 1596,5 +1169,6 @@
##  yesClassRugPoints is an array of (x, 1) points
##  noClassRugPoints is an array of (x, 0) points
def compute_calibration_curve(res, classIndex=1):
+@deprecated_keywords({"classIndex": "class_index"})
+def compute_calibration_curve(res, class_index=1):
import corn
## merge multiple iterations into one
@@ 1603,5 +1177,5 @@
mres.results.append( te )
 problists, tots = corn.computeROCCumulative(mres, classIndex)
+ problists, tots = corn.computeROCCumulative(mres, class_index)
results = []
@@ 1658,5 +1232,6 @@
## returns an array of curve elements, where:
##  curve is an array of points ((TP+FP)/(P + N), TP/P, (th, FP/N)) on the Lift Curve
def compute_lift_curve(res, classIndex=1):
+@deprecated_keywords({"classIndex": "class_index"})
+def compute_lift_curve(res, class_index=1):
import corn
## merge multiple iterations into one
@@ 1665,5 +1240,5 @@
mres.results.append( te )
 problists, tots = corn.computeROCCumulative(mres, classIndex)
+ problists, tots = corn.computeROCCumulative(mres, class_index)
results = []
@@ 1693,12 +1268,13 @@
def compute_CDT(res, classIndex=1, **argkw):
+@deprecated_keywords({"classIndex": "class_index"})
+def compute_CDT(res, class_index=1, **argkw):
"""Obsolete, don't use"""
import corn
 if classIndex<0:
+ if class_index<0:
if res.baseClass>=0:
 classIndex = res.baseClass
 else:
 classIndex = 1
+ class_index = res.baseClass
+ else:
+ class_index = 1
useweights = res.weights and not argkw.get("unweighted", 0)
@@ 1709,5 +1285,5 @@
iterationExperiments = split_by_iterations(res)
for exp in iterationExperiments:
 expCDTs = corn.computeCDT(exp, classIndex, useweights)
+ expCDTs = corn.computeCDT(exp, class_index, useweights)
for i in range(len(CDTs)):
CDTs[i].C += expCDTs[i].C
@@ 1716,9 +1292,9 @@
for i in range(res.number_of_learners):
if is_CDT_empty(CDTs[0]):
 return corn.computeCDT(res, classIndex, useweights)
+ return corn.computeCDT(res, class_index, useweights)
return CDTs
else:
 return corn.computeCDT(res, classIndex, useweights)
+ return corn.computeCDT(res, class_index, useweights)
## THIS FUNCTION IS OBSOLETE AND ITS AVERAGING OVER FOLDS IS QUESTIONABLE
@@ 1764,11 +1340,13 @@
# are divided by 'divideByIfIte'. Additional flag is returned which is True in
# the former case, or False in the latter.
def AUC_x(cdtComputer, ite, all_ite, divideByIfIte, computerArgs):
 cdts = cdtComputer(*(ite, ) + computerArgs)
+@deprecated_keywords({"divideByIfIte": "divide_by_if_ite",
+ "computerArgs": "computer_args"})
+def AUC_x(cdtComputer, ite, all_ite, divide_by_if_ite, computer_args):
+ cdts = cdtComputer(*(ite, ) + computer_args)
if not is_CDT_empty(cdts[0]):
 return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T)/divideByIfIte for cdt in cdts], True
+ return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T)/divide_by_if_ite for cdt in cdts], True
if all_ite:
 cdts = cdtComputer(*(all_ite, ) + computerArgs)
+ cdts = cdtComputer(*(all_ite, ) + computer_args)
if not is_CDT_empty(cdts[0]):
return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T) for cdt in cdts], False
@@ 1778,14 +1356,21 @@
# computes AUC between classes i and j as if there we no other classes
def AUC_ij(ite, classIndex1, classIndex2, useWeights = True, all_ite = None, divideByIfIte = 1.0):
+@deprecated_keywords({"classIndex1": "class_index1",
+ "classIndex2": "class_index2",
+ "useWeights": "use_weights",
+ "divideByIfIte": "divide_by_if_ite"})
+def AUC_ij(ite, class_index1, class_index2, use_weights = True, all_ite = None, divide_by_if_ite = 1.0):
import corn
 return AUC_x(corn.computeCDTPair, ite, all_ite, divideByIfIte, (classIndex1, classIndex2, useWeights))
+ return AUC_x(corn.computeCDTPair, ite, all_ite, divide_by_if_ite, (class_index1, class_index2, use_weights))
# computes AUC between class i and the other classes (treating them as the same class)
def AUC_i(ite, classIndex, useWeights = True, all_ite = None, divideByIfIte = 1.0):
+@deprecated_keywords({"classIndex": "class_index",
+ "useWeights": "use_weights",
+ "divideByIfIte": "divide_by_if_ite"})
+def AUC_i(ite, class_index, use_weights = True, all_ite = None, divide_by_if_ite = 1.0):
import corn
 return AUC_x(corn.computeCDT, ite, all_ite, divideByIfIte, (classIndex, useWeights))

+ return AUC_x(corn.computeCDT, ite, all_ite, divide_by_if_ite, (class_index, use_weights))
+
# computes the average AUC over folds using a "AUCcomputer" (AUC_i or AUC_ij)
@@ 1793,8 +1378,11 @@
# fold the computer has to resort to computing over all folds or even this failed;
# in these cases the result is returned immediately
def AUC_iterations(AUCcomputer, iterations, computerArgs):
+
+@deprecated_keywords({"AUCcomputer": "auc_computer",
+ "computerArgs": "computer_args"})
+def AUC_iterations(auc_computer, iterations, computer_args):
subsum_aucs = [0.] * iterations[0].number_of_learners
for ite in iterations:
 aucs, foldsUsed = AUCcomputer(*(ite, ) + computerArgs)
+ aucs, foldsUsed = auc_computer(*(ite, ) + computer_args)
if not aucs:
return None
@@ 1806,12 +1394,14 @@
# AUC for binary classification problems
def AUC_binary(res, useWeights = True):
+@deprecated_keywords({"useWeights": "use_weights"})
+def AUC_binary(res, use_weights = True):
if res.number_of_iterations > 1:
 return AUC_iterations(AUC_i, split_by_iterations(res), (1, useWeights, res, res.number_of_iterations))
 else:
 return AUC_i(res, 1, useWeights)[0]
+ return AUC_iterations(AUC_i, split_by_iterations(res), (1, use_weights, res, res.number_of_iterations))
+ else:
+ return AUC_i(res, 1, use_weights)[0]
# AUC for multiclass problems
def AUC_multi(res, useWeights = True, method = 0):
+@deprecated_keywords({"useWeights": "use_weights"})
+def AUC_multi(res, use_weights = True, method = 0):
numberOfClasses = len(res.class_values)
@@ 1833,5 +1423,5 @@
for classIndex1 in range(numberOfClasses):
for classIndex2 in range(classIndex1):
 subsum_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations))
+ subsum_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, use_weights, all_ite, res.number_of_iterations))
if subsum_aucs:
if method == 0:
@@ 1844,5 +1434,5 @@
else:
for classIndex in range(numberOfClasses):
 subsum_aucs = AUC_iterations(AUC_i, iterations, (classIndex, useWeights, all_ite, res.number_of_iterations))
+ subsum_aucs = AUC_iterations(AUC_i, iterations, (classIndex, use_weights, all_ite, res.number_of_iterations))
if subsum_aucs:
if method == 0:
@@ 1866,5 +1456,6 @@
# Computes AUC, possibly for multiple classes (the averaging method can be specified)
# Results over folds are averages; if some folds examples from one class only, the folds are merged
def AUC(res, method = AUC.ByWeightedPairs, useWeights = True):
+@deprecated_keywords({"useWeights": "use_weights"})
+def AUC(res, method = AUC.ByWeightedPairs, use_weights = True):
""" Returns the area under ROC curve (AUC) given a set of experimental
results. For multivalued class problems, it will compute some sort of
@@ 1874,7 +1465,7 @@
raise ValueError("Cannot compute AUC on a singleclass problem")
elif len(res.class_values) == 2:
 return AUC_binary(res, useWeights)
 else:
 return AUC_multi(res, useWeights, method)
+ return AUC_binary(res, use_weights)
+ else:
+ return AUC_multi(res, use_weights, method)
AUC.ByWeightedPairs = 0
@@ 1886,5 +1477,7 @@
# Computes AUC; in multivalued class problem, AUC is computed as one against all
# Results over folds are averages; if some folds examples from one class only, the folds are merged
def AUC_single(res, classIndex = 1, useWeights = True):
+@deprecated_keywords({"classIndex": "class_index",
+ "useWeights": "use_weights"})
+def AUC_single(res, class_index = 1, use_weights = True):
""" Computes AUC where the class given classIndex is singled out, and
all other classes are treated as a single class. To find how good our
@@ 1895,29 +1488,33 @@
classIndex = vehicle.domain.classVar.values.index("van"))
"""
 if classIndex<0:
+ if class_index<0:
if res.baseClass>=0:
 classIndex = res.baseClass
 else:
 classIndex = 1
+ class_index = res.baseClass
+ else:
+ class_index = 1
if res.number_of_iterations > 1:
 return AUC_iterations(AUC_i, split_by_iterations(res), (classIndex, useWeights, res, res.number_of_iterations))
 else:
 return AUC_i( res, classIndex, useWeights)[0]
+ return AUC_iterations(AUC_i, split_by_iterations(res), (class_index, use_weights, res, res.number_of_iterations))
+ else:
+ return AUC_i( res, class_index, use_weights)[0]
# Computes AUC for a pair of classes (as if there were no other classes)
# Results over folds are averages; if some folds have examples from one class only, the folds are merged
def AUC_pair(res, classIndex1, classIndex2, useWeights = True):
+@deprecated_keywords({"classIndex1": "class_index1",
+ "classIndex2": "class_index2",
+ "useWeights": "use_weights"})
+def AUC_pair(res, class_index1, class_index2, use_weights = True):
""" Computes AUC between a pair of instances, ignoring instances from all
other classes.
"""
if res.number_of_iterations > 1:
 return AUC_iterations(AUC_ij, split_by_iterations(res), (classIndex1, classIndex2, useWeights, res, res.number_of_iterations))
 else:
 return AUC_ij(res, classIndex1, classIndex2, useWeights)
+ return AUC_iterations(AUC_ij, split_by_iterations(res), (class_index1, class_index2, use_weights, res, res.number_of_iterations))
+ else:
+ return AUC_ij(res, class_index1, class_index2, use_weights)
# AUC for multiclass problems
def AUC_matrix(res, useWeights = True):
+@deprecated_keywords({"useWeights": "use_weights"})
+def AUC_matrix(res, use_weights = True):
""" Computes a (lower diagonal) matrix with AUCs for all pairs of classes.
If there are empty classes, the corresponding elements in the matrix
@@ 1944,5 +1541,5 @@
for classIndex1 in range(numberOfClasses):
for classIndex2 in range(classIndex1):
 pair_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations))
+ pair_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, use_weights, all_ite, res.number_of_iterations))
if pair_aucs:
for lrn in range(number_of_learners):
@@ 2080,11 +1677,16 @@
def plot_learning_curve_learners(file, allResults, proportions, learners, noConfidence=0):
 plot_learning_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], noConfidence)

def plot_learning_curve(file, allResults, proportions, legend, noConfidence=0):
+@deprecated_keywords({"allResults": "all_results",
+ "noConfidence": "no_confidence"})
+def plot_learning_curve_learners(file, all_results, proportions, learners, no_confidence=0):
+ plot_learning_curve(file, all_results, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], no_confidence)
+
+
+@deprecated_keywords({"allResults": "all_results",
+ "noConfidence": "no_confidence"})
+def plot_learning_curve(file, all_results, proportions, legend, no_confidence=0):
import types
fopened=0
 if (type(file)==types.StringType):
+ if type(file)==types.StringType:
file=open(file, "wt")
fopened=1
@@ 2093,17 +1695,17 @@
file.write("set xrange [%f:%f]\n" % (proportions[0], proportions[1]))
file.write("set multiplot\n\n")
 CAs = [CA_dev(x) for x in allResults]
+ CAs = [CA_dev(x) for x in all_results]
file.write("plot \\\n")
for i in range(len(legend)1):
 if not noConfidence:
+ if not no_confidence:
file.write("'' title '' with yerrorbars pointtype %i,\\\n" % (i+1))
file.write("'' title '%s' with linespoints pointtype %i,\\\n" % (legend[i], i+1))
 if not noConfidence:
+ if not no_confidence:
file.write("'' title '' with yerrorbars pointtype %i,\\\n" % (len(legend)))
file.write("'' title '%s' with linespoints pointtype %i\n" % (legend[1], len(legend)))
for i in range(len(legend)):
 if not noConfidence:
+ if not no_confidence:
for p in range(len(proportions)):
file.write("%f\t%f\t%f\n" % (proportions[p], CAs[p][i][0], 1.96*CAs[p][i][1]))
@@ 2162,9 +1764,11 @@

def plot_McNemar_curve_learners(file, allResults, proportions, learners, reference=1):
 plot_McNemar_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], reference)

def plot_McNemar_curve(file, allResults, proportions, legend, reference=1):
+@deprecated_keywords({"allResults": "all_results"})
+def plot_McNemar_curve_learners(file, all_results, proportions, learners, reference=1):
+ plot_McNemar_curve(file, all_results, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], reference)
+
+
+@deprecated_keywords({"allResults": "all_results"})
+def plot_McNemar_curve(file, all_results, proportions, legend, reference=1):
if reference<0:
reference=len(legend)1
@@ 2188,5 +1792,5 @@
for i in tmap:
for p in range(len(proportions)):
 file.write("%f\t%f\n" % (proportions[p], McNemar_of_two(allResults[p], i, reference)))
+ file.write("%f\t%f\n" % (proportions[p], McNemar_of_two(all_results[p], i, reference)))
file.write("e\n\n")
@@ 2197,8 +1801,11 @@
default_line_types=("\\setsolid", "\\setdashpattern <4pt, 2pt>", "\\setdashpattern <8pt, 2pt>", "\\setdashes", "\\setdots")
def learning_curve_learners_to_PiCTeX(file, allResults, proportions, **options):
 return apply(learning_curve_to_PiCTeX, (file, allResults, proportions), options)

def learning_curve_to_PiCTeX(file, allResults, proportions, **options):
+@deprecated_keywords({"allResults": "all_results"})
+def learning_curve_learners_to_PiCTeX(file, all_results, proportions, **options):
+ return apply(learning_curve_to_PiCTeX, (file, all_results, proportions), options)
+
+
+@deprecated_keywords({"allResults": "all_results"})
+def learning_curve_to_PiCTeX(file, all_results, proportions, **options):
import types
fopened=0
@@ 2207,6 +1814,6 @@
fopened=1
 nexamples=len(allResults[0].results)
 CAs = [CA_dev(x) for x in allResults]
+ nexamples=len(all_results[0].results)
+ CAs = [CA_dev(x) for x in all_results]
graphsize=float(options.get("graphsize", 10.0)) #cm
Index: Orange/feature/__init__.py
===================================================================
 Orange/feature/__init__.py (revision 9671)
+++ Orange/feature/__init__.py (revision 9895)
@@ 10,3 +10,19 @@
import imputation
+from Orange.core import Variable as Descriptor
+from Orange.core import EnumVariable as Discrete
+from Orange.core import FloatVariable as Continuous
+from Orange.core import PythonVariable as Python
+from Orange.core import StringVariable as String
+
+from Orange.core import VarList as Descriptors
+
+from Orange.core import newmetaid as new_meta_id
+
+from Orange.core import Variable as V
+make = V.make
+retrieve = V.get_existing
+MakeStatus = V.MakeStatus
+del V
+
__docformat__ = 'restructuredtext'
Index: docs/reference/rst/Orange.data.rst
===================================================================
 docs/reference/rst/Orange.data.rst (revision 9900)
+++ docs/reference/rst/Orange.data.rst (revision 9901)
@@ 5,5 +5,4 @@
.. toctree::
 Orange.data.variable
Orange.data.domain
Orange.data.value
Index: ocs/reference/rst/Orange.data.variable.rst
===================================================================
 docs/reference/rst/Orange.data.variable.rst (revision 9848)
+++ (revision )
@@ 1,484 +1,0 @@
.. automodule:: Orange.data.variable

========================
Variables (``variable``)
========================

Data instances in Orange can contain several types of variables:
:ref:`discrete `, :ref:`continuous `,
:ref:`strings `, and :ref:`Python ` and types derived from it.
The latter represent arbitrary Python objects.
The names, types, values (where applicable), functions for computing the
variable value from values of other variables, and other properties of the
variables are stored in descriptor classes derived from :obj:`Orange.data
.variable.Variable`.

Orange considers two variables (e.g. in two different data tables) the
same if they have the same descriptor. It is allowed  but not
recommended  to have different variables with the same name.

Variable descriptors


Variable descriptors can be constructed either by calling the
corresponding constructors or by a factory function :func:`Orange.data
.variable.make`, which either retrieves an existing descriptor or
constructs a new one.

.. class:: Variable

 An abstract base class for variable descriptors.

 .. attribute:: name

 The name of the variable.

 .. attribute:: var_type

 Variable type; it can be :obj:`~Orange.data.Type.Discrete`,
 :obj:`~Orange.data.Type.Continuous`,
 :obj:`~Orange.data.Type.String` or :obj:`~Orange.data.Type.Other`.

 .. attribute:: get_value_from

 A function (an instance of :obj:`~Orange.classification.Classifier`)
 that computes a value of the variable from values of one or more
 other variables. This is used, for instance, in discretization,
 which computes the value of a discretized variable from the
 original continuous variable.

 .. attribute:: ordered

 A flag telling whether the values of a discrete variable are ordered. At
 the moment, no builtin method treats ordinal variables differently than
 nominal ones.

 .. attribute:: random_generator

 A local random number generator used by method
 :obj:`~Variable.randomvalue()`.

 .. attribute:: default_meta_id

 A proposed (but not guaranteed) meta id to be used for that variable.
 For instance, when a tabdelimited contains meta attributes and
 the existing variables are reused, they will have this id
 (instead of a new one assigned by :obj:`Orange.data.new_meta_id()`).

 .. attribute:: attributes

 A dictionary which allows the user to store additional information
 about the variable. All values should be strings. See the section
 about :ref:`storing additional information `.

 .. method:: __call__(obj)

 Convert a string, number, or other suitable object into a variable
 value.

 :param obj: An object to be converted into a variable value
 :type o: any suitable
 :rtype: :class:`Orange.data.Value`

 .. method:: randomvalue()

 Return a random value for the variable.

 :rtype: :class:`Orange.data.Value`

 .. method:: compute_value(inst)

 Compute the value of the variable given the instance by calling
 obj:`~Variable.get_value_from` through a mechanism that
 prevents infinite recursive calls.

 :rtype: :class:`Orange.data.Value`

.. _discrete:
.. class:: Discrete

 Bases: :class:`Variable`

 Descriptor for discrete variables.

 .. attribute:: values

 A list with symbolic names for variables' values. Values are stored as
 indices referring to this list and modifying it instantly
 changes the (symbolic) names of values as they are printed out or
 referred to by user.

 .. note::

 The size of the list is also used to indicate the number of
 possible values for this variable. Changing the size  especially
 shrinking the list  can crash Python. Also, do not add values
 to the list by calling its append or extend method:
 use :obj:`add_value` method instead.

 It is also assumed that this attribute is always defined (but can
 be empty), so never set it to ``None``.

 .. attribute:: base_value

 Stores the base value for the variable as an index in `values`.
 This can be, for instance, a "normal" value, such as "no
 complications" as opposed to abnormal "low blood pressure". The
 base value is used by certain statistics, continuization etc.
 potentially, learning algorithms. The default is 1 which means that
 there is no base value.

 .. method:: add_value(s)

 Add a value with symbolic name ``s`` to values. Always call
 this function instead of appending to ``values``.

.. _continuous:
.. class:: Continuous

 Bases: :class:`Variable`

 Descriptor for continuous variables.

 .. attribute:: number_of_decimals

 The number of decimals used when the value is printed out, converted to
 a string or saved to a file.

 .. attribute:: scientific_format

 If ``True``, the value is printed in scientific format whenever it
 would have more than 5 digits. In this case, :obj:`number_of_decimals` is
 ignored.

 .. attribute:: adjust_decimals

 Tells Orange to monitor the number of decimals when the value is
 converted from a string (when the values are read from a file or
 converted by, e.g. ``inst[0]="3.14"``):

 * 0: the number of decimals is not adjusted automatically;
 * 1: the number of decimals is (and has already) been adjusted;
 * 2: automatic adjustment is enabled, but no values have been
 converted yet.

 By default, adjustment of the number of decimals goes as follows:

 * If the variable was constructed when data was read from a file,
 it will be printed with the same number of decimals as the
 largest number of decimals encountered in the file. If
 scientific notation occurs in the file,
 :obj:`scientific_format` will be set to ``True`` and scientific
 format will be used for values too large or too small.

 * If the variable is created in a script, it will have,
 by default, three decimal places. This can be changed either by
 setting the value from a string (e.g. ``inst[0]="3.14"``,
 but not ``inst[0]=3.14``) or by manually setting the
 :obj:`number_of_decimals`.

 .. attribute:: start_value, end_value, step_value

 The range used for :obj:`randomvalue`.

.. _String:
.. class:: String

 Bases: :class:`Variable`

 Descriptor for variables that contain strings. No method can use them for
 learning; some will raise error or warnings, and others will
 silently ignore them. They can be, however, used as metaattributes; if
 instances in a dataset have unique IDs, the most efficient way to store them
 is to read them as metaattributes. In general, never use discrete
 attributes with many (say, more than 50) values. Such attributes are
 probably not of any use for learning and should be stored as string
 attributes.

 When converting strings into values and back, empty strings are treated
 differently than usual. For other types, an empty string denotes
 undefined values, while :obj:`String` will take empty strings
 as empty strings  except when loading or saving into file.
 Empty strings in files are interpreted as undefined; to specify an empty
 string, enclose the string in double quotes; these are removed when the
 string is loaded.

.. _Python:
.. class:: Python

 Bases: :class:`Variable`

 Base class for descriptors defined in Python. It is fully functional
 and can be used as a descriptor for attributes that contain arbitrary Python
 values. Since this is an advanced topic, PythonVariables are described on a
 separate page. !!TODO!!


.. _attributes:

Storing additional attributes


All variables have a field :obj:`~Variable.attributes`, a dictionary
that can store additional string data.

.. literalinclude:: code/attributes.py

These attributes can only be saved to a .tab file. They are listed in the
third line in = format, after other attribute specifications
(such as "meta" or "class"), and are separated by spaces.

.. _variable_descriptor_reuse:

Reuse of descriptors


There are situations when variable descriptors need to be reused. Typically, the
user loads some training examples, trains a classifier, and then loads a separate
test set. For the classifier to recognize the variables in the second data set,
the descriptors, not just the names, need to be the same.

When constructing new descriptors for data read from a file or during unpickling,
Orange checks whether an appropriate descriptor (with the same name and, in case
of discrete variables, also values) already exists and reuses it. When new
descriptors are constructed by explicitly calling the above constructors, this
always creates new descriptors and thus new variables, although a variable with
the same name may already exist.

The search for an existing variable is based on four attributes: the variable's name,
type, ordered values, and unordered values. As for the latter two, the values can
be explicitly ordered by the user, e.g. in the second line of the tabdelimited
file. For instance, sizes can be ordered as small, medium, or big.

The search for existing variables can end with one of the following statuses.

.. data:: Orange.data.variable.MakeStatus.NotFound (4)

 The variable with that name and type does not exist.

.. data:: Orange.data.variable.MakeStatus.Incompatible (3)

 There are variables with matching name and type, but their
 values are incompatible with the prescribed ordered values. For example,
 if the existing variable already has values ["a", "b"] and the new one
 wants ["b", "a"], the old variable cannot be reused. The existing list can,
 however be appended with the new values, so searching for ["a", "b", "c"] would
 succeed. Likewise a search for ["a"] would be successful, since the extra existing value
 does not matter. The formal rule is thus that the values are compatible iff ``existing_values[:len(ordered_values)] == ordered_values[:len(existing_values)]``.

.. data:: Orange.data.variable.MakeStatus.NoRecognizedValues (2)

 There is a matching variable, yet it has none of the values that the new
 variable will have (this is obviously possible only if the new variable has
 no prescribed ordered values). For instance, we search for a variable
 "sex" with values "male" and "female", while there is a variable of the same
 name with values "M" and "F" (or, well, "no" and "yes" :). Reuse of this
 variable is possible, though this should probably be a new variable since it
 obviously comes from a different data set. If we do decide to reuse the variable, the
 old variable will get some unneeded new values and the new one will inherit
 some from the old.

.. data:: Orange.data.variable.MakeStatus.MissingValues (1)

 There is a matching variable with some of the values that the new one
 requires, but some values are missing. This situation is neither uncommon
 nor suspicious: in case of separate training and testing data sets there may
 be values which occur in one set but not in the other.

.. data:: Orange.data.variable.MakeStatus.OK (0)

 There is a perfect match which contains all the prescribed values in the
 correct order. The existing variable may have some extra values, though.

Continuous variables can obviously have only two statuses,
:obj:`~Orange.data.variable.MakeStatus.NotFound` or :obj:`~Orange.data.variable.MakeStatus.OK`.

When loading the data using :obj:`Orange.data.Table`, Orange takes the safest
approach and, by default, reuses everything that is compatible up to
and including :obj:`~Orange.data.variable.MakeStatus.NoRecognizedValues`. Unintended reuse would be obvious from the
variable having too many values, which the user can notice and fix. More on that
in the page on :doc:`Orange.data.formats`.

There are two functions for reusing the variables instead of creating new ones.

.. function:: Orange.data.variable.make(name, type, ordered_values, unordered_values[, create_new_on])

 Find and return an existing variable or create a new one if none of the existing
 variables matches the given name, type and values.

 The optional `create_new_on` specifies the status at which a new variable is
 created. The status must be at most :obj:`~Orange.data.variable.MakeStatus.Incompatible` since incompatible (or
 nonexisting) variables cannot be reused. If it is set lower, for instance
 to :obj:`~Orange.data.variable.MakeStatus.MissingValues`, a new variable is created even if there exists
 a variable which is only missing the same values. If set to :obj:`~Orange.data.variable.MakeStatus.OK`, the function
 always creates a new variable.

 The function returns a tuple containing a variable descriptor and the
 status of the best matching variable. So, if ``create_new_on`` is set to
 :obj:`~Orange.data.variable.MakeStatus.MissingValues`, and there exists a variable whose status is, say,
 :obj:`~Orange.data.variable.MakeStatus.NoRecognizedValues`, a variable would be created, while the second
 element of the tuple would contain :obj:`~Orange.data.variable.MakeStatus.NoRecognizedValues`. If, on the other
 hand, there exists a variable which is perfectly OK, its descriptor is
 returned and the returned status is :obj:`~Orange.data.variable.MakeStatus.OK`. The function returns no
 indicator whether the returned variable is reused or not. This can be,
 however, read from the status code: if it is smaller than the specified
 ``create_new_on``, the variable is reused, otherwise a new descriptor has been constructed.

 The exception to the rule is when ``create_new_on`` is OK. In this case, the
 function does not search through the existing variables and cannot know the
 status, so the returned status in this case is always :obj:`~Orange.data.variable.MakeStatus.OK`.

 :param name: Variable name
 :param type: Variable type
 :type type: Orange.data.variable.Type
 :param ordered_values: a list of ordered values
 :param unordered_values: a list of values, for which the order does not
 matter
 :param create_new_on: gives the condition for constructing a new variable instead
 of using the new one

 :return_type: a tuple (:class:`~Orange.data.variable.Variable`, int)

.. function:: Orange.data.variable.retrieve(name, type, ordered_values, onordered_values[, create_new_on])

 Find and return an existing variable, or :obj:`None` if no match is found.

 :param name: variable name.
 :param type: variable type.
 :type type: Orange.data.variable.Type
 :param ordered_values: a list of ordered values
 :param unordered_values: a list of values, for which the order does not
 matter
 :param create_new_on: gives the condition for constructing a new variable instead
 of using the new one

 :return_type: :class:`~Orange.data.variable.Variable`

The following examples give the shown results if
executed only once (in a Python session) and in this order.

:func:`Orange.data.variable.make` can be used for the construction of new variables. ::

 >>> v1, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, ["a", "b"])
 >>> print s, v1.values
 NotFound

A new variable was created and the status is :obj:`~Orange.data.variable
.MakeStatus.NotFound`. ::

 >>> v2, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, ["a"], ["c"])
 >>> print s, v2 is v1, v1.values
 MissingValues True

The status is :obj:`~Orange.data.variable.MakeStatus.MissingValues`,
yet the variable is reused (``v2 is v1``). ``v1`` gets a new value,
``"c"``, which was given as an unordered value. It does
not matter that the new variable does not need the value ``b``. ::

 >>> v3, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, ["a", "b", "c", "d"])
 >>> print s, v3 is v1, v1.values
 MissingValues True

This is like before, except that the new value, ``d`` is not among the
ordered values. ::

 >>> v4, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, ["b"])
 >>> print s, v4 is v1, v1.values, v4.values
 Incompatible, False, ,

The new variable needs to have ``b`` as the first value, so it is incompatible
with the existing variables. The status is
:obj:`~Orange.data.variable.MakeStatus.Incompatible` and
a new variable is created; the two variables are not equal and have
different lists of values. ::

 >>> v5, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, None, ["c", "a"])
 >>> print s, v5 is v1, v1.values, v5.values
 OK True

The new variable has values ``c`` and ``a``, but the order is not important,
so the existing attribute is :obj:`~Orange.data.variable.MakeStatus.OK`. ::

 >>> v6, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, None, ["e"]) "a"])
 >>> print s, v6 is v1, v1.values, v6.values
 NoRecognizedValues True

The new variable has different values than the existing variable (status
is :obj:`~Orange.data.variable.MakeStatus.NoRecognizedValues`),
but the existing one is nonetheless reused. Note that we
gave ``e`` in the list of unordered values. If it was among the ordered, the
reuse would fail. ::

 >>> v7, s = Orange.data.variable.make("a", Orange.data.Type.Discrete, None,
 ["f"], Orange.data.variable.MakeStatus.NoRecognizedValues)))
 >>> print s, v7 is v1, v1.values, v7.values
 Incompatible False

This is the same as before, except that we prohibited reuse when there are no
recognized values. Hence a new variable is created, though the returned status is
the same as before::

 >>> v8, s = Orange.data.variable.make("a", Orange.data.Type.Discrete,
 ["a", "b", "c", "d", "e"], None, Orange.data.variable.MakeStatus.OK)
 >>> print s, v8 is v1, v1.values, v8.values
 OK False

Finally, this is a perfect match, but any reuse is prohibited, so a new
variable is created.



Variables computed from other variables


Values of variables are often computed from other variables, such as in
discretization. The mechanism described below usually functions behind the scenes,
so understanding it is required only for implementing specific transformations.

Monk 1 is a wellknown dataset with target concept ``y := a==b or e==1``.
It can help the learning algorithm if the fourvalued attribute ``e`` is
replaced with a binary attribute having values `"1"` and `"not 1"`. The
new variable will be computed from the old one on the fly.

.. literalinclude:: code/variableget_value_from.py
 :lines: 717

The new variable is named ``e2``; we define it with a descriptor of type
:obj:`Discrete`, with appropriate name and values ``"not 1"`` and ``1`` (we
chose this order so that the ``not 1``'s index is ``0``, which can be, if
needed, interpreted as ``False``). Finally, we tell e2 to use
``checkE`` to compute its value when needed, by assigning ``checkE`` to
``e2.get_value_from``.

``checkE`` is a function that is passed an instance and another argument we
do not care about here. If the instance's ``e`` equals ``1``, the function
returns value ``1``, otherwise it returns ``not 1``. Both are returned as
values, not plain strings.

In most circumstances the value of ``e2`` can be computed on the fly  we can
pretend that the variable exists in the data, although it does not (but
can be computed from it). For instance, we can compute the information gain of
variable ``e2`` or its distribution without actually constructing data containing
the new variable.

.. literalinclude:: code/variableget_value_from.py
 :lines: 1922

There are methods which cannot compute values on the fly because it would be
too complex or time consuming. In such cases, the data need to be converted
to a new :obj:`Orange.data.Table`::

 new_domain = Orange.data.Domain([data.domain["a"], data.domain["b"], e2, data.domain.class_var])
 new_data = Orange.data.Table(new_domain, data)

Automatic computation is useful when the data is split into training and
testing examples. Training instances can be modified by adding, removing
and transforming variables (in a typical setup, continuous variables
are discretized prior to learning, therefore the original variables are
replaced by new ones). Test instances, on the other hand, are left as they
are. When they are classified, the classifier automatically converts the
testing instances into the new domain, which includes recomputation of
transformed variables.

.. literalinclude:: code/variableget_value_from.py
 :lines: 24
Index: docs/reference/rst/Orange.evaluation.scoring.rst
===================================================================
 docs/reference/rst/Orange.evaluation.scoring.rst (revision 9372)
+++ docs/reference/rst/Orange.evaluation.scoring.rst (revision 9892)
@@ 1,1 +1,448 @@
.. automodule:: Orange.evaluation.scoring
+
+############################
+Method scoring (``scoring``)
+############################
+
+.. index: scoring
+
+This module contains various measures of quality for classification and
+regression. Most functions require an argument named :obj:`res`, an instance of
+:class:`Orange.evaluation.testing.ExperimentResults` as computed by
+functions from :mod:`Orange.evaluation.testing` and which contains
+predictions obtained through crossvalidation,
+leave oneout, testing on training data or test set instances.
+
+==============
+Classification
+==============
+
+To prepare some data for examples on this page, we shall load the voting data
+set (problem of predicting the congressman's party (republican, democrat)
+based on a selection of votes) and evaluate naive Bayesian learner,
+classification trees and majority classifier using crossvalidation.
+For examples requiring a multivalued class problem, we shall do the same
+with the vehicle data set (telling whether a vehicle described by the features
+extracted from a picture is a van, bus, or Opel or Saab car).
+
+Basic cross validation example is shown in the following part of
+(:download:`statExamples.py `, uses :download:`voting.tab ` and :download:`vehicle.tab `):
+
+.. literalinclude:: code/statExample0.py
+
+If instances are weighted, weights are taken into account. This can be
+disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of
+disabling weights is to clear the
+:class:`Orange.evaluation.testing.ExperimentResults`' flag weights.
+
+General Measures of Quality
+===========================
+
+.. autofunction:: CA
+
+.. autofunction:: AP
+
+.. autofunction:: Brier_score
+
+.. autofunction:: IS
+
+So, let's compute all this in part of
+(:download:`statExamples.py `, uses :download:`voting.tab ` and :download:`vehicle.tab `) and print it out:
+
+.. literalinclude:: code/statExample1.py
+ :lines: 13
+
+The output should look like this::
+
+ method CA AP Brier IS
+ bayes 0.903 0.902 0.175 0.759
+ tree 0.846 0.845 0.286 0.641
+ majorty 0.614 0.526 0.474 0.000
+
+Script :download:`statExamples.py ` contains another example that also prints out
+the standard errors.
+
+Confusion Matrix
+================
+
+.. autofunction:: confusion_matrices
+
+ **A positivenegative confusion matrix** is computed (a) if the class is
+ binary unless :obj:`classIndex` argument is 2, (b) if the class is
+ multivalued and the :obj:`classIndex` is nonnegative. Argument
+ :obj:`classIndex` then tells which class is positive. In case (a),
+ :obj:`classIndex` may be omitted; the first class
+ is then negative and the second is positive, unless the :obj:`baseClass`
+ attribute in the object with results has nonnegative value. In that case,
+ :obj:`baseClass` is an index of the target class. :obj:`baseClass`
+ attribute of results object should be set manually. The result of a
+ function is a list of instances of class :class:`ConfusionMatrix`,
+ containing the (weighted) number of true positives (TP), false
+ negatives (FN), false positives (FP) and true negatives (TN).
+
+ We can also add the keyword argument :obj:`cutoff`
+ (e.g. confusion_matrices(results, cutoff=0.3); if we do, :obj:`confusion_matrices`
+ will disregard the classifiers' class predictions and observe the predicted
+ probabilities, and consider the prediction "positive" if the predicted
+ probability of the positive class is higher than the :obj:`cutoff`.
+
+ The example (part of :download:`statExamples.py `) below shows how setting the
+ cut off threshold from the default 0.5 to 0.2 affects the confusion matrics
+ for naive Bayesian classifier::
+
+ cm = Orange.evaluation.scoring.confusion_matrices(res)[0]
+ print "Confusion matrix for naive Bayes:"
+ print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
+
+ cm = Orange.evaluation.scoring.confusion_matrices(res, cutoff=0.2)[0]
+ print "Confusion matrix for naive Bayes:"
+ print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
+
+ The output::
+
+ Confusion matrix for naive Bayes:
+ TP: 238, FP: 13, FN: 29.0, TN: 155
+ Confusion matrix for naive Bayes:
+ TP: 239, FP: 18, FN: 28.0, TN: 150
+
+ shows that the number of true positives increases (and hence the number of
+ false negatives decreases) by only a single instance, while five instances
+ that were originally true negatives become false positives due to the
+ lower threshold.
+
+ To observe how good are the classifiers in detecting vans in the vehicle
+ data set, we would compute the matrix like this::
+
+ cm = Orange.evaluation.scoring.confusion_matrices(resVeh, \
+vehicle.domain.classVar.values.index("van"))
+
+ and get the results like these::
+
+ TP: 189, FP: 241, FN: 10.0, TN: 406
+
+ while the same for class "opel" would give::
+
+ TP: 86, FP: 112, FN: 126.0, TN: 522
+
+ The main difference is that there are only a few false negatives for the
+ van, meaning that the classifier seldom misses it (if it says it's not a
+ van, it's almost certainly not a van). Not so for the Opel car, where the
+ classifier missed 126 of them and correctly detected only 86.
+
+ **General confusion matrix** is computed (a) in case of a binary class,
+ when :obj:`classIndex` is set to 2, (b) when we have multivalued class and
+ the caller doesn't specify the :obj:`classIndex` of the positive class.
+ When called in this manner, the function cannot use the argument
+ :obj:`cutoff`.
+
+ The function then returns a threedimensional matrix, where the element
+ A[:obj:`learner`][:obj:`actual_class`][:obj:`predictedClass`]
+ gives the number of instances belonging to 'actual_class' for which the
+ 'learner' predicted 'predictedClass'. We shall compute and print out
+ the matrix for naive Bayesian classifier.
+
+ Here we see another example from :download:`statExamples.py `::
+
+ cm = Orange.evaluation.scoring.confusion_matrices(resVeh)[0]
+ classes = vehicle.domain.classVar.values
+ print "\t"+"\t".join(classes)
+ for className, classConfusions in zip(classes, cm):
+ print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions))
+
+ So, here's what this nice piece of code gives::
+
+ bus van saab opel
+ bus 56 95 21 46
+ van 6 189 4 0
+ saab 3 75 73 66
+ opel 4 71 51 86
+
+ Van's are clearly simple: 189 vans were classified as vans (we know this
+ already, we've printed it out above), and the 10 misclassified pictures
+ were classified as buses (6) and Saab cars (4). In all other classes,
+ there were more instances misclassified as vans than correctly classified
+ instances. The classifier is obviously quite biased to vans.
+
+ .. method:: sens(confm)
+ .. method:: spec(confm)
+ .. method:: PPV(confm)
+ .. method:: NPV(confm)
+ .. method:: precision(confm)
+ .. method:: recall(confm)
+ .. method:: F2(confm)
+ .. method:: Falpha(confm, alpha=2.0)
+ .. method:: MCC(conf)
+
+ With the confusion matrix defined in terms of positive and negative
+ classes, you can also compute the
+ `sensitivity `_
+ [TP/(TP+FN)], `specificity \
+`_
+ [TN/(TN+FP)], `positive predictive value \
+`_
+ [TP/(TP+FP)] and `negative predictive value \
+`_ [TN/(TN+FN)].
+ In information retrieval, positive predictive value is called precision
+ (the ratio of the number of relevant records retrieved to the total number
+ of irrelevant and relevant records retrieved), and sensitivity is called
+ `recall `_
+ (the ratio of the number of relevant records retrieved to the total number
+ of relevant records in the database). The
+ `harmonic mean `_ of precision
+ and recall is called an
+ `Fmeasure `_, where, depending
+ on the ratio of the weight between precision and recall is implemented
+ as F1 [2*precision*recall/(precision+recall)] or, for a general case,
+ Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)].
+ The `Matthews correlation coefficient \
+`_
+ in essence a correlation coefficient between
+ the observed and predicted binary classifications; it returns a value
+ between 1 and +1. A coefficient of +1 represents a perfect prediction,
+ 0 an average random prediction and 1 an inverse prediction.
+
+ If the argument :obj:`confm` is a single confusion matrix, a single
+ result (a number) is returned. If confm is a list of confusion matrices,
+ a list of scores is returned, one for each confusion matrix.
+
+ Note that weights are taken into account when computing the matrix, so
+ these functions don't check the 'weighted' keyword argument.
+
+ Let us print out sensitivities and specificities of our classifiers in
+ part of :download:`statExamples.py `::
+
+ cm = Orange.evaluation.scoring.confusion_matrices(res)
+ print
+ print "method\tsens\tspec"
+ for l in range(len(learners)):
+ print "%s\t%5.3f\t%5.3f" % (learners[l].name, Orange.evaluation.scoring.sens(cm[l]), Orange.evaluation.scoring.spec(cm[l]))
+
+ROC Analysis
+============
+
+`Receiver Operating Characteristic \
+`_
+(ROC) analysis was initially developed for
+a binarylike problems and there is no consensus on how to apply it in
+multiclass problems, nor do we know for sure how to do ROC analysis after
+cross validation and similar multiple sampling techniques. If you are
+interested in the area under the curve, function AUC will deal with those
+problems as specifically described below.
+
+.. autofunction:: AUC
+
+ .. attribute:: AUC.ByWeightedPairs (or 0)
+
+ Computes AUC for each pair of classes (ignoring instances of all other
+ classes) and averages the results, weighting them by the number of
+ pairs of instances from these two classes (e.g. by the product of
+ probabilities of the two classes). AUC computed in this way still
+ behaves as concordance index, e.g., gives the probability that two
+ randomly chosen instances from different classes will be correctly
+ recognized (this is of course true only if the classifier knows
+ from which two classes the instances came).
+
+ .. attribute:: AUC.ByPairs (or 1)
+
+ Similar as above, except that the average over class pairs is not
+ weighted. This AUC is, like the binary, independent of class
+ distributions, but it is not related to concordance index any more.
+
+ .. attribute:: AUC.WeightedOneAgainstAll (or 2)
+
+ For each class, it computes AUC for this class against all others (that
+ is, treating other classes as one class). The AUCs are then averaged by
+ the class probabilities. This is related to concordance index in which
+ we test the classifier's (average) capability for distinguishing the
+ instances from a specified class from those that come from other classes.
+ Unlike the binary AUC, the measure is not independent of class
+ distributions.
+
+ .. attribute:: AUC.OneAgainstAll (or 3)
+
+ As above, except that the average is not weighted.
+
+ In case of multiple folds (for instance if the data comes from cross
+ validation), the computation goes like this. When computing the partial
+ AUCs for individual pairs of classes or singledout classes, AUC is
+ computed for each fold separately and then averaged (ignoring the number
+ of instances in each fold, it's just a simple average). However, if a
+ certain fold doesn't contain any instances of a certain class (from the
+ pair), the partial AUC is computed treating the results as if they came
+ from a singlefold. This is not really correct since the class
+ probabilities from different folds are not necessarily comparable,
+ yet this will most often occur in a leaveoneout experiments,
+ comparability shouldn't be a problem.
+
+ Computing and printing out the AUC's looks just like printing out
+ classification accuracies (except that we call AUC instead of
+ CA, of course)::
+
+ AUCs = Orange.evaluation.scoring.AUC(res)
+ for l in range(len(learners)):
+ print "%10s: %5.3f" % (learners[l].name, AUCs[l])
+
+ For vehicle, you can run exactly this same code; it will compute AUCs
+ for all pairs of classes and return the average weighted by probabilities
+ of pairs. Or, you can specify the averaging method yourself, like this::
+
+ AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll)
+
+ The following snippet tries out all four. (We don't claim that this is
+ how the function needs to be used; it's better to stay with the default.)::
+
+ methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"]
+ print " " *25 + " \tbayes\ttree\tmajority"
+ for i in range(4):
+ AUCs = Orange.evaluation.scoring.AUC(resVeh, i)
+ print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs))
+
+ As you can see from the output::
+
+ bayes tree majority
+ by pairs, weighted: 0.789 0.871 0.500
+ by pairs: 0.791 0.872 0.500
+ one vs. all, weighted: 0.783 0.800 0.500
+ one vs. all: 0.783 0.800 0.500
+
+.. autofunction:: AUC_single
+
+.. autofunction:: AUC_pair
+
+.. autofunction:: AUC_matrix
+
+The remaining functions, which plot the curves and statistically compare
+them, require that the results come from a test with a single iteration,
+and they always compare one chosen class against all others. If you have
+cross validation results, you can either use split_by_iterations to split the
+results by folds, call the function for each fold separately and then sum
+the results up however you see fit, or you can set the ExperimentResults'
+attribute number_of_iterations to 1, to cheat the function  at your own
+responsibility for the statistical correctness. Regarding the multiclass
+problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class
+attribute's baseValue at the time when results were computed. If baseValue
+was not given at that time, 1 (that is, the second class) is used as default.
+
+We shall use the following code to prepare suitable experimental results::
+
+ ri2 = Orange.core.MakeRandomIndices2(voting, 0.6)
+ train = voting.selectref(ri2, 0)
+ test = voting.selectref(ri2, 1)
+ res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test)
+
+
+.. autofunction:: AUCWilcoxon
+
+.. autofunction:: compute_ROC
+
+Comparison of Algorithms
+
+
+.. autofunction:: McNemar
+
+.. autofunction:: McNemar_of_two
+
+==========
+Regression
+==========
+
+General Measure of Quality
+==========================
+
+Several alternative measures, as given below, can be used to evaluate
+the sucess of numeric prediction:
+
+.. image:: files/statRegression.png
+
+.. autofunction:: MSE
+
+.. autofunction:: RMSE
+
+.. autofunction:: MAE
+
+.. autofunction:: RSE
+
+.. autofunction:: RRSE
+
+.. autofunction:: RAE
+
+.. autofunction:: R2
+
+The following code (:download:`statExamples.py `) uses most of the above measures to
+score several regression methods.
+
+.. literalinclude:: code/statExamplesRegression.py
+
+The code above produces the following output::
+
+ Learner MSE RMSE MAE RSE RRSE RAE R2
+ maj 84.585 9.197 6.653 1.002 1.001 1.001 0.002
+ rt 40.015 6.326 4.592 0.474 0.688 0.691 0.526
+ knn 21.248 4.610 2.870 0.252 0.502 0.432 0.748
+ lr 24.092 4.908 3.425 0.285 0.534 0.515 0.715
+
+=================
+Ploting functions
+=================
+
+.. autofunction:: graph_ranks
+
+The following script (:download:`statExamplesGraphRanks.py `) shows hot to plot a graph:
+
+.. literalinclude:: code/statExamplesGraphRanks.py
+
+Code produces the following graph:
+
+.. image:: files/statExamplesGraphRanks1.png
+
+.. autofunction:: compute_CD
+
+.. autofunction:: compute_friedman
+
+=================
+Utility Functions
+=================
+
+.. autofunction:: split_by_iterations
+
+=====================================
+Scoring for multilabel classification
+=====================================
+
+Multilabel classification requries different metrics than those used in traditional singlelabel
+classification. This module presents the various methrics that have been proposed in the literature.
+Let :math:`D` be a multilabel evaluation data set, conisting of :math:`D` multilabel examples
+:math:`(x_i,Y_i)`, :math:`i=1..D`, :math:`Y_i \\subseteq L`. Let :math:`H` be a multilabel classifier
+and :math:`Z_i=H(x_i)` be the set of labels predicted by :math:`H` for example :math:`x_i`.
+
+.. autofunction:: mlc_hamming_loss
+.. autofunction:: mlc_accuracy
+.. autofunction:: mlc_precision
+.. autofunction:: mlc_recall
+
+So, let's compute all this and print it out (part of
+:download:`mlcevaluate.py `, uses
+:download:`emotions.tab `):
+
+.. literalinclude:: code/mlcevaluate.py
+ :lines: 115
+
+The output should look like this::
+
+ loss= [0.9375]
+ accuracy= [0.875]
+ precision= [1.0]
+ recall= [0.875]
+
+References
+==========
+
+Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multilabel scene classification',
+Pattern Recogintion, vol.37, no.9, pp:175771
+
+Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multilabeled Classification', paper
+presented to Proceedings of the 8th PacificAsia Conference on Knowledge Discovery and Data Mining
+(PAKDD 2004)
+
+Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bosstingbased system for text categorization',
+Machine Learning, vol.39, no.2/3, pp:13568.
Index: docs/reference/rst/Orange.feature.descriptor.rst
===================================================================
 docs/reference/rst/Orange.feature.descriptor.rst (revision 9897)
+++ docs/reference/rst/Orange.feature.descriptor.rst (revision 9897)
@@ 0,0 +1,493 @@
+.. py:currentmodule:: Orange.feature
+
+===========================
+Descriptor (``Descriptor``)
+===========================
+
+Data instances in Orange can contain several types of variables:
+:ref:`discrete `, :ref:`continuous `,
+:ref:`strings `, and :ref:`Python ` and types derived from it.
+The latter represent arbitrary Python objects.
+The names, types, values (where applicable), functions for computing the
+variable value from values of other variables, and other properties of the
+variables are stored in descriptor classes derived from :obj:`Descriptor`.
+
+Orange considers two variables (e.g. in two different data tables) the
+same if they have the same descriptor. It is allowed  but not
+recommended  to have different descriptors with the same name.
+
+Descriptors can be constructed either by calling the corresponding
+constructors or by a factory function :func:`make`, which either retrieves
+an existing descriptor or constructs a new one.
+
+.. class:: Descriptor
+
+ An abstract base class for variable descriptors.
+
+ .. attribute:: name
+
+ The name of the variable.
+
+ .. attribute:: var_type
+
+ Variable type; it can be :obj:`~Orange.data.Type.Discrete`,
+ :obj:`~Orange.data.Type.Continuous`,
+ :obj:`~Orange.data.Type.String` or :obj:`~Orange.data.Type.Other`.
+
+ .. attribute:: get_value_from
+
+ A function (an instance of :obj:`~Orange.classification.Classifier`)
+ that computes a value of the variable from values of one or more
+ other variables. This is used, for instance, in discretization,
+ which computes the value of a discretized variable from the
+ original continuous variable.
+
+ .. attribute:: ordered
+
+ A flag telling whether the values of a discrete variable are ordered. At
+ the moment, no builtin method treats ordinal variables differently than
+ nominal ones.
+
+ .. attribute:: random_generator
+
+ A local random number generator used by method
+ :obj:`~Descriptor.randomvalue()`.
+
+ .. attribute:: default_meta_id
+
+ A proposed (but not guaranteed) meta id to be used for that variable.
+ For instance, when a tabdelimited contains meta attributes and
+ the existing variables are reused, they will have this id
+ (instead of a new one assigned by :obj:`Orange.data.new_meta_id()`).
+
+ .. attribute:: attributes
+
+ A dictionary which allows the user to store additional information
+ about the variable. All values should be strings. See the section
+ about :ref:`storing additional information `.
+
+ .. method:: __call__(obj)
+
+ Convert a string, number, or other suitable object into a variable
+ value.
+
+ :param obj: An object to be converted into a variable value
+ :type o: any suitable
+ :rtype: :class:`Orange.data.Value`
+
+ .. method:: randomvalue()
+
+ Return a random value for the variable.
+
+ :rtype: :class:`Orange.data.Value`
+
+ .. method:: compute_value(inst)
+
+ Compute the value of the variable given the instance by calling
+ obj:`~Descriptor.get_value_from` through a mechanism that
+ prevents infinite recursive calls.
+
+ :rtype: :class:`Orange.data.Value`
+
+
+``Discrete``
+
+
+.. _discrete:
+.. class:: Discrete
+
+ Bases: :class:`Descriptor`
+
+ Descriptor for discrete variables.
+
+ .. attribute:: values
+
+ A list with symbolic names for variables' values. Values are stored as
+ indices referring to this list and modifying it instantly
+ changes the (symbolic) names of values as they are printed out or
+ referred to by user.
+
+ .. note::
+
+ The size of the list is also used to indicate the number of
+ possible values for this variable. Changing the size  especially
+ shrinking the list  can crash Python. Also, do not add values
+ to the list by calling its append or extend method:
+ use :obj:`add_value` method instead.
+
+ It is also assumed that this attribute is always defined (but can
+ be empty), so never set it to ``None``.
+
+ .. attribute:: base_value
+
+ Stores the base value for the variable as an index in `values`.
+ This can be, for instance, a "normal" value, such as "no
+ complications" as opposed to abnormal "low blood pressure". The
+ base value is used by certain statistics, continuization etc.
+ potentially, learning algorithms. The default is 1 which means that
+ there is no base value.
+
+ .. method:: add_value(s)
+
+ Add a value with symbolic name ``s`` to values. Always call
+ this function instead of appending to ``values``.
+
+``Continuous``
+
+
+.. _continuous:
+.. class:: Continuous
+
+ Bases: :class:`Descriptor`
+
+ Descriptor for continuous variables.
+
+ .. attribute:: number_of_decimals
+
+ The number of decimals used when the value is printed out, converted to
+ a string or saved to a file.
+
+ .. attribute:: scientific_format
+
+ If ``True``, the value is printed in scientific format whenever it
+ would have more than 5 digits. In this case, :obj:`number_of_decimals` is
+ ignored.
+
+ .. attribute:: adjust_decimals
+
+ Tells Orange to monitor the number of decimals when the value is
+ converted from a string (when the values are read from a file or
+ converted by, e.g. ``inst[0]="3.14"``):
+
+ * 0: the number of decimals is not adjusted automatically;
+ * 1: the number of decimals is (and has already) been adjusted;
+ * 2: automatic adjustment is enabled, but no values have been
+ converted yet.
+
+ By default, adjustment of the number of decimals goes as follows:
+
+ * If the variable was constructed when data was read from a file,
+ it will be printed with the same number of decimals as the
+ largest number of decimals encountered in the file. If
+ scientific notation occurs in the file,
+ :obj:`scientific_format` will be set to ``True`` and scientific
+ format will be used for values too large or too small.
+
+ * If the variable is created in a script, it will have,
+ by default, three decimal places. This can be changed either by
+ setting the value from a string (e.g. ``inst[0]="3.14"``,
+ but not ``inst[0]=3.14``) or by manually setting the
+ :obj:`number_of_decimals`.
+
+ .. attribute:: start_value, end_value, step_value
+
+ The range used for :obj:`randomvalue`.
+
+``String``
+
+
+.. _String:
+
+.. class:: String
+
+ Bases: :class:`Descriptor`
+
+ Descriptor for variables that contain strings. No method can use them for
+ learning; some will raise error or warnings, and others will
+ silently ignore them. They can be, however, used as metaattributes; if
+ instances in a dataset have unique IDs, the most efficient way to store them
+ is to read them as metaattributes. In general, never use discrete
+ attributes with many (say, more than 50) values. Such attributes are
+ probably not of any use for learning and should be stored as string
+ attributes.
+
+ When converting strings into values and back, empty strings are treated
+ differently than usual. For other types, an empty string denotes
+ undefined values, while :obj:`String` will take empty strings
+ as empty strings  except when loading or saving into file.
+ Empty strings in files are interpreted as undefined; to specify an empty
+ string, enclose the string in double quotes; these are removed when the
+ string is loaded.
+
+``Python``
+
+
+.. _Python:
+.. class:: Python
+
+ Bases: :class:`Descriptor`
+
+ Base class for descriptors defined in Python. It is fully functional
+ and can be used as a descriptor for attributes that contain arbitrary Python
+ values. Since this is an advanced topic, PythonVariables are described on a
+ separate page. !!TODO!!
+
+
+.. _attributes:
+
+Storing additional attributes
+
+
+All variables have a field :obj:`~Descriptor.attributes`, a dictionary
+that can store additional string data.
+
+.. literalinclude:: code/attributes.py
+
+These attributes can only be saved to a .tab file. They are listed in the
+third line in = format, after other attribute specifications
+(such as "meta" or "class"), and are separated by spaces.
+
+.. _variable_descriptor_reuse:
+
+Reuse of descriptors
+
+
+There are situations when variable descriptors need to be reused. Typically, the
+user loads some training examples, trains a classifier, and then loads a separate
+test set. For the classifier to recognize the variables in the second data set,
+the descriptors, not just the names, need to be the same.
+
+When constructing new descriptors for data read from a file or during unpickling,
+Orange checks whether an appropriate descriptor (with the same name and, in case
+of discrete variables, also values) already exists and reuses it. When new
+descriptors are constructed by explicitly calling the above constructors, this
+always creates new descriptors and thus new variables, although a variable with
+the same name may already exist.
+
+The search for an existing variable is based on four attributes: the variable's name,
+type, ordered values, and unordered values. As for the latter two, the values can
+be explicitly ordered by the user, e.g. in the second line of the tabdelimited
+file. For instance, sizes can be ordered as small, medium, or big.
+
+The search for existing variables can end with one of the following statuses.
+
+.. data:: MakeStatus.NotFound (4)
+
+ The variable with that name and type does not exist.
+
+.. data:: MakeStatus.Incompatible (3)
+
+ There are variables with matching name and type, but their
+ values are incompatible with the prescribed ordered values. For example,
+ if the existing variable already has values ["a", "b"] and the new one
+ wants ["b", "a"], the old variable cannot be reused. The existing list can,
+ however be appended with the new values, so searching for ["a", "b", "c"] would
+ succeed. Likewise a search for ["a"] would be successful, since the extra existing value
+ does not matter. The formal rule is thus that the values are compatible iff ``existing_values[:len(ordered_values)] == ordered_values[:len(existing_values)]``.
+
+.. data:: MakeStatus.NoRecognizedValues (2)
+
+ There is a matching variable, yet it has none of the values that the new
+ variable will have (this is obviously possible only if the new variable has
+ no prescribed ordered values). For instance, we search for a variable
+ "sex" with values "male" and "female", while there is a variable of the same
+ name with values "M" and "F" (or, well, "no" and "yes" :). Reuse of this
+ variable is possible, though this should probably be a new variable since it
+ obviously comes from a different data set. If we do decide to reuse the variable, the
+ old variable will get some unneeded new values and the new one will inherit
+ some from the old.
+
+.. data:: MakeStatus.MissingValues (1)
+
+ There is a matching variable with some of the values that the new one
+ requires, but some values are missing. This situation is neither uncommon
+ nor suspicious: in case of separate training and testing data sets there may
+ be values which occur in one set but not in the other.
+
+.. data:: MakeStatus.OK (0)
+
+ There is a perfect match which contains all the prescribed values in the
+ correct order. The existing variable may have some extra values, though.
+
+Continuous variables can obviously have only two statuses,
+:obj:`~MakeStatus.NotFound` or :obj:`~MakeStatus.OK`.
+
+When loading the data using :obj:`Orange.data.Table`, Orange takes the safest
+approach and, by default, reuses everything that is compatible up to
+and including :obj:`~MakeStatus.NoRecognizedValues`. Unintended reuse would be obvious from the
+variable having too many values, which the user can notice and fix. More on that
+in the page on :doc:`Orange.data.formats`.
+
+There are two functions for reusing the variables instead of creating new ones.
+
+.. function:: make(name, type, ordered_values, unordered_values[, create_new_on])
+
+ Find and return an existing variable or create a new one if none of the existing
+ variables matches the given name, type and values.
+
+ The optional `create_new_on` specifies the status at which a new variable is
+ created. The status must be at most :obj:`~MakeStatus.Incompatible` since incompatible (or
+ nonexisting) variables cannot be reused. If it is set lower, for instance
+ to :obj:`~MakeStatus.MissingValues`, a new variable is created even if there exists
+ a variable which is only missing the same values. If set to :obj:`~MakeStatus.OK`, the function
+ always creates a new variable.
+
+ The function returns a tuple containing a variable descriptor and the
+ status of the best matching variable. So, if ``create_new_on`` is set to
+ :obj:`~MakeStatus.MissingValues`, and there exists a variable whose status is, say,
+ :obj:`~MakeStatus.NoRecognizedValues`, a variable would be created, while the second
+ element of the tuple would contain :obj:`~MakeStatus.NoRecognizedValues`. If, on the other
+ hand, there exists a variable which is perfectly OK, its descriptor is
+ returned and the returned status is :obj:`~MakeStatus.OK`. The function returns no
+ indicator whether the returned variable is reused or not. This can be,
+ however, read from the status code: if it is smaller than the specified
+ ``create_new_on``, the variable is reused, otherwise a new descriptor has been constructed.
+
+ The exception to the rule is when ``create_new_on`` is OK. In this case, the
+ function does not search through the existing variables and cannot know the
+ status, so the returned status in this case is always :obj:`~MakeStatus.OK`.
+
+ :param name: Descriptor name
+ :param type: Descriptor type
+ :type type: Type
+ :param ordered_values: a list of ordered values
+ :param unordered_values: a list of values, for which the order does not
+ matter
+ :param create_new_on: gives the condition for constructing a new variable instead
+ of using the new one
+
+ :return_type: a tuple (:class:`~Descriptor`, int)
+
+.. function:: retrieve(name, type, ordered_values, onordered_values[, create_new_on])
+
+ Find and return an existing variable, or :obj:`None` if no match is found.
+
+ :param name: variable name.
+ :param type: variable type.
+ :type type: Type
+ :param ordered_values: a list of ordered values
+ :param unordered_values: a list of values, for which the order does not
+ matter
+ :param create_new_on: gives the condition for constructing a new variable instead
+ of using the new one
+
+ :return_type: :class:`~Descriptor`
+
+The following examples give the shown results if
+executed only once (in a Python session) and in this order.
+
+:func:`make` can be used for the construction of new variables. ::
+
+ >>> v1, s = Orange.feature.make("a", Orange.data.Type.Discrete, ["a", "b"])
+ >>> print s, v1.values
+ NotFound
+
+A new variable was created and the status is :obj:`~Orange.data.variable
+.MakeStatus.NotFound`. ::
+
+ >>> v2, s = Orange.feature.make("a", Orange.data.Type.Discrete, ["a"], ["c"])
+ >>> print s, v2 is v1, v1.values
+ MissingValues True
+
+The status is :obj:`~MakeStatus.MissingValues`,
+yet the variable is reused (``v2 is v1``). ``v1`` gets a new value,
+``"c"``, which was given as an unordered value. It does
+not matter that the new variable does not need the value ``b``. ::
+
+ >>> v3, s = Orange.feature.make("a", Orange.data.Type.Discrete, ["a", "b", "c", "d"])
+ >>> print s, v3 is v1, v1.values
+ MissingValues True
+
+This is like before, except that the new value, ``d`` is not among the
+ordered values. ::
+
+ >>> v4, s = Orange.feature.make("a", Orange.data.Type.Discrete, ["b"])
+ >>> print s, v4 is v1, v1.values, v4.values
+ Incompatible, False, ,
+
+The new variable needs to have ``b`` as the first value, so it is incompatible
+with the existing variables. The status is
+:obj:`~MakeStatus.Incompatible` and
+a new variable is created; the two variables are not equal and have
+different lists of values. ::
+
+ >>> v5, s = Orange.feature.make("a", Orange.data.Type.Discrete, None, ["c", "a"])
+ >>> print s, v5 is v1, v1.values, v5.values
+ OK True
+
+The new variable has values ``c`` and ``a``, but the order is not important,
+so the existing attribute is :obj:`~MakeStatus.OK`. ::
+
+ >>> v6, s = Orange.feature.make("a", Orange.data.Type.Discrete, None, ["e"]) "a"])
+ >>> print s, v6 is v1, v1.values, v6.values
+ NoRecognizedValues True
+
+The new variable has different values than the existing variable (status
+is :obj:`~MakeStatus.NoRecognizedValues`),
+but the existing one is nonetheless reused. Note that we
+gave ``e`` in the list of unordered values. If it was among the ordered, the
+reuse would fail. ::
+
+ >>> v7, s = Orange.feature.make("a", Orange.data.Type.Discrete, None,
+ ["f"], Orange.feature.MakeStatus.NoRecognizedValues)))
+ >>> print s, v7 is v1, v1.values, v7.values
+ Incompatible False
+
+This is the same as before, except that we prohibited reuse when there are no
+recognized values. Hence a new variable is created, though the returned status is
+the same as before::
+
+ >>> v8, s = Orange.feature.make("a", Orange.data.Type.Discrete,
+ ["a", "b", "c", "d", "e"], None, Orange.feature.MakeStatus.OK)
+ >>> print s, v8 is v1, v1.values, v8.values
+ OK False
+
+Finally, this is a perfect match, but any reuse is prohibited, so a new
+variable is created.
+
+
+
+Variables computed from other variables
+
+
+Values of variables are often computed from other variables, such as in
+discretization. The mechanism described below usually functions behind the scenes,
+so understanding it is required only for implementing specific transformations.
+
+Monk 1 is a wellknown dataset with target concept ``y := a==b or e==1``.
+It can help the learning algorithm if the fourvalued attribute ``e`` is
+replaced with a binary attribute having values `"1"` and `"not 1"`. The
+new variable will be computed from the old one on the fly.
+
+.. literalinclude:: code/variableget_value_from.py
+ :lines: 717
+
+The new variable is named ``e2``; we define it with a descriptor of type
+:obj:`Discrete`, with appropriate name and values ``"not 1"`` and ``1`` (we
+chose this order so that the ``not 1``'s index is ``0``, which can be, if
+needed, interpreted as ``False``). Finally, we tell e2 to use
+``checkE`` to compute its value when needed, by assigning ``checkE`` to
+``e2.get_value_from``.
+
+``checkE`` is a function that is passed an instance and another argument we
+do not care about here. If the instance's ``e`` equals ``1``, the function
+returns value ``1``, otherwise it returns ``not 1``. Both are returned as
+values, not plain strings.
+
+In most circumstances the value of ``e2`` can be computed on the fly  we can
+pretend that the variable exists in the data, although it does not (but
+can be computed from it). For instance, we can compute the information gain of
+variable ``e2`` or its distribution without actually constructing data containing
+the new variable.
+
+.. literalinclude:: code/variableget_value_from.py
+ :lines: 1922
+
+There are methods which cannot compute values on the fly because it would be
+too complex or time consuming. In such cases, the data need to be converted
+to a new :obj:`Orange.data.Table`::
+
+ new_domain = Orange.data.Domain([data.domain["a"], data.domain["b"], e2, data.domain.class_var])
+ new_data = Orange.data.Table(new_domain, data)
+
+Automatic computation is useful when the data is split into training and
+testing examples. Training instances can be modified by adding, removing
+and transforming variables (in a typical setup, continuous variables
+are discretized prior to learning, therefore the original variables are
+replaced by new ones). Test instances, on the other hand, are left as they
+are. When they are classified, the classifier automatically converts the
+testing instances into the new domain, which includes recomputation of
+transformed variables.
+
+.. literalinclude:: code/variableget_value_from.py
+ :lines: 24
Index: docs/reference/rst/Orange.feature.rst
===================================================================
 docs/reference/rst/Orange.feature.rst (revision 9372)
+++ docs/reference/rst/Orange.feature.rst (revision 9896)
@@ 8,4 +8,5 @@
:maxdepth: 2
+ Orange.feature.descriptor
Orange.feature.scoring
Orange.feature.selection
Index: docs/reference/rst/code/majorityclassification.py
===================================================================
 docs/reference/rst/code/majorityclassification.py (revision 9823)
+++ docs/reference/rst/code/majorityclassification.py (revision 9894)
@@ 15,5 +15,5 @@
res = Orange.evaluation.testing.cross_validation(learners, monks)
CAs = Orange.evaluation.scoring.CA(res, reportSE=True)
+CAs = Orange.evaluation.scoring.CA(res, report_se=True)
print "Tree: %5.3f+%5.3f" % CAs[0]
Index: docs/reference/rst/code/testingtest.py
===================================================================
 docs/reference/rst/code/testingtest.py (revision 9823)
+++ docs/reference/rst/code/testingtest.py (revision 9894)
@@ 12,5 +12,5 @@
def printResults(res):
 CAs = Orange.evaluation.scoring.CA(res, reportSE=1)
+ CAs = Orange.evaluation.scoring.CA(res, report_se=1)
for name, ca in zip(res.classifierNames, CAs):
print "%s: %5.3f+%5.3f" % (name, ca[0], 1.96 * ca[1]),
Index: docs/reference/rst/code/variableget_value_from.py
===================================================================
 docs/reference/rst/code/variableget_value_from.py (revision 9823)
+++ docs/reference/rst/code/variableget_value_from.py (revision 9897)
@@ 2,6 +2,6 @@
# Category: core
# Uses: monks1
# Referenced: Orange.data.variable
# Classes: Orange.data.variable.Discrete
+# Referenced: Orange.feature
+# Classes: Orange.feature.Discrete
import Orange
@@ 14,8 +14,8 @@
monks = Orange.data.Table("monks1")
e2 = Orange.data.variable.Discrete("e2", values=["not 1", "1"])
+e2 = Orange.feature.Discrete("e2", values=["not 1", "1"])
e2.get_value_from = checkE
print Orange.core.MeasureAttribute_info(e2, monks)
+print Orange.feature.scoring.InfoGain(e2, monks)
dist = Orange.core.Distribution(e2, monks)
Index: docs/reference/rst/index.rst
===================================================================
 docs/reference/rst/index.rst (revision 9729)
+++ docs/reference/rst/index.rst (revision 9897)
@@ 7,4 +7,6 @@
Orange.data
+
+ Orange.feature
Orange.associate
@@ 19,6 +21,4 @@
Orange.evaluation

 Orange.feature
Orange.multilabel
Index: setup.py
===================================================================
 setup.py (revision 9879)
+++ setup.py (revision 9893)
@@ 391,5 +391,5 @@
install.run(self)
 # Create a .pth file wiht a path inside the Orange/orng directory
+ # Create a .pth file with a path inside the Orange/orng directory
# so the old modules are importable
self.path_file, self.extra_dirs = ("orangeorngmodules", "Orange/orng")