source: orange/docs/widgets/rst/evaluate/testlearners.rst @ 11359:8d54e79aa135

Revision 11359:8d54e79aa135, 5.4 KB checked in by Ales Erjavec <ales.erjavec@…>, 14 months ago (diff)

Cleanup of 'Widget catalog' documentation.

Fixed rst text formating, replaced dead hardcoded reference links (now using
:ref:), etc.

Line 
1.. _Test Learners:
2
3Test Learners
4=============
5
6.. image:: ../icons/TestLearners.png
7
8Tests learning algorithms on data.
9
10Signals
11-------
12
13Inputs:
14   - Data (ExampleTable)
15      Data for training and, unless separate test data set is used, testing
16   - Separate Test Data (ExampleTable)
17      Separa data for testing
18   - Learner (orange.Learner)
19      One or more learning algorithms
20
21Outputs:
22   - Evaluation results (orngTest.ExperimentResults)
23      Results of testing the algorithms
24
25
26Description
27-----------
28
29The widget tests learning algorithms on data. Different sampling schemes are
30available, including using a separate test data. The widget does two things.
31First, it shows a table with different performance measures of the classifiers,
32such as classification accuracy and area under ROC. Second, it outputs a signal
33with data which can be used by other widgets for analyzing the performance of
34classifiers, such as :ref:`ROC Analysis` or :ref:`Confusion Matrix`.
35
36The signal Learner has a not very common property that it can be connected to
37more than one widget, which provide multiple learners to be tested with the
38same procedures. If the results of evaluation or fed into further widgets,
39such as the one for ROC analysis, the learning algorithms are analyzed together.
40
41.. image:: images/TestLearners.png
42
43The widget supports various sampling methods. :obj:`Cross-validation` splits
44the data into the given number of folds (usually 5 or 10). The algorithm is
45tested by holding out the examples from one fold at a time; the model is
46induced from the other folds and the examples from the held out fold are
47classified. :obj:`Leave-one-out` is similar, but it holds out one example
48at a time, inducing the model from all others and then classifying the held
49out. This method is obviously very stable and reliable ... and very slow.
50:obj:`Random sampling` randomly splits the data onto the training and
51testing set in the given proportion (e.g. 70:30); the whole procedure is t
52repeated for the specified number of times. :obj:`Test on train data` uses the
53whole data set for training and then for testing. This method practically
54always gives overly optimistic results.
55
56The above methods use the data from signal Data only. To give another data
57set with testing examples (for instance from another file or some data selected
58in another widget), we put it on the input signal Separate Test Data and select
59:obj:`Test on test data`.
60
61Any changes in the above settings are applied immediately if
62:obj:`Applied on any change` is checked. If not, the user will have to press
63:obj:`Apply` to apply any changes.
64
65The widget can compute a number of performance statistics.
66
67   - :obj:`Classification accuracy` is the proportion of correctly classified
68     examples
69   - :obj:`Sensitivity` (also called true positive rate (TPR), hit rate and
70     recall) is the number of detected positive examples among all positive
71     examples, e.g. the proportion of sick people correctly diagnosed as sick
72   - :obj:`Specificity` is the proportion of detected negative examples among
73     all negative examples, e.g. the proportion of healthy correctly recognized
74     as healthy
75   - :obj:`Area under ROC` is the area under receiver-operating curve
76   - :obj:`Information score` is the average amount of information per
77     classified instance, as defined by Kononenko and Bratko
78   - :obj:`F-measure` is a weighted harmonic mean of precision and recall
79     (see below), 2*precision*recall/(precision+recall)
80   - :obj:`Precision` is the number of positive examples among all examples
81     classified as positive, e.g. the number of sick among all diagnosed as
82     sick, or a number of relevant documents among all retrieved documents
83   - :obj:`Recall` is the same measure as sensitivity, except that the latter
84     term is more common in medicine and recall comes from text mining, where
85     it means the proportion of relevant documents which are retrieved
86   - :obj:`Brier score` measure the accuracy of probability assessments, which
87     measures the average deviation between the predicted probabilities of
88     events and the actual events.
89
90
91More comprehensive descriptions of measures can be found at
92`http://en.wikipedia.org/wiki/Receiver_operating_characteristic
93<http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
94(from classification accuracy to area under ROC),
95`http://www.springerlink.com/content/j21p620rw33xw773/
96<http://www.springerlink.com/content/j21p620rw33xw773/>`_ (information score),
97`http://en.wikipedia.org/wiki/F-measure#Performance_measures
98<http://en.wikipedia.org/wiki/F-measure#Performance_measures>`_
99(from F-measure to recall) and
100`http://en.wikipedia.org/wiki/Brier_score
101<http://en.wikipedia.org/wiki/Brier_score>`_ (Brier score).
102
103Most measure require a target class, e.g. having the disease or being relevant.
104The target class can be selected at the bottom of the widget.
105
106Example
107-------
108
109In a typical use of the widget, we give it a data set and a few learning
110algorithms, and we observe their performance in the table inside the Test
111Learners widgets and in the ROC and Lift Curve widgets attached to the Test
112Learners. The data is often preprocessed before testing; in this case we
113discretized it and did some manual feature selection; not that this is done
114outside the cross-validation loop, so the testing results may be overly
115optimistic.
116
117.. image:: images/TestLearners-Schema.png
118
119Another example of using this widget is given in the documentation for
120widget :ref:`Confusion Matrix`.
Note: See TracBrowser for help on using the repository browser.