source: orange/orange/doc/widgets/Evaluate/TestLearners.htm @ 9399:6bbe263e8bcf

Revision 9399:6bbe263e8bcf, 6.3 KB checked in by mitar, 2 years ago (diff)

Renaming widgets catalog.

Line 
1<html>
2<head>
3<title>Test Learners</title>
4<link rel=stylesheet href="../../../style.css" type="text/css" media=screen>
5<link rel=stylesheet href="style-print.css" type="text/css" media=print></link>
6</head>
7
8<body>
9
10<h1>Test Learners</h1>
11
12<img class="screenshot" src="../icons/TestLearners.png">
13<p>Tests learning algorithms on data.</p>
14
15<h2>Channels</h2>
16
17<h3>Inputs</h3>
18
19<DL class=attributes>
20<DT>Data (ExampleTable)</DT>
21<DD>Data for training and, unless separate test data set is used, testing</DD>
22
23<DT>Separate Test Data (ExampleTable)</DT>
24<DD>Separa data for testing</DD>
25
26<DT>Learner (orange.Learner)</DT>
27<DD>One or more learning algorithms</DD>
28</dl>
29
30<h3>Outputs</h3>
31
32<DL class=attributes>
33<DT>Evaluation results (orngTest.ExperimentResults)</DT>
34<DD>Results of testing the algorithms</DD>
35</dl>
36
37<h2>Description</h2>
38
39<p>The widget tests learning algorithms on data. Different sampling schemes are available, including using a separate test data. The widget does two things. First, it shows a table with different performance measures of the classifiers, such as classification accuracy and area under ROC. Second, it outputs a signal with data which can be used by other widgets for analyzing the performance of classifiers, such as <a href="ROCAnalysis.htm">ROC Analysis</a> or <a href="ConfusionMatrix.htm">Confusion Matrix</a>.</p>
40
41<p>The signal Learner has a not very common property that it can be connected to more than one widget, which provide multiple learners to be tested with the same procedures. If the results of evaluation or fed into further widgets, such as the one for ROC analysis, the learning algorithms are analyzed together.</p>
42
43<img class="screenshot" src="TestLearners.png"/>
44
45<p>The widget supports various sampling methods. <span class="option">Cross-validation</span> splits the data into the given number of folds (usually 5 or 10). The algorithm is tested by holding out the examples from one fold at a time; the model is induced from the other folds and the examples from the held out fold are classified. <span class="option">Leave-one-out</span> is similar, but it holds out one example at a time, inducing the model from all others and then classifying the held out. This method is obviously very stable and reliable ... and very slow. <span class="option">Random sampling</span> randomly splits the data onto the training and testing set in the given proportion (e.g. 70:30); the whole procedure is repeated for the specified number of times. <span class="option">Test on train data</span> uses the whole data set for training and then for testing. This method practically always gives overly optimistic results.</p>
46
47<p>The above methods use the data from signal Data only. To give another data set with testing examples (for instance from another file or some data selected in another widget), we put it on the input signal Separate Test Data and select <span class="option">Test on test data</span>.</p>
48
49<p>Any changes in the above settings are applied immediately if <span class="option">Applied on any change</span> is checked. If not, the user will have to press <span class="option">Apply</span> to apply any changes.</p>
50
51<p>The widget can compute a number of performance statistics.
52<ul>
53<li><span class="option">Classification accuracy</span> is the proportion of correctly classified examples</li>
54<li><span class="option">Sensitivity</span> (also called true positive rate (TPR), hit rate and recall) is the number of detected positive examples among all positive examples, e.g. the proportion of sick people correctly diagnosed as sick</li>
55<li><span class="option">Specificity</span> is the proportion of detected negative examples among all negative examples, e.g. the proportion of healthy correctly recognized as healthy</li>
56<li><span class="option">Area under ROC</span> is the area under receiver-operating curve</li>
57<li><span class="option">Information score</span> is the average amount of information per classified instance, as defined by Kononenko and Bratko</li>
58<li><span class="option">F-measure</span> is a weighted harmonic mean of precision and recall (see below), 2*precision*recall/(precision+recall)</li>
59<li><span class="option">Precision</span> is the number of positive examples among all examples classified as positive, e.g. the number of sick among all diagnosed as sick, or a number of relevant documents among all retrieved documents</li>
60<li><span class="option">Recall</span> is the same measure as sensitivity, except that the latter term is more common in medicine and recall comes from text mining, where it means the proportion of relevant documents which are retrieved</li>
61<li><span class="option">Brier score</span> measure the accuracy of probability assessments, which measures the average deviation between the predicted probabilities of events and the actual events.</li>
62</ul>
63
64<p>More comprehensive descriptions of measures can be found at <a href="http://en.wikipedia.org/wiki/Receiver_operating_characteristic">http://en.wikipedia.org/wiki/Receiver_operating_characteristic</a> (from classification accuracy to area under ROC),
65<a href="http://www.springerlink.com/content/j21p620rw33xw773/">http://www.springerlink.com/content/j21p620rw33xw773/</a> (information score), <a href="http://en.wikipedia.org/wiki/F-measure#Performance_measures">http://en.wikipedia.org/wiki/F-measure#Performance_measures</a>
66(from F-measure to recall) and <a href="http://en.wikipedia.org/wiki/Brier_score">http://en.wikipedia.org/wiki/Brier_score</a> (Brier score).</p>
67
68<p>Most measure require a target class, e.g. having the disease or being relevant. The target class can be selected at the bottom of the widget.</p>
69
70<h2>Example</h2>
71
72<p>In a typical use of the widget, we give it a data set and a few learning algorithms, and we observe their performance in the table inside the Test Learners widgets and in the ROC and Lift Curve widgets attached to the Test Learners. The data is often preprocessed before testing; in this case we discretized it and did some manual feature selection; not that this is done outside the cross-validation loop, so the testing results may be overly optimistic.</p>
73
74<img class="schema" src="TestLearners-Schema.png"/>
75
76<p>Another example of using this widget is given in the documentation for widget <a href="ConfusionMatrix.htm">Confusion Matrix</a>.</p>
77
78</body>
79</html>
Note: See TracBrowser for help on using the repository browser.