Ignore:
Files:
2 deleted
61 edited

Legend:

Unmodified
Added
Removed
  • Orange/OrangeWidgets/Data/OWPythonScript.py

    r11330 r11364  
    1010import code 
    1111import keyword 
     12import itertools 
    1213 
    1314from PyQt4.QtGui import ( 
     
    243244        pass 
    244245 
     246    def _moveCursorToInputLine(self): 
     247        """ 
     248        Move the cursor to the input line if not already there. If the cursor 
     249        if already in the input line (at position greater or equal to 
     250        `newPromptPos`) it is left unchanged, otherwise it is moved at the 
     251        end. 
     252 
     253        """ 
     254        cursor = self.textCursor() 
     255        pos = cursor.position() 
     256        if pos < self.newPromptPos: 
     257            cursor.movePosition(QTextCursor.End) 
     258            self.setTextCursor(cursor) 
     259 
     260    def pasteCode(self, source): 
     261        """ 
     262        Paste source code into the console. 
     263        """ 
     264        self._moveCursorToInputLine() 
     265 
     266        for line in interleave(source.splitlines(), itertools.repeat("\n")): 
     267            if line != "\n": 
     268                self.insertPlainText(line) 
     269            else: 
     270                self.write("\n") 
     271                self.loop.next() 
     272 
     273    def insertFromMimeData(self, source): 
     274        """ 
     275        Reimplemented from QPlainTextEdit.insertFromMimeData. 
     276        """ 
     277        if source.hasText(): 
     278            self.pasteCode(unicode(source.text())) 
     279            return 
     280 
     281 
     282def interleave(seq1, seq2): 
     283    """ 
     284    Interleave elements of `seq2` between consecutive elements of `seq1`. 
     285 
     286        >>> list(interleave([1, 3, 5], [2, 4])) 
     287        [1, 2, 3, 4, 5] 
     288 
     289    """ 
     290    iterator1, iterator2 = iter(seq1), iter(seq2) 
     291    leading = next(iterator1) 
     292    for element in iterator1: 
     293        yield leading 
     294        yield next(iterator2) 
     295        leading = element 
     296 
     297    yield leading 
     298 
    245299 
    246300class Script(object): 
  • Orange/OrangeWidgets/Unsupervised/OWSOM.py

    r11319 r11363  
    144144            self.send("SOM", None) 
    145145            self.send("Learner", None) 
    146             self.send("odebook vectors", None) 
     146            self.send("Codebook vectors", None) 
    147147 
    148148    def ApplySettings(self): 
  • docs/tutorial/rst/classification.rst

    r11084 r11361  
    140140.. literalinclude:: code/classification-models.py 
    141141 
    142    The logistic regression part of the output is: 
     142The logistic regression part of the output is:: 
    143143    
    144144   class attribute = survived 
  • docs/widgets/rst/associate/associationrules.rst

    r11050 r11359  
    2525----------- 
    2626 
    27 This widget runs several algorithms for induction of association rules: the original Agrawal's algorithm for sparse data sets, a modified version by Zupan and Demsar which is more suitable for the usual machine learning data, and, finally, an algorithm which induces classification rules where the right-hand side of the rule is always the class attribute. Don't confuse the latter algorithm with rule induction like `Association Rules Tree Viewer <AssociationRulesTreeViewer.htm>`_. 
     27This widget runs several algorithms for induction of association rules: 
     28the original Agrawal's algorithm for sparse data sets, a modified version 
     29by Zupan and Demsar which is more suitable for the usual machine learning 
     30data, and, finally, an algorithm which induces classification rules where 
     31the right-hand side of the rule is always the class attribute. 
    2832 
    2933.. image:: images/AssociationRules.png 
    3034   :alt: Association Rules Widget 
    3135 
    32 The first check box, :obj:`Induce classification rules` allows you to specify the algorithm to use. If checked, the original Agrawal's algorithm is used, which is designed for (very) sparse data sets. If clear, it will use an algorithm which works better on the usual machine learning data where each example is described by a (smaller) list of attributes and there are not many missing values. 
     36The first check box, :obj:`Induce classification rules` allows you to specify 
     37the algorithm to use. If checked, the original Agrawal's algorithm is 
     38used, which is designed for (very) sparse data sets. If clear, it will 
     39use an algorithm which works better on the usual machine learning data where 
     40each example is described by a (smaller) list of attributes and there are 
     41not many missing values. 
    3342 
    34 Next, you can decide whether to :obj:`Induce classification rules` or ordinary association rules. The former always have the class attribute (and nothing else) on the right-hand side. You can combine this with any of the above two algorithms. 
     43Next, you can decide whether to :obj:`Induce classification rules` or 
     44ordinary association rules. The former always have the class attribute 
     45(and nothing else) on the right-hand side. You can combine this with any 
     46of the above two algorithms. 
    3547 
    36 As for pruning, you can specify the :obj:`Minimal support` and :obj:`Minimal confidence`, where support is percentage of the entire data set covered by the (entire) rule and the confidence is the proportion of the number of examples which fit the right side among those that fit the left side. The running time depends primarily on the support. 
     48As for pruning, you can specify the :obj:`Minimal support` and 
     49:obj:`Minimal confidence`, where support is percentage of the entire 
     50data set covered by the (entire) rule and the confidence is the proportion 
     51of the number of examples which fit the right side among those that fit the 
     52left side. The running time depends primarily on the support. 
    3753 
    38 If support is set too low, the algorithm may find too many rules and eventually run out of memory. For this reason the number of rules is by default limited to 10000. You can increase the limit at a risk of running out of memory. 
     54If support is set too low, the algorithm may find too many rules and 
     55eventually run out of memory. For this reason the number of rules is 
     56by default limited to 10000. You can increase the limit at a risk of 
     57running out of memory. 
    3958 
    40 :obj:`Build rules` runs the algorithm and outputs the induced rules. You need to push this button after changing any settings. 
     59:obj:`Build rules` runs the algorithm and outputs the induced rules. 
     60You need to push this button after changing any settings. 
    4161 
    4262 
     
    4464-------- 
    4565 
    46 This widget is typically used with the `Association Rules Tree Viewer <AssociationRulesTreeViewer.htm>`_ and/or `Association Rules Tree Viewer <AssociationRulesTreeViewer.htm>`_. One possible schema is shown below. 
     66This widget is typically used with the :ref:`Association Rules Filter` 
     67and/or :ref:`Association Rules Explorer`. One possible schema is shown below. 
    4768 
    4869.. image:: images/AssociationRules-Schema.png 
  • docs/widgets/rst/associate/associationrulesexplorer.rst

    r11050 r11359  
    2525This widget is a tree-like association rules viewer. 
    2626 
    27 The widget can be used to find all the rules that include a particular condition on the left-hand side. The below snapshot is made on the Titanic data set, using the filter as shown on the page on the `Association Rules Filter <AssociationRulesViewer.htm>`_. 
     27The widget can be used to find all the rules that include a particular 
     28condition on the left-hand side. The below snapshot is made on the Titanic 
     29data set, using the filter as shown on the page on the 
     30:ref:`Association Rules Filter`. 
    2831 
    2932.. image:: images/AssociationRulesTreeViewer-closed.png 
    3033   :alt: Association Rules Tree Viewer Widget 
    3134 
    32 Say that we are interested in rules regarding the survival of adults. By opening the branch "age=adult", we discover four rules that contain this condition as the sole condition on the left-hand side, while the right-hand sides contain different combinations of sex, status and survival. 
     35Say that we are interested in rules regarding the survival of adults. By 
     36opening the branch "age=adult", we discover four rules that contain this 
     37condition as the sole condition on the left-hand side, while the right-hand 
     38sides contain different combinations of sex, status and survival. 
    3339 
    3440.. image:: images/AssociationRulesTreeViewer-semi-open.png 
    3541   :alt: Association Rules Tree Viewer Widget 
    3642 
    37 Besides that, there are rules with (at least) two conditions on the left-hand side, "age=adult" and "sex=male"; to explore these rules, we would need to open the corresponding branch. 
     43Besides that, there are rules with (at least) two conditions on the 
     44left-hand side, "age=adult" and "sex=male"; to explore these rules, 
     45we would need to open the corresponding branch. 
    3846 
    39 Each leaf of the tree corresponds to one particular ordering of the left-hand side conditions in a particular rule. Turned around, this means that each rule appears in many places in the tree. As the completely open tree below shows, the rule :code:`age=adult & sex=male -> status=crew` appears in two places (the seventh and the eleventh row). 
     47Each leaf of the tree corresponds to one particular ordering of the 
     48left-hand side conditions in a particular rule. Turned around, this 
     49means that each rule appears in many places in the tree. As the completely 
     50open tree below shows, the rule :code:`age=adult & sex=male -> status=crew` 
     51appears in two places (the seventh and the eleventh row). 
    4052 
    41 On the left-hand side of the widget, we can choose the measures we want to observe. Let L, R and B be the number of examples that fit the left, the right and both sides of the rule, respectively, and N the total number of examples in the data set. The measures are then defined as follows 
     53On the left-hand side of the widget, we can choose the measures we want 
     54to observe. Let L, R and B be the number of examples that fit the left, 
     55the right and both sides of the rule, respectively, and N the total number 
     56of examples in the data set. The measures are then defined as follows 
    4257 
    4358   - confidence = B / L 
     
    5065 
    5166 
    52 :obj:`Tree depth` sets the depth to which the tree is expanded. If, it is set to, for instance, three, then the leaves corresponding to rules with five conditions will, besides the right-hand side, also contain the two conditions which are not shown in the branches. 
     67:obj:`Tree depth` sets the depth to which the tree is expanded. If, it is set 
     68to, for instance, three, then the leaves corresponding to rules with five 
     69conditions will, besides the right-hand side, also contain the two conditions 
     70which are not shown in the branches. 
    5371 
    54 With :obj:`Display whole rules` we can decide whether we want the entire rule (including the information that is already contained in the branches, that is, in the path from the root to the rule) reprinted again in the leaf. 
     72With :obj:`Display whole rules` we can decide whether we want the entire rule 
     73(including the information that is already contained in the branches, that 
     74is, in the path from the root to the rule) reprinted again in the leaf. 
    5575 
    5676Examples 
    5777-------- 
    5878 
    59 This widget is typically used with the `Association Rules <AssociationRules.htm>`_ and possibly `Association Rules Filter <AssociationRulesViewer.htm>`_. A typical schema is shown below. 
     79This widget is typically used with the :ref:`Association Rules` and possibly 
     80:ref:`Association Rules Filter`. A typical schema is shown below. 
    6081 
    6182.. image:: images/AssociationRules-Schema.png 
  • docs/widgets/rst/associate/associationrulesfilter.rst

    r11050 r11359  
    66.. image:: ../icons/AssociationRulesFilter.png 
    77 
    8 A widget for printing out the rules and for their manual exploration and selection. 
     8A widget for printing out the rules and for their manual exploration and 
     9selection. 
    910 
    1011Signals 
     
    3031   - listing the induced rules and the corresponding measures of quality 
    3132 
    32 The columns of the grid on the left hand side of the widget represent different supports and the rows represents confidences. The scale is given above the grid: for the case in this snapshot, support goes from 28% to 75% and the confidence ranges from 29% to 100%. Each cell is colored according to the number of rules with that particular support and confidence - the darker the cell, the more rules are inside it. 
     33The columns of the grid on the left hand side of the widget represent 
     34different supports and the rows represents confidences. The scale is given 
     35above the grid: for the case in this snapshot, support goes from 28% to 75% 
     36and the confidence ranges from 29% to 100%. Each cell is colored according 
     37to the number of rules with that particular support and confidence - the 
     38darker the cell, the more rules are inside it. 
    3339 
    34 You can select a part of the grid. In the snapshot we selected the rules with supports between 36% and 66% and confidences from 33% to 82%. 
     40You can select a part of the grid. In the snapshot we selected the rules 
     41with supports between 36% and 66% and confidences from 33% to 82%. 
    3542 
    36 When the widget receives certain data, it shows the area containing all the rules. You can :obj:`Zoom` in to enlarge the selected part, push :obj:`Show All` to see the region with the rules again, or :obj:`No Zoom` to see the region with support and confidence from 0 to 1. :obj:`Unselect` removes the selection. 
     43When the widget receives certain data, it shows the area containing all 
     44the rules. You can :obj:`Zoom` in to enlarge the selected part, push 
     45:obj:`Show All` to see the region with the rules again, or :obj:`No Zoom` 
     46to see the region with support and confidence from 0 to 1. :obj:`Unselect` 
     47removes the selection. 
    3748 
    3849If nothing is selected, the widget outputs all rules. 
    3950 
    40 On the right hand side of the widget there is a list of the selected rules. The checkboxes above can be used to select the measures of rule quality that we are interested in. Let L, R and B be the number of examples that fit the left, the right and both sides of the rule, respectively, and N the total number of examples in the data set. The measures are then defined as follows 
     51On the right hand side of the widget there is a list of the selected rules. 
     52The checkboxes above can be used to select the measures of rule quality that 
     53we are interested in. Let L, R and B be the number of examples that fit the 
     54left, the right and both sides of the rule, respectively, and N the total 
     55number of examples in the data set. The measures are then defined as follows 
    4156 
    4257   - support = B / N 
     
    4964 
    5065 
    51 With the buttons below you can :obj:`Save Rules` into a tab-delimited file or :obj:`Send Rules` to the widget connected to the output. The latter is only enabled if :obj:`Send rules automatically` is unchecked. When it is checked, the rules are put on the widget's output automatically at every selection change. 
     66With the buttons below you can :obj:`Save Rules` into a tab-delimited file 
     67or :obj:`Send Rules` to the widget connected to the output. The latter is 
     68only enabled if :obj:`Send rules automatically` is unchecked. When it is 
     69checked, the rules are put on the widget's output automatically at every 
     70selection change. 
    5271 
    5372Examples 
    5473-------- 
    5574 
    56 This widget is used with the `Association Rules <AssociationRules.htm>`_ and maybe with a tree-like `Association Rules Tree Viewer <AssociationRulesTreeViewer.htm>`_. The typical schema is shown below. 
     75This widget is used with the :ref:`Association Rules` and maybe with a 
     76tree-like :ref:`Association Rules Explorer` The typical schema is shown below. 
    5777 
    5878.. image:: images/AssociationRules-Schema.png 
  • docs/widgets/rst/classify/c45.rst

    r11050 r11359  
    3333 
    3434 
    35 :code:`Classifier`, :code:`C45 Tree` and :code:`Classification Tree` are available only if examples are present on the input. Which of the latter two output signals is active is determined by setting :obj:`Convert to orange tree structure` (see the description below. 
     35:code:`Classifier`, :code:`C45 Tree` and :code:`Classification Tree` are 
     36available only if examples are present on the input. Which of the latter two 
     37output signals is active is determined by setting 
     38:obj:`Convert to orange tree structure` (see the description below). 
    3639 
    3740Description 
    3841----------- 
    3942 
    40 This widget provides a graphical interface to the well-known Quinlan's C4.5 algorithm for construction of classification tree. Orange uses the original Quinlan's code which must be, due to copyright issues, built and linked in separately. 
     43This widget provides a graphical interface to the well-known Quinlan's C4.5 
     44algorithm for construction of classification tree. Orange uses the original 
     45Quinlan's code which must be, due to copyright issues, built and linked in 
     46separately. 
    4147 
    42 Orange also implements its own classification tree induction algorithm which is comparable to Quinlan's, though the results may differ due to technical details. It is accessible in widget :code:`Classification Tree`. 
     48Orange also implements its own classification tree induction algorithm which 
     49is comparable to Quinlan's, though the results may differ due to technical 
     50details. It is accessible in widget :ref:`Classification Tree`. 
    4351 
    44 As all widgets for classification, C4.5 widget provides learner and classifier on the output. Learner is a learning algorithm with settings as specified by the user. It can be fed into widgets for testing learners, namely :code:`Test Learners`. Classifier is a classification tree build from the training examples on the input. If examples are not given, the widget outputs no classifier. 
     52As all widgets for classification, C4.5 widget provides learner and classifier 
     53on the output. Learner is a learning algorithm with settings as specified by 
     54the user. It can be fed into widgets for testing learners, namely 
     55:ref:`Test Learners`. Classifier is a classification tree build from the 
     56training examples on the input. If examples are not given, the widget outputs 
     57no classifier. 
    4558 
    4659.. image:: images/C4.5.png 
    4760   :alt: C4.5 Widget 
    4861 
    49 Learner can be given a name under which it will appear in, say, :code:`Test Learners`. The default name is "C4.5". 
     62Learner can be given a name under which it will appear in, say, 
     63:ref:`Test Learners`. The default name is "C4.5". 
    5064 
    51 The next block of options deals with splitting. C4.5 uses gain ratio by default; to override this, check :obj:`Use information gain instead of ratio`, which is equivalent to C4.5's command line option :code:`-g`. If you enable :obj:`subsetting` (equivalent to :code:`-s`), C4.5 will merge values of multivalued discrete attributes instead of creating one branch for each node. :obj:`Probabilistic threshold for continuous attributes` (:code:`-p`) makes C4.5 compute the lower and upper boundaries for values of continuous attributes for which the number of misclassified examples would be within one standard deviation from the base error. 
     65The next block of options deals with splitting. C4.5 uses gain ratio by 
     66default; to override this, check :obj:`Use information gain instead of ratio`, 
     67which is equivalent to C4.5's command line option :code:`-g`. If you enable 
     68:obj:`subsetting` (equivalent to :code:`-s`), C4.5 will merge values of 
     69multivalued discrete attributes instead of creating one branch for each node. 
     70:obj:`Probabilistic threshold for continuous attributes` (:code:`-p`) makes 
     71C4.5 compute the lower and upper boundaries for values of continuous attributes 
     72for which the number of misclassified examples would be within one standard 
     73deviation from the base error. 
    5274 
    53 As for pruning, you can set the :obj:`Minimal number of examples in the leaves` (Quinlan's default is 2, but you may want to disable this for noiseless data), and the :obj:`Post prunning with confidence level`; the default confidence is 25. 
     75As for pruning, you can set the :obj:`Minimal number of examples in the leaves` 
     76(Quinlan's default is 2, but you may want to disable this for noiseless data), 
     77and the :obj:`Post prunning with confidence level`; the default confidence is 
     7825. 
    5479 
    55 Trees can be constructed iteratively, with ever larger number of examples. If enable, you can set the :obj:`Number of trials`, the :obj:`initial windows size` and :obj:`window increment`. 
     80Trees can be constructed iteratively, with ever larger number of examples. If 
     81enable, you can set the :obj:`Number of trials`, the 
     82:obj:`initial windows size` and :obj:`window increment`. 
    5683 
    57 The resulting classifier can be left in the original Quinlan's structure, as returned by his underlying code, or :obj:`converted to orange the structure` that is used by Orange's tree induction algorithm. This setting decides which of the two signals that output the tree - :code:`C45 Classifier` or :code:`Tree Classifier` will be active. As Orange's structure is more general and can easily accommodate all the data that C4.5 tree needs for classification, we believe that the converted tree behave exactly the same as the original tree, so the results should not depend on this setting. You should therefore leave it enabled since only the converted trees can be shown in the tree displaying widgets. 
     84The resulting classifier can be left in the original Quinlan's structure, as 
     85returned by his underlying code, or :obj:`converted to orange the structure` 
     86that is used by Orange's tree induction algorithm. This setting decides which 
     87of the two signals that output the tree - :code:`C45 Classifier` or 
     88:code:`Tree Classifier` will be active. As Orange's structure is more general 
     89and can easily accommodate all the data that C4.5 tree needs for 
     90classification, we believe that the converted tree behave exactly the same as 
     91the original tree, so the results should not depend on this setting. You should 
     92therefore leave it enabled since only the converted trees can be shown in the 
     93tree displaying widgets. 
    5894 
    59 When you change one or more settings, you need to push :obj:`Apply`; this will put the new learner on the output and, if the training examples are given, construct a new classifier and output it as well. 
     95When you change one or more settings, you need to push :obj:`Apply`; this will 
     96put the new learner on the output and, if the training examples are given, 
     97construct a new classifier and output it as well. 
    6098 
    6199 
     
    63101-------- 
    64102 
    65 There are two typical uses of this widget. First, you may want to induce the tree and see what it looks like, like in the schema on the right. 
     103There are two typical uses of this widget. First, you may want to induce the 
     104tree and see what it looks like, like in the schema on the right. 
    66105 
    67106.. image:: images/C4.5-SchemaClassifier2.png 
    68107   :alt: C4.5 - Schema with a Classifier 
    69108 
    70 The second schema shows how to compare the results of C4.5 learner with another classifier, naive Bayesian Learner. 
     109The second schema shows how to compare the results of C4.5 learner with another 
     110classifier, naive Bayesian Learner. 
    71111 
    72112.. image:: images/C4.5-SchemaLearner.png 
  • docs/widgets/rst/classify/classificationtree.rst

    r11050 r11359  
    2121 
    2222   - Learner 
    23       The classification tree learning algorithm with settings as specified in the dialog. 
     23      The classification tree learning algorithm with settings as specified in 
     24      the dialog. 
    2425 
    2526   - Classification Tree 
     
    2728 
    2829 
    29 Signal :code:`Classification Tree` sends data only if the learning data (signal :code:`Classified Examples` is present. 
     30Signal :code:`Classification Tree` sends data only if the learning data 
     31(signal :code:`Classified Examples` is present. 
    3032 
    3133Description 
    3234----------- 
    3335 
    34 This widget provides a graphical interface to the classification tree learning algorithm. 
     36This widget provides a graphical interface to the classification tree learning 
     37algorithm. 
    3538 
    36 As all widgets for classification, this widget provides a learner and classifier on the output. Learner is a learning algorithm with settings as specified by the user. It can be fed into widgets for testing learners, for instance :code:`Test Learners`. Classifier is a Classification Tree Classifier (a subtype of a general classifier), built from the training examples on the input. If examples are not given, there is no classifier on the output. 
     39As all widgets for classification, this widget provides a learner and 
     40classifier on the output. Learner is a learning algorithm with settings 
     41as specified by the user. It can be fed into widgets for testing learners, 
     42for instance :ref:`Test Learners`. Classifier is a Classification Tree 
     43Classifier (a subtype of a general classifier), built from the training 
     44examples on the input. If examples are not given, there is no classifier on 
     45the output. 
    3746 
    3847.. image:: images/ClassificationTree.png 
    3948   :alt: Classification Tree Widget 
    4049 
    41 Learner can be given a name under which it will appear in, say, :code:`Test Learners`. The default name is "Classification Tree". 
     50Learner can be given a name under which it will appear in, say, 
     51:ref:`Test Learners`. The default name is "Classification Tree". 
    4252 
    43 The first block of options deals with the :obj:`Attribute selection criterion`, where you can choose between the information gain, gain ratio, gini index and ReliefF. For the latter, it is possible to :obj:`Limit the number of reference examples` (more examples give more accuracy and less speed) and the :obj:`Number of neighbours` considered in the estimation. 
     53The first block of options deals with the :obj:`Attribute selection criterion`, 
     54where you can choose between the information gain, gain ratio, gini index and 
     55ReliefF. For the latter, it is possible to :obj:`Limit the number of reference 
     56examples` (more examples give more accuracy and less speed) and the 
     57:obj:`Number of neighbours` considered in the estimation. 
    4458 
    45 If :code:`Binarization` is checked, the values of multivalued attributes are split into two groups (based on the statistics in the particular node) to yield a binary tree. Binarization gets rid of the usual measures' bias towards attributes with more values and is generally recommended. 
     59If :code:`Binarization` is checked, the values of multivalued attributes 
     60are split into two groups (based on the statistics in the particular node) 
     61to yield a binary tree. Binarization gets rid of the usual measures' 
     62bias towards attributes with more values and is generally recommended. 
    4663 
    47 Pruning during induction can be based on the :obj:`Minimal number of instance in leaves`; if checked, the algorithm will never construct a split which would put less than the specified number of training examples into any of the branches. You can also forbid the algorithm to split the nodes with less than the given number of instances (:obj:`Stop splitting nodes with less instances than`)or the nodes with a large enough majority class (:obj:`Stop splitting nodes with a majority class of (%)`. 
     64Pruning during induction can be based on the :obj:`Minimal number of 
     65instance in leaves`; if checked, the algorithm will never construct a split 
     66which would put less than the specified number of training examples into any 
     67of the branches. You can also forbid the algorithm to split the nodes with 
     68less than the given number of instances (:obj:`Stop splitting nodes with 
     69less instances than`)or the nodes with a large enough majority class 
     70(:obj:`Stop splitting nodes with a majority class of (%)`. 
    4871 
    49 During induction, the algorithm can produce a tree in which entire subtrees predict the same class, but with different probabilities. This can increase probability based measures of classifier quality, like the Brier score or AUC, but the trees tend to be much larger and more difficult to grasp. To avoid it, tell it to :obj:`Recursively merge the leaves with same majority class`. The widget also supports :obj:`pruning with m-estimate`. 
     72During induction, the algorithm can produce a tree in which entire subtrees 
     73predict the same class, but with different probabilities. This can increase 
     74probability based measures of classifier quality, like the Brier score 
     75or AUC, but the trees tend to be much larger and more difficult to grasp. 
     76To avoid it, tell it to :obj:`Recursively merge the leaves with same 
     77majority class`. The widget also supports :obj:`pruning with m-estimate`. 
    5078 
    51 After changing one or more settings, you need to push :obj:`Apply`, which will put the new learner on the output and, if the training examples are given, construct a new classifier and output it as well. 
     79After changing one or more settings, you need to push :obj:`Apply`, which 
     80will put the new learner on the output and, if the training examples are 
     81given, construct a new classifier and output it as well. 
    5282 
    53 The tree can deal with missing data. Orange's tree learner actually supports quite a few methods for that, but when used from canvas, it effectively splits the example into multiple examples with different weights. If you had data with 25% males and 75% females, then when the gender is unknown, the examples splits into two, a male and a female with weights .25 and .75, respectively. This goes for both learning and classification. 
     83The tree can deal with missing data. Orange's tree learner actually 
     84supports quite a few methods for that, but when used from canvas, 
     85it effectively splits the example into multiple examples with different 
     86weights. If you had data with 25% males and 75% females, then when the 
     87gender is unknown, the examples splits into two, a male and a female 
     88with weights .25 and .75, respectively. This goes for both learning 
     89and classification. 
    5490 
    5591Examples 
    5692-------- 
    5793 
    58 There are two typical uses of this widget. First, you may want to induce the model and check what it looks like. You do it with the schema below; to learn more about it, see the documentation on `Classification Tree Graph <ClassificationTreeGraph.htm>`_. 
     94There are two typical uses of this widget. First, you may want to induce 
     95the model and check what it looks like. You do it with the schema below; 
     96to learn more about it, see the documentation on :ref:`Classification Tree 
     97Graph` 
    5998 
    6099.. image:: images/ClassificationTreeGraph-SimpleSchema-S.gif 
  • docs/widgets/rst/classify/classificationtreegraph.rst

    r11050 r11359  
    1818Outputs: 
    1919   - Examples (ExampleTable) 
    20    - Attribute-valued data set associated with a classification tree node selected by the user. 
     20      Attribute-valued data set associated with a classification tree node 
     21      selected by the user. 
    2122 
    2223 
     
    3637 
    3738General Tab 
     39----------- 
    3840 
    3941Several general parameters that affect the drawing size of the 
     
    4547 
    4648Tree Tab 
    47  
     49-------- 
    4850 
    4951.. image:: images/ClassificationTreeGraph-TreeTab.png 
     
    6466the nodes with respect to the instances in their parent node. 
    6567 
    66  
    67  
    6868Node Tab 
     69-------- 
    6970 
    7071.. image:: images/ClassificationTreeGraph-NodeTab-S.png 
     
    8485 
    8586   - may be uniform (:obj:`Node Color` set to :obj:`Default`), 
    86    - correspond to number of instances in the node with respect to the number of instances in the root node (:obj:`Instances in node`), 
    87    - may report on the probability of the majority class (:obj:`Majority class probability`) where one would expect that the color intensity would be higher towards the leaves of the node, 
    88    - may report on probability of the target class (:obj:`Target class probability`), with more intense colors marking the nodes where instances of target class are more frequent, and 
     87   - correspond to number of instances in the node with respect to the number 
     88     of instances in the root node (:obj:`Instances in node`), 
     89   - may report on the probability of the majority class 
     90     (:obj:`Majority class probability`) where one would expect that the color 
     91     intensity would be higher towards the leaves of the node, 
     92   - may report on probability of the target class (:obj:`Target class 
     93     probability`), with more intense colors marking the nodes where instances 
     94     of target class are more frequent, and 
    8995   - may report on the distribution of instances with target class, 
    90  
    91 where the intensity of node color corresponds to proportion of the 
    92 target class instances in the node with respect to the target class 
    93 instances in the root node (:obj:`Target class distribution`). 
     96     where the intensity of node color corresponds to proportion of the 
     97     target class instances in the node with respect to the target class 
     98     instances in the root node (:obj:`Target class distribution`). 
    9499 
    95100 
    96101Navigation 
     102---------- 
    97103 
    98104:obj:`Find Root` aligns the position of the window such that the 
  • docs/widgets/rst/classify/classificationtreeviewer.rst

    r11050 r11359  
    2424 
    2525 
    26 Signal :code:`Classified Examples` sends data only if some tree node is selected and contains some examples. 
     26Signal :code:`Classified Examples` sends data only if some tree node is 
     27selected and contains some examples. 
    2728 
    2829Description 
    2930----------- 
    3031 
    31 The widget shows the tree as a hierarchy in a textual form. Although less fancy than the graphical viewer, much more information fits in a Tree Viewer's window than in the graphical viewer's one. 
     32The widget shows the tree as a hierarchy in a textual form. Although less 
     33fancy than the graphical viewer, much more information fits in a Tree Viewer's 
     34window than in the graphical viewer's one. 
    3235 
    3336.. image:: images/ClassificationTreeViewer.png 
    3437   :alt: Classification Tree Viewer widget 
    3538 
    36 The widget's options allow choosing the columns to be displayed, setting the general depth of the display (individual nodes can, of course, be opened and closed manually). 
     39The widget's options allow choosing the columns to be displayed, setting the 
     40general depth of the display (individual nodes can, of course, be opened and 
     41closed manually). 
    3742 
    38 On the right hand side there is the tree, where it is possible to select a single node. The rule describing the particular node is shown at the bottom, and the training examples belonging to the node are put on the output. 
     43On the right hand side there is the tree, where it is possible to select a 
     44single node. The rule describing the particular node is shown at the bottom, 
     45and the training examples belonging to the node are put on the output. 
    3946 
    4047 
     
    4249-------- 
    4350 
    44 The obvious use for the widget browsing the induced tree. Orange Canvas, however, offers something more exciting: the widget can be used for exploring the examples belonging to a certain node through using any other widget. If you are a beginner: this widget's output behaves exactly like the output of the file widget, with the exception that instead of reading the examples from a file, it gets them from the tree. 
     51The obvious use for the widget browsing the induced tree. Orange Canvas, 
     52however, offers something more exciting: the widget can be used for exploring 
     53the examples belonging to a certain node through using any other widget. 
     54If you are a beginner: this widget's output behaves exactly like the output 
     55of the file widget, with the exception that instead of reading the examples 
     56from a file, it gets them from the tree. 
    4557 
    4658.. image:: images/ClassificationTreeViewer-Schema.png 
    4759   :alt: A schema with Classification Tree Viewer 
    4860 
    49 In the above example we constructed a tree from the Wisconsin breast cancer data. We explore the tree nodes by feeding the examples in the Data Table (boring!), by checking the attribute value distributions for each node and seeing where the examples lie in the Scatter plot. To make the latter more informative, the Scatter plot shows the entire data set (examples from the File Widget), but marking - by plotting solid symbols instead of hollow ones - the examples belonging to the particular tree node, as shown in the snapshot below. 
     61In the above example we constructed a tree from the Wisconsin breast cancer 
     62data. We explore the tree nodes by feeding the examples in the Data Table, 
     63by checking the attribute value distributions for each node and 
     64seeing where the examples lie in the Scatter plot. To make the latter more 
     65informative, the Scatter plot shows the entire data set (examples from the 
     66File Widget), but marking - by plotting solid symbols instead of hollow ones 
     67- the examples belonging to the particular tree node, as shown in the 
     68snapshot below. 
    5069 
    5170.. image:: images/ClassificationTreeViewer-Example-S.png 
  • docs/widgets/rst/classify/cn2.rst

    r11050 r11359  
    3030 
    3131 
    32 Use this widget to learn a set of if-then rules from data. The algorithm is based on CN2 algorithm, however the variety of options in widget allows user to implement different kinds of cover-and-remove rule learning algorithms. 
     32Use this widget to learn a set of if-then rules from data. The algorithm 
     33is based on CN2 algorithm, however the variety of options in widget allows 
     34user to implement different kinds of cover-and-remove rule learning 
     35algorithms. 
    3336 
    3437.. image:: images/CN2.png 
    3538   :alt: CN2 Widget 
    3639 
    37 In the first box user can select between three evaluation functions. The first, :obj:`Laplace`, was originally used in CN2 algorithm. The second function is :obj:`m-estimate` of probability (used in later versions of CN2) and the last is :obj:`WRACC` (weighted relative accuracy), used in CN2-SD algorithm. 
     40In the first box user can select between three evaluation functions. The 
     41first, :obj:`Laplace`, was originally used in CN2 algorithm. The second 
     42function is :obj:`m-estimate` of probability (used in later versions of 
     43CN2) and the last is :obj:`WRACC` (weighted relative accuracy), used 
     44in CN2-SD algorithm. 
    3845 
    39 In the second box the user can define pre-prunning of rules. The first parameter, :obj:`Alpha (vs. default rule)`, is a parameter of LRS (likelihood ratio statistics). Alpha determines required significance of a rule when compared to the default rule. The second parameter, :obj:`Stopping Alpha (vs. parent rule)`, is also the parameter of LRS, only that in this case the rule is compared to its parent rule: it verifies whether the last specialization of the rule is significant enough. The third parameter, :obj:`Minimum coverage` specifies the minimal number of examples that each induced rule must cover. The last parameter, :obj:`Maximal rule length` limits the length of induced rules. 
     46In the second box the user can define pre-prunning of rules. The first 
     47parameter, :obj:`Alpha (vs. default rule)`, is a parameter of LRS 
     48(likelihood ratio statistics). Alpha determines required significance of 
     49a rule when compared to the default rule. The second parameter, 
     50:obj:`Stopping Alpha (vs. parent rule)`, is also the parameter of LRS, 
     51only that in this case the rule is compared to its parent rule: it verifies 
     52whether the last specialization of the rule is significant enough. 
     53The third parameter, :obj:`Minimum coverage` specifies the minimal number 
     54of examples that each induced rule must cover. The last parameter, 
     55:obj:`Maximal rule length` limits the length of induced rules. 
    4056 
    41  :obj:`Beam width` is the number of best rules that are, in each step, further specialized. Other rules are discarded. 
     57 :obj:`Beam width` is the number of best rules that are, in each step, 
     58 further specialized. Other rules are discarded. 
    4259 
    43 Covering and removing examples can be done in two different ways. :obj:`Exclusive covering`, as in the original CN2, removes all covered examples and continues learning on remaining examples. Alternative type of covering is :obj:`weighted covering`, which only decreases weight of covered examples instead of removing them. The parameter of weighted covering is the multiplier; the weights of all covered examples are multiplied by this number. 
     60Covering and removing examples can be done in two different ways. 
     61:obj:`Exclusive covering`, as in the original CN2, removes all covered 
     62examples and continues learning on remaining examples. Alternative type of 
     63covering is :obj:`weighted covering`, which only decreases weight of covered 
     64examples instead of removing them. The parameter of weighted covering is 
     65the multiplier; the weights of all covered examples are multiplied by this 
     66number. 
    4467 
    45 Any changes of arguments must be confirmed by pushing :obj:`Apply` before they are propagated through the schema. 
     68Any changes of arguments must be confirmed by pushing :obj:`Apply` before 
     69they are propagated through the schema. 
    4670 
    4771 
     
    5074-------- 
    5175 
    52 The figure shows a simple use of the widget. Rules are learned with CN2 widget and the classifier is sent to the Rules Viewer. 
     76The figure shows a simple use of the widget. Rules are learned with 
     77CN2 widget and the classifier is sent to the :ref:`CN2 Rules Viewer`. 
    5378 
    5479.. image:: images/CN2-Interaction-S.png 
  • docs/widgets/rst/classify/cn2rulesviewer.rst

    r11050 r11359  
    2020 
    2121   - Examples (ExampleTable) 
    22       Attribute-valued data set associated with a rule (or rules) selected by the user. 
     22      Attribute-valued data set associated with a rule (or rules) selected 
     23      by the user. 
    2324 
    2425 
     
    2829 
    2930 
    30 This widget visualizes rules learned by rule learning widgets (e.g. CN2). The viewer can, along pre-conditions and prediction of rule, show several rule properties like quality, number of covered learning examples, length, and the class distribution among the covered examples. These criteria can also be used for sorting the list of rules. 
     31This widget visualizes rules learned by rule learning widgets (e.g. CN2). 
     32The viewer can, along pre-conditions and prediction of rule, show several 
     33rule properties like quality, number of covered learning examples, length, 
     34and the class distribution among the covered examples. These criteria can 
     35also be used for sorting the list of rules. 
    3136 
    3237 
    33 The widget also allows selecting one or more rules, in which case it outputs the examples covered by this (these) rules. The signal is sent immediately if :obj:`Commit on change` is checked; otherwise the user needs to push :obj:`Commit`. If :obj:`Selected attributes only` is checked, the output examples are described only by the attributes appearing in the selected rules. 
     38The widget also allows selecting one or more rules, in which case it outputs 
     39the examples covered by this (these) rules. The signal is sent immediately if 
     40:obj:`Commit on change` is checked; otherwise the user needs to push 
     41:obj:`Commit`. If :obj:`Selected attributes only` is checked, the output 
     42examples are described only by the attributes appearing in the selected rules. 
    3443 
    35 In the snapshot below, the widget outputs the five examples covered by the two selected rules, and although the example table originally contains four attributes (petal and sepal width and length - this is the Iris data set), the example on the output will only be described by the petal width and length as these are the only two attributes appearing in the selected rules and :obj:`Selected attributes only` is checked. 
     44In the snapshot below, the widget outputs the five examples covered by the 
     45two selected rules, and although the example table originally contains four 
     46attributes (petal and sepal width and length - this is the Iris data set), 
     47the example on the output will only be described by the petal width and length 
     48as these are the only two attributes appearing in the selected rules and 
     49:obj:`Selected attributes only` is checked. 
    3650 
    3751 
     
    4357-------- 
    4458 
    45 For an example of widget's use in canvas see `documentation about CN2 widget <cn2.htm>`_. 
     59For an example of widget's use in canvas see 
     60:ref:`documentation about CN2 widget <CN2 Rules>`. 
  • docs/widgets/rst/classify/interactivetreebuilder.rst

    r11050 r11359  
    3030 
    3131   - Tree Learner (orange.Learner) 
    32       A learner which always returns the same tree - the one constructed in the widget 
     32      A learner which always returns the same tree - the one constructed in 
     33      the widget 
    3334 
    3435 
    35 Signal :code:`Examples` sends data only if some tree node is selected and contains some examples. 
     36Signal :code:`Examples` sends data only if some tree node is selected and 
     37contains some examples. 
    3638 
    3739Description 
    3840----------- 
    3941 
    40 This is a very exciting widget which is useful for teaching induction of classification trees and also in practice, where a data miner and an area expert can use it to manually construct a classification tree helped by the entire Orange's widgetry. 
     42This is a very exciting widget which is useful for teaching induction of 
     43classification trees and also in practice, where a data miner and an area 
     44expert can use it to manually construct a classification tree helped by the 
     45entire Orange's widgetry. 
    4146 
    42 The widget is based on `Classification Tree Viewer <ClassificationTreeViewer.htm>`_. It is mostly the same (so you are encouraged to read the related documentation), except for the different input/output signals and the addition of a few buttons. 
     47The widget is based on :ref:`Classification Tree Viewer`. It is mostly the 
     48same (so you are encouraged to read the related documentation), except for 
     49the different input/output signals and the addition of a few buttons. 
    4350 
    4451.. image:: images/InteractiveTreeBuilder.png 
    4552   :alt: Interactive Tree Builder widget 
    4653 
    47 Button :obj:`Split` splits the selected tree node according to the criterion above the button. For instance, if we pressed Split in the above widget, the animals that don't give milk and have no feathers (the pictures shows a tree for the zoo data set) would be split according to whether they are :code:`aquatic` or not. In case of continuous attributes, a cut off point needs to be specified as well. 
     54Button :obj:`Split` splits the selected tree node according to the criterion 
     55above the button. For instance, if we pressed Split in the above widget, 
     56the animals that don't give milk and have no feathers (the pictures shows 
     57a tree for the zoo data set) would be split according to whether they are 
     58:code:`aquatic` or not. In case of continuous attributes, a cut off point 
     59needs to be specified as well. 
    4860 
    49 If Split is used on a node which is not a leaf, the criterion at that node is replaced. If we, for instance, selected the &lt;root&gt; node and pushed Split, the criterion :code:`milk` would be replaced with :code:`aquatic` and the nodes below (:code:`feathers`) are removed. 
     61If Split is used on a node which is not a leaf, the criterion at that node 
     62is replaced. If we, for instance, selected the &lt;root&gt; node and pushed 
     63Split, the criterion :code:`milk` would be replaced with :code:`aquatic` 
     64and the nodes below (:code:`feathers`) are removed. 
    5065 
    51 Button :obj:`Cut` cuts the tree at the selected node. If we pushed Cut in the situation in the picture, nothing would happen since the selected node (:code:`feathers=0`) is already a leaf. If we selected :code:`&lt;root&gt;` and pushed Cut, the entire tree would be cut off. 
     66Button :obj:`Cut` cuts the tree at the selected node. If we pushed Cut 
     67in the situation in the picture, nothing would happen since the selected 
     68node (:code:`feathers=0`) is already a leaf. If we selected :code:`<root>` 
     69and pushed Cut, the entire tree would be cut off. 
    5270 
    53 Cut is especially useful in combination with :code:`Build` which builds a subtree at the current node. So, if we push Build in the situation depicted above, a subtree would be built for the milkless featherless animals, leaving the rest of the tree (that is, the existing two nodes) intact. If Build is pressed at a node which is not leaf, the entire subtree at that node is replaced with an automatically induced tree. 
     71Cut is especially useful in combination with :code:`Build` which builds 
     72a subtree at the current node. So, if we push Build in the situation 
     73depicted above, a subtree would be built for the milkless featherless 
     74animals, leaving the rest of the tree (that is, the existing two nodes) 
     75intact. If Build is pressed at a node which is not leaf, the entire subtree 
     76at that node is replaced with an automatically induced tree. 
    5477 
    55 Build uses some reasonable default parameters for tree learning (information gain ratio is used for attribute selection with a minimum of 2 examples per leaf, which gives an algorithm equivalent to Quinlan's C4.5). To gain more control on the tree construction arguments, use a `Classification Tree widget <ClassificationTree.htm>`_ or `C4.5 <C4.5.htm>`_ widget, set its parameters and connect it to the input of Interactive Tree Builder. The set parameters will the be used for the tree induction. (If you use C4.5, the original Quinlan's algorithm, don't forget to check :obj:`Convert to orange tree structure`.) 
     78Build uses some reasonable default parameters for tree learning (information 
     79gain ratio is used for attribute selection with a minimum of 2 examples per 
     80leaf, which gives an algorithm equivalent to Quinlan's C4.5). To gain more 
     81control on the tree construction arguments, use a :ref:`Classification Tree` 
     82widget or :ref:`C4.5` widget, set its parameters and connect it to the 
     83input of Interactive Tree Builder. The set parameters will the be used for 
     84the tree induction. (If you use C4.5, the original Quinlan's algorithm, 
     85don't forget to check :obj:`Convert to orange tree structure`.) 
    5686 
    57 The widget has several outputs. :obj:`Examples` gives, as in `Classification Tree Viewer <ClassificationTreeViewer.htm>`_ the list of examples from the selected node. This output can be used to observe the statistical properties or visualizations of various attributes for a specific node, based on which we should decide whether we should split the examples and how. 
     87The widget has several outputs. :obj:`Examples` gives, as in 
     88:ref:`Classification Tree Viewer` the list of examples from the selected node. 
     89This output can be used to observe the statistical properties or 
     90visualizations of various attributes for a specific node, based on which 
     91we should decide whether we should split the examples and how. 
    5892 
    59 Signal :obj:`Classification Tree` can be attached to another tree viewer. Using a Classification Tree Viewer is not really useful as it will show the same picture as Interactive Tree Builder. We can however connect the more colorful `Classification Tree Graph <ClassificationTreeGraph.htm>`_. 
     93Signal :obj:`Classification Tree` can be attached to another tree viewer. 
     94Using a :ref:`Classification Tree Viewer` is not really useful as it will 
     95show the same picture as Interactive Tree Builder. We can however connect 
     96the more colorful :ref:`Classification Tree Graph`. 
    6097 
    61 The last output is :obj:`Tree Learner`. This is a tree learner which always gives the same tree - the one we constructed in this widget. This can be used to assess the tree's quality with the `Test Learners <../Evaluate/TestLearners.htm>`_ widget. This requires some caution, though: you should not test the tree on the same data you used to induce it. See the Examples section below for the correct procedure. 
     98The last output is :obj:`Tree Learner`. This is a tree learner which always 
     99gives the same tree - the one we constructed in this widget. This can be used 
     100to assess the tree's quality with the :ref:`Test Learners` widget. This 
     101requires some caution, though: you should not test the tree on the same 
     102data you used to induce it. See the Examples section below for the correct 
     103procedure. 
    62104 
    63105Examples 
    64106-------- 
    65107 
    66 The first snapshot shows the typical "environment" of the Interactive Tree Builder. 
     108The first snapshot shows the typical "environment" of the Interactive 
     109Tree Builder. 
    67110 
    68111.. image:: images/InteractiveTreeBuilder-SchemaInduction.png 
    69112   :alt: A schema with Interactive Tree Builder 
    70113 
    71 The learning examples may come from a file. We also use a `Classification Tree <ClassificationTree.htm>`_ widget to able to set the tree induction parameters for the parts of the tree we want to induce automatically. 
     114The learning examples may come from a file. We also use a 
     115:ref:`Classification Tree` widget to able to set the tree induction parameters 
     116for the parts of the tree we want to induce automatically. 
    72117 
    73 On the right hand side, we have the `Rank <../Data/Rank.htm>`_ widget which assesses the quality of attributes through measures like information gain, gini index and others. Emulating the induction algorithm by selecting the attributes having the highest value for one of these measures should give the same results as using Classification Tree widget instead of the Interactive Builder. However, in manual construction we can (and should) also rely on the visualization widgets. One-dimensional visualizations like `Distributions <../Visualize/Distributions.htm>`_ give us an impression about the properties of a single attribute, while two- and more dimensional visualizations like `Scatterplot <../Visualize/Scatterplot.htm>`_ and `Linear Projection <../Visualize/LinearProjection.htm>`_ will give us a kind of lookahead by telling us about the useful combinations of attributes. We have also deployed the `Data Table <../Data/DataTable.htm>`_ widget since seeing particular examples in a tree node may also sometimes help the expert. 
     118On the right hand side, we have the :ref:`Rank` widget which assesses the 
     119quality of attributes through measures like information gain, gini index 
     120and others. Emulating the induction algorithm by selecting the attributes 
     121having the highest value for one of these measures should give the same 
     122results as using Classification Tree widget instead of the Interactive 
     123Builder. However, in manual construction we can (and should) also rely on 
     124the visualization widgets. One-dimensional visualizations like 
     125:ref:`Distributions` give us an impression about the properties of a single 
     126attribute, while two- and more dimensional visualizations like 
     127:ref:`Scatter Plot` and :ref:`Linear Projection` will give us a kind of 
     128lookahead by telling us about the useful combinations of attributes. We 
     129have also deployed the :ref:`Data Table` widget since seeing particular 
     130examples in a tree node may also sometimes help the expert. 
    74131 
    75 Finally, we use the `Classification Tree Graph <ClassificationTreeGraph.htm>`_ to present the resulting tree in a fancy looking picture. 
     132Finally, we use the :ref:`Classification Tree Graph` to present the resulting 
     133tree in a fancy looking picture. 
    76134 
    77 As the widget name suggests, the tree construction should be interactive, making the best use of the available Orange's visualization techniques and help of the area expert. At the beginning the widget presents a tree containing only the root. One way to proceed is to immediately click Build and then study the resulting tree. Data examples for various nodes can be presented and visualized to decide which parts of the tree make sense, which don't and should better be reconstructed manually, and which subtrees should be cut off. The other way is to start constructing the tree manually, adding the nodes according to the expert's knowledge and occasionally use Build button to let Orange make a suggestion. 
     135As the widget name suggests, the tree construction should be interactive, 
     136making the best use of the available Orange's visualization techniques 
     137and help of the area expert. At the beginning the widget presents a tree 
     138containing only the root. One way to proceed is to immediately click 
     139Build and then study the resulting tree. Data examples for various nodes 
     140can be presented and visualized to decide which parts of the tree make sense, 
     141which don't and should better be reconstructed manually, and which subtrees 
     142should be cut off. The other way is to start constructing the tree 
     143manually, adding the nodes according to the expert's knowledge and 
     144occasionally use Build button to let Orange make a suggestion. 
    78145 
    79146 
    80 Although expert's help will usually prevent overfitting the data, special care still needs to be taken when we are interested in knowing the performance of the induced tree. Since the widely used cross-validation is for obvious reasons inapplicable when the model is constructed manually, we should split the data into training and testing set prior to building the tree. 
     147Although expert's help will usually prevent overfitting the data, 
     148special care still needs to be taken when we are interested in knowing 
     149the performance of the induced tree. Since the widely used cross-validation 
     150is for obvious reasons inapplicable when the model is constructed 
     151manually, we should split the data into training and testing set prior 
     152to building the tree. 
    81153 
    82154.. image:: images/InteractiveTreeBuilder-SchemaSampling.png 
    83155   :alt: A schema with Interactive Tree Builder 
    84156 
    85 We have used the `Data Sampler <../Data/DataSampler>`_ widget for splitting the data; in most cases we recommend using stratified random sampling with a sample size of 70% for training. These examples (denoted as "Examples" in the snapshot) are fed to the Interactive Tree Builder where we employ the Orange's armory to construct the tree as described above. 
     157We have used the :ref:`Data Sampler` widget for splitting the data; in most 
     158cases we recommend using stratified random sampling with a sample size 
     159of 70% for training. These examples (denoted as "Examples" in the snapshot) 
     160are fed to the Interactive Tree Builder where we employ the Orange's armory 
     161to construct the tree as described above. 
    86162 
    87 The tricky part is connecting the :code:`Test Learners`: Data Sampler's Examples should be used as Test Learners' Data, and Data Sampler's Remaining Examples are the Test Learners' Separate Test Data. 
     163The tricky part is connecting the :ref:`Test Learners`: Data Sampler's 
     164Examples should be used as Test Learners' Data, and Data Sampler's 
     165Remaining Examples are the Test Learners' Separate Test Data. 
    88166 
    89167.. image:: images/InteractiveTreeBuilder-SchemaSampling-Wiring.png 
    90    :alt: Connecting Data Sampler to Test Learners when using Interactive Tree Builder 
     168   :alt: Connecting Data Sampler to Test Learners when using Interactive 
     169         Tree Builder 
    91170 
    92 In Test Learners, don't forget to set the Sampling type to :obj:`Test on test data`. Interactive Tree Builder should then give its Tree Learner to Test Learners. To compare the manually constructed tree with, say, an automatically constructed one and with a Naive Bayesian classifier, we can include these two in the schema. 
     171In Test Learners, don't forget to set the Sampling type to 
     172:obj:`Test on test data`. Interactive Tree Builder should then give its 
     173Tree Learner to Test Learners. To compare the manually constructed tree 
     174with, say, an automatically constructed one and with a Naive Bayesian 
     175classifier, we can include these two in the schema. 
    93176 
    94 Test Learners will now feed the training data (70% sample it gets from Data Sampler) to all three learning algorithms. While Naive Bayes and Classification Tree will actually learn, Interactive Tree Builder will ignore the training examples and return the manually built tree. All three models will then be tested on the remaining 30% examples. 
     177Test Learners will now feed the training data (70% sample it gets from 
     178Data Sampler) to all three learning algorithms. While Naive Bayes and 
     179Classification Tree will actually learn, Interactive Tree Builder will 
     180ignore the training examples and return the manually built tree. 
     181All three models will then be tested on the remaining 30% examples. 
  • docs/widgets/rst/classify/knearestneighbours.rst

    r11050 r11359  
    2727 
    2828 
    29 Signal :code:`KNN Classifier` sends data only if the learning data (signal :code:`Examples` is present. 
     29Signal :code:`KNN Classifier` sends data only if the learning data (signal 
     30:code:`Examples` is present. 
    3031 
    3132Description 
    3233----------- 
    3334 
    34 This widget provides a graphical interface to the k-Nearest Neighbours classifier. 
     35This widget provides a graphical interface to the k-Nearest Neighbours 
     36classifier. 
    3537 
    36 As all widgets for classification, it provides a learner and classifier on the output. Learner is a learning algorithm with settings as specified by the user. It can be fed into widgets for testing learners, for instance :code:`Test Learners`. Classifier is a kNN Classifier (a subtype of a general classifier), built from the training examples on the input. If examples are not given, there is no classifier on the output. 
     38As all widgets for classification, it provides a learner and classifier 
     39on the output. Learner is a learning algorithm with settings as specified 
     40by the user. It can be fed into widgets for testing learners, for instance 
     41:ref:`Test Learners`. Classifier is a kNN Classifier (a subtype of a general 
     42classifier), built from the training examples on the input. If examples are 
     43not given, there is no classifier on the output. 
    3744 
    3845.. image:: images/k-NearestNeighbours.png 
    3946   :alt: k-Nearest Neighbours Widget 
    4047 
    41 Learner can be given a name under which it will appear in, say, :code:`Test Learners`. The default name is "kNN". 
     48Learner can be given a name under which it will appear in, say, 
     49:ref:`Test Learners`. The default name is "kNN". 
    4250 
    43 Then, you can set the :obj:`Number of neighbours`. Neighbours are weighted by their proximity to the example being classified, so there's no harm in using ten or twenty examples as neighbours. Weights use a Gaussian kernel, so that the last neighbour has a weight of 0.001. If you check :obj:`Weighting by ranks, not distances`, the weighting formula will use the rank of the neighbour instead of its distance to the reference example. 
     51Then, you can set the :obj:`Number of neighbours`. Neighbours are weighted 
     52by their proximity to the example being classified, so there's no harm in 
     53using ten or twenty examples as neighbours. Weights use a Gaussian kernel, 
     54so that the last neighbour has a weight of 0.001. If you check 
     55:obj:`Weighting by ranks, not distances`, the weighting formula will 
     56use the rank of the neighbour instead of its distance to the reference 
     57example. 
    4458 
    45 The :obj:`Metrics` you can use are Euclidean, Hamming (the number of attributes in which the two examples differ - not suitable for continuous attributes), Manhattan (the sum of absolute differences for all attributes) and Maximal (the maximal difference between attributes). 
     59The :obj:`Metrics` you can use are Euclidean, Hamming (the number of 
     60attributes in which the two examples differ - not suitable for continuous 
     61attributes), Manhattan (the sum of absolute differences for all attributes) 
     62and Maximal (the maximal difference between attributes). 
    4663 
    47 If you check :obj:`Normalize continuous attributes`, their values will be divided by their span (on the training data). This ensures that all continuous attributes have equal impact, independent of their original scale. 
     64If you check :obj:`Normalize continuous attributes`, their values will be 
     65divided by their span (on the training data). This ensures that all 
     66continuous attributes have equal impact, independent of their original scale. 
    4867 
    49 If you use Euclidean distance leave :obj:`Ignore unknown values` unchecked. The corresponding class for measuring distances will compute the distributions of attribute values and return statistically valid distance estimations. 
     68If you use Euclidean distance leave :obj:`Ignore unknown values` 
     69unchecked. The corresponding class for measuring distances will compute 
     70the distributions of attribute values and return statistically valid distance 
     71estimations. 
    5072 
    51 If you use other metrics and have missing values in the data, imputation may be the optimal way to go, since other measures don't have any such treatment of unknowns. If you don't impute, you can either :obj:`Ignore unknown values`, which treats all missing values as wildcards (so they are equivalent to any other attribute value). If you leave it unchecked, "don't cares" are wildcards, and "don't knows" as different from all values. 
     73If you use other metrics and have missing values in the data, imputation 
     74may be the optimal way to go, since other measures don't have any such 
     75treatment of unknowns. If you don't impute, you can either 
     76:obj:`Ignore unknown values`, which treats all missing values as wildcards 
     77(so they are equivalent to any other attribute value). If you leave it 
     78unchecked, "don't cares" are wildcards, and "don't knows" as different 
     79from all values. 
    5280 
    53 When you change one or more settings, you need to push :obj:`Apply`, which will put the new learner on the output and, if the training examples are given, construct a new classifier and output it as well. 
     81When you change one or more settings, you need to push :obj:`Apply`, which 
     82will put the new learner on the output and, if the training examples are 
     83given, construct a new classifier and output it as well. 
    5484 
    5585 
     
    5787-------- 
    5888 
    59 This schema compares the results of k-Nearest neighbours with the default classifier which always predicts the majority class 
     89This schema compares the results of k-Nearest neighbours with the default 
     90classifier which always predicts the majority class 
    6091 
    6192.. image:: images/Majority-Knn-SchemaLearner.png 
  • docs/widgets/rst/classify/logisticregression.rst

    r11050 r11359  
    2121 
    2222   - Learner 
    23       The logistic regression learning algorithm with settings as specified in the dialog. 
     23      The logistic regression learning algorithm with settings as specified 
     24      in the dialog. 
    2425 
    2526   - Logistic Regression Classifier 
     
    2728 
    2829 
    29 Signal :code:`Logistic Regression Classifier` sends data only if the learning data (signal :code:`Examples` is present. 
     30Signal :code:`Logistic Regression Classifier` sends data only if the learning 
     31data (signal :code:`Examples` is present. 
    3032 
    3133Description 
    3234----------- 
    3335 
    34 This widget provides a graphical interface to the logistic regression classifier. 
     36This widget provides a graphical interface to the logistic regression 
     37classifier. 
    3538 
    36 As all widgets for classification, this widget provides a learner and classifier on the output. Learner is a learning algorithm with settings as specified by the user. It can be fed into widgets for testing learners, for instance :code:`Test Learners`. Classifier is a logistic regression classifier (a subtype of a general classifier), built from the training examples on the input. If examples are not given, there is no classifier on the output. 
     39As all widgets for classification, this widget provides a learner and 
     40classifier on the output. Learner is a learning algorithm with settings 
     41as specified by the user. It can be fed into widgets for testing learners, 
     42for instance :ref:`Test Learners`. Classifier is a logistic regression 
     43classifier (a subtype of a general classifier), built from the training 
     44examples on the input. If examples are not given, there is no classifier 
     45on the output. 
    3746 
    38 The widget requires - due to limitations of the learning algorithm - data with binary class. 
     47The widget requires - due to limitations of the learning algorithm - data with 
     48binary class. 
    3949 
    4050.. image:: images/LogisticRegression.png 
    4151   :alt: Logistic Regression Widget 
    4252 
    43 Learner can be given a name under which it will appear in, say, :code:`Test Learners`. The default name is "Logistic Regression". 
     53Learner can be given a name under which it will appear in, say, 
     54:ref:`Test Learners`. The default name is "Logistic Regression". 
    4455 
    45 If :obj:`Stepwise attribute selection` is checked, the learner will iteratively add and remove the attributes, one at a time, based on their significance. The thresholds for addition and removal of the attribute are set in :obj:`Add threshold` and :obj:`Remove threshold`. It is also possible to limit the total number of attributes in the model. 
     56If :obj:`Stepwise attribute selection` is checked, the learner will 
     57iteratively add and remove the attributes, one at a time, based on their 
     58significance. The thresholds for addition and removal of the attribute are 
     59set in :obj:`Add threshold` and :obj:`Remove threshold`. It is also possible 
     60to limit the total number of attributes in the model. 
    4661 
    47 Independent of these settings, the learner will always remove singular attributes, for instance the constant attributes or those which can be expressed as a linear combination of other attributes. 
     62Independent of these settings, the learner will always remove singular 
     63attributes, for instance the constant attributes or those which can be 
     64expressed as a linear combination of other attributes. 
    4865 
    49 Logistic regression has no internal mechanism for dealing with missing values. These thus need to be imputed. The widget offers a number of options: it can impute the average value of the attribute, its minimum and maximum or train a model to predict the attribute's values based on values of other attributes. It can also remove the examples with missing values. 
     66Logistic regression has no internal mechanism for dealing with missing 
     67values. These thus need to be imputed. The widget offers a number of options: 
     68it can impute the average value of the attribute, its minimum and maximum or 
     69train a model to predict the attribute's values based on values of other 
     70attributes. It can also remove the examples with missing values. 
    5071 
    51 Note that there also exist a separate widget for missing data imputation, `Impute <../Data/Impute.htm>`_. 
     72Note that there also exist a separate widget for missing data imputation, 
     73:ref:`Impute`. 
    5274 
    5375 
     
    5577-------- 
    5678 
    57 The widget is used just as any other widget for inducing classifier. See, for instance, the example for the `Naive Bayesian Classifier <NaiveBayes.htm>`_. 
     79The widget is used just as any other widget for inducing classifier. See, 
     80for instance, the example for the :ref:`Naive Bayes`. 
  • docs/widgets/rst/classify/majority.rst

    r11050 r11359  
    66.. image:: ../icons/majority.png 
    77 
    8 A Learner that returns the majority class, disregarding the example's attributes 
     8A Learner that returns the majority class, disregarding the example's 
     9attributes. 
    910 
    1011Signals 
     
    2728 
    2829 
    29 Signal :code:`Classifier` sends data only if the learning data (signal :code:`Examples`) is present. 
     30Signal :code:`Classifier` sends data only if the learning data (signal 
     31:code:`Examples`) is present. 
    3032 
    3133Description 
    3234----------- 
    3335 
    34 This widget provides a graphical interface to a learner which produces a classifier that always predicts the majority class. When asked for probabilities, it will return the relative frequencies of the classes in the training set. When there are two or more majority classes, the classifier chooses the predicted class randomly, but always returns the same class for a particular example. 
     36This widget provides a graphical interface to a learner which produces a 
     37classifier that always predicts the majority class. When asked for 
     38probabilities, it will return the relative frequencies of the classes 
     39in the training set. When there are two or more majority classes, the 
     40classifier chooses the predicted class randomly, but always returns the 
     41same class for a particular example. 
    3542 
    36 The widget is typically used to compare other learning algorithms with the default classification accuracy. 
     43The widget is typically used to compare other learning algorithms with 
     44the default classification accuracy. 
    3745 
    38 As all other widgets for classification, this one provides a learner and classifier, the former can be fed into widgets for testing learners, while the classifier itself is, well, not very useful. 
     46As all other widgets for classification, this one provides a learner and 
     47classifier, the former can be fed into widgets for testing learners, while 
     48the classifier itself is, well, not very useful. 
    3949 
    4050 
     
    4252   :alt: Majority 
    4353 
    44 The only option is the name under which it will appear in, say, :code:`Test Learners`. The default name is "Majority". When you change it, you need to click :obj:`Apply`. 
     54The only option is the name under which it will appear in, say, 
     55:ref:`Test Learners`. The default name is "Majority". When you change it, 
     56you need to click :obj:`Apply`. 
    4557 
    4658Examples 
    4759-------- 
    4860 
    49 In a typical use of this widget, it would be connected to `Test Learners <../Evaluate/TestLearners.htm>`_ to compare the scores of other learning algorithms (such as kNN, in this schema) with the default scores. 
     61In a typical use of this widget, it would be connected to 
     62:ref:`Test Learners` to compare the scores of other learning algorithms 
     63(such as kNN, in this schema) with the default scores. 
    5064 
    5165.. image:: images/Majority-Knn-SchemaLearner.png 
  • docs/widgets/rst/classify/naivebayes.rst

    r11050 r11359  
    2121 
    2222   - Learner 
    23       The naive Bayesian learning algorithm with settings as specified in the dialog. 
     23      The naive Bayesian learning algorithm with settings as specified in 
     24      the dialog. 
    2425 
    2526   - Naive Bayesian Classifier 
     
    2728 
    2829 
    29 Signal :code:`Naive Bayesian Classifier` sends data only if the learning data (signal :code:`Examples` is present. 
     30Signal :code:`Naive Bayesian Classifier` sends data only if the learning 
     31data (signal :code:`Examples` is present. 
    3032 
    3133Description 
     
    3436This widget provides a graphical interface to the Naive Bayesian classifier. 
    3537 
    36 As all widgets for classification, this widget provides a learner and classifier on the output. Learner is a learning algorithm with settings as specified by the user. It can be fed into widgets for testing learners, for instance :code:`Test Learners`. Classifier is a Naive Bayesian Classifier (a subtype of a general classifier), built from the training examples on the input. If examples are not given, there is no classifier on the output. 
     38As all widgets for classification, this widget provides a learner and 
     39classifier on the output. Learner is a learning algorithm with settings 
     40as specified by the user. It can be fed into widgets for testing learners, 
     41for instance :ref:`Test Learners`. Classifier is a Naive Bayesian Classifier 
     42(a subtype of a general classifier), built from the training examples on the 
     43input. If examples are not given, there is no classifier on the output. 
    3744 
    3845.. image:: images/NaiveBayes.png 
    3946   :alt: NaiveBayes Widget 
    4047 
    41 Learner can be given a name under which it will appear in, say, :code:`Test Learners`. The default name is "Naive Bayes". 
     48Learner can be given a name under which it will appear in, say, 
     49:ref:`Test Learners`. The default name is "Naive Bayes". 
    4250 
    43 Next come the probability estimators. :obj:`Prior` sets the method used for estimating prior class probabilities from the data. You can use either :obj:`Relative frequency` or the :obj:`Laplace estimate`. :obj:`Conditional (for discrete)` sets the method for estimating conditional probabilities, besides the above two, conditional probabilities can be estimated using the :obj:`m-estimate`; in this case the value of m should be given as the :obj:`Parameter for m-estimate`. By setting it to :obj:`&lt;same as above&gt;` the classifier will use the same method as for estimating prior probabilities. 
     51Next come the probability estimators. :obj:`Prior` sets the method used for 
     52estimating prior class probabilities from the data. You can use either 
     53:obj:`Relative frequency` or the :obj:`Laplace estimate`. 
     54:obj:`Conditional (for discrete)` sets the method for estimating conditional 
     55probabilities, besides the above two, conditional probabilities can be 
     56estimated using the :obj:`m-estimate`; in this case the value of m should be 
     57given as the :obj:`Parameter for m-estimate`. By setting it to 
     58:obj:`<same as above>` the classifier will use the same method as for 
     59estimating prior probabilities. 
    4460 
    45 Conditional probabilities for continuous attributes are estimated using LOESS. :obj:`Size of LOESS window` sets the proportion of points in the window; higher numbers mean more smoothing. :obj:`LOESS sample points` sets the number of points in which the function is sampled. 
     61Conditional probabilities for continuous attributes are estimated using 
     62LOESS. :obj:`Size of LOESS window` sets the proportion of points in the 
     63window; higher numbers mean more smoothing. 
     64:obj:`LOESS sample points` sets the number of points in which the function 
     65is sampled. 
    4666 
    47 If the class is binary, the classification accuracy may be increased considerably by letting the learner find the optimal classification threshold (option :obj:`Adjust threshold`). The threshold is computed from the training data. If left unchecked, the usual threshold of 0.5 is used. 
     67If the class is binary, the classification accuracy may be increased 
     68considerably by letting the learner find the optimal classification 
     69threshold (option :obj:`Adjust threshold`). The threshold is computed from 
     70the training data. If left unchecked, the usual threshold of 0.5 is used. 
    4871 
    49 When you change one or more settings, you need to push :obj:`Apply`; this will put the new learner on the output and, if the training examples are given, construct a new classifier and output it as well. 
     72When you change one or more settings, you need to push :obj:`Apply`; 
     73this will put the new learner on the output and, if the training examples 
     74are given, construct a new classifier and output it as well. 
    5075 
    5176 
     
    5378-------- 
    5479 
    55 There are two typical uses of this widget. First, you may want to induce the model and check what it looks like in a `Nomogram <Nomogram.htm>`_. 
     80There are two typical uses of this widget. First, you may want to induce 
     81the model and check what it looks like in a :ref:`Nomogram`. 
    5682 
    5783.. image:: images/NaiveBayes-SchemaClassifier.png 
    5884   :alt: Naive Bayesian Classifier - Schema with a Classifier 
    5985 
    60 The second schema compares the results of Naive Bayesian learner with another learner, a C4.5 tree. 
     86The second schema compares the results of Naive Bayesian learner with 
     87another learner, a C4.5 tree. 
    6188 
    6289.. image:: images/C4.5-SchemaLearner.png 
  • docs/widgets/rst/classify/nomogram.rst

    r11050 r11359  
    1313Inputs: 
    1414   - Classifier (orange.Classifier) 
    15       A classifier (either naive Bayesian classifier, logistic regression or linear SVM) 
     15      A classifier (either naive Bayesian classifier or logistic regression) 
    1616 
    1717 
     
    2323----------- 
    2424 
    25 Nomogram is a simple and intuitive, yet useful and powerful representation of linear models, such as logistic regression, naive Bayesian classifier and linear SVM. In statistical terms, the nomogram plots log odds ratios for each value of each attribute. We shall describe its basic properties here, though we recommend reading the paper in which we introduced the nomograms for naive Bayesian classifier, `Nomograms for Visualization of Naive Bayesian Classifier <http://www.ailab.si/blaz/papers/2004-PKDD.pdf>`_. This description will show the nomogram for a naive Bayesian classifier; nomograms for other types of classifiers are similar, though they lack some functionality due to inherent limitations of these models. 
     25Nomogram is a simple and intuitive, yet useful and powerful representation of 
     26linear models, such as logistic regression and naive Bayesian classifier. In 
     27statistical terms, the nomogram plots log odds ratios for each value of each 
     28attribute. We shall describe its basic properties here, though we recommend 
     29reading the paper in which we introduced the nomograms for naive Bayesian 
     30classifier, `Nomograms for Visualization of Naive Bayesian Classifier`_. This 
     31description will show the nomogram for a naive Bayesian classifier; nomograms 
     32for other types of classifiers are similar, though they lack some functionality 
     33due to inherent limitations of these models. 
    2634 
    27 The snapshot below shows a naive Bayesian nomogram for the heart disease data. The first attribute, gender, has two values (this should come as no surprise for readers over 18), where log odds ratio for females is -1 (as read from the axis on the top) and for males it is around 0.4. For the next attribute, the type of chest pain, the asymptotic pain votes for the target class (having narrowed vessels), and the other three have negative odds of different magnitudes. Note that these are odds for naive Bayesian classifier, where, unlike in logistic regression, there is no "base value" which would have a odds ratio of zero. 
     35.. _Nomograms for Visualization of Naive Bayesian Classifier: http://www.ailab.si/blaz/papers/2004-PKDD.pdf 
     36 
     37The snapshot below shows a naive Bayesian nomogram for the heart disease data. 
     38The first attribute, gender, has two values, where log odds ratio for 
     39females is -1 (as read from the axis on the top) and for males it is around 
     400.4. For the next attribute, the type of chest pain, the asymptotic pain 
     41votes for the target class (having narrowed vessels), and the other three 
     42have negative odds of different magnitudes. Note that these are odds for 
     43naive Bayesian classifier, where, unlike in logistic regression, there is 
     44no "base value" which would have a odds ratio of zero. 
    2845 
    2946.. image:: images/Nomogram.png 
    3047 
    31 The third attribute, SBP at rest, is continuous. To get log odds ratios for a particular value of the attribute, find the value (say 175) of the vertical axis to the left of the curve corresponding to the attribute. Then imagine a line to the left, at the point where it hits the curve, turn upwards and read the number on the top scale. The SBP of 175 has log odds ration of approximately 1 (0.93, to be precise). The curve thus shows a mapping from attribute values on the left to log odds at the top. 
     48The third attribute, SBP at rest, is continuous. To get log odds ratios 
     49for a particular value of the attribute, find the value (say 175) of the 
     50vertical axis to the left of the curve corresponding to the attribute. Then 
     51imagine a line to the left, at the point where it hits the curve, turn 
     52upwards and read the number on the top scale. The SBP of 175 has log odds 
     53ration of approximately 1 (0.93, to be precise). The curve thus shows a 
     54mapping from attribute values on the left to log odds at the top. 
    3255 
    33 Nomogram is a great data exploration tool. Lengths of the lines correspond to spans of odds ratios, suggesting importance of attributes. It also shows impacts of individual values; being female is good and being male is bad (w.r.t. this disease, at least); besides, being female is much more beneficial than being male is harmful. Gender is, however, a much less important attribute than the maximal heart rate (HR) with log odds from -3.5 to +2.2. SBP's from 125 to 140 are equivalent, that is, have the same odds ratios... 
     56Nomogram is a great data exploration tool. Lengths of the lines correspond 
     57to spans of odds ratios, suggesting importance of attributes. It also shows 
     58impacts of individual values; being female is good and being male is bad 
     59(w.r.t. this disease, at least); besides, being female is much more 
     60beneficial than being male is harmful. Gender is, however, a much less 
     61important attribute than the maximal heart rate (HR) with log odds from 
     62-3.5 to +2.2. SBP's from 125 to 140 are equivalent, that is, have the 
     63same odds ratios... 
    3464 
    3565.. image:: images/Nomogram-predictions.png 
    3666 
    37 Nomograms can also be used for making probabilistic prediction. A sum of log odds ratios for a male with asymptomatic chest pain, a rest SBP of 100, cholesterol 200 and maximal heart rate 175 is 0.38 + 1.16 + -0.51 + -0.4 = -0.58, which corresponds to a probability 32 % for having the disease. To use the widget for classification, check :obj:`Show predictions`. The widget then shows a blue dots on attribute axes, which can be dragged around - or left at the zero-line if the corresponding value is unknown. The axes at the bottom then show a mapping from the sum of log odds to probabilities. 
     67Nomograms can also be used for making probabilistic prediction. A sum 
     68of log odds ratios for a male with asymptomatic chest pain, a rest 
     69SBP of 100, cholesterol 200 and maximal heart rate 175 is 
     70`0.38 + 1.16 + -0.51 + -0.4 = -0.58`, which corresponds to a probability 
     7132 % for having the disease. To use the widget for classification, 
     72check :obj:`Show predictions`. The widget then shows a blue dots on 
     73attribute axes, which can be dragged around - or left at the zero-line 
     74if the corresponding value is unknown. The axes at the bottom then show 
     75a mapping from the sum of log odds to probabilities. 
    3876 
    39 Now for the settings. Option :obj:`Target Class` defines the target class. Attribute values to the right of the zero line represent arguments for that class and values to the left are arguments against it. 
     77Now for the settings. Option :obj:`Target Class` defines the target class, 
     78Attribute values to the right of the zero line represent arguments for 
     79that class and values to the left are arguments against it. 
    4080 
    4181 
    42 Log odds for naive Bayesian classifier are computed so that all values can have non-zero log odds. The nomogram is drawn as shown above, if alignment is set to :obj:`Align by zero influence`. If set to :obj:`Align left`, all attribute axes are left-aligned. Logistic regression compares the base value with other attribute values, so the base value always has log odds ratio of 0, and the attribute axes are always aligned to the left. 
     82Log odds for naive Bayesian classifier are computed so that all values 
     83can have non-zero log odds. The nomogram is drawn as shown above, if 
     84alignment is set to :obj:`Align by zero influence`. If set to 
     85:obj:`Align left`, all attribute axes are left-aligned. Logistic regression 
     86compares the base value with other attribute values, so the base value 
     87always has log odds ratio of 0, and the attribute axes are always aligned 
     88to the left. 
    4389 
    44 The influence of continuous attribute can be shown as two dimensional curves (:obj:`2D curve`) or with the values projected onto a single line (:obj:`1D projection`). The latter make the nomogram smaller, but can be unreadable if the log odds are not monotonous. In our sample, the nomogram would look OK for the heart rate and SBP, but not for cholesterol. 
     90The influence of continuous attribute can be shown as two dimensional 
     91curves (:obj:`2D curve`) or with the values projected onto a single line 
     92(:obj:`1D projection`). The latter make the nomogram smaller, but can be 
     93unreadable if the log odds are not monotonous. In our sample, the 
     94nomogram would look OK for the heart rate and SBP, but not for cholesterol. 
    4595 
    46 The widget can show either log odds ratios (:obj:`Log odds ratios`), as above, or "points" (:obj:`Point scale`). In the latter case, log OR are simply scaled to the interval -100 to 100 for easier (manual) calculation, for instance, if one wishes to print out the nomogram and use it on the paper. 
     96The widget can show either log odds ratios (:obj:`Log odds ratios`), 
     97as above, or "points" (:obj:`Point scale`). In the latter case, log OR 
     98are simply scaled to the interval -100 to 100 for easier (manual) 
     99calculation, for instance, if one wishes to print out the nomogram 
     100and use it on the paper. 
    47101 
    48 :obj:`Show prediction` puts a blue dot at each attribute which we can drag to the corresponding value. The widget sums the log odds ratios and shows the probability of the target class on the bottom axes. :obj:`Confidence intervals` adds confidence intervals for the individual log ratios and for probability prediction. :obj:`Show histogram` adds a bar whose height represents the relative number of examples for each value of discrete attribute, while for continuous attributes the curve is thickened where the number of examples is higher. 
     102:obj:`Show prediction` puts a blue dot at each attribute which we 
     103can drag to the corresponding value. The widget sums the log odds 
     104ratios and shows the probability of the target class on the bottom 
     105axes. :obj:`Confidence intervals` adds confidence intervals for the 
     106individual log ratios and for probability prediction. :obj:`Show histogram` 
     107adds a bar whose height represents the relative number of examples for 
     108each value of discrete attribute, while for continuous attributes the 
     109curve is thickened where the number of examples is higher. 
    49110 
    50111.. image:: images/Nomogram-histograms.png 
    51112 
    52 For instance, for gender the number of males is about twice as big than the number of females, and the confidence interval for the log OR is correspondingly smaller. The histograms and confidence intervals also explain the strange finding that extreme cholesterol level (600) is healthy, healthier than 200, while really low cholesterol (50) is almost as bad as levels around 300. The big majority of patients have cholesterol between 200 and 300; what happens outside this interval may be a random effect, which is also suggested by the very wide confidence intervals. 
     113For instance, for gender the number of males is about twice as big than 
     114the number of females, and the confidence interval for the log OR is 
     115correspondingly smaller. The histograms and confidence intervals also 
     116explain the strange finding that extreme cholesterol level (600) is healthy, 
     117healthier than 200, while really low cholesterol (50) is almost as bad as 
     118levels around 300. The big majority of patients have cholesterol between 
     119200 and 300; what happens outside this interval may be a random effect, 
     120which is also suggested by the very wide confidence intervals. 
    53121 
    54122 
     
    56124-------- 
    57125 
    58 To draw a nomogram, we need to get some data (e.g. from the `File widget <../Data/File.htm>`_, induce a classifier and give it to the nomogram. 
     126To draw a nomogram, we need to get some data (e.g. from the 
     127:ref:`File` widget, induce a classifier and give it to the nomogram. 
    59128 
    60129.. image:: images/NaiveBayes-SchemaClassifier.png 
  • docs/widgets/rst/classify/randomforest.rst

    r11050 r11359  
    1818Outputs: 
    1919   - Learner 
    20       The random forest learning algorithm with settings as specified in the dialog 
     20      The random forest learning algorithm with settings as specified in the 
     21      dialog 
    2122   - Random Forest Classifier 
    2223      Trained random forest 
     
    2829----------- 
    2930 
    30 Random forest is a classification technique that proposed by `Leo Brieman (2001) <#Breiman2001>`_, given the set of class-labeled data, builds a set of classification trees. Each tree is developed from a bootstrap sample from the training data. When developing individual trees, an arbitrary subset of attributes is drawn (hence the term "random") from which the best attribute for the split is selected. The classification is based on the majority vote from individually developed tree classifiers in the forest. 
     31Random forest is a classification technique that proposed by 
     32[Breiman2001]_, given the set of class-labeled data, builds a set of 
     33classification trees. Each tree is developed from a bootstrap sample 
     34from the training data. When developing individual trees, an arbitrary 
     35subset of attributes is drawn (hence the term "random") from which the best 
     36attribute for the split is selected. The classification is based on the 
     37majority vote from individually developed tree classifiers in the forest. 
    3138 
    32 Random forest widget provides for a GUI to Orange's own implementation of random forest (`orngEnsemble </doc/modules/orngEnsemble.htm>`_ module). The widget output the learner, and, given the training data on its input, the random forest. Additional output channel is provided for a selected classification tree (from the forest) for the purpose of visualization or further analysis. 
     39Random forest widget provides for a GUI to Orange's own implementation of 
     40random forest (:class:`~Orange.ensemble.forest.RandomForestLearner`). The 
     41widget output the learner, and, given the training data on its input, the 
     42random forest. Additional output channel is provided for a selected 
     43classification tree (from the forest) for the purpose of visualization 
     44or further analysis. 
    3345 
    3446.. image:: images/RandomForest.png 
    3547 
    36 In the widget, the first field is used to specify the name of the learner or classifier. Next block of parameters tells the algorithm how many classification trees will be included in the forest (:obj:`Number of trees in forest`), and how many attributes will be arbitrarily drawn for consideration at each node. If the later is not specified (option :obj:`Consider exactly ...` left unchecked), this number is equal to square root of number of attributes in the data set. Original Brieman's proposal is to grow the trees without any pre-prunning, but since this later often works quite well the user can set the depth to which the trees will be grown (:obj:`Maximal depth of individual trees`). As another pre-pruning option, the stopping condition in terms of minimal number of instances in the node before splitting can be set. Finally, if the training data is given to the widget, the :obj:`Index of the tree on the output` can be specified, instructing the widget to send the requested classifier. 
     48In the widget, the first field is used to specify the name of the learner 
     49or classifier. Next block of parameters tells the algorithm how many 
     50classification trees will be included in the forest 
     51(:obj:`Number of trees in forest`), and how many attributes will be 
     52arbitrarily drawn for consideration at each node. If the later is not 
     53specified (option :obj:`Consider exactly ...` left unchecked), this number 
     54is equal to square root of number of attributes in the data set. Original 
     55Brieman's proposal is to grow the trees without any pre-prunning, but since 
     56this later often works quite well the user can set the depth to which the 
     57trees will be grown (:obj:`Maximal depth of individual trees`). As another 
     58pre-pruning option, the stopping condition in terms of minimal number of 
     59instances in the node before splitting can be set. Finally, if the training 
     60data is given to the widget, the :obj:`Index of the tree on the output` 
     61can be specified, instructing the widget to send the requested classifier. 
    3762 
    3863Examples 
    3964-------- 
    4065 
    41 Snapshot below shows a standard comparison schema of a random forest and a tree learner (in this case, C4.5) on a specific data set. 
     66Snapshot below shows a standard comparison schema of a random forest and 
     67a tree learner (in this case, C4.5) on a specific data set. 
    4268 
    4369.. image:: images/RandomForest-Test.png 
    4470   :alt: Random forest evaluation 
    4571 
    46 A simple use of this widget where we wanted to explore how do the actual trees in the forest look like is presented in the following snapshot. In our case, the 5-th tree from the forest was rendered in the Classification Tree Graph widget. 
     72A simple use of this widget where we wanted to explore how do the actual 
     73trees in the forest look like is presented in the following snapshot. In 
     74our case, the 5-th tree from the forest was rendered in the Classification 
     75Tree Graph widget. 
    4776 
    4877.. image:: images/RandomForest-TreeGraph.png 
     
    5281---------- 
    5382 
    54 Breiman L (2001) Random Forests. Machine Learning 45 (1), 5-32. [`PDF <http://www.springerlink.com/content/u0p06167n6173512/fulltext.pdf>`_] 
     83.. [Breiman2001] Breiman L (2001) Random Forests. Machine Learning 45 (1), 5-32. 
     84   (`PDF <http://www.springerlink.com/content/u0p06167n6173512/fulltext.pdf>`_) 
  • docs/widgets/rst/classify/svm.rst

    r11050 r11359  
    2222 
    2323   - Learner 
    24       The support vector machine learning algorithm with settings as specified in the dialog. 
     24      The support vector machine learning algorithm with settings as specified 
     25      in the dialog. 
    2526 
    2627   - Classifier 
     
    2829 
    2930   - Support Vectors 
    30       A subset of data instances from the training set that were used as support vectors in the trained classifier 
     31      A subset of data instances from the training set that were used as 
     32      support vectors in the trained classifier 
    3133 
    3234 
     
    3537----------- 
    3638 
    37 Support vector machines (SVM) is a popular classification technique that will construct a separating hyperplane in the attribute space which maximizes the margin between the instances of different classes. The technique often yields supreme predictive performance results. Orange embeds a popular implementation of SVM in `LIBSVM <http://www.csie.ntu.edu.tw/~cjlin/libsvm/>`_ package, and this widget provides for a graphical user interface to its functionality. It also behaves like a typical Orange learner widget: on its output, it presents an object that can learn and is initialized with the setting specified in the widget, or, given the input data set, also a classifier that can be used to predict classes and class probabilities given a set of new examples. 
     39Support vector machines (SVM) is a popular classification technique that will 
     40construct a separating hyperplane in the attribute space which maximizes the 
     41margin between the instances of different classes. The technique often yields 
     42supreme predictive performance results. Orange embeds a popular implementation 
     43of SVM in `LIBSVM`_ package, and this widget provides for a graphical user 
     44interface to its functionality. It also behaves like a typical Orange 
     45learner widget: on its output, it presents an object that can learn and is 
     46initialized with the setting specified in the widget, or, given the input 
     47data set, also a classifier that can be used to predict classes and class 
     48probabilities given a set of new examples. 
    3849 
    3950.. image:: images/SVM.png 
    4051   :alt: Support vector machines widget 
    4152 
    42 Learner can be given a name under which it will appear in other widgets that use its output, say, in `Test Learners <../Evaluate/TestLearners.htm>`_. The default name is simply "SVM". 
     53Learner can be given a name under which it will appear in other widgets that 
     54use its output, say, in :ref:`Test Learners`. The default name is simply "SVM". 
    4355 
    44 The next block of options deals with kernel, that is, a function that transforms attribute space to a new feature space to fit the maximum-margin hyperplane, thus allowing the algorithm to create non-linear classifiers. The first kernel in the list, however, is a :obj:`Linear` kernel that does not require this trick, but all the others (:obj:`Polynomial`, :obj:`RBF` and :obj:`Sigmoid`) do. Specific functions that specify the kernel are presented besides their names, and the constants involved: 
     56The next block of options deals with kernel, that is, a function that 
     57transforms attribute space to a new feature space to fit the maximum-margin 
     58hyperplane, thus allowing the algorithm to create non-linear classifiers. 
     59The first kernel in the list, however, is a :obj:`Linear` kernel that does 
     60not require this trick, but all the others (:obj:`Polynomial`, :obj:`RBF` 
     61and :obj:`Sigmoid`) do. Specific functions that specify the kernel are 
     62presented besides their names, and the constants involved: 
    4563 
    46 :obj:`g` for the gamma constant in kernel function (the recommended value is 1/k, where k is the number of the attributes, but since there may be no training set given to the widget the default is 0 and the user has to set this option manually), :obj:`d`) for the degree of the kernel (default 3), and :obj:`c` for the constant c0 in the kernel function (default 0). 
     64:obj:`g` for the gamma constant in kernel function (the recommended value 
     65is 1/k, where k is the number of the attributes, but since there may be no 
     66training set given to the widget the default is 0 and the user has to set 
     67this option manually), :obj:`d`) for the degree of the kernel (default 3), 
     68and :obj:`c` for the constant c0 in the kernel function (default 0). 
    4769 
    48 :obj:`Options` control other aspects of the SVM learner. :obj:`Model complexity (C)` (penalty parameter), :obj:`Tolerance (p)` and :obj:`Numeric precision (eps)` are options that define the optimization function; see `LIBSVM <http://www.csie.ntu.edu.tw/~cjlin/libsvm/>`_ for further details. The other three options are used to instruct the learner to prepare the classifier such that it would estimate the class probability values (:obj:`Estimate class probabilities`), constrain the number of the support vectors which define the maximum-margin hyperplane (:obj:`Limit the number of support vectors`) and normalize the training and later the test data (:obj:`Normalize data`). The later somehow slows down the learner, but may be essential in achieving better classification performance. 
     70:obj:`Options` control other aspects of the SVM learner. 
     71:obj:`Model complexity (C)` (penalty parameter), :obj:`Tolerance (p)` and 
     72:obj:`Numeric precision (eps)` are options that define the optimization 
     73function; see `LIBSVM`_ for further details. The other three options are used 
     74to instruct the learner to prepare the classifier such that it would estimate 
     75the class probability values (:obj:`Estimate class probabilities`), constrain 
     76the number of the support vectors which define the maximum-margin hyperplane 
     77(:obj:`Limit the number of support vectors`) and normalize the training and 
     78later the test data (:obj:`Normalize data`). The later somehow slows down the 
     79learner, but may be essential in achieving better classification performance. 
    4980 
    50 The last button in the SVM dialog is :obj:`Automatic parameter search`. This is enabled when the widget is given a data set, and uses `LIBSVM <http://www.csie.ntu.edu.tw/~cjlin/libsvm/>`_'s procedures to search for the optimal value of learning parameters. Upon completion, the values of the parameters in the SVM dialog box are set to the parameters found by the procedure. 
     81The last button in the SVM dialog is :obj:`Automatic parameter search`. This 
     82is enabled when the widget is given a data set, and uses `LIBSVM`_'s procedures 
     83to search for the optimal value of learning parameters. Upon completion, the 
     84values of the parameters in the SVM dialog box are set to the parameters found 
     85by the procedure. 
    5186 
    5287Examples 
    5388-------- 
    5489 
    55 There are two typical uses of this widget, one that uses it as a classifier and the other one that uses it to construct an object for learning. For the first one, we have split the data set to two data sets (:obj:`Sample` and :obj:`Remaining Examples`). The sample was sent to :obj:`SVM` which produced a :obj:`Classifier`, that was then used in :obj:`Predictions` widget to classify the data in :obj:`Remaning Examples`. A similar schema can be used if the data would be already separated in two different files; in this case, two :obj:`File` widgets would be used instead of the :obj:`File`-:obj:`Data Sampler` combination. 
     90There are two typical uses of this widget, one that uses it as a classifier 
     91and the other one that uses it to construct an object for learning. For the 
     92first one, we have split the data set to two data sets (:obj:`Sample` and 
     93:obj:`Remaining Examples`). The sample was sent to :obj:`SVM` which produced 
     94a :obj:`Classifier`, that was then used in :ref:`Predictions` widget to 
     95classify the data in :obj:`Remaning Examples`. A similar schema can be 
     96used if the data would be already separated in two different files; in 
     97this case, two :ref:`File` widgets would be used instead of the 
     98:ref:`File` - :ref:`Data Sampler` combination. 
    5699 
    57100.. image:: images/SVM-Predictions.png 
    58101   :alt: SVM - a schema with a classifier 
    59102 
    60 The second schema shows how to use the :obj:`SVM` widget to construct the learner and compare it in cross-validation with :obj:`Majority` and :obj:`k Nearest Neighbors` learners. 
     103The second schema shows how to use the :obj:`SVM` widget to construct the 
     104learner and compare it in cross-validation with :ref:`Majority` and 
     105:ref:`k-Nearest Neighbours` learners. 
    61106 
    62107.. image:: images/SVM-Evaluation.png 
    63108   :alt: SVM and other learners compared by cross-validation 
    64109 
    65 The following schema observes a set of support vectors in a :obj:`Scatterplot` visualization. 
     110The following schema observes a set of support vectors in a :ref:`Scatter Plot` 
     111visualization. 
    66112 
    67113.. image:: images/SVM-SupportVectors.png 
    68114   :alt: Visualization of support vectors 
    69115 
    70 For the above schema to work correctly, the channel between :obj:`SVM` and :obj:`Scatterplot` widget has to be set appropriately. Set the channel between these two widgets by double-clinking on the green edge between the widgets, and use the settings as displayed in the dialog below. 
     116For the above schema to work correctly, the channel between :ref:`SVM` 
     117and :ref:`Scatter Plot` widget has to be set appropriately. Set the channel 
     118between these two widgets by double-clinking on the green edge between the 
     119widgets, and use the settings as displayed in the dialog below. 
    71120 
    72121.. image:: images/SVM-SupportVectorsOutput.png 
    73122   :alt: Channel setting for communication of support vectors 
     123 
     124 
     125.. _LIBSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 
  • docs/widgets/rst/data/concatenate.rst

    r11050 r11359  
    3030 
    3131In case one of the tables is connected to the widget as the primary table, the 
    32 resulting table will contain these same attributes. If there is no primary table, 
    33 the attributes can be either a union of all attributes that appear in the tables 
    34 specified as "Additional Tables", or their intersection, that is, a list of attributes 
    35 which appear in all the connected tables. 
     32resulting table will contain these same attributes. If there is no primary 
     33table, the attributes can be either a union of all attributes that appear in 
     34the tables specified as "Additional Tables", or their intersection, that is, a 
     35list of attributes which appear in all the connected tables. 
    3636 
    3737 
  • docs/widgets/rst/data/continuize.rst

    r11050 r11359  
    2323----------- 
    2424 
    25 Continuize widget receives a data set on the input and outputs the same data in which the discrete attributes (including binary attributes) are replaced with continuous using the methods specified by the user. 
     25Continuize widget receives a data set on the input and outputs the same data in 
     26which the discrete attributes (including binary attributes) are replaced with 
     27continuous using the methods specified by the user. 
    2628 
    2729.. image:: images/Continuize.png 
    2830 
    29 The first box, :obj:`Multinominal attributes`, defines the treatment of multivalued discrete attributes. Say that we have a discrete attribute status with values low, middle and high, listed in that order. Options for its transformation are 
    3031 
    31    - :obj:`Target or First value as base`: the attribute will be transformed into two continuous attributes, status=middle with values 0 or 1 signifying whether the original attribute had value middle on a particular example, and similarly, status=high. Hence, a three-valued attribute is transformed into two continuous attributes, corresponding to all except the first value of the attribute. 
     32The first box, :obj:`Multinominal attributes`, defines the treatment of 
     33multivalued discrete attributes. Say that we have a discrete attribute status 
     34with values low, middle and high, listed in that order. Options for its 
     35transformation are 
    3236 
    33    - :obj:`Most frequent value as base`: similar to the above, except that the data is analyzed and the most frequent value is used as a base. So, if most examples have the value middle, the two newly constructed continuous attributes will be status=low and status=high. 
     37   - :obj:`Target or First value as base`: the attribute will be transformed 
     38     into two continuous attributes, status=middle with values 0 or 1 
     39     signifying whether the original attribute had value middle on a 
     40     particular example, and similarly, status=high. Hence, a three-valued 
     41     attribute is transformed into two continuous attributes, corresponding to 
     42     all except the first value of the attribute. 
    3443 
    35    - :obj:`One attribute per value`: this would construct three continuous attributes out of a three-valued discrete one. 
     44   - :obj:`Most frequent value as base`: similar to the above, except that the 
     45     data is analyzed and the most frequent value is used as a base. So, if 
     46     most examples have the value middle, the two newly constructed continuous 
     47     attributes will be status=low and status=high. 
    3648 
    37    - :obj:`Ignore multinominal attributes`: removes the multinominal attributes from the data. 
     49   - :obj:`One attribute per value`: this would construct three continuous 
     50     attributes out of a three-valued discrete one. 
    3851 
    39    - :obj:`Treat as ordinal:` converts the attribute into a continuous attribute with values 0, 1, and 2. 
     52   - :obj:`Ignore multinominal attributes`: removes the multinominal attributes 
     53     from the data. 
    4054 
    41    - :obj:`Divide by number of values:` same as above, except that the values are normalized into range 0-1. So, our case would give values 0, 0.5 and 1. 
     55   - :obj:`Treat as ordinal:` converts the attribute into a continuous 
     56     attribute with values 0, 1, and 2. 
     57 
     58   - :obj:`Divide by number of values:` same as above, except that the values 
     59     are normalized into range 0-1. So, our case would give values 0, 0.5 and 
     60     1. 
    4261 
    4362 
    44 Next box defines the treatment of continuous attributes. You will usually prefer :obj:`Leave as is` option. The alternative is :obj:`Normalize by span` which will subtract the lowest value found in the data and divide by the span, so all values will fit into [0, 1]. Finally, :obj:`Normalize by variance` subtracts the average and divides by the variance. 
     63Next box defines the treatment of continuous attributes. You will usually 
     64prefer :obj:`Leave as is` option. The alternative is :obj:`Normalize by span` 
     65which will subtract the lowest value found in the data and divide by the span, 
     66so all values will fit into [0, 1]. Finally, :obj:`Normalize by variance` 
     67subtracts the average and divides by the variance. 
    4568 
    46 Finally, you can decide what happens with the class if it is discrete. Besides leaving it as it is, there are also the options which are available for multinominal attributes, except for those options which split the attribute into more than one attribute - this obviously cannot be supported since you cannot have more than one class attribute. Additionally, you can :obj:`specify a target value`; this will transform the class into a continuous attribute with value 1 if the value of the original attribute equals the target and 0 otherwise. 
     69Finally, you can decide what happens with the class if it is discrete. Besides 
     70leaving it as it is, there are also the options which are available for 
     71multinominal attributes, except for those options which split the attribute 
     72into more than one attribute - this obviously cannot be supported since you 
     73cannot have more than one class attribute. Additionally, you can 
     74:obj:`specify a target value`; this will transform the class into a continuous 
     75attribute with value 1 if the value of the original attribute equals the target 
     76and 0 otherwise. 
    4777 
    48 With :obj:`value range`, you can define the values of the new attributes. In the above text we supposed the range :obj:`from 0 to 1`. You can change it to :obj:`from -1 to 1`. 
     78With :obj:`value range`, you can define the values of the new attributes. In 
     79the above text we supposed the range :obj:`from 0 to 1`. You can change it to 
     80:obj:`from -1 to 1`. 
    4981 
    50 If :obj:`Send automatically` is set, the data set is committed on any change. Otherwise, you have to press :obj:`Send data` after each change. 
     82If :obj:`Send automatically` is set, the data set is committed on any change. 
     83Otherwise, you have to press :obj:`Send data` after each change. 
    5184 
    5285Examples 
    5386-------- 
    5487 
    55 The schema below shows a typical use of this widget: in order to properly plot linear projection of the data, discrete attributes need to be converted to continuous, therefore we put the data through Continuize widget before drawing it. Attribute "chest pain" originally had four values and was transformed into three continuous attributes; similar happened to gender, which was transformed into a single attribute gender=female. 
     88The schema below shows a typical use of this widget: in order to properly plot 
     89linear projection of the data, discrete attributes need to be converted to 
     90continuous, therefore we put the data through Continuize widget before drawing 
     91it. Attribute "chest pain" originally had four values and was transformed into 
     92three continuous attributes; similar happened to gender, which was transformed 
     93into a single attribute gender=female. 
    5694 
    5795.. image:: images/Continuize-Schema.png 
  • docs/widgets/rst/data/datatable.rst

    r11050 r11359  
    66.. image:: ../../../../Orange/OrangeWidgets/Data/icons/DataTable_48.png 
    77   :alt: Data Table icon 
    8    
     8 
    99Signals 
    1010------- 
     
    1313    - Examples (ExampleTable) 
    1414        Attribute-valued data set. 
    15        
     15 
    1616Outputs: 
    1717    - Selected Examples (Example Table) 
    1818        Selected data instalces 
    19          
     19 
    2020Description 
    2121----------- 
    22      
     22 
    2323Data Table widget takes one or more data sets on its input, and presents 
    24 them in a spreadsheet format. Widget supports sorting by attribute  
    25 values (click on the attribute name in the header row).  
     24them in a spreadsheet format. Widget supports sorting by attribute 
     25values (click on the attribute name in the header row). 
    2626 
    2727Examples 
    2828-------- 
    2929 
    30 We used two :ref:`File` widgets, read the iris and glass data set (provided in Orange distribution), and send them to the Data Table widget. 
     30We used two :ref:`File` widgets, read the iris and glass data set (provided in 
     31Orange distribution), and send them to the Data Table widget. 
    3132 
    3233.. image:: images/DataTable_schema.png 
    3334   :alt: Example data table schema 
    34     
     35 
    3536A snapshot of the widget under these settings is shown below. 
    3637 
  • docs/widgets/rst/data/impute.rst

    r11050 r11359  
    1313Inputs: 
    1414 
    15  
    1615   - Examples (ExampleTable) 
    1716      Data set. 
    1817 
    1918   - Learner for Imputation 
    20       A learning algorithm to be used when values are imputed using a predictive model. This algorithm, if given, substitutes the default (1-NNLearner). 
     19      A learning algorithm to be used when values are imputed using a 
     20      predictive model. This algorithm, if given, substitutes the default 
     21      (1-NNLearner). 
    2122 
    2223 
    2324Outputs: 
    24  
    2525 
    2626   - Examples (ExampleTable) 
     
    3131----------- 
    3232 
    33 Some Orange's algorithms and visualization cannot handle unknown values in the data. This widget does what statistician call imputation: it substitutes them by values computed from the data or set by the user. 
     33Some Orange's algorithms and visualization cannot handle unknown values in the 
     34data. This widget does what statistician call imputation: it substitutes them 
     35by values computed from the data or set by the user. 
    3436 
    3537.. image:: images/Impute.png 
    3638   :alt: Impute widget 
    3739 
    38 In the top-most box, :obj:`Default imputation method`, the user can specify a general imputation technique for all attributes. 
     40In the top-most box, :obj:`Default imputation method`, the user can specify a 
     41general imputation technique for all attributes. 
    3942 
    4043   - :obj:`Don't Impute` does nothing with the missing values. 
    4144 
    42    - :obj:`Average/Most-frequent` uses the average value (for continuous attributes) or the most common value (for discrete attributes). 
     45   - :obj:`Average/Most-frequent` uses the average value (for continuous 
     46     attributes) or the most common value (for discrete attributes). 
    4347 
    44    - :obj:`Model-based imputer` constructs a model for predicting the missing value based on values of other attributes; a separate model is constructed for each attribute. The default model is 1-NN learner, which takes the value from the most similar example (this is sometimes referred to as hot deck imputation). This algorithm can be substituted by one that the user connects to the input signal :obj:`Learner for Imputation`. Note, however, that if there are discrete and continuous attributes in the data, the algorithm needs to be capable of handling them both; at the moment only kNN learner can do that. (In the future, when Orange has more regressors, Impute widget may have separate input signals for discrete and continuous models.) 
     48   - :obj:`Model-based imputer` constructs a model for predicting the missing 
     49     value based on values of other attributes; a separate model is constructed 
     50     for each attribute. The default model is 1-NN learner, which takes the 
     51     value from the most similar example (this is sometimes referred to as hot 
     52     deck imputation). This algorithm can be substituted by one that the user 
     53     connects to the input signal :obj:`Learner for Imputation`. Note, however, 
     54     that if there are discrete and continuous attributes in the data, the 
     55     algorithm needs to be capable of handling them both; at the moment only 
     56     kNN learner can do that. (In the future, when Orange has more regressors, 
     57     Impute widget may have separate input signals for discrete and continuous 
     58     models.) 
    4559 
    46    - :obj:`Random values` computes the distributions of values for each attribute and then imputes by picking random values from them. 
     60   - :obj:`Random values` computes the distributions of values for each 
     61     attribute and then imputes by picking random values from them. 
    4762 
    48    - :obj:`Remove examples with missing values` removes the example containing missing values, except for the attributes for which specific actions are defined as described below. This check also applies to the class attribute if :obj:`Impute class values` is checked. 
     63   - :obj:`Remove examples with missing values` removes the example containing 
     64     missing values, except for the attributes for which specific actions are 
     65     defined as described below. This check also applies to the class attribute 
     66     if :obj:`Impute class values` is checked. 
    4967 
    5068 
    5169 
    52 It is also possible to specify individual treatment for each attribute which override the default treatment set above. One can also specify a manually defined value used for imputation. In the snapshot on the left, we decided not to impute the values of "normalized-losses" and "make", the missing values of "aspiration" will be replaced by random values, while the missing values of "body-style" and "drive-wheels" are replaced by "hatchback" and "fwd", respectively. If the values of "length", "width" or "height" is missing, the example is discarded. Values of all other attributes use the default method set above (model-based imputer, in our case). 
     70It is also possible to specify individual treatment for each attribute which 
     71override the default treatment set above. One can also specify a manually 
     72defined value used for imputation. In the snapshot on the left, we decided not 
     73to impute the values of "normalized-losses" and "make", the missing values of 
     74"aspiration" will be replaced by random values, while the missing values of 
     75"body-style" and "drive-wheels" are replaced by "hatchback" and "fwd", 
     76respectively. If the values of "length", "width" or "height" is missing, 
     77the example is discarded. Values of all other attributes use the default 
     78method set above (model-based imputer, in our case). 
    5379 
    54 Button :obj:`Set All to Default` resets the individual attribute treatments to the default. 
     80Button :obj:`Set All to Default` resets the individual attribute treatments 
     81to the default. 
    5582 
    56 Imputing class values is typically not a good practice, so it is off by default. It can be enabled by checking :obj:`Impute class values`. If checked and the default method is to remove the examples with missing values, then also examples with unknown classes are removed; otherwise they are not. 
     83Imputing class values is typically not a good practice, so it is off by 
     84default. It can be enabled by checking :obj:`Impute class values`. If checked 
     85and the default method is to remove the examples with missing values, then 
     86also examples with unknown classes are removed; otherwise they are not. 
    5787 
    58 All changes are committed immediately is :obj:`Send automatically` is checked. Otherwise, :obj:`Apply` needs to be pushed to apply any new settings. 
     88All changes are committed immediately is :obj:`Send automatically` is checked. 
     89Otherwise, :obj:`Apply` needs to be pushed to apply any new settings. 
  • docs/widgets/rst/data/mergedata.rst

    r11050 r11359  
    2424 
    2525   - Merged Examples A+B (ExampleTable) 
    26       Attribute-valued data set composed from instances from input data A which are appended attributes from input data B and their values determined by matching the values of the selected attributes. 
     26      Attribute-valued data set composed from instances from input data A 
     27      which are appended attributes from input data B and their values 
     28      determined by matching the values of the selected attributes. 
    2729   - Merged Examples B+A (ExampleTable) 
    28       Attribute-valued data set composed from instances from input data B which are appended attributes from input data A and their values determined by matching the values of the selected attributes. 
     30      Attribute-valued data set composed from instances from input data 
     31      B which are appended attributes from input data A and their values 
     32      determined by matching the values of the selected attributes. 
    2933 
    3034 
     
    3236----------- 
    3337 
    34 Merge Data widget is used to horizontally merge two data sets based on the values of selected attributes. On input, two data sets are required, A and B. The widget allows for selection of an attribute from each domain which will be used to perform the merging. When selected, the widget produces two outputs, A+B and B+A. The first output (A+B) corresponds to instances from input data A which are appended attributes from B, and the second output (B+A) to instances from B which are appended attributes from A. 
     38Merge Data widget is used to horizontally merge two data sets based on the 
     39values of selected attributes. On input, two data sets are required, A and B. 
     40The widget allows for selection of an attribute from each domain which will be 
     41used to perform the merging. When selected, the widget produces two outputs, 
     42A+B and B+A. The first output (A+B) corresponds to instances from input 
     43data A which are appended attributes from B, and the second output (B+A) 
     44to instances from B which are appended attributes from A. 
    3545 
    36 The merging is done by the values of the selected (merging) attributes. For example, instances from from A+B are constructed in the following way. First, the value of the merging attribute from A is taken and instances from B are searched with matching values of the merging attributes. If more than a single instance from B is found, the first one is taken and horizontally merged with the instance from A. If no instance from B match the criterium, the unknown values are assigned to the appended attributes. Similarly, B+A is constructed. 
     46The merging is done by the values of the selected (merging) attributes. For 
     47example, instances from from A+B are constructed in the following way. 
     48First, the value of the merging attribute from A is taken and instances 
     49from B are searched with matching values of the merging attributes. If 
     50more than a single instance from B is found, the first one is taken and 
     51horizontally merged with the instance from A. If no instance from B match 
     52the criterium, the unknown values are assigned to the appended attributes. 
     53Similarly, B+A is constructed. 
    3754 
    3855.. image:: images/MergeData1.png 
     
    4259-------- 
    4360 
    44 Below is an example that loads spot intensity data from microarray measurements and spot annotation data. While microarray data consists of measurements of several spots representing equal DNA material (denoted by equal :obj:`Spot ID's`), the annotation data consists of a single line (instance) for each spot. 
     61Below is an example that loads spot intensity data from microarray 
     62measurements and spot annotation data. While microarray data consists of 
     63measurements of several spots representing equal DNA material (denoted by 
     64equal :obj:`Spot ID's`), the annotation data consists of a single line 
     65(instance) for each spot. 
    4566 
    46 Merging the two data sets results in annotations appended to each spot intensity datum. The :obj:`Spot intensities` data is connected to :obj:`Examples A` input of the :obj:`Merge Data` widget, and the :obj:`Spot annotations` data to the :obj:`Examples B` input. Both outputs of the :obj:`Merge Data` widget are then connected to the :obj:`Data Table` widget. In the latter, the :obj:`Merged Examples A+B` are shown. The attributes between :obj:`Spot ID` and :obj:`BG {Ref}`, including these two, are from the :obj:`Spot intensities` data set (:obj:`Examples A`), while the last three are from the :obj:`Spot annotations` data set (:obj:`Examples B`). Only instances representing non-control DNA (these with :obj:`Spot ID` equal to :obj:`ST_Hs_???`) received annotations, while for the others (:obj:`Spot ID = ST_Cr_048`), no annotation data exists in the :obj:`Spot annotations` data and unknown values were assigned to the appended attributes. 
     67Merging the two data sets results in annotations appended to each spot 
     68intensity datum. The :obj:`Spot intensities` data is connected to 
     69:obj:`Examples A` input of the :ref:`Merge Data` widget, and the 
     70:obj:`Spot annotations` data to the :obj:`Examples B` input. Both outputs 
     71of the :ref:`Merge Data` widget are then connected to the :ref:`Data Table` 
     72widget. In the latter, the :obj:`Merged Examples A+B` are shown. 
     73The attributes between :obj:`Spot ID` and :obj:`BG {Ref}`, including these 
     74two, are from the :obj:`Spot intensities` data set (:obj:`Examples A`), 
     75while the last three are from the :obj:`Spot annotations` data set 
     76(:obj:`Examples B`). Only instances representing non-control DNA (these 
     77with :obj:`Spot ID` equal to :obj:`ST_Hs_???`) received annotations, while 
     78for the others (:obj:`Spot ID = ST_Cr_048`), no annotation data exists in 
     79the :obj:`Spot annotations` data and unknown values were assigned to the 
     80appended attributes. 
    4781 
    4882.. image:: images/MergeData2s.png 
     
    5286---- 
    5387 
    54 If the two data sets consists of equally-named attributes (others than the ones used to perform the merging), Orange will by default check for consistency of the values of these attributes and report an error in case of non-matching values. In order to avoid the consistency checking, make sure that new attributes are created for each data set: you may use "... Always create a new attribute" option in the `File <File.htm>`_ widget for loading the data. 
     88If the two data sets consists of equally-named attributes (others than the 
     89ones used to perform the merging), Orange will by default check for 
     90consistency of the values of these attributes and report an error in 
     91case of non-matching values. In order to avoid the consistency checking, 
     92make sure that new attributes are created for each data set: you may use 
     93"... Always create a new attribute" option in the :ref:`File` widget for 
     94loading the data. 
  • docs/widgets/rst/data/outliers.rst

    r11050 r11359  
    1919Outputs: 
    2020   - Outliers (ExampleTable) 
    21       Attribute-valued data set containing only examples that are outliers. Meta attribute Z-score is added. 
     21      Attribute-valued data set containing only examples that are outliers. 
     22      Meta attribute Z-score is added. 
    2223   - Inliers (ExampleTable) 
    23       Attribute-valued data set containing only examples that are not outliers. Meta attribute Z-score is added. 
     24      Attribute-valued data set containing only examples that are not 
     25      outliers. Meta attribute Z-score is added. 
    2426   - Examples with Z-scores (ExampleTable) 
    25       Attribute-valued data set containing examples from input data with corresponding Z-scores as meta attribute. 
     27      Attribute-valued data set containing examples from input data with 
     28      corresponding Z-scores as meta attribute. 
    2629 
    2730 
     
    3235 
    3336Outliers widget first computes distances between each pair of examples in input 
    34 Examples. Average distance between example to its nearest examples is valued by a 
    35 Z-score. Z-scores higher than zero denote an example that is more distant to other examples  
    36 than average. Input can also be a distance matrix: in this case precalculated distances are used. 
     37Examples. Average distance between example to its nearest examples is valued by 
     38a Z-score. Z-scores higher than zero denote an example that is more distant to 
     39other examples than average. Input can also be a distance matrix: in this case 
     40precalculated distances are used. 
    3741 
    38 Two parameters for Z-score calculation can be choosen: distance metrics and number of nearest examples to which 
    39 example's average distance is computed. Also, minimum Z-score to consider an example as outlier 
    40 can be set. Note, that higher the example's Z-score, more distant is the example from other examples. 
     42Two parameters for Z-score calculation can be choosen: distance metrics and 
     43number of nearest examples to which example's average distance is computed. 
     44Also, minimum Z-score to consider an example as outlier can be set. Note, that 
     45higher the example's Z-score, more distant is the example from other examples. 
    4146 
    4247Changes are applied automatically. 
     
    4954 
    5055Below is a simple example how to use this widget. The input is fed 
    51 directly from the `File <File.htm>`_ widget, and the output Examples with Z-score 
    52 to the `Data Table <DataTable.htm>`_ widget. 
     56directly from the :ref:`File` widget, and the output Examples with Z-score 
     57to the :ref:`Data Table` widget. 
    5358 
    5459.. image:: images/Outliers-Example1.gif 
  • docs/widgets/rst/data/purgedomain.rst

    r11050 r11359  
    66.. image:: ../icons/PurgeDomain.png 
    77 
    8 Removes the unused attribute values and useless attributes, sorts values of the remaining. 
     8Removes the unused attribute values and useless attributes, sorts values of 
     9the remaining. 
    910 
    1011Signals 
     
    2829----------- 
    2930 
    30 Definitions of nominal attributes sometimes contain values which don't appear in the data. Even if this does not happen in the original data, filtering the data, selecting examples subsets and similar can remove all examples for which the attribute has some particular value. Such values clutter data presentation, especially various visualizations, and should be removed. 
     31Definitions of nominal attributes sometimes contain values which don't appear 
     32in the data. Even if this does not happen in the original data, filtering the 
     33data, selecting examples subsets and similar can remove all examples for which 
     34the attribute has some particular value. Such values clutter data presentation, 
     35especially various visualizations, and should be removed. 
    3136 
    32 After purging an attribute, it may become single-valued or, in extreme case, have no values at all (if the value of this attribute was undefined for all examples). In such cases, the attribute can be removed. 
     37After purging an attribute, it may become single-valued or, in extreme case, 
     38have no values at all (if the value of this attribute was undefined for all 
     39examples). In such cases, the attribute can be removed. 
    3340 
    34 A different issue is the order of attribute values: if the data is read from a file in a format where the values are not declared in advance, they are sorted "in order of appearance". Sometimes we would prefer to have them sorted alphabetically. 
     41A different issue is the order of attribute values: if the data is read from a 
     42file in a format where the values are not declared in advance, they are sorted 
     43"in order of appearance". Sometimes we would prefer to have them sorted 
     44alphabetically. 
    3545 
    3646.. image:: images/PurgeDomain.png 
    3747 
    38 Such purification is done by widget Purge Domain. Ordinary attributes and class attributes are treated separately. For each, we can decide if we want the values sorted or not. Next, we may allow the widget to remove attributes with less than two values, or remove the class attribute if there are less than two classes. Finally, we can instruct the widget to check which values of attributes actually appear in the data and remove the unused values. The widget cannot remove values if it is not allowed to remove the attributes; since (potentially) having attributes without values makes no sense. 
     48Such purification is done by widget Purge Domain. Ordinary attributes and class 
     49attributes are treated separately. For each, we can decide if we want the 
     50values sorted or not. Next, we may allow the widget to remove attributes with 
     51less than two values, or remove the class attribute if there are less than two 
     52classes. Finally, we can instruct the widget to check which values of 
     53attributes actually appear in the data and remove the unused values. The widget 
     54cannot remove values if it is not allowed to remove the attributes; since 
     55(potentially) having attributes without values makes no sense. 
    3956 
    40 If :obj:`Send automatically` is checked, the widget will send data at each change of widget settings. Otherwise, sending the data needs to be explicitly initiated by clicking the :obj:`Send data` button. 
     57If :obj:`Send automatically` is checked, the widget will send data at each 
     58change of widget settings. Otherwise, sending the data needs to be explicitly 
     59initiated by clicking the :obj:`Send data` button. 
    4160 
    42 The new, reduced attributes get a prefix "R", which distinguishes them from the original ones. The values of new attributes can be computed from the old ones, but not the opposite. This means that if you construct a classifier from the new attributes, you can use it to classify the examples described by the original attributes. But not the opposite: constructing the classifier from old attributes and using it on examples described by the reduced ones won't work. Fortunately, the latter is seldom the case. In a typical setup, one would explore the data, visualize it, filter it, purify it... and then test the final model on the original data. 
     61The new, reduced attributes get a prefix "R", which distinguishes them from 
     62the original ones. The values of new attributes can be computed from the old 
     63ones, but not the opposite. This means that if you construct a classifier from 
     64the new attributes, you can use it to classify the examples described by the 
     65original attributes. But not the opposite: constructing the classifier from 
     66old attributes and using it on examples described by the reduced ones won't 
     67work. Fortunately, the latter is seldom the case. In a typical setup, one would 
     68explore the data, visualize it, filter it, purify it... and then test the final 
     69model on the original data. 
    4370 
    4471Examples 
    4572-------- 
    4673 
    47 Purge Domain would typically appear after data filtering, for instance when selecting a subset of visualized examples. 
     74Purge Domain would typically appear after data filtering, for instance when 
     75selecting a subset of visualized examples. 
    4876 
    4977.. image:: images/PurgeDomain-Schema.png 
    5078   :alt: Schema with Purge Domain 
    5179 
    52 In the above schema we play with the Zoo data set: we visualize it and select a portion of the data which contains only four out of the seven original classes. To get rid of the empty classes, we put the data through Purge Domain before going on in, say, Attribute Statistics widget. The latter shows only the four classes which actually appear. To see the effect of data purification, uncheck :obj:`Remove unused class values` and observe the effect this has on Attribute Statistics. 
     80In the above schema we play with the Zoo data set: we visualize it and select 
     81a portion of the data which contains only four out of the seven original 
     82classes. To get rid of the empty classes, we put the data through Purge Domain 
     83before going on in, say, Attribute Statistics widget. The latter shows only 
     84the four classes which actually appear. To see the effect of data 
     85purification, uncheck :obj:`Remove unused class values` and observe the effect 
     86this has on Attribute Statistics. 
    5387 
    5488.. image:: images/PurgeDomain-Widgets.png 
  • docs/widgets/rst/data/rank.rst

    r11050 r11359  
    2525 
    2626   - ExampleTable Attributes (ExampleTable) 
    27       Data set in where each example corresponds to an attribute from the original set, and the attributes correspond one of the selected attribute evaluation measures. 
     27      Data set in where each example corresponds to an attribute from the 
     28      original set, and the attributes correspond one of the selected 
     29      attribute evaluation measures. 
    2830 
    2931 
     
    3133----------- 
    3234 
    33 This widget computes a set of measures for evaluating the quality/usefulness of attributes: ReliefF, information gain, gain ratio and gini index. Besides providing this information, it also allows user to select a subset of attributes or it can automatically select the specified number of best-ranked attributes. 
     35This widget computes a set of measures for evaluating the quality/usefulness 
     36of attributes: ReliefF, information gain, gain ratio and gini index. 
     37Besides providing this information, it also allows user to select a subset 
     38of attributes or it can automatically select the specified number of 
     39best-ranked attributes. 
    3440 
    3541.. image:: images/Rank.png 
    3642 
    37 The right-hand side of the widget presents the computed quality of the attributes. The first line shows the attribute name and the second the number of its values (or a "C", if the attribute is continuous. Remaining columns show different measures of quality. 
     43The right-hand side of the widget presents the computed quality of the 
     44attributes. The first line shows the attribute name and the second the 
     45number of its values (or a "C", if the attribute is continuous. Remaining 
     46columns show different measures of quality. 
    3847 
    39 The user is able to select the measures (s)he wants computed and presented. :obj:`ReliefF` requires setting two arguments: the number of :obj:`Neighbours` taken into account and the number of randomly chosen reference :obj:`Examples`. The former should be higher if there is a lot of noise; the latter generally makes the computation less reliable if set too low, while higher values make it slow. 
     48The user is able to select the measures (s)he wants computed and presented. 
     49:obj:`ReliefF` requires setting two arguments: the number of :obj:`Neighbours` 
     50taken into account and the number of randomly chosen reference :obj:`Examples`. 
     51The former should be higher if there is a lot of noise; the latter generally 
     52makes the computation less reliable if set too low, while higher values 
     53make it slow. 
    4054 
    41 The order in which the attributes are presented can be set either in the list below the measures or by clicking the table's column headers. Attributes can also be sorted by a measure not printed in the table. 
     55The order in which the attributes are presented can be set either in the 
     56list below the measures or by clicking the table's column headers. Attributes 
     57can also be sorted by a measure not printed in the table. 
    4258 
    43 Measures that cannot handle continuous attributes (impurity measures - information gain, gain ratio and gini index) are run on discretized attributes. For sake of simplicity we always split the continuous attributes in intervals with (approximately) equal number of examples, but the user can set the number of :obj:`Intervals`. 
     59Measures that cannot handle continuous attributes (impurity 
     60measures - information gain, gain ratio and gini index) are run on 
     61discretized attributes. For sake of simplicity we always split the 
     62continuous attributes in intervals with (approximately) equal number of 
     63examples, but the user can set the number of :obj:`Intervals`. 
    4464 
    45 It is also possible to set the number of decimals (:obj:`No. of decimals`) in the print out. Using a number to high may exaggerate the accuracy of the computation; many decimals may only be useful when the computed numbers are really small. 
     65It is also possible to set the number of decimals 
     66(:obj:`No. of decimals`) in the print out. Using a number to high may 
     67exaggerate the accuracy of the computation; many decimals may only be 
     68useful when the computed numbers are really small. 
    4669 
    47 The widget outputs two example tables. The one, whose corresponding signal is named :code:`ExampleTable Attributes` looks pretty much like the one shown in the Rank widget, except that the second column is split into two columns, one giving the attribute type (D for discrete and C for continuous), and the other giving the number of distinct values if the attribute is discrete and undefined if it's continuous. 
     70The widget outputs two example tables. The one, whose corresponding signal 
     71is named :code:`ExampleTable Attributes` looks pretty much like the one 
     72shown in the Rank widget, except that the second column is split into two 
     73columns, one giving the attribute type (D for discrete and C for continuous), 
     74and the other giving the number of distinct values if the attribute is 
     75discrete and undefined if it's continuous. 
    4876 
    49 The second, more interesting table has the same examples as the original, but with a subset of the attributes. To select/unselect attributes, click the corresponding rows in the table. This way, the widget can be used for manual selection of attributes. Something similar can also be done with a `Select Attributes <SelectAttributes.htm>`_ widget, except that the Rank widget can be used for selecting the attributes according to their quality, while Select Attributes offers more in terms of changing the order of attributes, picking another class attribute and similar. 
     77The second, more interesting table has the same examples as the original, 
     78but with a subset of the attributes. To select/unselect attributes, click 
     79the corresponding rows in the table. This way, the widget can be used for 
     80manual selection of attributes. Something similar can also be done with 
     81a :ref:`Select Attributes` widget, except that the Rank widget can be used 
     82for selecting the attributes according to their quality, while Select 
     83Attributes offers more in terms of changing the order of attributes, 
     84picking another class attribute and similar. 
    5085 
    51 The widget can also be used to automatically select a feature subset. If :obj:`Best ranked` is selected in box :obj:`Select attributes`, the widget will output a data set where examples are described by the specified number of best ranked attributes. The data set is changed whenever the order of attributes is changed for any reason (different measure is selected for sorting, ReliefF or discretization settings are changed...) 
     86The widget can also be used to automatically select a feature subset. 
     87If :obj:`Best ranked` is selected in box :obj:`Select Attributes`, the 
     88widget will output a data set where examples are described by the 
     89specified number of best ranked attributes. The data set is changed 
     90whenever the order of attributes is changed for any reason (different 
     91measure is selected for sorting, ReliefF or discretization settings are 
     92changed...) 
    5293 
    53 The first two options in :obj:`Select Attributes` box can be used to clear the selection (:obj:`None`) or to select all attributes (:obj:`All`). 
     94The first two options in :obj:`Select Attributes` box can be used to 
     95clear the selection (:obj:`None`) or to select all attributes (:obj:`All`). 
    5496 
    55 Button :obj:`Commit` sends the data set with the selected attributes. If :obj:`Send automatically` is set, the data set is committed on any change. 
     97Button :obj:`Commit` sends the data set with the selected attributes. 
     98If :obj:`Commit automatically` is set, the data set is committed on any change. 
    5699 
    57100 
     
    59102-------- 
    60103 
    61 On typical use of the widget is to put it immediately after the `File widget <File.htm>`_ to reduce the attribute set. The snapshot below shows this as a part of a bit more complicated schema. 
     104On typical use of the widget is to put it immediately after the :ref:`File` 
     105widget to reduce the attribute set. The snapshot below shows this as a part of 
     106a bit more complicated schema. 
    62107 
    63108.. image:: images/Rank-after-file-Schema.png 
    64109 
    65 The examples in the file are put through `Data Sampler <DataSampler.htm>`_ which split the data set into two subsets: one, containing 70% of examples (signal :code:`Classified Examples`) will be used for training a `naive Bayesian classifier <../Classify/NaiveBayes.htm>`_, and the other 30% (signal :code:`Remaining Classified Examples`) for testing. Attribute subset selection based on information gain was performed on the training set only, and five most informative attributes were selected for learning. A data set with all other attributes removed (signal :code:`Reduced Example Table`) is fed into :code:`Test Learners`. Test Learners widgets also gets the :code:`Remaining Classified Examples` to use them as test examples (don't forget to set :code:`Test on Test Data` in that widget!). 
     110The examples in the file are put through ref:`Data Sampler` which split the 
     111data set into two subsets: one, containing 70% of examples (signal 
     112:code:`Classified Examples`) will be used for training a 
     113:ref:`Naive Bayes <Naive Bayes>` classifier, and the other 30% (signal 
     114:code:`Remaining Classified Examples`) for testing. Attribute subset selection 
     115based on information gain was performed on the training set only, and five most 
     116informative attributes were selected for learning. A data set with all other 
     117attributes removed (signal :code:`Reduced Example Table`) is fed into 
     118:ref:`Test Learners`. Test Learners widgets also gets the 
     119:code:`Remaining Classified Examples` to use them as test examples (don't 
     120forget to set :code:`Test on Test Data` in that widget!). 
    66121 
    67 To verify how the subset selection affects the classifier's performance, we added another :code:`Test Learners`, but connected it to the :code:`Data Sampler` so that the two subsets emitted by the latter are used for training and testing without any feature subset selection. 
     122To verify how the subset selection affects the classifier's performance, we 
     123added another :ref:`Test Learners`, but connected it to the 
     124:code:`Data Sampler` so that the two subsets emitted by the latter are used 
     125for training and testing without any feature subset selection. 
    68126 
    69 Running this schema on the heart disease data set shows quite a considerable improvements in all respects on the reduced attribute subset. 
     127Running this schema on the heart disease data set shows quite a considerable 
     128improvements in all respects on the reduced attribute subset. 
    70129 
    71 In another, way simpler example, we connected a `Tree Viewer <../Classify/ClassificationTreeGraph.htm>`_ to the Rank widget to observe different attribute quality measures at different nodes. This can give us some picture about how important is the selection of measure in tree construction: the more the measures agree about attribute ranking, the less crucial is the measure selection. 
     130In another, way simpler example, we connected a 
     131:ref:`Classification Tree Viewer` to the Rank widget to observe different 
     132attribute quality measures at different nodes. This can give us some picture 
     133about how important is the selection of measure in tree construction: the more 
     134the measures agree about attribute ranking, the less crucial is the measure 
     135selection. 
    72136 
    73137.. image:: images/Rank-Tree.png 
    74138 
    75 A variation of the above is using the Rank widget after the `Interactive tree builder <../Classify/InteractiveTreeBuilder.htm>`_: the sorted attributes may help us in deciding the attribute to use at a certain node. 
     139A variation of the above is using the Rank widget after the 
     140:ref:`Interactive Tree Builder`: the sorted attributes may help us in deciding 
     141the attribute to use at a certain node. 
    76142 
    77143.. image:: images/Rank-ITree.png 
  • docs/widgets/rst/data/selectattributes.rst

    r11050 r11359  
    44================= 
    55 
    6 .. image:: ../../../../Orange/OrangeWidgets/Data/icons/SelectAttributes_48.png 
     6.. image:: images/SelectAttributes_icon.png 
    77   :alt: Select Attributes icon 
    8     
     8 
    99Signals 
    1010------- 
     
    1616Outputs: 
    1717  - Examples (ExampleTable) 
    18       Attribute-valued data set composed using the domain  
     18      Attribute-valued data set composed using the domain 
    1919      specification constructed using the widget. 
    20        
     20 
    2121Description 
    2222----------- 
    2323 
    24 Select Attributes widget is used to manually compose your data  
    25 domain, that is, to decide which attributes will be used and how.  
    26 Orange distinguishes between ordinary attributes, an (optional) class attributes  
    27 and meta attributes. For instance, for building a classification model, the  
    28 domain would be composed of a set of attributes and a discrete class attribute.  
    29 Meta attributes are not used in modelling, but several widgets can use them  
    30 for providing optional labels to instances. 
     24Select Attributes widget is used to manually compose your data 
     25domain, that is, to decide which attributes will be used and how. 
     26Orange distinguishes between ordinary attributes, an (optional) class 
     27attributes and meta attributes. For instance, for building a classification 
     28model, the domain would be composed of a set of attributes and a discrete class 
     29attribute. Meta attributes are not used in modelling, but several widgets can 
     30use them  for providing optional labels to instances. 
    3131 
    32 Orange attributes are typed and are either discrete, continuous or  
    33 a character string. The attribute type is marked with a symbol appearing  
     32Orange attributes are typed and are either discrete, continuous or 
     33a character string. The attribute type is marked with a symbol appearing 
    3434before the name of the attribute (D, C, S, respectively). 
    3535 
    36 Changes made to the domain are propagated to the output by pressing an  
    37 Apply button. Reset will present the attributes as defined in original  
     36Changes made to the domain are propagated to the output by pressing an 
     37Apply button. Reset will present the attributes as defined in original 
    3838domain in the data set from the input signal 
    3939 
    4040.. image:: images/SelectAttributes.png 
    4141   :alt: Select Attributes widget example 
    42     
     42 
    4343Examples 
    4444-------- 
    45 Below is a simple example how to use this widget. The input is fed directly from  
    46 the :ref:`File` widget, and the output to the :ref:`Data Table` widget. We have also linked  
    47 the former to the File widget so that one can inspect the difference in the  
    48 domain composition. 
    4945 
    50 .. image:: images/SelectAttributes_schema.* 
     46Below is a simple example how to use this widget. The input is fed directly 
     47from  the :ref:`File` widget, and the output to the :ref:`Data Table` widget. 
     48We have also linked the former to the File widget so that one can inspect the 
     49difference in the domain composition. 
     50 
     51.. image:: images/SelectAttributes_schema.png 
    5152   :alt: Select Attributes schema 
  • docs/widgets/rst/data/selectdata.rst

    r11050 r11359  
    1818Outputs: 
    1919   - Matching Examples (ExampleTable) 
    20       Attribute-valued data set composed from instances from input data set that match user-defined condition. 
     20      Attribute-valued data set composed from instances from input data set 
     21      that match user-defined condition. 
    2122   - Non-Matching Examples (ExampleTable) 
    22       Data instances from input data set that do not match the user-defined condition. 
     23      Data instances from input data set that do not match the user-defined 
     24      condition. 
    2325 
    2426 
  • docs/widgets/rst/evaluate/calibrationplot.rst

    r11050 r11359  
    66.. image:: ../icons/CalibrationPlot.png 
    77 
    8 Shows the match between the classifiers' probability predictions and actual class probabilities. 
     8Shows the match between the classifiers' probability predictions and actual 
     9class probabilities. 
    910 
    1011Signals 
     
    2223----------- 
    2324 
    24 Calibration plot plots the class probabilities against those predicted by the classifier(s). 
     25Calibration plot plots the class probabilities against those predicted by 
     26the classifier(s). 
    2527 
    2628.. image:: images/CalibrationPlot.png 
    2729 
    28 Option :obj:`Target class` chooses the positive class. In case there are more than two classes, the widget considers all other classes as a single, negative class. If the test results contain more than one classifier, the user can choose which curves she or he wants to see plotted. 
     30Option :obj:`Target class` chooses the positive class. In case there are more 
     31than two classes, the widget considers all other classes as a single, negative 
     32class. If the test results contain more than one classifier, the user can 
     33choose which curves she or he wants to see plotted. 
    2934 
    30 The diagonal represents the optimal behaviour; the close the classifier gets, the more accurate its predictions. 
     35The diagonal represents the optimal behaviour; the close the classifier gets, 
     36the more accurate its predictions. 
    3137 
    32 If :obj:`Show rug` is enable, ticks at the bottom and the top of the graph represents negative and positive examples (respectively). Their position corresponds to classifier's probability prediction and the color shows the classifier. On the bottom of the graph, the points to the left are those which are (correctly) assigned a low probability of the target class, and those to the right are incorrectly assigned high probabilities. On the top of the graph, the instances to the right are correctly assigned hight probabilities and vice versa. 
     38If :obj:`Show rug` is enable, ticks at the bottom and the top of the graph 
     39represents negative and positive examples (respectively). Their position 
     40corresponds to classifier's probability prediction and the color shows the 
     41classifier. On the bottom of the graph, the points to the left are those 
     42which are (correctly) assigned a low probability of the target class, and 
     43those to the right are incorrectly assigned high probabilities. On the top 
     44of the graph, the instances to the right are correctly assigned hight 
     45probabilities and vice versa. 
    3346 
    3447Example 
    3548------- 
    3649 
    37 At the moment, the only widget which give the right type of the signal needed by the Calibration Plot is `Test Learners <TestLearners.htm>`_. The Lift Curve will hence always follow Test Learners and, since it has no outputs, no other widgets follow it. Here is a typical example. 
     50At the moment, the only widget which give the right type of the signal 
     51needed by the Calibration Plot is :ref:`Test Learners`. The Calibration Plot 
     52will hence always follow Test Learners and, since it has no outputs, no other 
     53widgets follow it. Here is a typical example. 
    3854 
    3955.. image:: images/ROCLiftCalibration-Schema.png 
  • docs/widgets/rst/evaluate/confusionmatrix.rst

    r11050 r11359  
    1313Inputs: 
    1414   - Evaluation results (orngTest.ExperimentResults) 
    15       Results of testing the algorithms; typically from `Test Learners <TestLearners.htm>`_ 
     15      Results of testing the algorithms; typically from :ref:`Test Learners` 
    1616 
    1717 
     
    2424----------- 
    2525 
    26 Confusion Matrix gives the number/proportion of examples from one class classified in to another (or same) class. Besides that, selecting elements of the matrix feeds the corresponding examples onto the output signal. This way, one can observe which specific examples were misclassified in a certain way. 
     26Confusion Matrix gives the number/proportion of examples from one class 
     27classified in to another (or same) class. Besides that, selecting elements 
     28of the matrix feeds the corresponding examples onto the output signal. This 
     29way, one can observe which specific examples were misclassified in a certai 
     30way. 
    2731 
    28 The widget usually gets the evaluation results from `Test Learners <TestLearners.htm>`_; an example of the schema is shown below. 
     32The widget usually gets the evaluation results from :ref:`Test Learners`; 
     33an example of the schema is shown below. 
    2934 
    3035.. image:: images/ConfusionMatrix.png 
    3136 
    32 The widget on the snapshot shows the confusion matrix for classification tree and naive Bayesian classifier trained and tested on the Iris data. The righthand side of the widget contains the matrix for naive Bayesian classifier (since this classifier is selected on the left). Each row corresponds to a correct class, and columns represent the predicted classes. For instance, seven examples of Iris-versicolor were misclassified as Iris-virginica. The rightmost column gives the number of examples from each class (there are 50 irises of each of the three classes) and the bottom row gives the number of examples classified into each class (e.g., 52 instances were classified into virginica). 
     37The widget on the snapshot shows the confusion matrix for classification 
     38tree and naive Bayesian classifier trained and tested on the Iris data. 
     39The righthand side of the widget contains the matrix for naive Bayesian 
     40classifier (since this classifier is selected on the left). Each row 
     41corresponds to a correct class, and columns represent the predicted classes. 
     42For instance, seven examples of Iris-versicolor were misclassified as 
     43Iris-virginica. The rightmost column gives the number of examples from 
     44each class (there are 50 irises of each of the three classes) and the bottom 
     45row gives the number of examples classified into each class (e.g., 52 
     46instances were classified into virginica). 
    3347 
    34 When the evaluation results contain data on multiple learning algorithms, we have to choose one in in box :obj:`Learners`. 
     48When the evaluation results contain data on multiple learning algorithms, 
     49we have to choose one in in box :obj:`Learners`. 
    3550 
    3651.. image:: images/ConfusionMatrix-Schema.png 
    3752 
    38 In :obj:`Show` we select what data we would like to see in the matrix. In the above example, we are observing the :obj:`Number of examples`. The alternatives are :obj:`Proportions of predicted` and :obj:`Proportions of true` classes. In the iris example, "proportions of predicted" shows how many of examples classified as, say, Iris-versicolor are in which true class; in the table we can read the 0% of them are actually setosae, 89.6% of those classified as versicolor are versicolors, and 10.4% are virginicae. 
     53In :obj:`Show` we select what data we would like to see in the matrix. 
     54In the above example, we are observing the :obj:`Number of examples`. 
     55The alternatives are :obj:`Proportions of predicted` and 
     56:obj:`Proportions of true` classes. In the iris example, "proportions of 
     57predicted" shows how many of examples classified as, say, Iris-versicolor 
     58are in which true class; in the table we can read the 0% of them are 
     59actually setosae, 89.6% of those classified as versicolor are versicolors, 
     60and 10.4% are virginicae. 
    3961 
    4062.. image:: images/ConfusionMatrix-propTrue.png 
    4163 
    42 Proportions of predicted shows the opposite relation: of all true versicolors, 86% were classified as versicolors and 14% as virginicae. 
     64Proportions of predicted shows the opposite relation: of all true versicolors, 
     6586% were classified as versicolors and 14% as virginicae. 
    4366 
    44 Button :obj:`Correct` sends all correctly classified examples to the output by selecting the diagonal of the matrix. :obj:`Misclassified` selects the misclassified examples. :obj:`None` annulates the selection. As mentioned before, one can also select individual cells of the table, to select specific kinds of misclassified examples, e.g. the versicolors classified as virginicae. 
     67Button :obj:`Correct` sends all correctly classified examples to the output 
     68by selecting the diagonal of the matrix. :obj:`Misclassified` selects the 
     69misclassified examples. :obj:`None` annulates the selection. As mentioned 
     70before, one can also select individual cells of the table, to select specific 
     71kinds of misclassified examples, e.g. the versicolors classified as virginicae. 
    4572 
    46 When sending the selecting examples the widget can add new attributes telling the predicted classes or their probabilities, if the corresponding options :obj:`Append class prediction` and/or :obj:`Append predicted class probabilities` are checked. 
     73When sending the selecting examples the widget can add new attributes telling 
     74the predicted classes or their probabilities, if the corresponding options 
     75:obj:`Append class prediction` and/or 
     76:obj:`Append predicted class probabilities` are checked. 
    4777 
    48 The widget updates the output at every change if :obj:`Commit automatically` is checked. If not, the user will need to press :obj:`Commit` to commit the changes. 
     78The widget updates the output at every change if :obj:`Commit automatically` 
     79is checked. If not, the user will need to press :obj:`Commit` to commit the 
     80changes. 
    4981 
    5082Example 
     
    5587.. image:: images/ConfusionMatrix-Schema.png 
    5688 
    57 `Test Learners <TestLearners.htm>`_ gets data from `File <../Data/File.htm>`_ and two learning algorithms from `Naive Bayes <../Classify/NaiveBayes.htm>`_ and `Classification Tree <../Classify/ClassificationTree.htm>`_. It performs cross-validation or some other train-and-test procedures to get class predictions by both algorithms for all (or some, depending on the procedure) examples from the data. The test results are fed into the confusion matrix, where we can observe how many examples were misclassified in which way. 
     89:ref:`Test Learners` gets data from :ref:`File` and two learning algorithms 
     90from :ref:`Naive Bayes` and :ref:`Classification Tree`. It performs 
     91cross-validation or some other train-and-test procedures to get 
     92class predictions by both algorithms for all (or some, depending on the 
     93procedure) examples from the data. The test results are fed into the confusion 
     94matrix, where we can observe how many examples were misclassified in which way. 
    5895 
    59 On the output we connected two other widgets. `Data Table <../Data/DataTable.htm>`_ will show the examples we select in the Confusion matrix. If we, for instance, click :obj:`Misclassified` the table will contain all examples which were misclassified by the selected method. 
     96On the output we connected two other widgets. :ref:`Data Table` will show 
     97the examples we select in the Confusion matrix. If we, for instance, 
     98click :obj:`Misclassified` the table will contain all examples which were 
     99misclassified by the selected method. 
    60100 
    61 `Scatter Plot <../Visualize/ScatterPlot.htm>`_ gets two set of examples. From the file widget, it gets the complete data and the confusion matrix will send only the selected data, for instance the misclassified examples. The scatter plot will show all the data, with the symbols representing the selected data filled and the other symbols hollow. 
     101:ref:`Scatter Plot` gets two set of examples. From the file widget, it gets 
     102the complete data and the confusion matrix will send only the selected data, 
     103for instance the misclassified examples. The scatter plot will show all the 
     104data, with the symbols representing the selected data filled and the other 
     105symbols hollow. 
    62106 
    63 For a nice example, we can load the iris data set and observe the position of misclassified examples in the scatter plot with attributes petal length and petal width used for x and y axes. As expected, the misclassified examples lie on the boundary between the two classes. 
     107For a nice example, we can load the iris data set and observe the position 
     108of misclassified examples in the scatter plot with attributes petal 
     109length and petal width used for x and y axes. As expected, the misclassified 
     110examples lie on the boundary between the two classes. 
    64111 
    65112.. image:: images/ConfusionMatrix-Example.png 
  • docs/widgets/rst/evaluate/liftcurve.rst

    r11050 r11359  
    2222----------- 
    2323 
    24 Lift curves show the relation between the number of instances which were predicted positive and those of them that are indeed positive. This type of curve is often used in segmenting the population, e.g., plotting the number of responding customers against the number of all customers contacted. Given the costs of false positives and false negatives, it can also determine the optimal classifier and threshold. 
     24Lift curves show the relation between the number of instances which were 
     25predicted positive and those of them that are indeed positive. This type of 
     26curve is often used in segmenting the population, e.g., plotting the number 
     27of responding customers against the number of all customers contacted. Given 
     28the costs of false positives and false negatives, it can also determine the 
     29optimal classifier and threshold. 
    2530 
    2631.. image:: images/LiftCurve.png 
    2732 
    28 Option :obj:`Target class` chooses the positive class. In case there are more than two classes, the widget considers all other classes as a single, negative class. 
     33Option :obj:`Target class` chooses the positive class. In case there are 
     34more than two classes, the widget considers all other classes as a single, 
     35negative class. 
    2936 
    30 If the test results contain more than one classifier, the user can choose which curves she or he wants to see plotted. :obj:`Show convex lift hull` plots a convex hull over lift curves for all classifiers. The curve thus shows the optimal classifier (or combination thereof) for each desired TP/P rate. The diagonal line represents the behaviour of a random classifier. 
     37If the test results contain more than one classifier, the user can choose 
     38which curves she or he wants to see plotted. :obj:`Show convex lift hull` 
     39plots a convex hull over lift curves for all classifiers. The curve thus 
     40shows the optimal classifier (or combination thereof) for each desired TP/P 
     41rate. The diagonal line represents the behaviour of a random classifier. 
    3142 
    32 The user can specify the cost of false positives and false negatives, and the prior target class probability. :obj:`Compute from Data` sets it to the proportion of examples of this class in the data. The black line in the graph, which corresponds to the right-hand axis, gives the total cost for each P ration for the optimal classifier among those selected in the list box on the left. The minimum is labelled by the optimal classifier at that point and the related cost. 
     43The user can specify the cost of false positives and false negatives, and 
     44the prior target class probability. :obj:`Compute from Data` sets it to the 
     45proportion of examples of this class in the data. The black line in the 
     46graph, which corresponds to the right-hand axis, gives the total cost for 
     47each P ration for the optimal classifier among those selected in the list 
     48box on the left. The minimum is labelled by the optimal classifier at that 
     49point and the related cost. 
    3350 
    34 The widget allows setting costs from 1 to 1000. The units are not important, as are not the magnitudes. What matters is the relation between the two costs, so setting them to 100 and 200 will give the same result as 400 and 800. 
     51The widget allows setting costs from 1 to 1000. The units are not important, 
     52as are not the magnitudes. What matters is the relation between the two 
     53costs, so setting them to 100 and 200 will give the same result as 400 and 800. 
    3554 
    3655Example 
    3756------- 
    3857 
    39 At the moment, the only widget which give the right type of the signal needed by the Lift Curve is `Test Learners <TestLearners.htm>`_. The Lift Curve will hence always follow Test Learners and, since it has no outputs, no other widgets follow it. Here is a typical example. 
     58At the moment, the only widget which give the right type of the signal 
     59needed by the Lift Curve is :ref:`Test Learners`. The Lift Curve will hence 
     60always follow Test Learners and, since it has no outputs, no other widgets 
     61follow it. Here is a typical example. 
    4062 
    4163.. image:: images/ROCLiftCalibration-Schema.png 
  • docs/widgets/rst/evaluate/predictions.rst

    r11050 r11359  
    1515      Data to be classified 
    1616   - Predictors (orange.Classifier) 
    17       Predictors to be used on the data; multiple widget can connect to this slot 
     17      Predictors to be used on the data; multiple widget can connect to this 
     18      slot 
    1819 
    1920Outputs: 
     
    2526----------- 
    2627 
    27 The widget gets a data set and one or more predictors (classifiers, not learning algorithms - see the example below). It shows a table with the data and the predictions made. 
     28The widget gets a data set and one or more predictors (classifiers, not 
     29learning algorithms - see the example below). It shows a table with the 
     30data and the predictions made. 
    2831 
    29 Despite its simplicity, the widget allows for quite interesting analysis of decisions of predictive models; there is a simple demonstration at the end of the page. Note also, however, the related widget `Confusion Matrix <ConfusionMatrix.htm>`_. Although many things can be done with any of them, there are tasks for which one of them might be much more convenient than the other. 
     32Despite its simplicity, the widget allows for quite interesting analysis of 
     33decisions of predictive models; there is a simple demonstration at the end of 
     34the page. Note also, however, the related widget :ref:`Confusion Matrix`. 
     35Although many things can be done with any of them, there are tasks for which 
     36one of them might be much more convenient than the other. 
    3037 
    3138.. image:: images/Predictions.png 
    3239 
    33 The widget can show class predictions (:obj:`Show predicted class`) and predicted probabilities for the selected classes (:obj:`Show predicted probabilities`, the classes are selected below). 
     40The widget can show class predictions (:obj:`Show predicted class`) and 
     41predicted probabilities for the selected classes 
     42(:obj:`Show predicted probabilities`, the classes are selected below). 
    3443 
    35 By default, the widget also shows the attributes. This can be disabled by checking :obj:`Hide all` under :obj:`Data Attributes`. 
     44By default, the widget also shows the attributes. This can be disabled by 
     45checking :obj:`Hide all` under :obj:`Data Attributes`. 
    3646 
    37 The output of the widget is another data table, where predictions are appended as new meta attributes. The table is output either automatically (:obj:`Send automatically`) or upon clicking :obj:`Send Predictions`. 
     47The output of the widget is another data table, where predictions are 
     48appended as new meta attributes. The table is output either automatically 
     49(:obj:`Send automatically`) or upon clicking :obj:`Send Predictions`. 
    3850 
    3951 
     
    4557.. image:: images/Predictions-Schema.png 
    4658 
    47 First, compare the schema with the one for `Test Learners <TestLearners.htm>`_. Widgets representing learning algorithms, like `Naive Bayesian classifier <../Classify/NaiveBayes.htm>`_ or `Classification tree <ClassificationTree.htm>`_ provide two kinds of signals, one with a learning algorithm and one with a classifier, that is, a result of the learning algorithm when it is given some data. The learner is available always, while for outputting a classifier, the widget representing a learning algorithm needs to be given some data. 
     59First, compare the schema with the one for :ref:`Test Learners`. Widgets 
     60representing learning algorithms, like :ref:`Naive Bayes` or 
     61:ref:`Classification tree` provide two kinds of signals, one with a learning 
     62algorithm and one with a classifier, that is, a result of the learning 
     63algorithm when it is given some data. The learner is available always, while 
     64for outputting a classifier, the widget representing a learning algorithm needs 
     65to be given some data. 
    4866 
    49 Test Learners tests learning algorithms, hence it expects learning algorithms on the input. In the corresponding schema, we gave the Test Learners some data from the File widget and a few "learner widgets". Widget Predictions shows predictions of a classifier, hence it needs a classifier and data. To get a classifier from the learner widget, the widget needs to be given data. 
     67Test Learners tests learning algorithms, hence it expects learning algorithms 
     68on the input. In the corresponding schema, we gave the Test Learners some data 
     69from the File widget and a few "learner widgets". Widget Predictions shows 
     70predictions of a classifier, hence it needs a classifier and data. To get 
     71a classifier from the learner widget, the widget needs to be given data. 
    5072 
    51 This is hence what we do: we randomly split the data into two subsets. The larger, containing 70 % of data instances, is given to Naive Bayes and Classification tree, so they can produce the corresponding classifiers. The classifiers go into Predictions, among with the remaining 30 % of the data. Predictions shows how these examples are classified. 
     73This is hence what we do: we randomly split the data into two subsets. The 
     74larger, containing 70 % of data instances, is given to Naive Bayes and 
     75Classification tree, so they can produce the corresponding classifiers. 
     76The classifiers go into Predictions, among with the remaining 30 % of the 
     77data. Predictions shows how these examples are classified. 
    5278 
    53 The results of this procedure on the heart disease data are shown in the snapshot at beginning of the page. The last three columns are the actual class, and the predictions by the classification tree and naive Bayesian classifier. For the latter two we see the probability of class "1" (since this is the class chosen on the left-hand side of the widget) and the predicted class. 
     79The results of this procedure on the heart disease data are shown in the 
     80snapshot at beginning of the page. The last three columns are the actual 
     81class, and the predictions by the classification tree and naive Bayesian 
     82classifier. For the latter two we see the probability of class "1" (since 
     83this is the class chosen on the left-hand side of the widget) and the predicted 
     84class. 
    5485 
    55 The schema also shows a few things we can do with the data from the widget. First, we can observe it in a `Data Table <../Data/DataTable.htm>`_. It shows a similar view to the one in Predictions, except that the probabilities are shown as separate attributes, so we can sort the examples by them and so on. 
     86The schema also shows a few things we can do with the data from the widget. 
     87First, we can observe it in a :ref:`Data Table`. It shows a similar view to 
     88the one in Predictions, except that the probabilities are shown as separate 
     89attributes, so we can sort the examples by them and so on. 
    5690 
    57 To save the predictions, we simply attach the `Save <../Data/Save.htm>`_ widget to Predictions. 
     91To save the predictions, we simply attach the :ref:`Save` widget to Predictions. 
    5892 
    59 Finally, we can analyze the classifier's predictions. For instance, we want to observe the relations between probabilities predicted by the two classifiers with respect to the class. For that, we first take `Select Attributes <../Data/SelectAttributes.htm>`_ with which we move the meta attributes with probability predictions to ordinary attributes. The transformed data is then given to the `Scatter plot <../Visualize/ScatterPlot.htm>`_, which we set to use the attributes with probabilities for the x and y axes, and the class is (as already by default) used to color the data points. 
     93Finally, we can analyze the classifier's predictions. For instance, we want 
     94to observe the relations between probabilities predicted by the two classifiers 
     95with respect to the class. For that, we first take :ref:`Select Attributes` 
     96with which we move the meta attributes with probability predictions to ordinary 
     97attributes. The transformed data is then given to the :ref:`Scatter plot`, 
     98which we set to use the attributes with probabilities for the x and y axes, and 
     99the class is (as already by default) used to color the data points. 
    60100 
    61101.. image:: images/Predictions-ExampleScatterplot.png 
    62102 
    63 To get the above plot, we added 5% jittering to continuous attributes, since the classification tree gives just a few distinct probabilities, hence without jittering there would be too much overlap between the points. 
     103To get the above plot, we added 5% jittering to continuous attributes, since 
     104the classification tree gives just a few distinct probabilities, hence without 
     105jittering there would be too much overlap between the points. 
    64106 
    65 The blue points at the bottom left represent the people with no diameter narrowing, which were correctly classified by both classifiers. The upper left red points represent the patients with narrowed vessels, which were correctly classified by both. At the top left there are a few blue points: these are those without narrowed vessels to whom the tree gave a high probability of having the disease, while Bayesian classifier was right by predicting a low probability of the disease. In the opposite corner, we can spot red points, that is, the sick, to which the tree gave a low probability, while the naive Bayesian classifier was (again) right by assigning a high probability of having the disease. 
     107The blue points at the bottom left represent the people with no diameter 
     108narrowing, which were correctly classified by both classifiers. The upper left 
     109red points represent the patients with narrowed vessels, which were correctly 
     110classified by both. At the top left there are a few blue points: these are 
     111those without narrowed vessels to whom the tree gave a high probability of 
     112having the disease, while Bayesian classifier was right by predicting a low 
     113probability of the disease. In the opposite corner, we can spot red points, 
     114that is, the sick, to which the tree gave a low probability, while the naive 
     115Bayesian classifier was (again) right by assigning a high probability of having 
     116the disease. 
    66117 
    67 Note that this analysis is done on a rather small sample, so these conclusions may be ungrounded. 
     118Note that this analysis is done on a rather small sample, so these 
     119conclusions may be ungrounded. 
    68120 
    69 Another example of using this widget is given in the documentation for widget `Confusion Matrix <ConfusionMatrix.htm>`_. 
     121Another example of using this widget is given in the documentation for 
     122widget :ref:`Confusion Matrix`. 
  • docs/widgets/rst/evaluate/rocanalysis.rst

    r11050 r11359  
    2525----------- 
    2626 
    27 The widget show ROC curves for the tested models and the corresponding convex hull. Given the costs of false positives and false negatives, it can also determine the optimal classifier and threshold. 
     27The widget show ROC curves for the tested models and the corresponding convex 
     28hull. Given the costs of false positives and false negatives, it can also 
     29determine the optimal classifier and threshold. 
    2830 
    2931.. image:: images/ROCAnalysis.png 
    3032 
    31 Option :obj:`Target class` chooses the positive class. In case there are more than two classes, the widget considers all other classes as a single, negative class. 
     33Option :obj:`Target class` chooses the positive class. In case there are 
     34more than two classes, the widget considers all other classes as a single, 
     35negative class. 
    3236 
    33 If the test results contain more than one classifier, the user can choose which curves she or he wants to see plotted. 
     37If the test results contain more than one classifier, the user can choose 
     38which curves she or he wants to see plotted. 
    3439 
    3540.. image:: images/ROCAnalysis-Convex.png 
    3641 
    37 Option :obj:`Show convex curves` refers to convex curves over each individual classifier (the thin lines on the cutout on the left). :obj:`Show convex hull` plots a convex hull over ROC curves for all classifiers (the thick yellow line). Plotting both types of convex curves them makes sense since selecting a threshold in a concave part of the curve cannot yield optimal results, disregarding the cost matrix. Besides, it is possible to reach any point on the convex curve by combining the classifiers represented by the points at the border of the concave region. 
     42Option :obj:`Show convex curves` refers to convex curves over each individual 
     43classifier (the thin lines on the cutout on the left). :obj:`Show convex hull` 
     44plots a convex hull over ROC curves for all classifiers (the thick yellow 
     45line). Plotting both types of convex curves them makes sense since selecting a 
     46threshold in a concave part of the curve cannot yield optimal results, 
     47disregarding the cost matrix. Besides, it is possible to reach any point 
     48on the convex curve by combining the classifiers represented by the points 
     49at the border of the concave region. 
    3850 
    3951The diagonal line represents the behaviour of a random classifier. 
    4052 
    41 When the data comes from multiple iterations of training and testing, such as k-fold cross validation, the results can be (and usually are) averaged. The averaging options are: 
     53When the data comes from multiple iterations of training and testing, such 
     54as k-fold cross validation, the results can be (and usually are) averaged. 
     55The averaging options are: 
    4256 
    43    - :obj:`Merge (expected ROC perf.)` treats all the test data as if it came from a single iteration 
    44    - :obj:`Vertical` averages the curves vertically, showing the corresponding confidence intervals 
    45    - :obj:`Threshold` traverses over threshold, averages the curves positions at them and shows horizontal and vertical confidence intervals 
     57   - :obj:`Merge (expected ROC perf.)` treats all the test data as if it 
     58     came from a single iteration 
     59   - :obj:`Vertical` averages the curves vertically, showing the corresponding 
     60     confidence intervals 
     61   - :obj:`Threshold` traverses over threshold, averages the curves positions 
     62     at them and shows horizontal and vertical confidence intervals 
    4663   - :obj:`None` does not average but prints all the curves instead 
    4764 
     
    5673.. image:: images/ROCAnalysis-Analysis.png 
    5774 
    58 The second sheet of settings is dedicated to analysis of the curve. The user can specify the cost of false positives and false negatives, and the prior target class probability. :obj:`Compute from Data` sets it to the proportion of examples of this class in the data. 
     75The second sheet of settings is dedicated to analysis of the curve. The user 
     76can specify the cost of false positives and false negatives, and the prior 
     77target class probability. :obj:`Compute from Data` sets it to the proportion 
     78of examples of this class in the data. 
    5979 
    60 Iso-performance line is a line in the ROC space such that all points on the line give the same profit/loss. The line to the upper left are better those down and right. The direction of the line depends upon the above costs and probabilities. Put together, this gives a recipe for depicting the optimal threshold for the given costs: it is the point where the tangent with the given inclination touches the curve. If we go higher or more to the left, the points on the isoperformance line cannot be reached by the learner. Going down or to the right, decreases the performance. 
     80Iso-performance line is a line in the ROC space such that all points on the 
     81line give the same profit/loss. The line to the upper left are better those 
     82down and right. The direction of the line depends upon the above costs and 
     83probabilities. Put together, this gives a recipe for depicting the optimal 
     84threshold for the given costs: it is the point where the tangent with the 
     85given inclination touches the curve. If we go higher or more to the left, 
     86the points on the isoperformance line cannot be reached by the learner. 
     87Going down or to the right, decreases the performance. 
    6188 
    62 The widget can show the performance line, which changes as the user changes the parameters. The points where the line touches any of the curves - that is, the optimal point for any of the given classifiers - is also marked and the corresponding threshold (the needed probability of the target class for the example to be classified into that class) is shown besides. 
     89The widget can show the performance line, which changes as the user 
     90changes the parameters. The points where the line touches any of the 
     91curves - that is, the optimal point for any of the given classifiers - 
     92is also marked and the corresponding threshold (the needed probability 
     93of the target class for the example to be classified into that class) is 
     94shown besides. 
    6395 
    64 The widget allows setting costs from 1 to 1000. The units are not important, as are not the magnitudes. What matters is the relation between the two costs, so setting them to 100 and 200 will give the same result as 400 and 800. 
     96The widget allows setting costs from 1 to 1000. The units are not important, 
     97as are not the magnitudes. What matters is the relation between the two costs, 
     98so setting them to 100 and 200 will give the same result as 400 and 800. 
    6599 
    66100.. image:: images/ROCAnalysis-Performance2.png 
    67101 
    68 Defaults: both costs equal (500), Prior target class probability 44% (from the data) 
     102Defaults: both costs equal (500), Prior target class probability 44% 
     103(from the data) 
    69104 
    70105.. image:: images/ROCAnalysis-Performance1.png 
    71106 
    72 False positive cost: 838, False negative cost 650, Prior target class probability 73% 
     107False positive cost: 838, False negative cost 650, Prior target class 
     108probability 73% 
    73109 
    74 :obj:`Default threshold (0.5) point` shows the point on the ROC curve achieved by the classifier if it predicts the target class if its probability equals or exceeds 0.5. 
     110:obj:`Default threshold (0.5) point` shows the point on the ROC curve 
     111achieved by the classifier if it predicts the target class if its probability 
     112equals or exceeds 0.5. 
    75113 
    76114Example 
    77115------- 
    78116 
    79 At the moment, the only widget which give the right type of the signal needed by ROC Analysis is `Test Learners <TestLearners.htm>`_. The ROC Analysis will hence always follow Test Learners and, since it has no outputs, no other widgets follow it. Here is a typical example. 
     117At the moment, the only widget which give the right type of the signal 
     118needed by ROC Analysis is :ref:`Test Learners`. The ROC Analysis will hence 
     119always follow Test Learners and, since it has no outputs, no other widgets 
     120follow it. Here is a typical example. 
    80121 
    81122.. image:: images/ROCLiftCalibration-Schema.png 
  • docs/widgets/rst/evaluate/testlearners.rst

    r11050 r11359  
    2727----------- 
    2828 
    29 The widget tests learning algorithms on data. Different sampling schemes are available, including using a separate test data. The widget does two things. First, it shows a table with different performance measures of the classifiers, such as classification accuracy and area under ROC. Second, it outputs a signal with data which can be used by other widgets for analyzing the performance of classifiers, such as `ROC Analysis <ROCAnalysis.htm>`_ or `Confusion Matrix <ConfusionMatrix.htm>`_. 
     29The widget tests learning algorithms on data. Different sampling schemes are 
     30available, including using a separate test data. The widget does two things. 
     31First, it shows a table with different performance measures of the classifiers, 
     32such as classification accuracy and area under ROC. Second, it outputs a signal 
     33with data which can be used by other widgets for analyzing the performance of 
     34classifiers, such as :ref:`ROC Analysis` or :ref:`Confusion Matrix`. 
    3035 
    31 The signal Learner has a not very common property that it can be connected to more than one widget, which provide multiple learners to be tested with the same procedures. If the results of evaluation or fed into further widgets, such as the one for ROC analysis, the learning algorithms are analyzed together. 
     36The signal Learner has a not very common property that it can be connected to 
     37more than one widget, which provide multiple learners to be tested with the 
     38same procedures. If the results of evaluation or fed into further widgets, 
     39such as the one for ROC analysis, the learning algorithms are analyzed together. 
    3240 
    3341.. image:: images/TestLearners.png 
    3442 
    35 The widget supports various sampling methods. :obj:`Cross-validation` splits the data into the given number of folds (usually 5 or 10). The algorithm is tested by holding out the examples from one fold at a time; the model is induced from the other folds and the examples from the held out fold are classified. :obj:`Leave-one-out` is similar, but it holds out one example at a time, inducing the model from all others and then classifying the held out. This method is obviously very stable and reliable ... and very slow. :obj:`Random sampling` randomly splits the data onto the training and testing set in the given proportion (e.g. 70:30); the whole procedure is repeated for the specified number of times. :obj:`Test on train data` uses the whole data set for training and then for testing. This method practically always gives overly optimistic results. 
     43The widget supports various sampling methods. :obj:`Cross-validation` splits 
     44the data into the given number of folds (usually 5 or 10). The algorithm is 
     45tested by holding out the examples from one fold at a time; the model is 
     46induced from the other folds and the examples from the held out fold are 
     47classified. :obj:`Leave-one-out` is similar, but it holds out one example 
     48at a time, inducing the model from all others and then classifying the held 
     49out. This method is obviously very stable and reliable ... and very slow. 
     50:obj:`Random sampling` randomly splits the data onto the training and 
     51testing set in the given proportion (e.g. 70:30); the whole procedure is t 
     52repeated for the specified number of times. :obj:`Test on train data` uses the 
     53whole data set for training and then for testing. This method practically 
     54always gives overly optimistic results. 
    3655 
    37 The above methods use the data from signal Data only. To give another data set with testing examples (for instance from another file or some data selected in another widget), we put it on the input signal Separate Test Data and select :obj:`Test on test data`. 
     56The above methods use the data from signal Data only. To give another data 
     57set with testing examples (for instance from another file or some data selected 
     58in another widget), we put it on the input signal Separate Test Data and select 
     59:obj:`Test on test data`. 
    3860 
    39 Any changes in the above settings are applied immediately if :obj:`Applied on any change` is checked. If not, the user will have to press :obj:`Apply` to apply any changes. 
     61Any changes in the above settings are applied immediately if 
     62:obj:`Applied on any change` is checked. If not, the user will have to press 
     63:obj:`Apply` to apply any changes. 
    4064 
    4165The widget can compute a number of performance statistics. 
    4266 
    43    - :obj:`Classification accuracy` is the proportion of correctly classified examples 
    44    - :obj:`Sensitivity` (also called true positive rate (TPR), hit rate and recall) is the number of detected positive examples among all positive examples, e.g. the proportion of sick people correctly diagnosed as sick 
    45    - :obj:`Specificity` is the proportion of detected negative examples among all negative examples, e.g. the proportion of healthy correctly recognized as healthy 
     67   - :obj:`Classification accuracy` is the proportion of correctly classified 
     68     examples 
     69   - :obj:`Sensitivity` (also called true positive rate (TPR), hit rate and 
     70     recall) is the number of detected positive examples among all positive 
     71     examples, e.g. the proportion of sick people correctly diagnosed as sick 
     72   - :obj:`Specificity` is the proportion of detected negative examples among 
     73     all negative examples, e.g. the proportion of healthy correctly recognized 
     74     as healthy 
    4675   - :obj:`Area under ROC` is the area under receiver-operating curve 
    47    - :obj:`Information score` is the average amount of information per classified instance, as defined by Kononenko and Bratko 
    48    - :obj:`F-measure` is a weighted harmonic mean of precision and recall (see below), 2*precision*recall/(precision+recall) 
    49    - :obj:`Precision` is the number of positive examples among all examples classified as positive, e.g. the number of sick among all diagnosed as sick, or a number of relevant documents among all retrieved documents 
    50    - :obj:`Recall` is the same measure as sensitivity, except that the latter term is more common in medicine and recall comes from text mining, where it means the proportion of relevant documents which are retrieved 
    51    - :obj:`Brier score` measure the accuracy of probability assessments, which measures the average deviation between the predicted probabilities of events and the actual events. 
     76   - :obj:`Information score` is the average amount of information per 
     77     classified instance, as defined by Kononenko and Bratko 
     78   - :obj:`F-measure` is a weighted harmonic mean of precision and recall 
     79     (see below), 2*precision*recall/(precision+recall) 
     80   - :obj:`Precision` is the number of positive examples among all examples 
     81     classified as positive, e.g. the number of sick among all diagnosed as 
     82     sick, or a number of relevant documents among all retrieved documents 
     83   - :obj:`Recall` is the same measure as sensitivity, except that the latter 
     84     term is more common in medicine and recall comes from text mining, where 
     85     it means the proportion of relevant documents which are retrieved 
     86   - :obj:`Brier score` measure the accuracy of probability assessments, which 
     87     measures the average deviation between the predicted probabilities of 
     88     events and the actual events. 
    5289 
    5390 
    54 More comprehensive descriptions of measures can be found at `http://en.wikipedia.org/wiki/Receiver_operating_characteristic <http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_ (from classification accuracy to area under ROC), 
    55 `http://www.springerlink.com/content/j21p620rw33xw773/ <http://www.springerlink.com/content/j21p620rw33xw773/>`_ (information score), `http://en.wikipedia.org/wiki/F-measure#Performance_measures <http://en.wikipedia.org/wiki/F-measure#Performance_measures>`_ 
    56 (from F-measure to recall) and `http://en.wikipedia.org/wiki/Brier_score <http://en.wikipedia.org/wiki/Brier_score>`_ (Brier score). 
     91More comprehensive descriptions of measures can be found at 
     92`http://en.wikipedia.org/wiki/Receiver_operating_characteristic 
     93<http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_ 
     94(from classification accuracy to area under ROC), 
     95`http://www.springerlink.com/content/j21p620rw33xw773/ 
     96<http://www.springerlink.com/content/j21p620rw33xw773/>`_ (information score), 
     97`http://en.wikipedia.org/wiki/F-measure#Performance_measures 
     98<http://en.wikipedia.org/wiki/F-measure#Performance_measures>`_ 
     99(from F-measure to recall) and 
     100`http://en.wikipedia.org/wiki/Brier_score 
     101<http://en.wikipedia.org/wiki/Brier_score>`_ (Brier score). 
    57102 
    58 Most measure require a target class, e.g. having the disease or being relevant. The target class can be selected at the bottom of the widget. 
     103Most measure require a target class, e.g. having the disease or being relevant. 
     104The target class can be selected at the bottom of the widget. 
    59105 
    60106Example 
    61107------- 
    62108 
    63 In a typical use of the widget, we give it a data set and a few learning algorithms, and we observe their performance in the table inside the Test Learners widgets and in the ROC and Lift Curve widgets attached to the Test Learners. The data is often preprocessed before testing; in this case we discretized it and did some manual feature selection; not that this is done outside the cross-validation loop, so the testing results may be overly optimistic. 
     109In a typical use of the widget, we give it a data set and a few learning 
     110algorithms, and we observe their performance in the table inside the Test 
     111Learners widgets and in the ROC and Lift Curve widgets attached to the Test 
     112Learners. The data is often preprocessed before testing; in this case we 
     113discretized it and did some manual feature selection; not that this is done 
     114outside the cross-validation loop, so the testing results may be overly 
     115optimistic. 
    64116 
    65117.. image:: images/TestLearners-Schema.png 
    66118 
    67 Another example of using this widget is given in the documentation for widget `Confusion Matrix <ConfusionMatrix.htm>`_. 
     119Another example of using this widget is given in the documentation for 
     120widget :ref:`Confusion Matrix`. 
  • docs/widgets/rst/regression/earth.rst

    r11050 r11359  
    1515Outputs: 
    1616   - Learner 
    17         The Earth learning algorithm with parameters as specified in the dialog. 
    18          
    19    - Predictor  
     17        The Earth learning algorithm with parameters as specified in the 
     18         dialog. 
     19 
     20   - Predictor 
    2021        Trained regressor 
    21          
    22 Signal ``Predictor`` sends the regressor only if signal ``Data`` is present.   
     22 
     23Signal ``Predictor`` sends the regressor only if signal ``Data`` is present. 
    2324 
    2425Description 
     
    2627 
    2728This widget constructs a Earth learning algorithm (an implementation of 
    28 the `MARS - Multivariate Adaptive Regression Splines`_). As all widgets  
    29 for classification and regression, this widget provides a learner and  
     29the `MARS - Multivariate Adaptive Regression Splines`_). As all widgets 
     30for classification and regression, this widget provides a learner and 
    3031classifier/regressor on the output. Learner is a learning algorithm with 
    31 settings as specified by the user. It can be fed into widgets for testing  
     32settings as specified by the user. It can be fed into widgets for testing 
    3233learners, for instance Test Learners. 
    3334 
     
    3940.. rst-class:: stamp-list 
    4041 
    41     1. Learner/Predictor can be given a name under which it will appear  
     42    1. Learner/Predictor can be given a name under which it will appear 
    4243       in other widgets (say ``Test Learners`` or ``Predictions``). 
    43      
    44     2. The ``Max. term degree`` parameter specifies the degree of the  
    45        terms induced in the forward pass. For instance, if set to ``1``  
     44 
     45    2. The ``Max. term degree`` parameter specifies the degree of the 
     46       terms induced in the forward pass. For instance, if set to ``1`` 
    4647       the resulting model will contain only linear terms. 
    47         
    48     3. The ``Max. terms`` specifies how many terms can be induces in the  
    49        forward pass. A special value ``Automatic`` instructs the learner  
     48 
     49    3. The ``Max. terms`` specifies how many terms can be induces in the 
     50       forward pass. A special value ``Automatic`` instructs the learner 
    5051       to set the limit automatically based on the dimensionality of the 
    5152       data (``min(200, max(20, 2 * NumberOfAttributes)) + 1``) 
    52         
    53     4. The ``Knot penalty`` is used in the pruning pass (hinge function   
     53 
     54    4. The ``Knot penalty`` is used in the pruning pass (hinge function 
    5455       penalty for the GCV calculation) 
    5556 
  • docs/widgets/rst/regression/linear.rst

    r11050 r11359  
    1515    - Data (Table) 
    1616        Input data table 
    17          
     17 
    1818Output 
    1919    - Learner 
    2020        The learning algorithm with the supplied parameters 
    21          
     21 
    2222    - Predictor 
    2323        Trained regressor 
    24          
     24 
    2525    - Model  Statisics 
    2626        A data table containing trained model statistics 
    27          
    28          
    29 Signal ``Predictor`` and ``Model Statistics`` send the output  
     27 
     28 
     29Signal ``Predictor`` and ``Model Statistics`` send the output 
    3030signal only if input signal ``Data`` is present. 
    3131 
     
    3939.. image:: images/LinearRegression.png 
    4040    :alt: Linear Regression interface 
    41      
     41 
    4242.. rst-class:: stamp-list 
    4343 
    4444    1. The learner/predictor name 
    4545    2. Train an ordinary least squares or ridge regression model 
    46     3. If ``Ridge lambda`` is checked the learner will build a ridge regression model 
    47        with 4 as the ``lambda`` parameter. 
     46    3. If ``Ridge lambda`` is checked the learner will build a ridge regression 
     47       model with 4 as the ``lambda`` parameter. 
    4848    4. Ridge lambda parameter. 
    4949    5. Use `Lasso`_ regularization. 
    5050    6. The Lasso bound (bound on the beta vector L1 norm) 
    51     7. Tolerance (any beta value lower then this will be forced to 0)  
    52      
     51    7. Tolerance (any beta value lower then this will be forced to 0) 
     52 
    5353.. _`Lasso`: http://en.wikipedia.org/wiki/Least_squares#LASSO_method 
    5454 
  • docs/widgets/rst/regression/mean.rst

    r10403 r11359  
    66.. image:: ../../../../Orange/OrangeWidgets/icons/Unknown.png 
    77   :alt: Mean Learner 
    8     
     8 
    99Channels 
    1010-------- 
     
    1616   - Learner 
    1717        The Mean learning algorithm. 
    18          
    19    - Predictor  
     18 
     19   - Predictor 
    2020        Trained regressor 
    21          
     21 
    2222Signal ``Predictor`` sends the regressor only if signal ``Data`` is present. 
    2323 
     
    3232.. image:: images/Mean.png 
    3333    :alt: Mean widget interface 
    34   
     34 
    3535 
    3636.. rst-class:: stamp-list 
    3737 
    3838    1. Learner/predictor name. 
    39      
    40     2. ``Apply`` button sends the learner (and predictor if input  
     39 
     40    2. ``Apply`` button sends the learner (and predictor if input 
    4141       signal ``Data`` is present). 
    42            
  • docs/widgets/rst/regression/pade.rst

    r11050 r11359  
    66.. image:: ../icons/Pade.png 
    77 
    8 Replaces a continuous class with a derivative or a MQC by one or more continuous attributes. 
     8Replaces a continuous class with a derivative or a MQC by one or more 
     9continuous attributes. 
    910 
    1011Signals 
     
    2425----------- 
    2526 
    26 This widget implements several techniques for assessing partial derivatives of the class variable for the given set of examples. The derivative is appended to the example table as a new class attribute. The widget can compute either quantitative derivative by a chosen continuous attribute or a qualitative derivative by one or more attributes. 
     27This widget implements several techniques for assessing partial derivatives 
     28of the class variable for the given set of examples. The derivative is appended 
     29to the example table as a new class attribute. The widget can compute either 
     30quantitative derivative by a chosen continuous attribute or a qualitative 
     31derivative by one or more attributes. 
    2732 
    28 The widget is implemented to cache some data. After, for instance, computing the derivatives by :code:`x`and :code:`y` separately, the widget has already stored all the data to produce the derivatives by both in a moment. 
     33The widget is implemented to cache some data. After, for instance, computing 
     34the derivatives by :code:`x` and :code:`y` separately, the widget has already 
     35stored all the data to produce the derivatives by both in a moment. 
    2936 
    3037.. image:: images/Pade.png 
    3138 
    32 The :obj:`Attributes` box lists all continuous attributes and lets the user select the attribute by which she wants to compute the qualitative derivative. The selection is important only when the widget actually outputs a qualitative derivative (this depends on other settings, described below). Buttons :obj:`All` and :obj:`None` select the entire list and nothing. 
     39The :obj:`Attributes` box lists all continuous attributes and lets the user 
     40select the attribute by which she wants to compute the qualitative derivative. 
     41The selection is important only when the widget actually outputs a qualitative 
     42derivative (this depends on other settings, described below). Buttons 
     43:obj:`All` and :obj:`None` select the entire list and nothing. 
    3344 
    34 Derivatives by more than one attribute are mathematically questionable, and computing by many attributes can be slow and messy. Methods that are based on triangulation will include all attributes in the triangulation, regardless of the selection, but then compute only the selected derivatives. 
     45Derivatives by more than one attribute are mathematically questionable, and 
     46computing by many attributes can be slow and messy. Methods that are based on 
     47triangulation will include all attributes in the triangulation, regardless of 
     48the selection, but then compute only the selected derivatives. 
    3549 
    36 Box :obj:`Method` determines the used method and its settings. Available methods are :obj:`First triangle`, :obj:`Star Regression`, :obj:`Univariate Star Regression` and :obj:`Tube Regression`. First triangle is unsuitable for data with non-negligible noise. Star regression seems to perform rather poor; the quantitative derivatives it computes are even theoretically wrong. Univariate Star Regression will handle noise well, but also work well for very complex functions (like sin(x)sin(y) across several periods). Tube regression is very noise resistant, which can lead it to oversimplify the model, yet it is the only method that does not use the triangulation and is thus capable of handling discrete attributes, unknown values and large number of dimensions. It may be slow when the number of examples is very large. Detailed description of these methods can be found in Zabkar and Demsar's papers. 
     50Box :obj:`Method` determines the used method and its settings. Available 
     51methods are :obj:`First triangle`, :obj:`Star Regression`, 
     52:obj:`Univariate Star Regression` and :obj:`Tube Regression`. First triangle is 
     53unsuitable for data with non-negligible noise. Star regression seems to perform 
     54rather poor; the quantitative derivatives it computes are even theoretically 
     55wrong. Univariate Star Regression will handle noise well, but also work well 
     56for very complex functions (like sin(x)sin(y) across several periods). Tube 
     57regression is very noise resistant, which can lead it to oversimplify the 
     58model, yet it is the only method that does not use the triangulation and is 
     59thus capable of handling discrete attributes, unknown values and large number 
     60of dimensions. It may be slow when the number of examples is very large. 
     61Detailed description of these methods can be found in Zabkar and Demsar's 
     62papers. 
    3763 
    38 :obj:`Ignore differences below` lets the user set a threshold for qualitative derivatives. 
     64:obj:`Ignore differences below` lets the user set a threshold for qualitative 
     65derivatives. 
    3966 
    40 The widget can also put some data in meta attributes: the :obj:`Qualitative constraint`, as described above, :obj:`Derivatives of selected attributes` and the :obj:`Original class attribute`. 
     67The widget can also put some data in meta attributes: the 
     68:obj:`Qualitative constraint`, as described above, 
     69:obj:`Derivatives of selected attributes` and the 
     70:obj:`Original class attribute`. 
    4171 
    42 The changes take effect and the widget start processing when :obj:`Apply` is hit. 
     72The changes take effect and the widget start processing when :obj:`Apply` 
     73is hit. 
  • docs/widgets/rst/regression/regressiontree.rst

    r11050 r11359  
    1818Outputs: 
    1919   - Learner 
    20       The classification tree learning algorithm with settings as specified in the dialog. 
     20      The classification tree learning algorithm with settings as specified in 
     21      the dialog. 
    2122   - Regression Tree 
    2223      Trained classifier (a subtype of Classifier) 
    2324 
    2425 
    25 Signal :code:`Regression Tree` sends data only if the learning data (signal :code:`Examples` is present. 
     26Signal :code:`Regression Tree` sends data only if the learning data (signal 
     27:code:`Examples`) is present. 
    2628 
    2729Description 
    2830----------- 
    2931 
    30 This widget constructs a regression tree learning algorithm. As all widgets for classification and regression, this widget provides a learner and classifier/regressor on the output. Learner is a learning algorithm with settings as specified by the user. It can be fed into widgets for testing learners, for instance :code:`Test Learners`. 
     32This widget constructs a regression tree learning algorithm. As all widgets 
     33for classification and regression, this widget provides a learner and 
     34classifier/regressor on the output. Learner is a learning algorithm with 
     35settings as specified by the user. It can be fed into widgets for testing 
     36learners, for instance :ref:`Test Learners`. 
    3137 
    3238.. image:: images/RegressionTree.png 
    3339   :alt: Regression Tree Widget 
    3440 
    35 Learner can be given a name under which it will appear in, say, :code:`Test Learners`. The default name is "Regression Tree". 
     41Learner can be given a name under which it will appear in, say, 
     42:ref:`Test Learners`. The default name is "Regression Tree". 
    3643 
    37 If :code:`Binarization` is checked, the values of multivalued attributes are split into two groups (based on the statistics in the particular node) to yield a binary tree. Binarization gets rid of the usual measures' bias towards attributes with more values and is generally recommended. 
     44If :code:`Binarization` is checked, the values of multivalued attributes 
     45are split into two groups (based on the statistics in the particular node) 
     46to yield a binary tree. Binarization gets rid of the usual measures' bias 
     47towards attributes with more values and is generally recommended. 
    3848 
    39 The widget can be instructed to prune the tree during induction by setting :obj:`Do not split nodes with less instances than`. For pruning after induction, there is pruning with m-estimate of error. 
     49The widget can be instructed to prune the tree during induction by setting 
     50:obj:`Do not split nodes with less instances than`. For pruning after 
     51induction, there is pruning with m-estimate of error. 
    4052 
    41 After changing one or more settings, you need to push :obj:`Apply`, which will put the new learner on the output and, if the training examples are given, construct a new classifier and output it as well. 
     53After changing one or more settings, you need to push :obj:`Apply`, which will 
     54put the new learner on the output and, if the training examples are given, 
     55construct a new classifier and output it as well. 
    4256 
    4357Examples 
    4458-------- 
    4559 
    46 There are two typical uses of this widget. First, you may want to induce the model and check what it looks like with the schema below. 
     60There are two typical uses of this widget. First, you may want to induce 
     61the model and check what it looks like with the schema below. 
    4762 
    4863.. image:: images/RegressionTree-Schema.png 
  • docs/widgets/rst/regression/regressiontreegraph.rst

    r11050 r11359  
    2424----------- 
    2525 
    26 This widget shows a regression tree. It is quite similar to the one for classification tree, so please see the documentation for `Classification Tree Graph <../Classify/ClassificationTreeGraph.htm>`_. 
     26This widget shows a regression tree. It is quite similar to the one for 
     27classification tree, so please see the documentation for 
     28:ref:`Classification Tree Graph`. 
  • docs/widgets/rst/unsupervized/attributedistance.rst

    r11050 r11359  
    2424----------- 
    2525 
    26 Widget Attribute Distances computes the distances between the attributes in the data sets. Don't confuse it with a similar widget for computing the distances between examples. 
     26Widget Attribute Distances computes the distances between the attributes in 
     27the data sets. Don't confuse it with a similar widget for computing the 
     28distances between examples. 
    2729 
    2830.. image:: images/AttributeDistance.png 
    2931   :alt: Association Rules Widget 
    3032 
    31 Since the widget cannot compute distances between discrete and continuous attributes, all attributes are first either discretized, by splitting the attribute into four quartiles, or "continuized" by treating any discrete attributes as ordinal with values equivalent to 0, 1, 2, 3... For other, possibly better methods of discretization/continuization, see widgets `Discretize <../Data/Discretize.htm>`_ and `Continuize <../Data/Continuize.htm>`_. 
     33Since the widget cannot compute distances between discrete and continuous 
     34attributes, all attributes are first either discretized, by splitting the 
     35attribute into four quartiles, or "continuized" by treating any discrete 
     36attributes as ordinal with values equivalent to 0, 1, 2, 3... For other, 
     37possibly better methods of discretization/continuization, see widgets 
     38:ref:`Discretize` and :ref:`Continuize`. 
    3239 
    3340The two kinds of attributes then have different measures of distance. 
    3441 
    35 For discrete attributes, the distance can be computed as :obj:`Pearson's chi-square`, where the more the two attributes are dependent, the closer they are. The measure actually returns the p-value of the common chi-square test of independence. The other two measures are as defined by Aleks Jakulin in his work on `attribute interactions <http://stat.columbia.edu/~jakulin/Int/>`_: :obj:`2-way interaction` is defined as I(A;B)/H(A,B) and :obj:`3-way interaction` is I(A;B;C), respectively. 
     42For discrete attributes, the distance can be computed as 
     43:obj:`Pearson's chi-square`, where the more the two attributes are dependent, 
     44the closer they are. The measure actually returns the p-value of the common 
     45chi-square test of independence. The other two measures are as defined by 
     46Aleks Jakulin in his work on `attribute interactions 
     47<http://stat.columbia.edu/~jakulin/Int/>`_: :obj:`2-way interaction` is 
     48defined as I(A;B)/H(A,B) and :obj:`3-way interaction` is I(A;B;C), 
     49respectively. 
    3650 
    3751 
     
    3953-------- 
    4054 
    41 This widget is an intermediate widget: it shows no user readable results and its output needs to be fed to a widget that can do something useful with the computed distances, for instance the `Distance Map <DistanceMap.htm>`_, `Hierarchical Clustering <HierarchicalClustering.htm>`_ to cluster the attributes, or `MDS <MDS.htm>`_ to visualize the distances between them. 
     55This widget is an intermediate widget: it shows no user readable results and 
     56its output needs to be fed to a widget that can do something useful with the 
     57computed distances, for instance the :ref:`Distance Map`, 
     58:ref:`Hierarchical Clustering` to cluster the attributes, or :ref:`MDS` to 
     59visualize the distances between them. 
    4260 
    4361.. image:: images/AttributeDistance-Schema.png 
  • docs/widgets/rst/unsupervized/distancefile.rst

    r11050 r11359  
    2626Loads a distance matrix from a text file. 
    2727 
    28 The first line in the file has to start with the number of items - the dimension of the matrix. It can be followed by the word labeled or labelled if the file also contains the labels for the items. The rest of the file is the matrix, where the elements are separated by tabulators. If the items are labelled, the label has to be put in front of each line. The matrix can be given with the lower or the upper part, or both. Here are two examples. 
     28The first line in the file has to start with the number of items - the 
     29dimension of the matrix. It can be followed by the word labeled or labelled 
     30if the file also contains the labels for the items. The rest of the file is 
     31the matrix, where the elements are separated by tabulators. If the items are 
     32labelled, the label has to be put in front of each line. The matrix can be 
     33given with the lower or the upper part, or both. Here are two examples. 
    2934 
    30 0.1 
    31 0.5    0.3 
    32 0.7    0.9    0.2 
    33 0.2    0.8    0.6    0.5 
     35:: 
    3436 
    35 john   0.1 
    36 joe    0.5    0.3 
    37 jack   0.7    0.9    0.2 
    38 jane   0.2    0.8    0.6    0.5 
     37   0.1 
     38   0.5    0.3 
     39   0.7    0.9    0.2 
     40   0.2    0.8    0.6    0.5 
     41 
     42:: 
     43 
     44   john   0.1 
     45   joe    0.5    0.3 
     46   jack   0.7    0.9    0.2 
     47   jane   0.2    0.8    0.6    0.5 
     48 
    3949 
    4050.. image:: images/DistanceFile.png 
    4151   :alt: Distance File Widget 
    4252 
    43 The file is selected using button :obj:`...`, which opens a file browser. File extension is arbitrary. 
     53The file is selected using button :obj:`...`, which opens a file browser. 
     54File extension is arbitrary. 
    4455 
    45 Sometimes we get the labels in a separate example table. Say that we have a set of 15 examples in a data file, and the distance matrix represents distances between these examples. In this case, we would connect the widget providing these 15 examples (say a `File widget <../Data/File.htm>`_) to the input of Distance File and select :obj:`Use examples as items` in Items from input data. Distance File would attach the examples to the distance matrix, so further widget can, for instance, use a user-selected attribute to label the items. 
     56Sometimes we get the labels in a separate example table. Say that we have a 
     57set of 15 examples in a data file, and the distance matrix represents distances 
     58between these examples. In this case, we would connect the widget providing 
     59these 15 examples (say a :ref:`File`) to the input of Distance File and 
     60select :obj:`Use examples as items` in Items from input data. 
     61Distance File would attach the examples to the distance matrix, so further 
     62widget can, for instance, use a user-selected attribute to label the items. 
    4663 
    47 In another scenario, the distance matrix would represent distances between attributes of some data table. We similarly connect a data providing widget to Distance File, but select :obj:`Use attribute names`. Distance File then labels the items with the names of the attributes. 
     64In another scenario, the distance matrix would represent distances between 
     65attributes of some data table. We similarly connect a data providing widget 
     66to Distance File, but select :obj:`Use attribute names`. Distance File then 
     67labels the items with the names of the attributes. 
    4868 
    4969Examples 
    5070-------- 
    5171 
    52 The first schema loads the labelled distance file above and shows it with the `Distance Map widget <DistanceMap.htm>`_. 
     72The first schema loads the labelled distance file above and shows it with 
     73the :ref:`Distance Map`. 
    5374 
    5475.. image:: images/DistanceFile1-Schema.png 
     
    5677.. image:: images/DistanceFile-DistanceMap1.png 
    5778 
    58 In the second schema the labels come from a file. The labels given in the file loaded by Distance File are replaced by the attribute names (since we checked :obj:`Use attribute names`. 
     79In the second schema the labels come from a file. The labels given in the file 
     80loaded by Distance File are replaced by the attribute names (since we checked 
     81:obj:`Use attribute names`. 
    5982 
    6083.. image:: images/DistanceFile2-Schema.png 
    6184 
    62 In the file widget we loaded the Iris data set, and the resulting distance map looks like this. 
     85In the file widget we loaded the Iris data set, and the resulting distance map 
     86looks like this. 
    6387 
    6488.. image:: images/DistanceFile-DistanceMap2.png 
    6589 
    66 The widget can of course be connected to any other widget that can do something useful with the distances, such as `MDS <MDS.htm>`_ or `Hierarchical Clustering <HiearchicalClustering.htm>`_. 
     90The widget can of course be connected to any other widget that can do something 
     91useful with the distances, such as :ref:`MDS` or 
     92:ref:`Hierarchical Clustering`. 
  • docs/widgets/rst/unsupervized/distancemap.rst

    r11050 r11359  
    2626----------- 
    2727 
    28 Distance Map is a visualization of distances between objects. The visualization is rather simple: it is the same as is we printed out a table of numbers, except that the numbers are replaced by spots colored by colors from the specified palette. 
     28Distance Map is a visualization of distances between objects. The visualization 
     29is rather simple: it is the same as is we printed out a table of numbers, 
     30except that the numbers are replaced by spots colored by colors from the 
     31specified palette. 
    2932 
    30 The distances are most often distances between examples (for instance from `Example Distance <ExampleDistance.htm>`_) or attributes (for instance from `Attribute Distance <AttributeDistance.htm>`_). The widget does not require that (another option can be loading the distances from a file using `Distance File <DistanceFile.htm>`_), although when one of these is the case, the user can select a region of the map and the widget will output the corresponding examples or attributes through the appropriate signal. 
     33The distances are most often distances between examples (for instance from 
     34:ref:`Example Distance` or attributes (for instance from 
     35:ref:`Attribute Distance`. The widget does not require that (another option 
     36can be loading the distances from a file using :ref:`Distance File`, although 
     37when one of these is the case, the user can select a region of the map and the 
     38widget will output the corresponding examples or attributes through the 
     39appropriate signal. 
    3140 
    3241.. image:: images/DistanceMap.png 
    3342 
    34 The snapshot shows distances between attributes in the heart disease data, using the preset Black - Red palette, where smaller numbers are represented with black and larger with red. The matrix is symmetric and the diagonal is black - no attribute is different from itself. The former (symmetricity) is always assumed, while the diagonal may also be non-zero. 
     43The snapshot shows distances between attributes in the heart disease data, 
     44using the preset Black - Red palette, where smaller numbers are represented 
     45with black and larger with red. The matrix is symmetric and the diagonal is 
     46black - no attribute is different from itself. The former (symmetricity) is 
     47always assumed, while the diagonal may also be non-zero. 
    3548 
    3649.. image:: images/DistanceMap-Settings.png 
    3750 
    38 The widget's settings are divided into three tabs. The first one defines the size and order of cells. :obj:`Width` and :obj:`Height` in the :obj:`Cell Size` box set the size of the cells. The cells can be restricted to squares (:obj:`Cells as squares`) and drawn with or without gridlines in between (:obj:`Show grid`). When cells are too small (8 pixels or less), the grid disappears in any case. 
     51The widget's settings are divided into three tabs. The first one defines the 
     52size and order of cells. :obj:`Width` and :obj:`Height` in the :obj:`Cell Size` 
     53box set the size of the cells. The cells can be restricted to squares 
     54(:obj:`Cells as squares`) and drawn with or without gridlines in between 
     55(:obj:`Show grid`). When cells are too small (8 pixels or less), the grid 
     56disappears in any case. 
    3957 
    40 :obj:`Merge` merges multiple cells into a single cell, which can be useful when the matrix is too large. For this option to yield meaningful results, items need to be sorted so similar items are merged. The widget has three options; it can leave the items as they are (:obj:`No sorting`) it can put similar items together (:obj:`Adjacent distance`) or randomly shuffle the items (:obj:`Random order`). Of these, adjacent distance is unfortunately not implemented yet. 
    41  
     58:obj:`Merge` merges multiple cells into a single cell, which can be useful 
     59when the matrix is too large. For this option to yield meaningful results, 
     60items need to be sorted so similar items are merged. The widget has three 
     61options; it can leave the items as they are (:obj:`No sorting`) it can put 
     62similar items together (:obj:`Adjacent distance`) or randomly shuffle the 
     63items (:obj:`Random order`). Of these, adjacent distance is unfortunately not 
     64implemented yet. 
    4265 
    4366 
    4467.. image:: images/DistanceMap-Colors.png 
    4568 
    46 The second tab defines the colors that represent the numeric values. :obj:`Gamma` defines how numbers are mapped onto the palette colors. When set to 1 (default), the mapping is linear. When it decreases, the numeric values at the lower end get similar colors, the curve get steeper in the middle, and higher values are again represented with colors which are more similar than if gamma was higher. The graph below shows the mapping function at gamma=0.25. 
     69The second tab defines the colors that represent the numeric values. 
     70:obj:`Gamma` defines how numbers are mapped onto the palette colors. 
     71When set to 1 (default), the mapping is linear. When it decreases, the numeric 
     72values at the lower end get similar colors, the curve get steeper in the 
     73middle, and higher values are again represented with colors which are more 
     74similar than if gamma was higher. The graph below shows the mapping function 
     75at gamma=0.25. 
    4776 
    4877.. image:: images/DistanceMap-gamma.png 
    4978 
    50 Setting gamma is useful when the distribution of distances has long tails which are not very interested. The widget also offers controls for cutting of the outliers. Normally, the color palette is used to visualize the entire range of distances appearing in the matrix. This can be changed be checking :obj:`Enable thresholds` and setting the low and high threshold. Distances outside this interval are then shown using special colors, so the color spectrum can be used for visualizing the interesting part of the distribution. 
     79Setting gamma is useful when the distribution of distances has long tails which 
     80are not very interested. The widget also offers controls for cutting of the 
     81outliers. Normally, the color palette is used to visualize the entire range 
     82of distances appearing in the matrix. This can be changed be checking 
     83:obj:`Enable thresholds` and setting the low and high threshold. Distances 
     84outside this interval are then shown using special colors, so the color 
     85spectrum can be used for visualizing the interesting part of the distribution. 
    5186 
    52 The widget supports different color schemes. The built-in schemes are named :obj:`Blue - Yellow`, :obj:`Black - Red` and :obj:`Green - Black - Red`. The schema is defined, first, by the two colors representing the lowest and highest distances. The two colors are set by clicking the rectangles to the left and right of the color strip below the schema name. The transition can go either from one color to another (in the RGB space) or :obj:`Pass through black`, that is, from one color to black and then to another. Colors can also be set for undefined values (:obj:`N/A`), values below and above the low and high thresholds (:obj:`Underflow` and :obj:`Overflow`), the background (:obj:`Background`), the outline of the cell under the mouse cursor (:obj:`Cell outline`) and the marker around the selected region (:obj:`Selected cells`). 
     87The widget supports different color schemes. The built-in schemes are named 
     88:obj:`Blue - Yellow`, :obj:`Black - Red` and :obj:`Green - Black - Red`. The 
     89schema is defined, first, by the two colors representing the lowest and 
     90highest distances. The two colors are set by clicking the rectangles to the 
     91left and right of the color strip below the schema name. The transition can go 
     92either from one color to another (in the RGB space) or 
     93:obj:`Pass through black`, that is, from one color to black and then to 
     94another. Colors can also be set for undefined values (:obj:`N/A`), values 
     95below and above the low and high thresholds (:obj:`Underflow` and 
     96:obj:`Overflow`), the background (:obj:`Background`), the outline of the cell 
     97under the mouse cursor (:obj:`Cell outline`) and the marker around the selected 
     98region (:obj:`Selected cells`). 
    5399 
    54 User can modify the existing schemata and also create new, customized schemata (:obj:`New`). The built-in schemata are shown below. 
     100User can modify the existing schemata and also create new, customized schemata 
     101(:obj:`New`). The built-in schemata are shown below. 
    55102 
    56103.. image:: images/DistanceMap-Green-Black-Red.png 
     
    58105.. image:: images/DistanceMap-Info.png 
    59106 
    60 The last tab defines the shown information and controls selection of cells. :obj:`Show legend` determines whether the widget shows the colored strip at the top which shows the mapping of numbers into colors. :obj:`Show labels` shows and hides the item names (e.g. age, gender etc) besides the map. Labels can only be shown it they exist; they do when the data represents distances between attributes or when the data is loaded from a labeled distance file. 
     107The last tab defines the shown information and controls selection of cells. 
     108:obj:`Show legend` determines whether the widget shows the colored strip at 
     109the top which shows the mapping of numbers into colors. :obj:`Show labels` 
     110shows and hides the item names (e.g. age, gender etc) besides the map. Labels 
     111can only be shown it they exist; they do when the data represents distances 
     112between attributes or when the data is loaded from a labeled distance file. 
    61113 
    62 If :obj:`Show balloon` is checked, a ballon appears when the mouse is hovering over a cell, which shows the numerical distances and, if :obj:`Display item names` is checked, also the names of the corresponding items. 
     114If :obj:`Show balloon` is checked, a ballon appears when the mouse is hovering 
     115over a cell, which shows the numerical distances and, if 
     116:obj:`Display item names` is checked, also the names of the corresponding 
     117items. 
    63118 
    64 The user can select a region in the map by the usual click-and-drag with the mouse. When a part of the map is selected, the widget output all items corresponding to the selected cells. The three buttons in the :obj:`Select` can undo the last selection, remove all selections and send the selected data. If :obj:`Send after mouse release` is checked, the data is set automatically, without needing to press the button above. 
     119The user can select a region in the map by the usual click-and-drag with the 
     120mouse. When a part of the map is selected, the widget output all items 
     121corresponding to the selected cells. The three buttons in the :obj:`Select` 
     122can undo the last selection, remove all selections and send the selected 
     123data. If :obj:`Send after mouse release` is checked, the data is set 
     124automatically, without needing to press the button above. 
    65125 
    66126Examples 
     
    71131.. image:: images/DistanceMap-Schema.png 
    72132 
    73 The file widget loads the iris data set; we then compute the attribute distances and visualize them. 
     133The file widget loads the iris data set; we then compute the attribute 
     134distances and visualize them. 
  • docs/widgets/rst/unsupervized/exampledistance.rst

    r11050 r11359  
    2424----------- 
    2525 
    26 Widget Example Distances computes the distances between the examples in the data sets. Don't confuse it with a similar widget for computing the distances between attributes. 
     26Widget Example Distances computes the distances between the examples in the 
     27data sets. Don't confuse it with a similar widget for computing the distances 
     28between attributes. 
    2729 
    2830.. image:: images/ExampleDistance.png 
    2931   :alt: Example Distance Widget 
    3032 
    31 The available :obj:`Distance Metrics` definitions are :obj:`Euclidean`, :obj:`Manhattan`, :obj:`Hammming` and :obj:`Relief`. Besides, of course, different formal definitions, the measures also differ in how correctly they treat unknown values. Manhattan and Hamming distance do not excel in this respect: when computing by-attribute distances, if any of the two values are missing, the corresponding distance is set to 0.5 (on a normalized scale where the largest difference in attribute values is 1.0). Relief distance is similar to Manhattan, but with a more correct treatment for discrete attributes: it computes the expected distances by the probability distributions computed from the data (see any Kononenko's papers on ReliefF for the definition). 
     33The available :obj:`Distance Metrics` definitions are :obj:`Euclidean`, 
     34:obj:`Manhattan`, :obj:`Hammming` and :obj:`Relief`. Besides, of course, 
     35different formal definitions, the measures also differ in how correctly 
     36they treat unknown values. Manhattan and Hamming distance do not excel in 
     37this respect: when computing by-attribute distances, if any of the two values 
     38are missing, the corresponding distance is set to 0.5 (on a normalized scale 
     39where the largest difference in attribute values is 1.0). Relief distance is 
     40similar to Manhattan, but with a more correct treatment for discrete 
     41attributes: it computes the expected distances by the probability distribution 
     42computed from the data (see any Kononenko's papers on ReliefF for the 
     43definition). 
    3244 
    33 The most correct treatment of unknown values is done by the Euclidean metrics which computes and uses the probability distributions of discrete attributes, while for continuous distributions it computes the expected distance assuming the Gaussian distribution of attribute values, where the distribution's parameters are again assessed from the data. 
     45The most correct treatment of unknown values is done by the Euclidean metrics 
     46which computes and uses the probability distributions of discrete attributes, 
     47while for continuous distributions it computes the expected distance assuming 
     48the Gaussian distribution of attribute values, where the distribution's 
     49parameters are again assessed from the data. 
    3450 
    35 The rows/columns of the resulting distance matrix can be labeled by the values of a certain attribute which can be chosen in the bottom box, :obj:`Example label`. 
     51The rows/columns of the resulting distance matrix can be labeled by the 
     52values of a certain attribute which can be chosen in the bottom box, 
     53:obj:`Example label`. 
    3654 
    3755 
     
    3957-------- 
    4058 
    41 This widget is a typical intermediate widget: it gives shows no user readable results and its output needs to be fed to a widget that can do something useful with the computed distances, for instance the `Distance Map <DistanceMap.htm>`_, `Hierarchical Clustering <HierarchicalClustering.htm>`_ or `MDS <MDS.htm>`_. 
     59This widget is a typical intermediate widget: it gives shows no user readable 
     60results and its output needs to be fed to a widget that can do something 
     61useful with the computed distances, for instance the :ref:`Distance Map`, 
     62:ref:`Hierarchical Clustering` or :ref:`MDS`. 
    4263 
    4364.. image:: images/ExampleDistance-Schema.png 
  • docs/widgets/rst/unsupervized/hierarchicalclustering.rst

    r11050 r11359  
    1818Outputs: 
    1919   - Selected Examples 
    20       A list of selected examples; applicable only when the input matrix refers to distances between examples 
    21    - Structured Data Files 
    22       ??? 
    23  
     20      A list of selected examples; applicable only when the input matrix 
     21      refers to distances between examples 
     22   - Remaining Examples 
     23      A list of unselected examples 
     24   - Centroids 
     25      A list of cluster centroids 
    2426 
    2527Description 
    2628----------- 
    2729 
    28 The widget computes hierarchical clustering of arbitrary types of objects from the matrix of distances between them and shows the corresponding dendrogram. If the distances apply to examples, the widget offers some special functionality (adding cluster indices, outputting examples...). 
     30The widget computes hierarchical clustering of arbitrary types of objects from 
     31the matrix of distances between them and shows the corresponding dendrogram. If 
     32the distances apply to examples, the widget offers some special functionality 
     33(adding cluster indices, outputting examples...). 
    2934 
    3035.. image:: images/HierarchicalClustering.png 
    3136 
    32 The widget supports three kinds of linkages. In :obj:`Single linkage` clustering, the distance between two clusters is defined as the distance between the closest elements of the two clusters. :obj:`Average linkage` clustering computes the average distance between elements of the two clusters, and :obj:`Complete linkage` defines the distance between two clusters as the distance between their most distant elements. 
     37The widget supports three kinds of linkages. In :obj:`Single linkage` 
     38clustering, the distance between two clusters is defined as the distance 
     39between the closest elements of the two clusters. :obj:`Average linkage` 
     40clustering computes the average distance between elements of the two clusters, 
     41and :obj:`Complete linkage` defines the distance between two clusters as the 
     42distance between their most distant elements. 
    3343 
    34 Nodes of the dendrogram can be labeled. What the labels are depends upon the items being clustered. For instance, when clustering attributes, the labels are obviously the attribute names. When clustering examples, we can use the values of one of the attributes, typically one that give the name or id of an instance, as labels. The label can be chosen in the box :obj:`Annotate`, which also allows setting the font size and line spacing. 
     44Nodes of the dendrogram can be labeled. What the labels are depends upon the 
     45items being clustered. For instance, when clustering attributes, the labels 
     46are obviously the attribute names. When clustering examples, we can use the 
     47values of one of the attributes, typically one that give the name or id of an 
     48instance, as labels. The label can be chosen in the box :obj:`Annotate`, which 
     49also allows setting the font size and line spacing. 
    3550 
    36 Huge dendrograms can be pruned by checking :obj:`Limit pring depth` and selecting the appropriate depth. This only affects the displayed dendrogram and not the actual clustering. 
     51Huge dendrograms can be pruned by checking :obj:`Limit pring depth` and 
     52selecting the appropriate depth. This only affects the displayed dendrogram 
     53and not the actual clustering. 
    3754 
    38 Clicking inside the dendrogram can have two effects. If the cut off line is not shown (:obj:`Show cutoff line` is unchecked), clicking inside the dendrogram will select a cluster. Multiple clusters can be selected by holding Ctrl. Each selected cluster is shown in different color and is treated as a separate cluster on the output. 
     55Clicking inside the dendrogram can have two effects. If the cut off line is 
     56not shown (:obj:`Show cutoff line` is unchecked), clicking inside the 
     57dendrogram will select a cluster. Multiple clusters can be selected by holding 
     58Ctrl. Each selected cluster is shown in different color and is treated as a 
     59separate cluster on the output. 
    3960 
    40 If :obj:`Show cutoff line` is checked, clicking in the dendrogram places a cutoff line. All items in the clustering are selected and the are divided into groups according to the position of the line. 
     61If :obj:`Show cutoff line` is checked, clicking in the dendrogram places a 
     62cutoff line. All items in the clustering are selected and the are divided 
     63into groups according to the position of the line. 
    4164 
    42 If the items being clustered are examples, they can be added a cluster index (:obj:`Append cluster indices`). The index can appear as a :obj:`Class attribute`, ordinary :obj:`Attribute` or a :obj:`Meta attribute`. In the former case, if the data already has a class attribute, the original class is placed among meta attributes. 
     65If the items being clustered are examples, they can be added a cluster index 
     66(:obj:`Append cluster indices`). The index can appear as a 
     67:obj:`Class attribute`, ordinary :obj:`Attribute` or a :obj:`Meta attribute`. 
     68In the former case, if the data already has a class attribute, the original 
     69class is placed among meta attributes. 
    4370 
    44 The data can be output on any change (:obj:`Commit on change`) or, if this is disabled, by pushing :obj:`Commit`. 
     71The data can be output on any change (:obj:`Commit on change`) or, if this 
     72is disabled, by pushing :obj:`Commit`. 
    4573 
    4674 
    47 Clustering has two parameters that can be set by the user, the number of clusters and the type of distance metrics, :obj:`Euclidean distance` or :obj:`Manhattan`. Any changes must be confirmed by pushing :obj:`Apply`. 
     75.. This is from the old Alex Jakulin's widget doc. Left in case BIC is 
     76   reimplemented 
    4877 
    49 The table on the right hand side shows the results of clustering. For each cluster it gives the number of examples, its fitness and BIC. 
     78   Clustering has two parameters that can be set by the user, the number of 
     79   clusters and the type of distance metrics, :obj:`Euclidean distance` or 
     80   :obj:`Manhattan`. Any changes must be confirmed by pushing :obj:`Apply`. 
    5081 
    51 Fitness measures how well the cluster is defined. Let d<sub>i,C</sub> be the average distance between point i and the points in cluster C. Now, let a<sub>i</sub> equal d<sub>i,C'</sub>, where C' is the cluster i belongs to, and let b<sub>i</sub>=min d<sub>i,C</sub> over all other clusters C. Fitness is then defined as the average silhouette of the cluster C, that is avg( (b<sub>i</sub>-a<sub>i</sub>)/max(b<sub>i</sub>, a<sub>i</sub>) ). 
     82   The table on the right hand side shows the results of clustering. For each 
     83   cluster it gives the number of examples, its fitness and BIC. 
    5284 
    53 To make it simple, fitness close to 1 signifies a well-defined cluster. 
     85   Fitness measures how well the cluster is defined. Let d<sub>i,C</sub> be the 
     86   average distance between point i and the points in cluster C. Now, let 
     87   a<sub>i</sub> equal d<sub>i,C'</sub>, where C' is the cluster i belongs to, 
     88   and let b<sub>i</sub>=min d<sub>i,C</sub> over all other clusters C. Fitness 
     89   is then defined as the average silhouette of the cluster C, that is 
     90   avg( (b<sub>i</sub>-a<sub>i</sub>)/max(b<sub>i</sub>, a<sub>i</sub>) ). 
    5491 
    55 BIC is short for Bayesian Information Criteria and is computed as ln L-k(d+1)/2 ln n, where k is the number of clusters, d is dimension of data (the number of attributes) and n is the number of examples (data instances). L is the likelihood of the model, assuming the spherical Gaussian distributions around the centroid(s) of the cluster(s). 
     92   To make it simple, fitness close to 1 signifies a well-defined cluster. 
     93 
     94   BIC is short for Bayesian Information Criteria and is computed as 
     95   ln L-k(d+1)/2 ln n, where k is the number of clusters, d is dimension of 
     96   data (the number of attributes) and n is the number of examples 
     97   (data instances). L is the likelihood of the model, assuming the 
     98   spherical Gaussian distributions around the centroid(s) of the cluster(s). 
    5699 
    57100 
     
    63106.. image:: images/HierarchicalClustering-Schema.png 
    64107 
    65 We loaded the Zoo data set. The clustering of attributes is already shown above. Below is the clustering of examples, that is, of animals, and the nodes are annotated by the animals' names. We connected the `Linear projection widget <../Visualize/LinearProjection.htm>`_ showing the freeviz-optimized projection of the data so that it shows all examples read from the file, while the signal from Hierarchical clustering is used as a subset. Linear projection thus marks the examples selected in Hierarchical clustering. This way, we can observe the position of the selected cluster(s) in the projection. 
     108We loaded the Zoo data set. The clustering of attributes is already shown 
     109above. Below is the clustering of examples, that is, of animals, and the nodes 
     110are annotated by the animals' names. We connected the :ref:`Linear projection` 
     111showing the freeviz-optimized projection of the data so that it shows all 
     112examples read from the file, while the signal from Hierarchical clustering is 
     113used as a subset. Linear projection thus marks the examples selected in 
     114Hierarchical clustering. This way, we can observe the position of the selected 
     115cluster(s) in the projection. 
    66116 
    67117.. image:: images/HierarchicalClustering-Example.png 
    68118 
    69 To (visually) test how well the clustering corresponds to the actual classes in the data, we can tell the widget to show the class ("type") of the animal instead of its name (:obj:`Annotate`). Correspondence looks good. 
     119To (visually) test how well the clustering corresponds to the actual classes 
     120in the data, we can tell the widget to show the class ("type") of the animal 
     121instead of its name (:obj:`Annotate`). Correspondence looks good. 
    70122 
    71123.. image:: images/HierarchicalClustering-Example2.png 
    72124 
    73 A fancy way to verify the correspondence between the clustering and the actual classes would be to compute the chi-square test between them. As Orange does not have a dedicated widget for that, we can compute the chi-square in `Attribute Distance <AttributeDistance.htm>`_ and observe it in `Distance Map <DistanceMap.htm>`_. The only caveat is that Attribute Distance computes distances between attributes and not the class and the attribute, so we have to use `Select attributes <../Data/SelectAttributes.htm>`_ to put the class among the ordinary attributes and replace it with another attribute, say "tail" (this is needed since Attribute Distance requires data with a class attribute, for technical reasons; the class attribute itself does not affect the computed chi-square). 
     125A fancy way to verify the correspondence between the clustering and the actual 
     126classes would be to compute the chi-square test between them. As Orange does 
     127not have a dedicated widget for that, we can compute the chi-square in 
     128:ref:`Attribute Distance` and observe it in :ref:`Distance Map`. The only 
     129caveat is that Attribute Distance computes distances between attributes and 
     130not the class and the attribute, so we have to use :ref:`Select attributes` to 
     131put the class among the ordinary attributes and replace it with another 
     132attribute, say "tail" (this is needed since Attribute Distance requires data 
     133with a class attribute, for technical reasons; the class attribute itself does 
     134not affect the computed chi-square). 
    74135 
    75 A more direct approach is to leave the class attribute (the animal type) as it is, simply add the cluster index and observe its information gain in the `Rank widget <../Data/Rank.htm>`_. 
     136A more direct approach is to leave the class attribute (the animal type) as it 
     137is, simply add the cluster index and observe its information gain in the 
     138:ref:`Rank`. 
    76139 
    77 More tricks with a similar purpose are described in the documentation for `K-Means Clustering <K-MeansClustering.htm>`_. 
     140More tricks with a similar purpose are described in the documentation for 
     141:ref:`K-Means Clustering`. 
    78142 
    79 The schema that does both and the corresponding settings of the hiearchical clustering widget are shown below. 
     143The schema that does both and the corresponding settings of the hiearchical 
     144clustering widget are shown below. 
    80145 
    81146.. image:: images/HierarchicalClustering-Schema2.png 
  • docs/widgets/rst/unsupervized/interactiongraph.rst

    r11050 r11359  
    2424 
    2525 
    26 The widget computes interactions between attributes as defined by Aleks Jakulin in his work on `attribute interactions <http://stat.columbia.edu/~jakulin/Int/>`_. The interaction is defined as the difference between the sum of individual attribute information gains and the information gain of their cartesian product. The interaction can be negative (e.g. when the attributes are correlated), or positive (e.g. when the class is related to the xor of the two attributes). 
     26The widget computes interactions between attributes as defined by Aleks Jakulin 
     27in his work on `attribute interactions <http://stat.columbia.edu/~jakulin/Int/>`_. 
     28The interaction is defined as the difference between the sum of individual 
     29attribute information gains and the information gain of their cartesian 
     30product. The interaction can be negative (e.g. when the attributes are 
     31correlated), or positive (e.g. when the class is related to the xor of the 
     32two attributes). 
    2733 
    28 The widget uses an external application for drawing graphs, `GraphViz <http://www.graphviz.org/>`_. It does not come with Orange, so you will need to install separately for the entire widget to work. 
     34The widget uses an external application for drawing graphs, 
     35`GraphViz <http://www.graphviz.org/>`_. It does not come with Orange, so you 
     36will need to install separately for the entire widget to work. 
    2937 
    30 The widget will be completely redesigned in the nearest future, so we here only give its most basic description. 
     38The widget will be completely redesigned in the nearest future, so we here 
     39only give its most basic description. 
    3140 
    3241 
    3342.. image:: images/InteractionGraph-Small.png 
    3443 
    35 The widget is comprised of three parts. In the leftmost the user can select the attributes among which she or he wants to compute the interactions. The middle part contains all pairs of attributes - or all interesting pairs, if :obj:`Show only important interactions` is checked. For attributes which are in positive interaction, the blue parts at the left and the right end of the bar represent the individual information gains and the green part in the middle represents the interaction. For those in negative interaction, the red part is the interaction, which can be interpreted as the amount of information conveyed by both attributes, while the blue parts to the left and right are each attribute's individual contribution. 
     44The widget is comprised of three parts. In the leftmost the user can select 
     45the attributes among which she or he wants to compute the interactions. The 
     46middle part contains all pairs of attributes - or all interesting pairs, 
     47if :obj:`Show only important interactions` is checked. For attributes which 
     48are in positive interaction, the blue parts at the left and the right end of 
     49the bar represent the individual information gains and the green part in the 
     50middle represents the interaction. For those in negative interaction, the red 
     51part is the interaction, which can be interpreted as the amount of information 
     52conveyed by both attributes, while the blue parts to the left and right are 
     53each attribute's individual contribution. 
    3654 
    3755The right part of the widget shows a graph of interactions. 
  • docs/widgets/rst/unsupervized/kmeansclustering.rst

    r11050 r11359  
    2323----------- 
    2424 
    25 The widget applies the K-means clustering algorithm to the data from the input and outputs a new data set in which the cluster index is used for the class attribute. The original class attribute, if it existed, is moved to meta attributes. The basic information on the clustering results is also shown in the widget. 
     25The widget applies the K-means clustering algorithm to the data from the input 
     26and outputs a new data set in which the cluster index is used for the class 
     27attribute. The original class attribute, if it existed, is moved to meta 
     28attributes. The basic information on the clustering results is also shown in 
     29the widget. 
    2630 
    27 <img class="leftscreenshot" src="K-MeansClustering.png" border=0> 
    2831 
    29 Clustering has two parameters that can be set by the user, the number of clusters and the type of distance metrics, :obj:`Euclidean distance` or :obj:`Manhattan`. Any changes must be confirmed by pushing :obj:`Apply`. 
     32.. Clustering has two parameters that can be set by the user, the number of 
     33   clusters and the type of distance metrics, :obj:`Euclidean distance` or 
     34   :obj:`Manhattan`. Any changes must be confirmed by pushing :obj:`Apply`. 
    3035 
    31 The table on the right hand side shows the results of clustering. For each cluster it gives the number of examples, its fitness and BIC. 
     36   The table on the right hand side shows the results of clustering. For each 
     37   cluster it gives the number of examples, its fitness and BIC. 
    3238 
    33 Fitness measures how well the cluster is defined. Let d<sub>i,C</sub> be the average distance between point i and the points in cluster C. Now, let a<sub>i</sub> equal d<sub>i,C'</sub>, where C' is the cluster i belongs to, and let b<sub>i</sub>=min d<sub>i,C</sub> over all other clusters C. Fitness is then defined as the average silhouette of the cluster C, that is avg( (b<sub>i</sub>-a<sub>i</sub>)/max(b<sub>i</sub>, a<sub>i</sub>) ). 
     39   Fitness measures how well the cluster is defined. Let d<sub>i,C</sub> be the 
     40   average distance between point i and the points in cluster C. Now, let 
     41   a<sub>i</sub> equal d<sub>i,C'</sub>, where C' is the cluster i belongs to, 
     42   and let b<sub>i</sub>=min d<sub>i,C</sub> over all other clusters C. Fitness 
     43   is then defined as the average silhouette of the cluster C, that is 
     44   avg( (b<sub>i</sub>-a<sub>i</sub>)/max(b<sub>i</sub>, a<sub>i</sub>) ). 
    3445 
    35 To make it simple, fitness close to 1 signifies a well-defined cluster. 
     46   To make it simple, fitness close to 1 signifies a well-defined cluster. 
    3647 
    37 BIC is short for Bayesian Information Criteria and is computed as ln L-k(d+1)/2 ln n, where k is the number of clusters, d is dimension of data (the number of attributes) and n is the number of examples (data instances). L is the likelihood of the model, assuming the spherical Gaussian distributions around the centroid(s) of the cluster(s). 
     48   BIC is short for Bayesian Information Criteria and is computed as 
     49   ln L-k(d+1)/2 ln n, where k is the number of clusters, d is dimension of 
     50   data (the number of attributes) and n is the number of examples (data 
     51   instances). L is the likelihood of the model, assuming the spherical 
     52   Gaussian distributions around the centroid(s) of the cluster(s). 
    3853 
    3954 
     
    4560.. image:: images/K-MeansClustering-Schema.png 
    4661 
    47 The beginning is nothing special: we load the iris data, divide it into three clusters, show it in a table, where we can observe which example went into which cluster. The interesting part are the Scatter plot and Select data. 
     62The beginning is nothing special: we load the iris data, divide it into 
     63three clusters, show it in a table, where we can observe which example went 
     64into which cluster. The interesting part are the Scatter plot and Select data. 
    4865 
    49 Since K-means added the cluster index as the class attribute, the scatter plot will color the points according to the clusters they are in. Indeed, what we get looks like this. 
     66Since K-means added the cluster index as the class attribute, the scatter 
     67plot will color the points according to the clusters they are in. Indeed, what 
     68we get looks like this. 
     69 
    5070.. image:: images/K-MeansClustering-Scatterplot.png 
    5171 
    52 The thing we might be really interested in is how well the clusters induced by the (unsupervised) clustering algorithm match the actual classes appearing in the data. We thus take the Select data widget in which we can select individual classes and get the corresponding points in the scatter plot marked. The match is perfect setosa, and pretty good for the other two classes. 
     72The thing we might be really interested in is how well the clusters induced by 
     73the (unsupervised) clustering algorithm match the actual classes appearing in 
     74the data. We thus take the Select data widget in which we can select individual 
     75classes and get the corresponding points in the scatter plot marked. The match 
     76is perfect setosa, and pretty good for the other two classes. 
    5377 
    5478.. image:: images/K-MeansClustering-Example.png 
    5579 
    56 You may have noticed that we left the :obj:`Remove unused values/attributes` and :obj:`Remove unused classes` in Select Data unchecked. This is important: if the widget modifies the attributes, it outputs a list of modified examples and the scatter plot cannot compare them to the original examples. 
     80You may have noticed that we left the :obj:`Remove unused values/attributes` 
     81and :obj:`Remove unused classes` in Select Data unchecked. This is important: 
     82if the widget modifies the attributes, it outputs a list of modified examples 
     83and the scatter plot cannot compare them to the original examples. 
    5784 
    58 Another, perhaps simpler way to test the match between clusters and the original classes is to use the widget `Distributions <../Visualize/Distributions.htm>`_. The only (minor) problem here is that this widget only visualizes the normal attributes and not the meta attributes. We solve this by using `Select Attributes <../Data/SelectAttributes.htm>`_ with which we move the original class to normal attributes. 
     85Another, perhaps simpler way to test the match between clusters and the 
     86original classes is to use the widget :ref:`Distributions`. The only (minor) 
     87problem here is that this widget only visualizes the normal attributes and not 
     88the meta attributes. We solve this by using :ref:`Select Attributes` with which 
     89we move the original class to normal attributes. 
    5990 
    6091.. image:: images/K-MeansClustering-Schema.png 
    6192 
    62 The match is perfect for setosa: all instances of setosa are in the first cluster (blue). 47 versicolors are in the third cluster (green), while three ended up in the second. For virginicae, 49 are in the second cluster and one in the third. 
     93The match is perfect for setosa: all instances of setosa are in the first 
     94cluster (blue). 47 versicolors are in the third cluster (green), while three 
     95ended up in the second. For virginicae, 49 are in the second cluster and one 
     96in the third. 
    6397 
    64 To observe the possibly more interesting reverse relation, we need to rearrange the attributes in the Select Attributes: we reinstate the original class Iris as the class and put the cluster index among the attributes. 
     98To observe the possibly more interesting reverse relation, we need to 
     99rearrange the attributes in the Select Attributes: we reinstate the original 
     100class Iris as the class and put the cluster index among the attributes. 
    65101 
    66102.. image:: images/K-MeansClustering-Example2a.png 
    67103 
    68 The first cluster is exclusively setosae, the second has mostly virginicae and the third has mostly versicolors. 
     104The first cluster is exclusively setosae, the second has mostly virginicae 
     105and the third has mostly versicolors. 
  • docs/widgets/rst/unsupervized/mds.rst

    r11050 r11359  
    66.. image:: ../icons/MDS.png 
    77 
    8 Multidimensional scaling (MDS) - a projection into a plane fitted to the given distances between the points 
     8Multidimensional scaling (MDS) - a projection into a plane fitted to the given 
     9distances between the points 
    910 
    1011Signals 
     
    2122   - Selected Examples 
    2223      A table of selected examples 
    23    - Structured Data Files 
    24       ??? 
    2524 
    2625 
    27 Signals Example Subset and Selected Examples are only applicable if Distance Matrix describes distances between examples, for instance if the matrix comes from `Example Distance <ExampleDistance.htm>`_. 
     26Signals Example Subset and Selected Examples are only applicable if Distance 
     27Matrix describes distances between examples, for instance if the matrix comes 
     28from :ref:`Example Distance`. 
    2829 
    2930Description 
    3031----------- 
    3132 
    32 Multidimensional scaling is a technique which finds a low-dimensional (in our case a two-dimensional) projection of points, where it tries to fit the given distances between points as well is possible. The perfect fit is typically impossible to obtain since the data is higher dimensional or the distances are not Euclidean. 
     33Multidimensional scaling is a technique which finds a low-dimensional (in our 
     34case a two-dimensional) projection of points, where it tries to fit the given 
     35distances between points as well is possible. The perfect fit is typically 
     36impossible to obtain since the data is higher dimensional or the distances are 
     37not Euclidean. 
    3338 
    34 To do its work, the widget needs a matrix of distances. The distances can correspond to any kinds of object. However, the widget has some functionality dedicated to distances between examples, such as coloring the points and changing their shapes, marking them, and outputting them upon selection. 
     39To do its work, the widget needs a matrix of distances. The distances can 
     40correspond to any kinds of object. However, the widget has some functionality 
     41dedicated to distances between examples, such as coloring the points and 
     42changing their shapes, marking them, and outputting them upon selection. 
    3543 
    36 The algorithm iteratively moves the points around in a kind of simulation of a physical model: if two points are too close to each other (or too far away), there is a force pushing them apart (together). The change of the point's position at each time interval corresponds to the sum of forces acting on it. 
     44The algorithm iteratively moves the points around in a kind of simulation of a 
     45physical model: if two points are too close to each other (or too far away), 
     46there is a force pushing them apart (together). The change of the point's 
     47position at each time interval corresponds to the sum of forces acting on it. 
    3748 
    3849.. image:: images/MDS.png 
    3950 
    40 The first group of buttons set the position of points. :obj:`Randomize` sets the to a random position; the initial positions are also random. :obj:`Jitter` randomly moves the points for a short distance; this may be useful if the optimization is stuck in a (seemingly) local minimum. :obj:`Torgerson` positions the points using Torgerson's method. 
     51The first group of buttons set the position of points. :obj:`Randomize` sets 
     52the to a random position; the initial positions are also random. 
     53:obj:`Jitter` randomly moves the points for a short distance; this may be 
     54useful if the optimization is stuck in a (seemingly) local minimum. 
     55:obj:`Torgerson` positions the points using Torgerson's method. 
    4156 
    42 Optimization is run by pushing :obj:`Optimize`. :obj:`Single Step` makes a single step of optimization; this is primarily useful for educative purposes. 
     57Optimization is run by pushing :obj:`Optimize`. :obj:`Single Step` makes a 
     58single step of optimization; this is primarily useful for educative purposes. 
    4359 
    44 Stress function defines how the difference between the desired and the actual distance between points translates into the forces acting on them. Several are available. Let current and desired be the distance in the current projection and the desired distances, and diff=current-desired. Then the stress functions are defined as follows: 
     60Stress function defines how the difference between the desired and the actual 
     61distance between points translates into the forces acting on them. Several are 
     62available. Let current and desired be the distance in the current projection 
     63and the desired distances, and diff=current-desired. Then the stress functions 
     64are defined as follows: 
    4565 
    46    - :obj:`Kruskal stress`: diff<sup>2</sup> 
    47    - :obj:`Sammon stress`: diff<sup>2</sup>/current 
     66   - :obj:`Kruskal stress`: diff\ :sup:`2` 
     67   - :obj:`Sammon stress`: diff\ :sup:`2`\ /current 
    4868   - :obj:`Signed Sammon stress`: diff/current 
    4969   - :obj:`Signed relative stress`: diff/desired 
    5070 
    5171 
     72The widget redraws the projection during optimization. It can do so at 
     73:obj:`Every step`, :obj:`Every 10 steps` or :obj:`Every 100 steps`. Setting a 
     74lower refresh interval makes the animation more visually appealing, but can be 
     75slow if the number of points is high. 
    5276 
    53 The widget redraws the projection during optimization. It can do so at :obj:`Every step`, :obj:`Every 10 steps` or :obj:`Every 100 steps`. Setting a lower refresh interval makes the animation more visually appealing, but can be slow if the number of points is high. 
     77The optimization stops either when the projection changes only minimally at 
     78the last iteration or when a specified number of steps have been made. The two 
     79conditions are given with options :obj:`Minimal average stress change` and 
     80:obj:`Maximal number of steps`. 
    5481 
    55 The optimization stops either when the projection changes only minimally at the last iteration or when a specified number of steps have been made. The two conditions are given with options :obj:`Minimal average stress change` and :obj:`Maximal number of steps`. 
     82The bottom of the settings pane shows the average stress (the lower the better) 
     83and the number of steps made in the last optimization. 
    5684 
    57 The bottom of the settings pane shows the average stress (the lower the better) and the number of steps made in the last optimization. 
     85.. image:: images/MDS-Graph.png 
     86   :alt: MDS 'Graph' tab 
     87   :align: left 
    5888 
    59 <img class="leftscreenshot" src="MDS-Graph.png" border=0 /> 
     89The second tab with settings defines how the points are visualized and the 
     90settings related to outputting the data. The user can set the size of points 
     91(:obj:`Point Size`) or let the size depend on the value of some continuous 
     92attribute (:obj:`Size`) of the example the point represents. The color and 
     93shape of the point (:obj:`Color`, :obj:`Shape`) can depend upon values of 
     94discrete attributes. Any attribute can serve as a label. 
    6095 
    61 The second tab with settings defines how the points are visualized and the settings related to outputting the data. The user can set the size of points (:obj:`Point Size`) or let the size depend on the value of some continuous attribute (:obj:`Size`) of the example the point represents. The color and shape of the point (:obj:`Color`, :obj:`Shape`) can depend upon values of discrete attributes. Any attribute can serve as a label. 
     96These options are only active if the points represents examples (that is, if 
     97there is a table of examples attached to the distance matrix on the widget's 
     98input). If the points represent attributes (e.g. the distance matrix comes 
     99from :ref:`Attribute Distance`, the points can be labeled by attribute names. 
     100If the points come from a labeled distance file (see :ref:`Distance File`, the 
     101labels can be used for annotating the points. 
    62102 
    63 These options are only active if the points represents examples (that is, if there is a table of examples attached to the distance matrix on the widget's input). If the points represent attributes (e.g. the distance matrix comes from `Attribute Distance <AttributeDistance.htm>`_), the points can be labeled by attribute names. If the points come from a labeled distance file (see `Distance File <DistanceFile.htm>`_), the labels can be used for annotating the points. 
    64  
    65 The widget can superimpose a graph onto the projection, where the specified proportion of the most similar pairs is connected, with the width of connection showing the similarity. This is enabled by checking :obj:`Show similar pairs` and setting the proportion of connected pairs below. Enabling this option during the optimization can illustrate how the algorithm works, though drawing too many connections at each refresh can make the optimization very slow. The picture below shows a rendering of the zoo data set with this option enable. 
     103The widget can superimpose a graph onto the projection, where the specified 
     104proportion of the most similar pairs is connected, with the width of connection 
     105showing the similarity. This is enabled by checking :obj:`Show similar pairs` 
     106and setting the proportion of connected pairs below. Enabling this option 
     107during the optimization can illustrate how the algorithm works, though drawing 
     108too many connections at each refresh can make the optimization very slow. The 
     109picture below shows a rendering of the zoo data set with this option enabled. 
    66110 
    67111.. image:: images/MSD-Connected.png 
     112   :alt: MDS Similar Pairs 
    68113 
    69 The remaining options deal with zooming selecting the points and sending them on. The magnifying glass enables zooming, and the other two icons enable selection of examples with rectangular or arbitrary selection areas. The buttons in the left group undo the last action, remove all selection and send the selected examples. Sending the examples can be automatic if :obj:`Auto send selected` is checked. 
     114The remaining options deal with zooming selecting the points and sending them 
     115on. The magnifying glass enables zooming, and the other two icons enable 
     116selection of examples with rectangular or arbitrary selection areas. The 
     117buttons in the left group undo the last action, remove all selection and send 
     118the selected examples. Sending the examples can be automatic if 
     119:obj:`Auto send selected` is checked. 
    70120 
    71 The output data can have the coordinates of each point appended, either as normal attributes (:obj:`Append coordinates`) or as meta attributes (:obj:`Append coordinates as meta`). 
     121The output data can have the coordinates of each point appended, either as 
     122normal attributes (:obj:`Append coordinates`) or as meta attributes 
     123(:obj:`Append coordinates as meta`). 
    72124 
    73 The MDS graph performs many of the functions of the visualizations widget. It is in many respects similar to the `Scatter Plot <../Visuzalize/ScatterPlot.htm>`_, so we recommend reading its description as well. 
     125The MDS graph performs many of the functions of the visualizations widget. It 
     126is in many respects similar to the :ref:`Scatter Plot`, so we recommend 
     127reading its description as well. 
    74128 
    75129Examples 
     
    79133 
    80134.. image:: images/MDS-Schema.png 
    81  
    82 Interactive functions of the MDS widget - marking subsets of examples, selecting examples, etc. - are similar to those of the `Scatter Plot <../Visuzalize/ScatterPlot.htm>`_ widget, so see its documentation for more examples. 
     135   :alt: MDS Scheme 
  • docs/widgets/rst/visualize/attributestatistics.rst

    r11050 r11359  
    2323----------- 
    2424 
    25 Attribute Statistics shows distributions of attribute values. It is a good practice to check any new data with this widget, to quickly discover any anomalies, such as duplicated values (e.g. gray and grey), outliers, and similar. 
     25Attribute Statistics shows distributions of attribute values. It is a good 
     26practice to check any new data with this widget, to quickly discover any 
     27anomalies, such as duplicated values (e.g. gray and grey), outliers, and 
     28similar. 
    2629 
    2730.. image:: images/AttributeStatistics-Cont.png 
     31   :alt: Attribute Statistics for continuous features 
    2832 
    29 For continuous attributes, the widget shows the minimal and maximal value. In case of Iris' attribute "petal length" (figure on the left), these are 1.00 and 6.90. In between are the 25'th percentile, the median and the 75%, which are 1.60, 4.35 and 5.10, respectively. The mean and standard deviation are printed in red (3.76 and 1.76) and also represented with the vertical line. At the bottom left corner there is also information on the sample size (there are 150 examples in the Iris data set, without any missing values) and the number of distinct values that this attribute takes. 
     33For continuous attributes, the widget shows the minimal and maximal value. 
     34In case of Iris' attribute "petal length" (figure on the left), these are 
     351.00 and 6.90. In between are the 25'th percentile, the median and the 75%, 
     36which are 1.60, 4.35 and 5.10, respectively. The mean and standard deviation 
     37are printed in red (3.76 and 1.76) and also represented with the vertical line. 
     38At the bottom left corner there is also information on the sample size (there 
     39are 150 examples in the Iris data set, without any missing values) and the 
     40number of distinct values that this attribute takes. 
    3041 
    3142.. image:: images/AttributeStatistics-Disc.png 
    3243 
    33 For discrete attributes, the bars represent the number of examples with each particular attribute value. The picture shows the number of different animal types in the Zoo data set: there are 41 mammals, 13 fish and so forth. 
     44For discrete attributes, the bars represent the number of examples with each 
     45particular attribute value. The picture shows the number of different animal 
     46types in the Zoo data set: there are 41 mammals, 13 fish and so forth. 
    3447 
    3548 
    36 For both kinds of attributes, the graph can be saved by clicking the Save Graph button. 
     49For both kinds of attributes, the graph can be saved by clicking the 
     50:obj:`Save Graph` button. 
    3751 
    3852Examples 
    3953-------- 
    4054 
    41 Attribute Statistics is most commonly used immediately after the `File <../Data/File.htm>`_ widget to observe statistical properties of the data set. It is also useful for finding the properties of a specific data set, for instance a group of examples manually defined in another widget, such as scatter plot or examples belonging to some cluster or a classification tree node, as shown in the schema below. 
     55Attribute Statistics is most commonly used immediately after the :ref:`File` 
     56widget to observe statistical properties of the data set. It is also useful for 
     57finding the properties of a specific data set, for instance a group of 
     58examples manually defined in another widget, such as scatter plot or examples 
     59belonging to some cluster or a classification tree node, as shown in the 
     60schema below. 
    4261 
    4362.. image:: images/AttributeStatistics-Schema.png 
     63   :alt: Attribute Statistics Example schema 
  • docs/widgets/rst/visualize/distributions.rst

    r11050 r11359  
    2323----------- 
    2424 
    25 Distributions displays the value distribution of either discrete or continuous attributes. If the data contains class, the distributions are conditioned on the class. 
     25Distributions displays the value distribution of either discrete or continuous 
     26attributes. If the data contains class, the distributions are conditioned on 
     27the class. 
    2628 
    2729.. image:: images/Distributions-Disc.png 
     30   :alt: Distribution for a discrete feature 
    2831 
    29 For discrete attributes, the graph displayed by the widget shows how many times (e.g., in how many data instances) each of the attribute values appear in the data. If the data contains a class variable, class distributions for each of the attribute values will be displayed as well (like in the snapshot above). The widget may be requested to display only value distributions for instances of certain class (:obj:`Outcomes`). For class-valued data sets, the class probability conditioned on a specific value of the attribute (:obj:`Target value`) may be displayed as well. 
     32For discrete attributes, the graph displayed by the widget shows how many times 
     33(e.g., in how many data instances) each of the attribute values appear in the 
     34data. If the data contains a class variable, class distributions for each of 
     35the attribute values will be displayed as well (like in the snapshot above). 
     36The widget may be requested to display only value distributions for instances 
     37of certain class (:obj:`Outcomes`). For class-valued data sets, the class 
     38probability conditioned on a specific value of the attribute 
     39(:obj:`Target value`) may be displayed as well. 
    3040 
    3141.. image:: images/Distributions-Cont.png 
     42   :alt: Distribution for a continuous feature 
    3243 
    33 For continuous attributes, the attribute values are discretized and value distribution is displayed as a histogram. Notice that the :obj:`Number of bars` can be used to alter the discretization used. Class probabilities for the continuous attributes are obtained through loess smoothing, the appearance of the curve and inclusion of the confidence intervals are set in :obj:`Probability plot` settings. 
     44For continuous attributes, the attribute values are discretized and value 
     45distribution is displayed as a histogram. Notice that the :obj:`Number of bars` 
     46can be used to alter the discretization used. Class probabilities for the 
     47continuous attributes are obtained through loess smoothing, the appearance of 
     48the curve and inclusion of the confidence intervals are set in 
     49:obj:`Probability plot` settings. 
    3450 
    35 Notice that in class-less domains, the bars are displayed in gray and no additional information (e.g. conditional probabilities) are available. 
     51Notice that in class-less domains, the bars are displayed in gray and no 
     52additional information (e.g. conditional probabilities) are available. 
    3653 
    3754.. image:: images/Distributions-NoClass.png 
     55   :alt: Distribution with no class variable 
  • docs/widgets/rst/visualize/linearprojection.rst

    r11050 r11359  
    66.. image:: ../icons/LinearProjection.png 
    77 
    8 Various linear projection methods with explorative data analysis and intelligent data visualization enhancements. 
     8Various linear projection methods with explorative data analysis and 
     9intelligent data visualization enhancements. 
    910 
    1011Signals 
     
    2930 
    3031 
    31 Warning: this widget combines a number of visualization methods that are currently in research. Eventually, it will break down to a set of simpler widgets, each implementing its own method. 
     32Warning: this widget combines a number of visualization methods that are 
     33currently in research. Eventually, it will break down to a set of simpler 
     34widgets, each implementing its own method. 
    3235 
    3336Description 
    3437----------- 
    3538 
    36 This widget provides an interface to a number of linear projection methods that all deal with class-labeled data and aim at finding the two-dimensional projection where instances of different classes are best separated. Consider, for a start, a projection of a <a href="">zoo.tab</a> data set (animal species and their features) shown below. Notice that it is breast-feeding (milk) and hair that nicely characterizes mamals from the other organisms, and that laying eggs is something that birds do. This specific visualization was obtained using FreeViz (`Demsar et al., 2007 <#Demsar2007>`_), while the widget also implements an interface to supervised principal component analysis (`Koren and Carmel, 2003 <#Koren2003>`_), partial least squares (for a nice introduction, see `Boulesteix and Strimmer, 2006 <Boulesteix2007>`_), and RadViz visualization and associated intelligent data visualization technique called VizRank (<a href=""></a>). 
     39This widget provides an interface to a number of linear projection methods that 
     40all deal with class-labeled data and aim at finding the two-dimensional 
     41projection where instances of different classes are best separated. Consider, 
     42for a start, a projection of a **zoo.tab** data set (animal species and their 
     43features) shown below. Notice that it is breast-feeding (milk) and hair that 
     44nicely characterizes mamals from the other organisms, and that laying eggs is 
     45something that birds do. This specific visualization was obtained using FreeViz 
     46([Demsar2007]_), while the widget also implements an interface to supervised 
     47principal component analysis ([Koren2003]_), partial least squares (for a nice 
     48introduction, see [Boulesteix2007]_), and RadViz visualization and 
     49associated intelligent data visualization technique called VizRank  
     50([Leban2006]_) 
    3751 
    3852.. image:: images/LinearProjection-Zoo.png 
     53   :alt: Lienar Projection on zoo data set 
    3954 
    40 Projection search methods are invoked from :obj:`Optimization Dialogs` in the :obj:`Main` tab. Other controls in this tab and controls in the :obj:`Settings` tab are just like those with other visualization widgets; please refer to a documentation of `Scatterplot <Scatterplot.html>`_ widget for further information. 
     55Projection search methods are invoked from :obj:`Optimization Dialogs` in the 
     56:obj:`Main` tab. Other controls in this tab and controls in the :obj:`Settings` 
     57tab are just like those with other visualization widgets; please refer to a 
     58documentation of :ref:`Scatter Plot` widget for further information. 
    4159 
    4260.. image:: images/LinearProjection-FreeViz.png 
    4361   :alt: FreeViz screen shot 
    4462 
    45 :obj:`FreeViz` button in :obj:`Main` tab opens a dialog from which four different methods are accessed. The first one is FreeViz, which uses a paradigm borrowed from particle physics: points in the same class attract each other, those from different class repel each other, and the resulting forces are exerted on the anchors of the attributes, that is, on unit vectors of each of the dimensional axis. The points cannot move (are projected in the projection space), but the attribute anchors can, so the optimization process is a hill-climbing optimization where at the end the anchors are placed such that forces are in equilibrium. The FreeViz optimization dialog is used to invoke the optimization process (:obj:`Optimize Separation`) or execute a single step of optimization (:obj:`Single Step`). The result of the optimization may depend on the initial placement of the anchors, which can be set in a circle, arbitrary or even manually (:obj:`Set anchor positions`). The later also works at any stage of optimization, and we recommend to play with this option in order to understand how a change of one anchor affects the positions of the data points. Controls in :obj:`Forces` box are used to set the parameters that define the type of the forces between the data points (see `Demsar et al., 2007 <#Demsar2007>`_). In any linear projection, projections of unit vector that are very short compared to the others indicate that their associated attribute is not very informative for particular classification task. Those vectors, that is, their corresponding anchors, may be hidden from the visualization using controls in :obj:`Show anchors` box. 
     63:obj:`FreeViz` button in :obj:`Main` tab opens a dialog from which four 
     64different methods are accessed. The first one is FreeViz, which uses a paradigm 
     65borrowed from particle physics: points in the same class attract each other, 
     66those from different class repel each other, and the resulting forces are 
     67exerted on the anchors of the attributes, that is, on unit vectors of each of 
     68the dimensional axis. The points cannot move (are projected in the projection 
     69space), but the attribute anchors can, so the optimization process is a 
     70hill-climbing optimization where at the end the anchors are placed such that 
     71forces are in equilibrium. The FreeViz optimization dialog is used to invoke 
     72the optimization process (:obj:`Optimize Separation`) or execute a single step 
     73of optimization (:obj:`Single Step`). The result of the optimization may depend 
     74on the initial placement of the anchors, which can be set in a circle, 
     75arbitrary or even manually (:obj:`Set anchor positions`). The later also works 
     76at any stage of optimization, and we recommend to play with this option in 
     77order to understand how a change of one anchor affects the positions of the 
     78data points. Controls in :obj:`Forces` box are used to set the parameters that 
     79define the type of the forces between the data points (see [Demsar2007]_). 
     80In any linear projection, projections of unit vector that are very short 
     81compared to the others indicate that their associated attribute is not very 
     82informative for particular classification task. Those vectors, that is, their 
     83corresponding anchors, may be hidden from the visualization using controls in 
     84:obj:`Show anchors` box. 
    4685 
    47 The other two, quite prominent visualization methods, are accessible through FreeViz's :obj:`Dimensionality Reduction` tab (not shown here). These includes supervised principal component analysis and partial least squares method. The general objection of these two approaches is the same as for FreeViz (find a projection that separates data instances of different class), but the results - because of different optimization methods and differences in their bias - may be quite different. 
     86The other two, quite prominent visualization methods, are accessible through 
     87FreeViz's :obj:`Dimensionality Reduction` tab (not shown here). These includes 
     88supervised principal component analysis and partial least squares method. 
     89The general objection of these two approaches is the same as for FreeViz 
     90(find a projection that separates data instances of different class), but the 
     91results - because of different optimization methods and differences in their 
     92bias - may be quite different. 
    4893 
    49 The fourth projection search technique that can be accessed from this widget is VizRank search algorithm with RadViz visualization (Leban et al. (2006)). This is essentially the same visualization and projection search method as implemented in `Radviz <Radviz>`_. 
     94The fourth projection search technique that can be accessed from this widget 
     95is VizRank search algorithm with RadViz visualization ([Leban2006]_). This is 
     96essentially the same visualization and projection search method as implemented 
     97in :ref:`Radviz`. 
    5098 
    51 Like other point-based visualization widget, Linear Projection also includes explorative analysis functions (selection of data instances and zooming). See documentation for :doc:`Scatterplot <scatterplot>` widget for documentation of these as implemented in :obj:`Zoom / Select` toolbox in the :obj:`Main` tab of the widget. 
     99Like other point-based visualization widget, Linear Projection also includes 
     100explorative analysis functions (selection of data instances and zooming). 
     101See documentation for :ref:`Scatter Plot` widget for documentation of these as 
     102implemented in :obj:`Zoom / Select` toolbox in the :obj:`Main` tab of the 
     103widget. 
    52104 
    53105 
     
    55107---------- 
    56108 
    57   - Demsar J, Leban G, Zupan B. FreeViz-An intelligent multivariate visualization approach to explorative analysis of biomedical data. J Biomed Inform 40(6):661-71, 2007. 
    58   - Koren Y, Carmel L. Visualization of labeled data using linear transformations, in: Proceedings of IEEE Information Visualization 2003 (InfoVis'03), 2003. `PDF <http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=3DDF0DB68D8AB9949820A19B0344C1F3?doi=10.1.1.13.8657&rep=rep1&type=pdf>`_ 
    59   - Boulesteix A-L, Strimmer K (2006) Partial least squares: a versatile tool for the analysis of high-dimensional genomic data, Briefings in Bioinformatics 8(1): 32-44. `Abstract <http://bib.oxfordjournals.org/cgi/content/abstract/8/1/32>`_ 
    60   - Leban, G., B. Zupan, et al. (2006). "VizRank: Data Visualization Guided by Machine Learning." Data Mining and Knowledge Discovery 13(2): 119-136. 
     109.. [Demsar2007] Demsar J, Leban G, Zupan B. FreeViz-An intelligent multivariate 
     110   visualization approach to explorative analysis of biomedical data. J Biomed 
     111   Inform 40(6):661-71, 2007. 
     112 
     113.. [Koren2003] Koren Y, Carmel L. Visualization of labeled data using linear 
     114   transformations, in: Proceedings of IEEE Information Visualization 2003 
     115   (InfoVis'03), 2003. `PDF <http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=3DDF0DB68D8AB9949820A19B0344C1F3?doi=10.1.1.13.8657&rep=rep1&type=pdf>`_ 
     116 
     117.. [Boulesteix2007] Boulesteix A-L, Strimmer K (2006) Partial least squares: 
     118   a versatile tool for the analysis of high-dimensional genomic data, 
     119   Briefings in Bioinformatics 8(1): 32-44.  
     120   `Abstract <http://bib.oxfordjournals.org/cgi/content/abstract/8/1/32>`_ 
     121 
     122.. [Leban2006] Leban, G., B. Zupan, et al. (2006). "VizRank: Data Visualization 
     123   Guided by Machine Learning." Data Mining and Knowledge Discovery 13(2): 
     124   119-136. 
  • docs/widgets/rst/visualize/mosaicdisplay.rst

    r11050 r11359  
    1717      A subset of data instances from Examples. 
    1818   - Selected Examples (ExampleTable) 
    19       A subset of examples belonging to manually selected cells in mosaic display. 
     19      A subset of examples belonging to manually selected cells in mosaic 
     20      display. 
    2021 
    2122Outputs: 
     
    2627----------- 
    2728 
    28 The mosaic display is a graphical method to visualize the counts in n-way `contingency tables <http://en.wikipedia.org/wiki/Contingency_table>`_, that is, tables where each cell corresponds to a distinct value-combination of n attributes. The method was proposed by <a href="#HartiganKleiner81" title="Hartigan &amp; Kleiner (1981) Mosaics for contingency tables">Hartigan & Kleiner (1981)</a> and extended in <a href="#Friendly94" title="Friendly (1994) Mosaic displays for multi-way contingency tables">Friendly (1994)</a>. Each cell in mosaic display corresponds to a single cell in contingency table. If the data contains a class attribute, the mosaic display will show the class distribution. 
     29The mosaic display is a graphical method to visualize the counts in n-way 
     30`contingency tables <http://en.wikipedia.org/wiki/Contingency_table>`_, that 
     31is, tables where each cell corresponds to a distinct value-combination of n 
     32attributes. The method was proposed by Hartigan & Kleiner 
     33([HartiganKleiner81]_) and extended by Friendly ([Friendly94]_). Each cell in 
     34mosaic display corresponds to a single cell in contingency table. If the data 
     35contains a class attribute, the mosaic display will show the class 
     36distribution. 
    2937 
    30 Orange's implementation of mosaic display allows to observe the interactions of up to four variables in a single visualization. The snapshot below shows a mosaic display for the Titanic data set, observing three variables (sex, status, and age) and their association with a class (survived). The diagram shows that the survival (red color) was highest for women traveling in the first class, and lowest for men traveling in the second and third class. 
     38Orange's implementation of mosaic display allows to observe the interactions 
     39of up to four variables in a single visualization. The snapshot below shows a 
     40mosaic display for the Titanic data set, observing three variables (sex, 
     41status, and age) and their association with a class (survived). The diagram 
     42shows that the survival (red color) was highest for women traveling in the 
     43first class, and lowest for men traveling in the second and third class. 
    3144 
    3245.. image:: images/MosaicDisplay-Titanic.png 
     46   :alt: Mosiac Display on titanic dataset 
    3347 
    34 This visualization gets slightly more complex - but once getting used to, more informative - if the expected class distribution is shown on the same visualization. For this purpose, a sub-box (:obj:`Use sub-boxes on the left to show...` and below it choose :obj:`Apriori class distribution`. This would plot a bar on the top of every cell displayed, being able to observe the difference between the actual and expected distribution for each cell. Change :obj:`Apriori class distribution` to :obj:`Expected class distribution` to compare the actual distributions to those computed by assuming the independence of attributes. 
     48This visualization gets slightly more complex - but once getting used to, more 
     49informative - if the expected class distribution is shown on the same 
     50visualization. For this purpose, a sub-box 
     51(:obj:`Use sub-boxes on the left to show...` and below it choose 
     52:obj:`Apriori class distribution`. This would plot a bar on the top of every 
     53cell displayed, being able to observe the difference between the actual and 
     54expected distribution for each cell. Change :obj:`Apriori class distribution` 
     55to :obj:`Expected class distribution` to compare the actual distributions to 
     56those computed by assuming the independence of attributes. 
    3557 
    3658.. image:: images/MosaicDisplay-Titanic-Apriori.png 
    3759 
    38 The degree of deviation from aprori class distribution for each cell can be directly visualized using :obj:`Standard Pearson residuals` option (from :obj:`Colors in cells represent ...` box, see snapshot below). On Titanic data set, this visualization clearly shows for which combinations of attributes the changes of survival were highest or lowest. 
     60The degree of deviation from aprori class distribution for each cell can be 
     61directly visualized using :obj:`Standard Pearson residuals` option (from 
     62:obj:`Colors in cells represent ...` box, see snapshot below). On Titanic data 
     63set, this visualization clearly shows for which combinations of attributes the 
     64changes of survival were highest or lowest. 
    3965 
    4066.. image:: images/MosaicDisplay-Titanic-Residuals.png 
     67   :alt: Mosiac Display - Pearson residual 
    4168 
    42 If there are many attributes, finding subsets which would yield interesting mosaic displays is at least cumbersome. Orange implementation includes :obj:`VizRank` (:obj:`Main` tab), which can list provide a list of most interesting subsets with chosen cardinality. Various measures of interestingness are implemented, but in principle all favor displays where at least some cells would exhibit high deviation from the apriori class distributions. 
     69If there are many attributes, finding subsets which would yield interesting 
     70mosaic displays is at least cumbersome. Orange implementation includes 
     71:obj:`VizRank` (:obj:`Main` tab), which can list provide a list of most 
     72interesting subsets with chosen cardinality. Various measures of 
     73interestingness are implemented, but in principle all favor displays where at 
     74least some cells would exhibit high deviation from the apriori class 
     75distributions. 
    4376 
    4477.. image:: images/MosaicDisplay-Titanic-VizRank.png 
     78   :alt: Mosiac Display with Viz Rank 
    4579 
    46 Instead of comparing cell's class distribution to apriori ones, these can be compared to distribution from a subset of instances from the same data domain. The widget uses a separate input channel for this purpose. Notice also that individual cells can be selected/de-selected (clicking with left or right mouse button on the cell), sending out the instances from the selected cells using the :obj:`Selected Examples` channel. 
     80Instead of comparing cell's class distribution to apriori ones, these can be 
     81compared to distribution from a subset of instances from the same data domain. 
     82The widget uses a separate input channel for this purpose. Notice also that 
     83individual cells can be selected/de-selected (clicking with left or right mouse 
     84button on the cell), sending out the instances from the selected cells using 
     85the :obj:`Selected Examples` channel. 
    4786 
    4887References 
    4988---------- 
    5089 
    51    - Hartigan, J. A., and Kleiner, B. (1981).  Mosaics for contingency tables. In W. F. Eddy (Ed.),  Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface. New York: Springer-Verlag. 
    52    - Friendly, M. (1994). Mosaic displays for multi-way contingency tables.  Journal of the American Statistical Association,  89, 190-200. 
     90.. [HartiganKleiner81] Hartigan, J. A., and Kleiner, B. (1981).  Mosaics for 
     91   contingency tables. In W. F. Eddy (Ed.),  Computer Science and Statistics: 
     92   Proceedings of the 13th Symposium on the Interface. New York: 
     93   Springer-Verlag. 
     94 
     95.. [Friendly94] Friendly, M. (1994). Mosaic displays for multi-way contingency 
     96   tables.  Journal of the American Statistical Association,  89, 190-200. 
  • docs/widgets/rst/visualize/parallelcoordinates.rst

    r11050 r11359  
    66.. image:: ../icons/ParallelCoordinates.png 
    77 
    8 Parallel Coordinates visualization with some explorative data analysis and intelligent data visualization enhancements. 
     8Parallel Coordinates visualization with some explorative data analysis and 
     9intelligent data visualization enhancements. 
    910 
    1011Signals 
     
    2223Outputs: 
    2324   - Selected Examples (ExampleTable) 
    24       A subset of examples that user has manually selected from the scatterplot. 
     25      A subset of examples that user has manually selected from the 
     26      scatterplot. 
    2527   - Unselected Examples (ExampleTable) 
    2628      All other examples (examples not included in the user's selection). 
     
    3234----------- 
    3335 
    34 Parallel Coordinates is a multidimensional data visualization technique. Each attribute is represented in a vertical line, where the maximum and minimum values of that dimension are scaled to the upper and lower points on these vertical lines. For N visualized attributes, N-1 lines connected to each vertical line at the appropriate dimensional value represent an N-dimensional point. The snapshot shown below displays data from the Iris data set, with the data instance closest to the cursor being highlighted. In Iris data set, the instances are labeled with one of the three distinct classes, depicted with colored lines in the visualization (red, green, blue). 
     36Parallel Coordinates is a multidimensional data visualization technique. Each 
     37attribute is represented in a vertical line, where the maximum and minimum 
     38values of that dimension are scaled to the upper and lower points on these 
     39vertical lines. For N visualized attributes, N-1 lines connected to each 
     40vertical line at the appropriate dimensional value represent an N-dimensional 
     41point. The snapshot shown below displays data from the Iris data set, with 
     42the data instance closest to the cursor being highlighted. In Iris data set, 
     43the instances are labeled with one of the three distinct classes, depicted with 
     44colored lines in the visualization (red, green, blue). 
    3545 
    3646.. image:: images/ParallelCoordinates-Iris.png 
    3747 
    38 The :obj:`Main` tab allows the user to choose the subset of attributes to be displayed in the visualization. In case of a class-labeled data set, only one class (vs. the others) may be exposed by selecting it from :obj:`Target class`. Especially with data sets that include many attributes, :obj:`Optimization Dialog` may help to find interesting projections. Currently, this is decided based on a correlation between neighboring attributes in the visualization, where the target s to find visualizations with the highest sum of the absolute value of correlations between neighboring attributes. Snapshot below shows such a visualization which uses five attributes and plots the data set from functional genomics (`brown-selected.tab <http://orange.biolab.si/doc/datasets/brown-selected.tab>`_). 
     48The :obj:`Main` tab allows the user to choose the subset of attributes to be 
     49displayed in the visualization. In case of a class-labeled data set, only one 
     50class (vs. the others) may be exposed by selecting it from :obj:`Target class`. 
     51Especially with data sets that include many attributes, 
     52:obj:`Optimization Dialog` may help to find interesting projections. Currently, 
     53this is decided based on a correlation between neighboring attributes in the 
     54visualization, where the target s to find visualizations with the highest sum 
     55of the absolute value of correlations between neighboring attributes. Snapshot 
     56below shows such a visualization which uses five attributes and plots the data 
     57set from functional genomics (**brown-selected.tab**). 
    3958 
    4059.. image:: images/ParallelCoordinates-Optimization.png 
    4160 
    42 The :obj:`Settings` tab is used to control several aspects on visualization. :obj:`Jittering` may be useful when the data includes instances which share many of the attribute values. The graph can be annotated by displaying minimal (bottom) and maximal (top) values of the attributes (:obj:`Show attribute values`). For polygon of each of the data instances can be converted to a spline (:obj:`Show splines`). :obj:`Global value scaling` would make the scale for each of the attributes equal by finding the extreme values across the attributes in the display. That could be useful in a number of applications, such as, for instance, those from functional genomics (the snapshot shown below). :obj:`Line tracking` can highlight the polygon of an instance closest to the mouse pointer (see the topmost snapshot on this page). The option :obj:`Hide pure examples` would draw the data instance polygons from left to right, stopping at the attribute where at a distinct point all the instances would belong to a single class. The setting in :obj:`Statistics` box toggles the drawing of an average or median trajectory. Information on correlation between the neighboring attributes may also be displayed (:obj:`Between-axis labels`). 
     61The :obj:`Settings` tab is used to control several aspects on visualization. 
     62:obj:`Jittering` may be useful when the data includes instances which share 
     63many of the attribute values. The graph can be annotated by displaying minimal 
     64(bottom) and maximal (top) values of the attributes 
     65(:obj:`Show attribute values`). For polygon of each of the data instances can 
     66be converted to a spline (:obj:`Show splines`). :obj:`Global value scaling` 
     67would make the scale for each of the attributes equal by finding the extreme 
     68values across the attributes in the display. That could be useful in a number 
     69of applications, such as, for instance, those from functional genomics (the 
     70snapshot shown below). :obj:`Line tracking` can highlight the polygon of an 
     71instance closest to the mouse pointer (see the topmost snapshot on this page). 
     72The option :obj:`Hide pure examples` would draw the data instance polygons from 
     73left to right, stopping at the attribute where at a distinct point all the 
     74instances would belong to a single class. The setting in :obj:`Statistics` box 
     75toggles the drawing of an average or median trajectory. Information on 
     76correlation between the neighboring attributes may also be displayed 
     77(:obj:`Between-axis labels`). 
    4378 
    4479.. image:: images/ParallelCoordinates-Settings.png 
  • docs/widgets/rst/visualize/polyviz.rst

    r11050 r11359  
    66.. image:: ../icons/Polyviz.png 
    77 
    8 Polyviz visualization with explorative data analysis and intelligent data visualization enhancements. 
     8Polyviz visualization with explorative data analysis and intelligent data 
     9visualization enhancements. 
    910 
    1011Signals 
     
    3233----------- 
    3334 
    34 Polyviz is a visualization technique similar to `Radviz <Radviz.htm>`_, but with a twist: instead of a single fixed attribute anchors, data points are now attracted to anchors with value-dependent positions. Consider the snapshot below, which shows a visualization of Iris data set using three of its attributes. The widget can show anchor lines when the pointer is over one of the data instances; the one shown in the snapshot has high value of petal width, and close to average value for the other two attributes. Notice that anchors (lines stemming from the data point) start in the points that are on attribute lines according to the value of that attribute. Since in this particular visualization we are showing three different attributes, the data instances are placed within the triangle. 
     35Polyviz is a visualization technique similar to :ref:`Radviz`, but with a 
     36twist: instead of a single fixed attribute anchors, data points are now 
     37attracted to anchors with value-dependent positions. Consider the snapshot 
     38below, which shows a visualization of Iris data set using three of its 
     39attributes. The widget can show anchor lines when the pointer is over one 
     40of the data instances; the one shown in the snapshot has high value of petal 
     41width, and close to average value for the other two attributes. Notice that 
     42anchors (lines stemming from the data point) start in the points that are on 
     43attribute lines according to the value of that attribute. Since in this 
     44particular visualization we are showing three different attributes, the data 
     45instances are placed within the triangle. 
    3546 
    3647.. image:: images/Polyviz-Iris.png 
    3748 
    38 Just like other point-based visualizations, Polyviz provides support for explorative data analysis and search for interesting visualizations. For further details on both, see the documentation on   `Scatterplot <Scatterplot.htm>`_ widget. See the documentation on `Radviz <Radviz.htm>`_ for details on various aspects controlled by the :obj:`Settings` tab. The utility of VizRank, an intelligent visualization technique, using `brown-selected.tab <http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ data set is illustrated with a snapshot below. 
     49Just like other point-based visualizations, Polyviz provides support for 
     50explorative data analysis and search for interesting visualizations. For 
     51further details on both, see the documentation on :ref:`Scatter Plot` widget. 
     52See the documentation on :ref:`Radviz` for details on various aspects 
     53controlled by the :obj:`Settings` tab. The utility of VizRank, an intelligent 
     54visualization technique, using `brown-selected.tab 
     55<http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ data set is 
     56illustrated with a snapshot below. 
    3957 
    4058.. image:: images/Polyviz-VizRank.png 
  • docs/widgets/rst/visualize/radviz.rst

    r11050 r11359  
    66.. image:: ../icons/Radviz.png 
    77 
    8 Radviz vizualization with explorative data analysis and intelligent data visualization enhancements. 
     8Radviz vizualization with explorative data analysis and intelligent data 
     9visualization enhancements. 
    910 
    1011Signals 
     
    3233----------- 
    3334 
    34 Radviz (Hoffman et al., 1997) is a neat non-linear multi-dimensional visualization technique that can display data on three or more attributes in a 2-dimensional projection. 
    35 The visualized attributes are presented as anchor points equally spaced around the perimeter of a unit circle. Data instances are shown as points inside the circle, with their positions determined by a 
    36 metaphor from physics: each point is held in place with springs that are attached at the other end to the attribute anchors. The stiffness of each spring is proportional to the value of the corresponding attribute and the point ends up at the position where the spring forces are in equilibrium. Prior to visualization, attribute values are scaled to lie between 0 and 1. Data instances that are close to a set of feature anchors have higher values for these features than for the others. 
     35Radviz ([Hoffman1997]_) is a neat non-linear multi-dimensional visualization 
     36technique that can display data on three or more attributes in a 2-dimensional 
     37projection. The visualized attributes are presented as anchor points equally 
     38spaced around the perimeter of a unit circle. Data instances are shown as 
     39points inside the circle, with their positions determined by a metaphor from 
     40physics: each point is held in place with springs that are attached at the 
     41other end to the attribute anchors. The stiffness of each spring is 
     42proportional to the value of the corresponding attribute and the point ends up 
     43at the position where the spring forces are in equilibrium. Prior to 
     44visualization, attribute values are scaled to lie between 0 and 1. Data 
     45instances that are close to a set of feature anchors have higher values for 
     46these features than for the others. 
    3747 
    38 The snapshot shown below shows a Radviz widget with a visualization of the data set from functional genomics (Brown et al.). In this particular visualization the data instances are colored according to the corresponding class, and the visualization space is colored according to the computed class probability. Notice that the particular visualization very nicely separates the data instances of the different class, making the   visualization interesting and potentially informative. 
     48The snapshot shown below shows a Radviz widget with a visualization of the 
     49data set from functional genomics ([Brown2000]_). In this particular 
     50visualization the data instances are colored according to the corresponding 
     51class, and the visualization space is colored according to the computed class 
     52probability. Notice that the particular visualization very nicely separates 
     53the data instances of the different class, making the visualization interesting 
     54and potentially informative. 
    3955 
    4056.. image:: images/Radviz-Brown.png 
    4157 
    42 To gain further understanding about the placement of the data points in two-dimensional space, it helps to set on the :obj:`Show value lines` and use :obj:`Tooltips show spring values`. We also switched-off the :obj:`Show probabilities` to see the markings associated with data points better. The resulting display is shown below. From it, it should be clear that high values of "spo5 11" attribute (and for some data instances high values of "spo mid") is quite characteristic for instance of class Ribo, which at the same time have comparable lower value of other attributes. High values of heat 20 and diau f are characteristic fir Resp class. See Leban et al. (2006) and Mramor et al. (2007) for further illustrations of utility of Radviz in analysis of this and similar data set from functional genomics. Other options in the :obj:`Settings` tab are quite standard. The :obj:`Point size` controls the size of the points that mark the data instnace. :obj:`Jittering Options` are especially interesting when displaying data with discrete attributes, where many of the data instances would overlap. Same could happen also with continuous attributes if many data instances use the same value of the attributes. :obj:`Scaling Options` can shrink or blow-up the visualization from its central point. From :obj:`General Graph Settings`, which mainly includes standard point-visualization options, let us bring to your attention :obj:`Show value lines` which we used in the visualization below and which tells the widget to annotate each data point with a set of lines, each corresponding with each of the attributes displayed. The length of these lines are proportional to the attribute values (no line if the value is minimal). A slider accompanying this option sets the scale in which the lines are drawn. :obj:`Tooltip Settings` determine which information is being displayed when the pointer gets over the data instance. 
     58To gain further understanding about the placement of the data points in 
     59two-dimensional space, it helps to set on the :obj:`Show value lines` and 
     60use :obj:`Tooltips show spring values`. We also switched-off the 
     61:obj:`Show probabilities` to see the markings associated with data points 
     62better. The resulting display is shown below. From it, it should be clear that 
     63high values of "spo5 11" attribute (and for some data instances high values of 
     64"spo mid") is quite characteristic for instance of class Ribo, which at the 
     65same time have comparable lower value of other attributes. High values of 
     66heat 20 and diau f are characteristic for Resp class. See [Leban2006]_ and 
     67[Mramor2007]_ for further illustrations of utility of Radviz in analysis of 
     68this and similar data set from functional genomics. Other options in the 
     69:obj:`Settings` tab are quite standard. The :obj:`Point size` controls the size 
     70of the points that mark the data instnace. :obj:`Jittering Options` are 
     71especially interesting when displaying data with discrete attributes, where 
     72many of the data instances would overlap. Same could happen also with 
     73continuous attributes if many data instances use the same value of the 
     74attributes. :obj:`Scaling Options` can shrink or blow-up the visualization from 
     75its central point. From :obj:`General Graph Settings`, which mainly includes 
     76standard point-visualization options, let us bring to your attention 
     77:obj:`Show value lines` which we used in the visualization below and which 
     78tells the widget to annotate each data point with a set of lines, each 
     79corresponding with each of the attributes displayed. The length of these lines 
     80are proportional to the attribute values (no line if the value is minimal). 
     81A slider accompanying this option sets the scale in which the lines are drawn. 
     82:obj:`Tooltip Settings` determine which information is being displayed when the 
     83pointer gets over the data instance. 
    4384 
    4485.. image:: images/Radviz-Brown-Springs.png 
    4586 
    46 Just like all point-based visualizations, this widget includes tools for intelligent data visualization (VizRank and FreeViz, see Leban et al. (2006) and <a href="">Demsar et al. (2007)</a>) and interface for explorative data analysis - selection of data points in visualization. Just like in `Scatterplot widget <Scatterplot.htm>`_, intelligent visualization can be used to find a set of attributes that would result in an interesting visualization. For now, this works only with class-labeled data set, where interesting visualizations are those that well separate data instances of different class. Radviz graph above is according to this definition an example of a very good visualization, while the one below - where we show an VizRank's interface (:obj:`VizRank` button in :obj:`Optimization dialogs`) with a list of 5-attribute visualizations and their scores - is not. See documentation of `Scatterplot widget <Scatterplot.htm>`_ for further details on VizRank, and for description of explorative analysis functions (selection of data instances and zooming). 
     87Just like all point-based visualizations, this widget includes tools for 
     88intelligent data visualization (VizRank and FreeViz, see [Leban2006]_) and 
     89[Demsar2007]_) and interface for explorative data analysis - selection of data 
     90points in visualization. Just like in :ref:`Scatter Plot` widget, intelligent 
     91visualization can be used to find a set of attributes that would result in an 
     92interesting visualization. For now, this works only with class-labeled data 
     93set, where interesting visualizations are those that well separate data 
     94instances of different class. Radviz graph above is according to this 
     95definition an example of a very good visualization, while the one below - where 
     96we show an VizRank's interface (:obj:`VizRank` button in 
     97:obj:`Optimization dialogs`) with a list of 5-attribute visualizations and 
     98their scores - is not. See documentation of :ref:`Scatter Plot` widget for 
     99further details on VizRank, and for description of explorative analysis 
     100functions (selection of data instances and zooming). 
    47101 
    48102References 
    49103---------- 
    50104 
    51    - Hoffman,P.E. et al. (1997) DNA visual and analytic data mining. In the Proceedings of the IEEE Visualization. Phoenix, AZ, pp. 437-441. 
    52    - Brown, M. P., W. N. Grundy, et al. (2000). "Knowledge-based analysis of microarray gene expression data by using support vector machines." Proc Natl Acad Sci U S A 97(1): 262-7. 
    53    - Leban, G., B. Zupan, et al. (2006). "VizRank: Data Visualization Guided by Machine Learning." Data Mining and Knowledge Discovery 13(2): 119-136. 
    54    - Demsar J, Leban G, Zupan B. FreeViz-An intelligent multivariate visualization approach to explorative analysis of biomedical data. J Biomed Inform 40(6):661-71, 2007. 
    55    - Mramor M, Leban G, Demsar J, Zupan B. Visualization-based cancer microarray data classification analysis. Bioinformatics 23(16): 2147-2154, 2007. 
     105.. [Hoffman1997] Hoffman,P.E. et al. (1997) DNA visual and analytic data mining. 
     106   In the Proceedings of the IEEE Visualization. Phoenix, AZ, pp. 437-441. 
     107 
     108.. [Brown2000] Brown, M. P., W. N. Grundy, et al. (2000). 
     109   "Knowledge-based analysis of microarray gene expression data by using 
     110   support vector machines." Proc Natl Acad Sci U S A 97(1): 262-7. 
     111 
     112.. [Leban2006] Leban, G., B. Zupan, et al. (2006). "VizRank: Data Visualization 
     113   Guided by Machine Learning." Data Mining and Knowledge Discovery 13(2): 
     114   119-136. 
     115 
     116.. [Demsar2007] Demsar J, Leban G, Zupan B. FreeViz-An intelligent multivariate 
     117   visualization approach to explorative analysis of biomedical data. J Biomed 
     118   Inform 40(6):661-71, 2007. 
     119 
     120.. [Mramor2007] Mramor M, Leban G, Demsar J, Zupan B. Visualization-based 
     121   cancer microarray data classification analysis. Bioinformatics 23(16): 
     122   2147-2154, 2007. 
  • docs/widgets/rst/visualize/scatterplot.rst

    r11050 r11359  
    66.. image:: ../icons/Distributions.png 
    77 
    8 A standard scatterplot visualization with explorative analysis and  intelligent data visualization enhancements. 
     8A standard scatterplot visualization with explorative analysis and  intelligent 
     9data visualization enhancements. 
    910 
    1011Signals 
     
    2021Outputs: 
    2122   - Selected Examples (ExampleTable) 
    22       A subset of examples that user has manually selected from the scatterplot. 
     23      A subset of examples that user has manually selected from the 
     24      scatterplot. 
    2325   - Unselected Examples (ExampleTable) 
    2426      All other examples (examples not included in the user's selection). 
     
    2830----------- 
    2931 
    30 Scatterplot widget provides a standard 2-dimensional scatterplot visualization for both continuous and discrete-valued attributes. The data is displayed as a collection of points, each having the value of :obj:`X-axis attribute` determining the position on the horizontal axis and the value of :obj:`Y-axis attribute` determining the position on the vertical axis. Various properties of the graph, like color, size and shape of the  points are controlled through the appropriate setting in the :obj:`Main` pane of the widget, while other (like legends and axis titles, maximum point size and jittering) are set in the :obj:`Settings` pane. A snapshot below shows a scatterplot of an Iris data set, with the size of the points proportional to the value of sepal width attribute, and coloring matching that of the class attribute. 
     32Scatterplot widget provides a standard 2-dimensional scatterplot visualization 
     33for both continuous and discrete-valued attributes. The data is displayed as a 
     34collection of points, each having the value of :obj:`X-axis attribute` 
     35determining the position on the horizontal axis and the value of 
     36:obj:`Y-axis attribute` determining the position on the vertical axis. 
     37Various properties of the graph, like color, size and shape of the  points are 
     38controlled through the appropriate setting in the :obj:`Main` pane of the 
     39widget, while other (like legends and axis titles, maximum point size and 
     40jittering) are set in the :obj:`Settings` pane. A snapshot below shows a 
     41scatterplot of an Iris data set, with the size of the points proportional to 
     42the value of sepal width attribute, and coloring matching that of the class 
     43attribute. 
    3144 
    3245.. image:: images/Scatterplot-Iris.png 
    3346 
    34 In the case of discrete attributes, jittering (:obj:`Jittering options` ) should be used to circumvent the overlap of the points with the same value for both axis, and to obtain a plot where density of the points in particular region corresponds better to the density of the data with that particular combination of values. As an example of such a plot, the scatterplot for the Titanic data reporting on the gender of the passenger and the traveling class is shown below; withouth jittering, scatterplot would display only eight distinct points. 
     47In the case of discrete attributes, jittering (:obj:`Jittering options` ) 
     48should be used to circumvent the overlap of the points with the same value for 
     49both axis, and to obtain a plot where density of the points in particular 
     50region corresponds better to the density of the data with that particular 
     51combination of values. As an example of such a plot, the scatterplot for the 
     52Titanic data reporting on the gender of the passenger and the traveling class 
     53is shown below; withouth jittering, scatterplot would display only eight 
     54distinct points. 
    3555 
    3656.. image:: images/Scatterplot-Titanic.png 
    3757 
    38 Most of the scatterplot options are quite standard, like those for selecting attributes for point colors, labels, shape and size (:obj:`Main` pane), or those that control the display of various elements in the graph like axis title, grid lines, etc. (:obj:`Settings` pane). Beyond these, the Orange's scatterplot also implements an intelligent visualization technique called VizRank that is invoked through :obj:`VizRank` button in :obj:`Main` tab. 
     58Most of the scatterplot options are quite standard, like those for selecting 
     59attributes for point colors, labels, shape and size (:obj:`Main` pane), or 
     60those that control the display of various elements in the graph like axis 
     61title, grid lines, etc. (:obj:`Settings` pane). Beyond these, the Orange's 
     62scatterplot also implements an intelligent visualization technique called 
     63VizRank that is invoked through :obj:`VizRank` button in :obj:`Main` tab. 
    3964 
    4065Intelligent Data Visualization 
    4166 
    42 If a data set has many (many!) attributes, it is impossible to manually scan through all the pairs of attributes to find interesting scatterplots. Intelligent data visualizations techniques are about finding such visualizations automatically. Orange's Scatterplot includes one such tool called VizRank <a href="#Leban2006" title="">(Leban et al., 2006)</a>, that can be in current implementation used only with classification data sets, that is, data sets where instances are labeled with a discrete class. The task of optimization is to find those scatterplot projections, where instances with different class labels are well separated. For example, for a data set `brown-selected.tab <http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ (comes with Orange installation) the two attributes that best separate instances of different class are displayed in the snapshot below, where we have also switched on the :obj:`Show Probabilities` option from Scatterplot's :obj:`Settings` pane. Notice that this projection appears at the top of :obj:`Projection list, most interesting first`, followed by a list of other potentially interesting projections. Selecting each of these would change the projection displayed in the scatterplot, so the list and associated projections can be inspected in this way. 
     67If a data set has many attributes, it is impossible to manually scan through 
     68all the pairs of attributes to find interesting scatterplots. Intelligent data 
     69visualizations techniques are about finding such visualizations automatically. 
     70Orange's Scatterplot includes one such tool called VizRank ([Leban2006]_), that 
     71can be in current implementation used only with classification data sets, that 
     72is, data sets where instances are labeled with a discrete class. The task of 
     73optimization is to find those scatterplot projections, where instances with 
     74different class labels are well separated. For example, for a data set  
     75`brown-selected.tab <http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ 
     76(comes with Orange installation) the two attributes that best separate 
     77instances of different class are displayed in the snapshot below, where we have 
     78also switched on the :obj:`Show Probabilities` option from Scatterplot's 
     79:obj:`Settings` pane. Notice that this projection appears at the top of 
     80:obj:`Projection list, most interesting first`, followed by a list of 
     81other potentially interesting projections. Selecting each of these would 
     82change the projection displayed in the scatterplot, so the list and associated 
     83projections can be inspected in this way. 
    4384 
    4485.. image:: images/Scatterplot-VizRank-Brown.png 
    4586 
    46 The number of different projections that can be considered by VizRank may be quite high. VizRank searches the space of possible projections heuristically. The search is invoked by pressing :obj:`Start Evaluating Projections`, which may be stopped anytime. Search through modification of top-rated projections (replacing one of the two attributes with another one) is invoked by pressing a :obj:`Locally Optimize Best Projections` button. 
     87The number of different projections that can be considered by VizRank may be 
     88quite high. VizRank searches the space of possible projections heuristically. 
     89The search is invoked by pressing :obj:`Start Evaluating Projections`, which 
     90may be stopped anytime. Search through modification of top-rated projections 
     91(replacing one of the two attributes with another one) is invoked by pressing a 
     92:obj:`Locally Optimize Best Projections` button. 
    4793 
    4894.. image:: images/Scatterplot-VizRank-Settings.png 
    49  
    50 <td valign="top"> 
    51 VizRank's options are quite elaborate, and if you are not the expert in machine learning it would be best to leave them at their defaults. The options are grouped according to the different aspects of the methods as described in <a href="#Leban2006" title="">(Leban et al., 2006)</a>. The projections are evaluated through testing a selected classifier (:obj:`Projection evaluation method` default is k-nearest neighbor classification) using some standard evaluation technique (:obj:`Testing method`). For very large data set use sampling to speed-up the evaluation (:obj:`Percent of data used`). Visualizations will then be ranked according to the prediction accuracy (:obj:`Measure of classification success`), in our own tests :obj:`Average Probability Assigned to the Correct Class` worked somehow better than more standard measures like :obj:`Classification Accuracy` or :obj:`Brier Score`. To avoid exhaustive search for data sets with many attributes, these are ranked by heuristics (:obj:`Measure for attribute ranking`), testing most likely projection candidates first. Number of items in the list of projections is controlled in :obj:`Maximum length of projection list`. 
    52 </tr></table> 
    53  
    54 A set of tools that deals with management and post-analysis of list of projections is available under :obj:`Manage &amp; Save` tab. Here you may decide which classes the visualizations should separate (default set to separation of all the classes). Projection list can saved (:obj:`Save` in :obj:`Manage projections` group), loaded (:obj:`Load`), a set of best visualizations may be saved (:obj:`Saved Best Graphs`). :obj:`Reevalutate Projections` is used when you have loaded the list of best projections from file, but the actual data has changed since the last evaluation. For evaluating the current projection without engaging the projection search there is an :obj:`Evaluate Projection` button. Projections are evaluated based on performance of k-nearest neighbor classifiers, and the results of these evaluations in terms of which data instances were correctly or incorrectly classified is available through the two :obj:`Show k-NN` buttons. 
     95   :align: left 
     96 
     97VizRank's options are quite elaborate, and if you are not the expert in machine 
     98learning it would be best to leave them at their defaults. The options are 
     99grouped according to the different aspects of the methods as described in 
     100[Leban2006]_. The projections are evaluated through testing a selected 
     101classifier (:obj:`Projection evaluation method` default is k-nearest neighbor 
     102classification) using some standard evaluation technique 
     103(:obj:`Testing method`). For very large data set use sampling to speed-up the 
     104evaluation (:obj:`Percent of data used`). Visualizations will then be ranked 
     105according to the prediction accuracy (:obj:`Measure of classification success` 
     106), in our own tests :obj:`Average Probability Assigned to the Correct Class` 
     107worked somehow better than more standard measures like 
     108:obj:`Classification Accuracy` or :obj:`Brier Score`. To avoid exhaustive 
     109search for data sets with many attributes, these are ranked by heuristics 
     110(:obj:`Measure for attribute ranking`), testing most likely projection 
     111candidates first. Number of items in the list of projections is controlled in 
     112:obj:`Maximum length of projection list`. 
     113 
    55114 
    56115.. image:: images/Scatterplot-VizRank-ManageSave.png 
    57  
    58 Based on a set of interesting projections found by VizRank, a number of post-analysis tools is available. :obj:`Attribute Ranking` displays a graph which show how many times the attributes appear in the top-rated projections. Bars can be colored according to the class with maximal average value of the attribute. :obj:`Attribute Interactions` displays a heat map showing how many times the two attributes appeared in the top-rated projections. :obj:`Graph Projection Scores` displays the distribution of projection scores. 
     116   :align: left 
     117 
     118A set of tools that deals with management and post-analysis of list of 
     119projections is available under :obj:`Manage & Save` tab. Here you may decide 
     120which classes the visualizations should separate (default set to separation of 
     121all the classes). Projection list can saved (:obj:`Save` in 
     122:obj:`Manage projections` group), loaded (:obj:`Load`), a set of best 
     123visualizations may be saved (:obj:`Saved Best Graphs`). 
     124:obj:`Reevalutate Projections` is used when you have loaded the list of best 
     125projections from file, but the actual data has changed since the last 
     126evaluation. For evaluating the current projection without engaging the 
     127projection search there is an :obj:`Evaluate Projection` button. Projections 
     128are evaluated based on performance of k-nearest neighbor classifiers, and the 
     129results of these evaluations in terms of which data instances were correctly or 
     130incorrectly classified is available through the two :obj:`Show k-NN` buttons. 
     131 
     132 
     133Based on a set of interesting projections found by VizRank, a number of 
     134post-analysis tools is available. :obj:`Attribute Ranking` displays a graph 
     135which show how many times the attributes appear in the top-rated projections. 
     136Bars can be colored according to the class with maximal average value of the 
     137attribute. :obj:`Attribute Interactions` displays a heat map showing how many 
     138times the two attributes appeared in the top-rated projections. 
     139:obj:`Graph Projection Scores` displays the distribution of projection scores. 
    59140 
    60141.. image:: images/Scatterplot-VizRank-AttributeHistogram.png 
     
    64145.. image:: images/Scatterplot-VizRank-Scores.png 
    65146 
    66 List of best-rated projections may also be used for the search and analysis of outliers. The idea is that the outliers are those data instances, which are incorrectly classified in many of the top visualizations. For example, the class of the 33-rd instance in `brown-selected.tab <http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ should be Resp, but this instance is quite often misclassified as Ribo. The snapshot below shows one particular visualization displaying why such misclassification occurs. Perhaps the most important part of the :obj:`Outlier Identification` window is a list in the lower left (:obj:`Show predictions for all examples`) with a list of candidates for outliers sorted by the probabilities of classification to the right class. In our case, the most likely outlier is the instance 171, followed by an instance 33, both with probabilities of classification to the right class below 0.5. 
     147List of best-rated projections may also be used for the search and analysis of 
     148outliers. The idea is that the outliers are those data instances, which are 
     149incorrectly classified in many of the top visualizations. For example, the 
     150class of the 33-rd instance in `brown-selected.tab 
     151<http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ should be Resp, 
     152but this instance is quite often misclassified as Ribo. The snapshot below 
     153shows one particular visualization displaying why such misclassification 
     154occurs. Perhaps the most important part of the :obj:`Outlier Identification` 
     155window is a list in the lower left (:obj:`Show predictions for all examples`) 
     156with a list of candidates for outliers sorted by the probabilities of 
     157classification to the right class. In our case, the most likely outlier is the 
     158instance 171, followed by an instance 33, both with probabilities of 
     159classification to the right class below 0.5. 
    67160 
    68161.. image:: images/Scatterplot-VizRank-Outliers.png 
     
    72165.. image:: images/Scatterplot-ZoomSelect.png 
    73166 
    74 Scatterplot, together with the rest of the Orange's widget, provides for a explorative data analysis environment by supporting zooming-in and out of the part of the plot and selection of data instances. These functions are enabled through :obj:`Zoom/Select` toolbox. The default tool is zoom: left-click and drag on the plot area defines the rectangular are to zoom-in. Right click to zoom out. Next two buttons in this tool bar are rectangular and polygon selection. Selections are stacked and can be removed in order from the last one defined, or all at once (back-arrow and cross button from the tool bar). The last button in the tool bar is used to resend the data from this widget. Since this is done automatically after every change of the selection, this last function is not particularly useful. An example of a simple schema where we selected data instances from two polygon regions and send them to the Data Table widget is shown below. Notice that by counting the dots from the scatterplot there should be 12 data instances selected, whereas the data table shows 17. This is because some data instances overlap (have the same value of the two attributes used) - we could use Jittering to expose them. 
     167Scatterplot, together with the rest of the Orange's widget, provides for a 
     168explorative data analysis environment by supporting zooming-in and out of the 
     169part of the plot and selection of data instances. These functions are enabled 
     170through :obj:`Zoom/Select` toolbox. The default tool is zoom: left-click and 
     171drag on the plot area defines the rectangular are to zoom-in. Right click to 
     172zoom out. Next two buttons in this tool bar are rectangular and polygon 
     173selection. Selections are stacked and can be removed in order from the last 
     174one defined, or all at once (back-arrow and cross button from the tool bar). 
     175The last button in the tool bar is used to resend the data from this widget. 
     176Since this is done automatically after every change of the selection, this 
     177last function is not particularly useful. An example of a simple schema where 
     178we selected data instances from two polygon regions and send them to the 
     179:ref:`Data Table` widget is shown below. Notice that by counting the dots from 
     180the scatterplot there should be 12 data instances selected, whereas the data 
     181table shows 17. This is because some data instances overlap (have the same 
     182value of the two attributes used) - we could use Jittering to expose them. 
    75183 
    76184.. image:: images/Scatterplot-Iris-Selection.png 
     
    80188-------- 
    81189 
    82 Scatterplot can be nicely combined with other widgets that output a list of selected data instances. For example, a combination of classification tree and scatterplot, as shown below, makes for a nice exploratory tool displaying data instances pertinent to a chosen classification tree node (clicking on any node of classification tree would send a set of selected data instances to scatterplot, updating the visualization and marking selected instances with filled symbols). 
     190Scatterplot can be nicely combined with other widgets that output a list of 
     191selected data instances. For example, a combination of classification tree and 
     192scatterplot, as shown below, makes for a nice exploratory tool displaying data 
     193instances pertinent to a chosen classification tree node (clicking on any node 
     194of classification tree would send a set of selected data instances to 
     195scatterplot, updating the visualization and marking selected instances with 
     196filled symbols). 
    83197 
    84198.. image:: images/Scatterplot-ClassificationTree.png 
     
    88202---------- 
    89203 
    90    - Leban G, Zupan B, Vidmar G, Bratko I. VizRank: Data Visualization Guided by Machine Learning. Data Mining and Knowledge Discovery 13(2): 119-136, 2006. 
    91    - Mramor M, Leban G, Demsar J, Zupan B. Visualization-based cancer microarray data classification analysis. Bioinformatics 23(16): 2147-2154, 2007. 
     204.. [Leban2006] Leban G, Zupan B, Vidmar G, Bratko I. VizRank: Data 
     205   Visualization Guided by Machine Learning. Data Mining and Knowledge 
     206   Discovery 13(2): 119-136, 2006. 
  • docs/widgets/rst/visualize/sievediagram.rst

    r11050 r11359  
    2323----------- 
    2424 
    25 A sieve diagram is a graphical method for 
    26 visualizing the frequencies in a two-way contingency table 
    27 and comparing them  
    28 to the expected frequencies under assumtion of independence. The sieve diagram was proposed by Riedwyl and Schüpbach in a technical report in 1983 and later called a parquet diagram <a href="#Riedwyl and Schüpbach" title="">(Riedwyl and Schüpbach, 1994)</a>. In this display the area of each rectangle is proportional to expected frequency and observed frequency is shown by the number of squares in each rectangle. The difference between observed and expected frequency (proportional to standard Pearson residual) appears as the density of shading, using color to indicate whether the deviation from independence is positive (blue) or negative (red). 
     25A sieve diagram is a graphical method for visualizing the frequencies in a 
     26two-way contingency table and comparing them to the expected frequencies under 
     27assumtion of independence. The sieve diagram was proposed by Riedwyl and 
     28Schüpbach in a technical report in 1983 and later called a parquet diagram 
     29([Riedwy1994]_). In this display the area of each rectangle is proportional to 
     30expected frequency and observed frequency is shown by the number of squares in 
     31each rectangle. The difference between observed and expected frequency 
     32(proportional to standard Pearson residual) appears as the density of shading, 
     33using color to indicate whether the deviation from independence is positive 
     34(blue) or negative (red). 
    2935 
    30 The snapshot below shows a sieve diagram for Titanic data set and attributes sex and survived (the later is actually a class attribute in this data set). The plot shows that the two variables are highly associated, as there are substantial differences between observed and expected frequencies in all of the four quadrants. For example and as highlighted in a balloon, the chance for not surviving the accident was for female passengers much lower than expected (0.05 vs. 0.14). 
     36The snapshot below shows a sieve diagram for Titanic data set and attributes 
     37sex and survived (the later is actually a class attribute in this data set). 
     38The plot shows that the two variables are highly associated, as there are 
     39substantial differences between observed and expected frequencies in all of the 
     40four quadrants. For example and as highlighted in a balloon, the chance for 
     41not surviving the accident was for female passengers much lower than expected 
     42(0.05 vs. 0.14). 
    3143 
    3244.. image:: images/SieveDiagram-Titanic.png 
    3345 
    34 Orange can help to identify pairs of attributes with interesting associations. Such attribute pairs are upon request (:obj:`Calculate Chi Squares`) listed in :obj:`Interesting attribute pair`. As it turns out, the most interesting attribute pair in Titanic data set is indeed the one we show in the above snapshot. For a contrast, the sieve diagram of the least interesting pair (age vs. survival) is shown below. 
     46Orange can help to identify pairs of attributes with interesting associations. 
     47Such attribute pairs are upon request (:obj:`Calculate Chi Squares`) listed 
     48in :obj:`Interesting attribute pair`. As it turns out, the most interesting 
     49attribute pair in Titanic data set is indeed the one we show in the above 
     50snapshot. For a contrast, the sieve diagram of the least interesting pair 
     51(age vs. survival) is shown below. 
    3552 
    3653.. image:: images/SieveDiagram-Titanic-age-survived.png 
     
    3956---------- 
    4057 
    41   - Riedwyl, H., and Schüpbach, M. (1994). Parquet diagram to plot contingency tables. In  Softstat '93: Advances in Statistical Software, F. Faulbaum (Ed.). New York: Gustav Fischer, 293-299. 
     58.. [Riedwy1994] Riedwyl, H., and Schüpbach, M. (1994). Parquet diagram to plot 
     59   contingency tables. In  Softstat '93: Advances in Statistical Software, 
     60   F. Faulbaum (Ed.). New York: Gustav Fischer, 293-299. 
  • docs/widgets/rst/visualize/surveyplot.rst

    r11050 r11359  
    2626----------- 
    2727 
    28 A survey plot is a simple multi-attribute visualization technique that can help to spot correlations between any two variables especially when the data is sorted according to a particular dimension. Each horizontal splice in a plot corresponds to a particular data instance. The data on a specific attribute is shown in a single column, where the length of the line corresponds to the dimensional value. When data includes a discrete or continuous class, the slices (data instances) are colored correspondingly. 
     28A survey plot is a simple multi-attribute visualization technique that can help 
     29to spot correlations between any two variables especially when the data is 
     30sorted according to a particular dimension. Each horizontal splice in a plot 
     31corresponds to a particular data instance. The data on a specific attribute is 
     32shown in a single column, where the length of the line corresponds to the 
     33dimensional value. When data includes a discrete or continuous class, the 
     34slices (data instances) are colored correspondingly. 
    2935 
    30 Implementation in Orange supports sorting by two selected attributes (:obj:`Sorting`). The attributes shown in the plot are listed in :obj:`Shown attributes` box, all other appear in the list of :obj:`Hidden attributes`. 
     36Implementation in Orange supports sorting by two selected attributes 
     37(:obj:`Sorting`). The attributes shown in the plot are listed in 
     38:obj:`Shown attributes` box, all other appear in the list of 
     39:obj:`Hidden attributes`. 
    3140 
    32 Below is a snapshot of survey plot widget for an `Iris data set </doc/datasets/iris.tab>`_. Plot nicely shows that petal width and length and sepal length are correlated. It is also very clear that Iris-setosa can be classified based on petal length or width alone, while for the Iris versicolor and virginica there is some ambiguity with some potential outliers, one of which is highlighted in the snapshot. 
     41Below is a snapshot of survey plot widget for an Iris. Plot nicely shows that 
     42petal width and length and sepal length are correlated. It is also very clear 
     43that Iris-setosa can be classified based on petal length or width alone, while 
     44for the Iris versicolor and virginica there is some ambiguity with some 
     45potential outliers, one of which is highlighted in the snapshot. 
    3346 
    3447.. image:: images/SurveyPlot-Iris.png 
    3548 
    36 Values of the attributes may be scaled independently, for each attribute at the time, or globally, using the information from all of the attributes This option is controlled through :obj:`Global values scaling` check box. Switching it on results in a plot shown below; notice that the leafs have smalled widths than lengths, and the ratio is bigger with petal leafs. With :obj:`Example tracking` on, mousing over the plot would bring a box around the row representing a single data instance, which would with :obj:`Show legend` display the information about the values of particular instance (just like in the snapshot aboce). The attributes appearing in the attribute list may be ordered according to selected criteria. :obj:`Tooltips settings` controls the amount of information displayed in the data instance balloon (essentially, one can change between including the information of only visualized attributes, or all the attributes in the data set). 
     49Values of the attributes may be scaled independently, for each attribute at 
     50the time, or globally, using the information from all of the attributes This 
     51option is controlled through :obj:`Global values scaling` check box. Switching 
     52it on results in a plot shown below; notice that the leafs have smalled widths 
     53than lengths, and the ratio is bigger with petal leafs. With 
     54:obj:`Example tracking` on, mousing over the plot would bring a box around the 
     55row representing a single data instance, which would with :obj:`Show legend` 
     56display the information about the values of particular instance (just like in 
     57the snapshot aboce). The attributes appearing in the attribute list may be 
     58ordered according to selected criteria. :obj:`Tooltips settings` controls the 
     59amount of information displayed in the data instance balloon (essentially, one 
     60can change between including the information of only visualized attributes, or 
     61all the attributes in the data set). 
    3762 
    3863.. image:: images/SurveyPlot-Settings.png 
  • setup.py

    r11306 r11362  
    8080 
    8181if sys.platform == 'darwin': 
    82     extra_compile_args = '-fPIC -fpermissive -fno-common -w -DDARWIN'.split() 
     82    extra_compile_args = '-fPIC -fno-common -w -DDARWIN'.split() 
    8383    extra_link_args = '-headerpad_max_install_names -undefined dynamic_lookup'.split()