Ignore:
Timestamp:
02/27/13 15:02:50 (14 months ago)
Author:
Ales Erjavec <ales.erjavec@…>
Branch:
default
Message:

Cleanup of 'Widget catalog' documentation.

Fixed rst text formating, replaced dead hardcoded reference links (now using
:ref:), etc.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/widgets/rst/visualize/scatterplot.rst

    r11050 r11359  
    66.. image:: ../icons/Distributions.png 
    77 
    8 A standard scatterplot visualization with explorative analysis and  intelligent data visualization enhancements. 
     8A standard scatterplot visualization with explorative analysis and  intelligent 
     9data visualization enhancements. 
    910 
    1011Signals 
     
    2021Outputs: 
    2122   - Selected Examples (ExampleTable) 
    22       A subset of examples that user has manually selected from the scatterplot. 
     23      A subset of examples that user has manually selected from the 
     24      scatterplot. 
    2325   - Unselected Examples (ExampleTable) 
    2426      All other examples (examples not included in the user's selection). 
     
    2830----------- 
    2931 
    30 Scatterplot widget provides a standard 2-dimensional scatterplot visualization for both continuous and discrete-valued attributes. The data is displayed as a collection of points, each having the value of :obj:`X-axis attribute` determining the position on the horizontal axis and the value of :obj:`Y-axis attribute` determining the position on the vertical axis. Various properties of the graph, like color, size and shape of the  points are controlled through the appropriate setting in the :obj:`Main` pane of the widget, while other (like legends and axis titles, maximum point size and jittering) are set in the :obj:`Settings` pane. A snapshot below shows a scatterplot of an Iris data set, with the size of the points proportional to the value of sepal width attribute, and coloring matching that of the class attribute. 
     32Scatterplot widget provides a standard 2-dimensional scatterplot visualization 
     33for both continuous and discrete-valued attributes. The data is displayed as a 
     34collection of points, each having the value of :obj:`X-axis attribute` 
     35determining the position on the horizontal axis and the value of 
     36:obj:`Y-axis attribute` determining the position on the vertical axis. 
     37Various properties of the graph, like color, size and shape of the  points are 
     38controlled through the appropriate setting in the :obj:`Main` pane of the 
     39widget, while other (like legends and axis titles, maximum point size and 
     40jittering) are set in the :obj:`Settings` pane. A snapshot below shows a 
     41scatterplot of an Iris data set, with the size of the points proportional to 
     42the value of sepal width attribute, and coloring matching that of the class 
     43attribute. 
    3144 
    3245.. image:: images/Scatterplot-Iris.png 
    3346 
    34 In the case of discrete attributes, jittering (:obj:`Jittering options` ) should be used to circumvent the overlap of the points with the same value for both axis, and to obtain a plot where density of the points in particular region corresponds better to the density of the data with that particular combination of values. As an example of such a plot, the scatterplot for the Titanic data reporting on the gender of the passenger and the traveling class is shown below; withouth jittering, scatterplot would display only eight distinct points. 
     47In the case of discrete attributes, jittering (:obj:`Jittering options` ) 
     48should be used to circumvent the overlap of the points with the same value for 
     49both axis, and to obtain a plot where density of the points in particular 
     50region corresponds better to the density of the data with that particular 
     51combination of values. As an example of such a plot, the scatterplot for the 
     52Titanic data reporting on the gender of the passenger and the traveling class 
     53is shown below; withouth jittering, scatterplot would display only eight 
     54distinct points. 
    3555 
    3656.. image:: images/Scatterplot-Titanic.png 
    3757 
    38 Most of the scatterplot options are quite standard, like those for selecting attributes for point colors, labels, shape and size (:obj:`Main` pane), or those that control the display of various elements in the graph like axis title, grid lines, etc. (:obj:`Settings` pane). Beyond these, the Orange's scatterplot also implements an intelligent visualization technique called VizRank that is invoked through :obj:`VizRank` button in :obj:`Main` tab. 
     58Most of the scatterplot options are quite standard, like those for selecting 
     59attributes for point colors, labels, shape and size (:obj:`Main` pane), or 
     60those that control the display of various elements in the graph like axis 
     61title, grid lines, etc. (:obj:`Settings` pane). Beyond these, the Orange's 
     62scatterplot also implements an intelligent visualization technique called 
     63VizRank that is invoked through :obj:`VizRank` button in :obj:`Main` tab. 
    3964 
    4065Intelligent Data Visualization 
    4166 
    42 If a data set has many (many!) attributes, it is impossible to manually scan through all the pairs of attributes to find interesting scatterplots. Intelligent data visualizations techniques are about finding such visualizations automatically. Orange's Scatterplot includes one such tool called VizRank <a href="#Leban2006" title="">(Leban et al., 2006)</a>, that can be in current implementation used only with classification data sets, that is, data sets where instances are labeled with a discrete class. The task of optimization is to find those scatterplot projections, where instances with different class labels are well separated. For example, for a data set `brown-selected.tab <http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ (comes with Orange installation) the two attributes that best separate instances of different class are displayed in the snapshot below, where we have also switched on the :obj:`Show Probabilities` option from Scatterplot's :obj:`Settings` pane. Notice that this projection appears at the top of :obj:`Projection list, most interesting first`, followed by a list of other potentially interesting projections. Selecting each of these would change the projection displayed in the scatterplot, so the list and associated projections can be inspected in this way. 
     67If a data set has many attributes, it is impossible to manually scan through 
     68all the pairs of attributes to find interesting scatterplots. Intelligent data 
     69visualizations techniques are about finding such visualizations automatically. 
     70Orange's Scatterplot includes one such tool called VizRank ([Leban2006]_), that 
     71can be in current implementation used only with classification data sets, that 
     72is, data sets where instances are labeled with a discrete class. The task of 
     73optimization is to find those scatterplot projections, where instances with 
     74different class labels are well separated. For example, for a data set  
     75`brown-selected.tab <http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ 
     76(comes with Orange installation) the two attributes that best separate 
     77instances of different class are displayed in the snapshot below, where we have 
     78also switched on the :obj:`Show Probabilities` option from Scatterplot's 
     79:obj:`Settings` pane. Notice that this projection appears at the top of 
     80:obj:`Projection list, most interesting first`, followed by a list of 
     81other potentially interesting projections. Selecting each of these would 
     82change the projection displayed in the scatterplot, so the list and associated 
     83projections can be inspected in this way. 
    4384 
    4485.. image:: images/Scatterplot-VizRank-Brown.png 
    4586 
    46 The number of different projections that can be considered by VizRank may be quite high. VizRank searches the space of possible projections heuristically. The search is invoked by pressing :obj:`Start Evaluating Projections`, which may be stopped anytime. Search through modification of top-rated projections (replacing one of the two attributes with another one) is invoked by pressing a :obj:`Locally Optimize Best Projections` button. 
     87The number of different projections that can be considered by VizRank may be 
     88quite high. VizRank searches the space of possible projections heuristically. 
     89The search is invoked by pressing :obj:`Start Evaluating Projections`, which 
     90may be stopped anytime. Search through modification of top-rated projections 
     91(replacing one of the two attributes with another one) is invoked by pressing a 
     92:obj:`Locally Optimize Best Projections` button. 
    4793 
    4894.. image:: images/Scatterplot-VizRank-Settings.png 
    49  
    50 <td valign="top"> 
    51 VizRank's options are quite elaborate, and if you are not the expert in machine learning it would be best to leave them at their defaults. The options are grouped according to the different aspects of the methods as described in <a href="#Leban2006" title="">(Leban et al., 2006)</a>. The projections are evaluated through testing a selected classifier (:obj:`Projection evaluation method` default is k-nearest neighbor classification) using some standard evaluation technique (:obj:`Testing method`). For very large data set use sampling to speed-up the evaluation (:obj:`Percent of data used`). Visualizations will then be ranked according to the prediction accuracy (:obj:`Measure of classification success`), in our own tests :obj:`Average Probability Assigned to the Correct Class` worked somehow better than more standard measures like :obj:`Classification Accuracy` or :obj:`Brier Score`. To avoid exhaustive search for data sets with many attributes, these are ranked by heuristics (:obj:`Measure for attribute ranking`), testing most likely projection candidates first. Number of items in the list of projections is controlled in :obj:`Maximum length of projection list`. 
    52 </tr></table> 
    53  
    54 A set of tools that deals with management and post-analysis of list of projections is available under :obj:`Manage &amp; Save` tab. Here you may decide which classes the visualizations should separate (default set to separation of all the classes). Projection list can saved (:obj:`Save` in :obj:`Manage projections` group), loaded (:obj:`Load`), a set of best visualizations may be saved (:obj:`Saved Best Graphs`). :obj:`Reevalutate Projections` is used when you have loaded the list of best projections from file, but the actual data has changed since the last evaluation. For evaluating the current projection without engaging the projection search there is an :obj:`Evaluate Projection` button. Projections are evaluated based on performance of k-nearest neighbor classifiers, and the results of these evaluations in terms of which data instances were correctly or incorrectly classified is available through the two :obj:`Show k-NN` buttons. 
     95   :align: left 
     96 
     97VizRank's options are quite elaborate, and if you are not the expert in machine 
     98learning it would be best to leave them at their defaults. The options are 
     99grouped according to the different aspects of the methods as described in 
     100[Leban2006]_. The projections are evaluated through testing a selected 
     101classifier (:obj:`Projection evaluation method` default is k-nearest neighbor 
     102classification) using some standard evaluation technique 
     103(:obj:`Testing method`). For very large data set use sampling to speed-up the 
     104evaluation (:obj:`Percent of data used`). Visualizations will then be ranked 
     105according to the prediction accuracy (:obj:`Measure of classification success` 
     106), in our own tests :obj:`Average Probability Assigned to the Correct Class` 
     107worked somehow better than more standard measures like 
     108:obj:`Classification Accuracy` or :obj:`Brier Score`. To avoid exhaustive 
     109search for data sets with many attributes, these are ranked by heuristics 
     110(:obj:`Measure for attribute ranking`), testing most likely projection 
     111candidates first. Number of items in the list of projections is controlled in 
     112:obj:`Maximum length of projection list`. 
     113 
    55114 
    56115.. image:: images/Scatterplot-VizRank-ManageSave.png 
    57  
    58 Based on a set of interesting projections found by VizRank, a number of post-analysis tools is available. :obj:`Attribute Ranking` displays a graph which show how many times the attributes appear in the top-rated projections. Bars can be colored according to the class with maximal average value of the attribute. :obj:`Attribute Interactions` displays a heat map showing how many times the two attributes appeared in the top-rated projections. :obj:`Graph Projection Scores` displays the distribution of projection scores. 
     116   :align: left 
     117 
     118A set of tools that deals with management and post-analysis of list of 
     119projections is available under :obj:`Manage & Save` tab. Here you may decide 
     120which classes the visualizations should separate (default set to separation of 
     121all the classes). Projection list can saved (:obj:`Save` in 
     122:obj:`Manage projections` group), loaded (:obj:`Load`), a set of best 
     123visualizations may be saved (:obj:`Saved Best Graphs`). 
     124:obj:`Reevalutate Projections` is used when you have loaded the list of best 
     125projections from file, but the actual data has changed since the last 
     126evaluation. For evaluating the current projection without engaging the 
     127projection search there is an :obj:`Evaluate Projection` button. Projections 
     128are evaluated based on performance of k-nearest neighbor classifiers, and the 
     129results of these evaluations in terms of which data instances were correctly or 
     130incorrectly classified is available through the two :obj:`Show k-NN` buttons. 
     131 
     132 
     133Based on a set of interesting projections found by VizRank, a number of 
     134post-analysis tools is available. :obj:`Attribute Ranking` displays a graph 
     135which show how many times the attributes appear in the top-rated projections. 
     136Bars can be colored according to the class with maximal average value of the 
     137attribute. :obj:`Attribute Interactions` displays a heat map showing how many 
     138times the two attributes appeared in the top-rated projections. 
     139:obj:`Graph Projection Scores` displays the distribution of projection scores. 
    59140 
    60141.. image:: images/Scatterplot-VizRank-AttributeHistogram.png 
     
    64145.. image:: images/Scatterplot-VizRank-Scores.png 
    65146 
    66 List of best-rated projections may also be used for the search and analysis of outliers. The idea is that the outliers are those data instances, which are incorrectly classified in many of the top visualizations. For example, the class of the 33-rd instance in `brown-selected.tab <http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ should be Resp, but this instance is quite often misclassified as Ribo. The snapshot below shows one particular visualization displaying why such misclassification occurs. Perhaps the most important part of the :obj:`Outlier Identification` window is a list in the lower left (:obj:`Show predictions for all examples`) with a list of candidates for outliers sorted by the probabilities of classification to the right class. In our case, the most likely outlier is the instance 171, followed by an instance 33, both with probabilities of classification to the right class below 0.5. 
     147List of best-rated projections may also be used for the search and analysis of 
     148outliers. The idea is that the outliers are those data instances, which are 
     149incorrectly classified in many of the top visualizations. For example, the 
     150class of the 33-rd instance in `brown-selected.tab 
     151<http://orange.biolab.si/doc/datasets/brown-selected.tab>`_ should be Resp, 
     152but this instance is quite often misclassified as Ribo. The snapshot below 
     153shows one particular visualization displaying why such misclassification 
     154occurs. Perhaps the most important part of the :obj:`Outlier Identification` 
     155window is a list in the lower left (:obj:`Show predictions for all examples`) 
     156with a list of candidates for outliers sorted by the probabilities of 
     157classification to the right class. In our case, the most likely outlier is the 
     158instance 171, followed by an instance 33, both with probabilities of 
     159classification to the right class below 0.5. 
    67160 
    68161.. image:: images/Scatterplot-VizRank-Outliers.png 
     
    72165.. image:: images/Scatterplot-ZoomSelect.png 
    73166 
    74 Scatterplot, together with the rest of the Orange's widget, provides for a explorative data analysis environment by supporting zooming-in and out of the part of the plot and selection of data instances. These functions are enabled through :obj:`Zoom/Select` toolbox. The default tool is zoom: left-click and drag on the plot area defines the rectangular are to zoom-in. Right click to zoom out. Next two buttons in this tool bar are rectangular and polygon selection. Selections are stacked and can be removed in order from the last one defined, or all at once (back-arrow and cross button from the tool bar). The last button in the tool bar is used to resend the data from this widget. Since this is done automatically after every change of the selection, this last function is not particularly useful. An example of a simple schema where we selected data instances from two polygon regions and send them to the Data Table widget is shown below. Notice that by counting the dots from the scatterplot there should be 12 data instances selected, whereas the data table shows 17. This is because some data instances overlap (have the same value of the two attributes used) - we could use Jittering to expose them. 
     167Scatterplot, together with the rest of the Orange's widget, provides for a 
     168explorative data analysis environment by supporting zooming-in and out of the 
     169part of the plot and selection of data instances. These functions are enabled 
     170through :obj:`Zoom/Select` toolbox. The default tool is zoom: left-click and 
     171drag on the plot area defines the rectangular are to zoom-in. Right click to 
     172zoom out. Next two buttons in this tool bar are rectangular and polygon 
     173selection. Selections are stacked and can be removed in order from the last 
     174one defined, or all at once (back-arrow and cross button from the tool bar). 
     175The last button in the tool bar is used to resend the data from this widget. 
     176Since this is done automatically after every change of the selection, this 
     177last function is not particularly useful. An example of a simple schema where 
     178we selected data instances from two polygon regions and send them to the 
     179:ref:`Data Table` widget is shown below. Notice that by counting the dots from 
     180the scatterplot there should be 12 data instances selected, whereas the data 
     181table shows 17. This is because some data instances overlap (have the same 
     182value of the two attributes used) - we could use Jittering to expose them. 
    75183 
    76184.. image:: images/Scatterplot-Iris-Selection.png 
     
    80188-------- 
    81189 
    82 Scatterplot can be nicely combined with other widgets that output a list of selected data instances. For example, a combination of classification tree and scatterplot, as shown below, makes for a nice exploratory tool displaying data instances pertinent to a chosen classification tree node (clicking on any node of classification tree would send a set of selected data instances to scatterplot, updating the visualization and marking selected instances with filled symbols). 
     190Scatterplot can be nicely combined with other widgets that output a list of 
     191selected data instances. For example, a combination of classification tree and 
     192scatterplot, as shown below, makes for a nice exploratory tool displaying data 
     193instances pertinent to a chosen classification tree node (clicking on any node 
     194of classification tree would send a set of selected data instances to 
     195scatterplot, updating the visualization and marking selected instances with 
     196filled symbols). 
    83197 
    84198.. image:: images/Scatterplot-ClassificationTree.png 
     
    88202---------- 
    89203 
    90    - Leban G, Zupan B, Vidmar G, Bratko I. VizRank: Data Visualization Guided by Machine Learning. Data Mining and Knowledge Discovery 13(2): 119-136, 2006. 
    91    - Mramor M, Leban G, Demsar J, Zupan B. Visualization-based cancer microarray data classification analysis. Bioinformatics 23(16): 2147-2154, 2007. 
     204.. [Leban2006] Leban G, Zupan B, Vidmar G, Bratko I. VizRank: Data 
     205   Visualization Guided by Machine Learning. Data Mining and Knowledge 
     206   Discovery 13(2): 119-136, 2006. 
Note: See TracChangeset for help on using the changeset viewer.