source: orange/orange/doc/modules/orngVizRank.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 10.7 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html><HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
3</HEAD>
4<body>
5<h1>orngVizRank: Orange VizRank module</h1>
6<index name="modules+vizualization ranking">
7
8<p>Module orngVizRank implements VizRank algorithm (Leban et al, 2004; Leban et al, 2005) which is able to rank possible data projections generated using two different visualization methods - scatterplot and radviz method. For a given class labeled data set, VizRank creates different possible data projections and assigns a score of interestingness to each of the projections. VizRank scores the projections based on how well are different classes separated in the projection. If different classes are well separated the projection gets a high score, otherwise the score it is correspondingly lower. After evaluation it is sensible to focus on top-ranked projections that provide the greatest insight on how to separate between different classes.</P>
9
10<p>In the rest of this document we will talk about two different visualization methods - scatterplot and radviz. While scatterplot is a well known method, not many people know radviz. For those readers who are interested in this method, please see (Hoffman, 1997).</P>
11
12<hr>
13
14<H2>VizRank in Orange</H2>
15<index name="classes/VizRank (in orngVizRank)">
16
17<P>The easiest way to use VizRank in Orange is through Orange widgets. Widgets like Scatterplot, Radviz and Polyviz (which can be found in Visualize tab in Orange Canvas) contain a button "VizRank" which opens VizRank's dialog where you can change all possible settings and find interesting data projections. </P>
18
19<P>A more advanced user, however, will perhaps also want to use VizRank in scripts. These users will use the orngVizRank module.</P>
20
21<P>In the rest of this document we will give information only about using VizRank in scripts. For those of you who will use VizRank in Orange widgets we provided extensive tooltips that should clarify the meaning of different settings.</P>
22
23<P><b>Creating a VizRank instance</b></P>
24
25<p>First lets show a very simple example of how we can use VizRank in scripts:</p>
26
27<xmp class="code">>>> import orange
28>>> data = orange.ExampleTable("wine.tab")
29>>> from orngVizRank import *
30>>> vizrank = VizRank(SCATTERPLOT)     # options are: SCATTERPLOT, RADVIZ or LINEAR_PROJECTION
31>>> vizrank.setData(data)     # set the data set
32>>> vizrank.evaluateProjections()      # evaluate possible projections
33>>> print vizrank.results[0]
34(86.88861657813024, (86.88861657813024, [87.603105074268271, 82.08174408531525, 93.120556697249413],
35    [59, 71, 48]), 178, ['A7', 'A10'], 5, {})
36</xmp>
37
38
39<p>In this example we created a VizRank instance, evaluated scatterplot projections of the UCI wine data set and printed the information about the best ranked projection. The best projection scored a value of 86.88 (in a range between 0 and 100) and is showing attributes 'A7' and 'A10'. There is also lots of other information for each projections in the result list, but it is not relevant for a casual user.</p>
40
41<p>Below is a list of functions and settings, that can be used in order to modify VizRank's behaviour.</p>
42
43<DL>
44<DT><B>kValue</b>
45<DD class=ddfun>the number of examples used in predicting the class value. By default it is set to <i>N/c</i>, where <i>N</i> is number of examples in the data set and <i>c</i> is the number of class values</DD>
46
47<DT><B>percentDataUsed</b>
48<DD class=ddfun>when handling large data sets, the kNN method might take a lot of time to evaluate each projection. We can still get a good estimate of projection interestingness if we consider only a subset of examples. You can
49specify a value between 0 and 100. Default: 100</DD>
50
51<DT><B>qualityMeasure</b>
52<DD class=ddfun>there are different measures of prediction success that one can use to evaluate a classifier. You can use classification accuracy (<CODE>CLASS_ACCURACY</CODE>), average probability of correct classification (<CODE>AVERAGE_CORRECT</CODE>) or Brier score (<CODE>BRIER_SCORE</CODE>). Default: <CODE>AVERAGE_CORRECT</CODE></DD>
53
54<DT><B>testingMethod</b>
55<DD class=ddfun>the way how the accuracy of the classifier is computed. You can use leave one out (<CODE>LEAVE_ONE_OUT</CODE>), 10 fold cross validation (<CODE>TEN_FOLD_CROSS_VALIDATION</CODE>) or testing on the learning set (<CODE>TEST_ON_LEARNING_SET</CODE>). Default: <CODE>TEN_FOLD_CROSS_VALIDATION</CODE></DD>
56
57<DT><B>attrCont</b>
58<DD class=ddfun>which method for evaluating continuous attributes do we want to use. Attributes are ranked and projections with top ranked attributes are evaluated first. Possible options are ReliefF (<CODE>CONT_MEAS_RELIEFF</CODE>), Signal to Noise (<CODE>CONT_MEAS_S2N</CODE>), a modification of Signal to Noise measure (<CODE>CONT_MEAS_S2NMIX</CODE>) or no measure (<CODE>CONT_MEAS_NONE</CODE>). Default: <CODE>CONT_MEAS_RELIEFF</CODE></DD>
59
60<DT><B>attrDisc</b>
61<DD class=ddfun>which method for evaluating discrete attributes do we want to use. Attributes are ranked and projections with top ranked attributes are evaluated first. Possible options are ReliefF (<CODE>DISC_MEAS_RELIEFF</CODE>), Gain ratio(<CODE>DISC_MEAS_GAIN</CODE>), Gini index (<CODE>DISC_MEAS_GINI</CODE>) or no measure (<CODE>DISC_MEAS_NONE</CODE>). Default: <CODE>DISC_MEAS_RELIEFF</CODE></DD>
62
63<DT><B>useGammaDistribution</b>
64<DD class=ddfun>this parameter determines the order in which the heuristic will select attributes that will be then evaluated using VizRank. If value is set to 0, heuristic will start with selecting top ranked attributes (as ranked by measures specified by <CODE>attrCont</CODE> and <CODE>attrDist</CODE> variables) and when tested all possible combinations progress to worse ranked attributes. If value set to 1, heuristic will also first rank attributes but will then randomly select attributes according to gamma distribution - this way the better ranked attributes will still be selected more often, but sometimes they will be tested in a combination with attributes that are poorly ranked but can in the end produce high-ranked projection. In domains with a larger set of attributes (&gt;20) it is advisable to use gamma distribution, otherwise we never come to evaluate projections with proorly ranked attributes. Default: 0</DD>
65
66
67<DT><B>useExampleWeighting</b>
68<DD class=ddfun> if class distribution is very uneven example weighting can be used. Default: 0</DD>
69
70<DT><B>evaluationTime</b>
71<DD class=ddfun> time in minutes that we want to spend in evaluating projections. Since there might be a large number of possible projections we can this way stop evaluation before it evaluates all projetions. Because of the seach heuristic (<CODE>attrCont</CODE> and <CODE>attrDisc</CODE>) we will most likely find projections with the highest scores at the beginning of the evaluation. Default: 2</DD>
72
73</DL>
74
75<p><b>Radviz specific settings:</b><br>
76<index name="RadViz">
77
78<DL>
79<DT><B>optimizationType</b>
80<DD class=ddfun> for description see <CODE>attributeCount</CODE> below. Possible values are <CODE>EXACT_NUMBER_OF_ATTRS</CODE> and <CODE>MAXIMUM_NUMBER_OF_ATTRS</CODE>. Default: <CODE>MAXIMUM_NUMBER_OF_ATTRS</CODE></DD>
81
82<DT><B>attributeCount</b>
83<DD class=ddfun> maximum number of attributes in a projection that we will consider. If <CODE>optimizationType == MAXIMUM_NUMBER_OF_ATTRS</CODE> then we will consider projections that have between 3 and <CODE>attributeCount</CODE> attributes. If <CODE>optimizationType == EXACT_NUMBER_OF_ATTRS</CODE> then we will consider only projections that have exactly <CODE>attributeCount</CODE> attributes. Default: 4</DD>
84</DL>
85
86<p><b>Methods:</b><br>
87<DL>
88<DT><B>setData</b>(data)
89<DD class=ddfun> set the example table to evaluate</DD>
90
91<DT><B>evaluateProjections()</b>
92<DD class=ddfun> start projection evaluation. If not all projections are yet evaluated, it will automatically stop after <CODE>evaluationTime</CODE> minutes.</DD>
93
94<DT><B>save</b>(filename)
95<DD class=ddfun> save the list of evaluated projections</DD>
96
97<DT><B>load</b>(filename)
98<DD class=ddfun> load a file with evaluated projections</DD>
99</DL>
100
101<hr>
102<h3>VizRank as a learner</h3>
103
104<p>VizRank can also be used as a learning method. You can construct a learner by creating an instance of the VizRankLearner class.</p>
105
106<xmp class = "code">learner = VizRankLearner(SCATTERPLOT)</xmp>
107
108<p>VizRankLearner can actually accept three parameters. First is the type of the visualization method to use (<CODE>SCATTERPLOT</CODE> or
109<CODE>RADVIZ</CODE>). The second parameter is an instance of VizRank class. If it is not given, a new instance is created. The third parameter is a graph instance - <CODE>orngScaleScatterPlotData</CODE> or <CODE>orngScaleRadvizData</CODE> instance. If it is not specified, a new instance is created.</p>
110
111<p>To change the VizRank's settings we simply access them through the learner.VizRank instance (e.g. <CODE>learner.VizRank.kValue = 10</CODE>).</p>
112
113<p>The learner instance can be used as any other learners. If you provide it the examples it returns a classifier of type <CODE>VizRankClassifier</CODE> which can be used as any other classifier:<br>
114
115<xmp class = "code">classifier = learner(data)</xmp>
116
117<p>When classifying VizRank classifier will use the evaluated projections to make class prediction for the new example. Evaluated projection will serve as arguments for each class value. Arguments have different values (weights) and the example is classified to the class which has the highest sum of argument values.</p>
118
119<p>VizRank's settings that are relevant when using VizRank as a classifier:</p>
120
121<DL>
122<DT><B>argumentCount</b>
123<DD class=ddfun> number of arguments (projections) used when predicting the class value<br>
124
125</DL>
126
127<p>A simple example:</p>
128<p>
129
130<xmp class="code">>>> import orange
131>>> from orngVizRank import *
132>>> data = orange.ExampleTable("iris.tab")
133>>> learner = VizRankLearner(SCATTERPLOT)
134>>> learner.VizRank.argumentCount = 3
135>>> classifier = learner(data)
136>>> for i in range(5):
137        print classifier(data[i]), data[i].getclass()
138(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
139(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
140(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
141(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
142(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
143</xmp>
144
145<h2>References</h2>
146
147<p>Leban, G., Bratko, I., Petrovic, U., Curk, T., Zupan, B. VizRank: finding informative data projections in functional genomics
148by machine learning. <i>Bioinformatics</i> <b>21</b>, 413-414 (2005).</p>
149<p>Leban, G., Mramor, M., Bratko, I., Zupan, B.: Simple and Effective Visual Models for Gene Expression Cancer Diagnostics, <i>KDD-2005</i>
150167--177 (Chicago, 2005).</p>
151<p>Hoffman, P. E., Grinstein, G. G., Marx, K., Grosse, I. & Stanley, E.: DNA Visual and Analytic Data Mining. <i>IEEE Visualization 1997</i> 1, 437-441 (1997).</p>
152</body> </html>
Note: See TracBrowser for help on using the repository browser.