source: orange/orange/doc/reference/orngVizRank.htm @ 3084:103d50969d70

Revision 3084:103d50969d70, 10.0 KB checked in by Gregor <Gregor@…>, 7 years ago (diff)

* empty log message *

Line 
1<html><HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css" MEDIA=screen>
3</HEAD>
4<body>
5<h1>orngVizRank: Orange VizRank module</h1>
6
7<p>Module orngVizRank implements VizRank (Leban et al, 2004; Leban et al, 2005) algorithm which is able to rank possible data projections generated using two different visualization methods - scatterplot and radviz method. For a given class labeled data set, VizRank creates different possible data projections and assigns a score of interestingness to each of the projections. VizRank scores the projections based on how well are different classes separated in the projection. If different classes are well separated the projection gets a high score, otherwise the score it is correspondingly lower. After evaluation it is sensible to focus on top-ranked projections that provide the greatest insight on how to separate between different classes.</P>
8
9<p>In the rest of this document we will talk about two different visualization methods - scatterplot and radviz. While scatterplot is a well known method, not many people know radviz. For those readers who are interested in this method, please see (Hoffman, 1997).</P>
10
11<hr>
12
13<H2>VizRank in Orange</H2>
14
15<P>The easiest way to use VizRank in Orange is through Orange widgets. Widgets like Scatterplot, Radviz and Polyviz (which can be found in Visualize tab in Orange Canvas) contain a button "VizRank" which opens VizRank's dialog where you can change all possible settings and find interesting data projections. </P>
16
17<P>A more advanced user, however, will perhaps also want to use VizRank in scripting. These users will use the orngVizRank module.</P>
18
19<P>In the rest of this document we will give information only about using VizRank in scripts. For those of you who will use VizRank in Orange widgets we provided extensive tooltips that should clarify the meaning of different settings.</P>
20
21<P><b>Creating a VizRank instance</b></P>
22
23<p>First lets show a very simple example of how we can use VizRank in scripts:</p>
24
25<xmp class="code">>>> import orange
26>>> data = orange.ExampleTable("wine.tab")
27>>> from orngVizRank import *
28>>> vizrank = VizRank(SCATTERPLOT)     # use SCATTERPLOT if you want to evaluate scatterplot projections or RADVIZ to evaluate radviz projections
29>>> vizrank.setData(data)     # set the data set
30>>> vizrank.evaluateProjections()      # evaluate possible projections
31>>> print vizrank.results[0]
32(86.88861657813024, (86.88861657813024, [87.603105074268271, 82.08174408531525, 93.120556697249413], [59, 71, 48]), 178, ['A7', 'A10'], 5, [])
33</xmp>
34
35
36<p>In this example we created a VizRank instance, evaluated scatterplot projections of the UCI wine data set and printed the information about the best ranked projection. The best projection scored a value of 86.88 (in a range between 0 and 100) and is showing attributes 'A7' and 'A10'.</p>
37
38<p>Below is a list of functions and settings, that can be used in order to modify VizRank's behaviour.</p>
39
40<p>kValue<br>
41&nbsp;&nbsp;&nbsp; the number of examples used in predicting the class value. By default it is set to <i>N/c</i>, where <i>N</i> is number of examples in the data set and <i>c</i> is the number of class values<br>
42
43percentDataUsed<br>
44&nbsp;&nbsp;&nbsp; when handling large data sets, the kNN method might take a lot of time to evaluate each projection. We can still get a good estimate of projection interestingness if we consider only a subset of examples. You can
45specify a value between 0 and 100. Default: 100<br>
46
47qualityMeasure<br>
48&nbsp;&nbsp;&nbsp; there are different measures of prediction success that one can use to evaluate a classifier. You can use classification accuracy (<CODE>CLASS_ACCURACY</CODE>), average probability of correct classification (<CODE>AVERAGE_CORRECT</CODE>) or Brier score (<CODE>BRIER_SCORE</CODE>). Default: <CODE>AVERAGE_CORRECT</CODE><br>
49
50testingMethod<br>
51&nbsp;&nbsp;&nbsp; the way how the accuracy of the classifier is computed. You can use leave one out (<CODE>LEAVE_ONE_OUT</CODE>), 10 fold cross validation (<CODE>TEN_FOLD_CROSS_VALIDATION</CODE>) or testing on the learning set (<CODE>TEST_ON_LEARNING_SET</CODE>). Default: <CODE>TEN_FOLD_CROSS_VALIDATION</CODE><br>
52
53attrCont<br>
54&nbsp;&nbsp;&nbsp; which method for evaluating continuous attributes do we want to use. Attributes are ranked and projections with top ranked attributes are evaluated first. Possible options are ReliefF (<CODE>CONT_MEAS_RELIEFF</CODE>), Signal to Noise (<CODE>CONT_MEAS_S2N</CODE>), a modification of Signal to Noise measure (<CODE>CONT_MEAS_S2NMIX</CODE>) or no measure (<CODE>CONT_MEAS_NONE</CODE>). Default: <CODE>CONT_MEAS_RELIEFF</CODE><br>
55
56attrDisc<br>
57&nbsp;&nbsp;&nbsp; which method for evaluating discrete attributes do we want to use. Attributes are ranked and projections with top ranked attributes are evaluated first. Possible options are ReliefF (<CODE>DISC_MEAS_RELIEFF</CODE>), Gain ratio(<CODE>DISC_MEAS_GAIN</CODE>), Gini index (<CODE>DISC_MEAS_GINI</CODE>) or no measure (<CODE>DISC_MEAS_NONE</CODE>). Default: <CODE>DISC_MEAS_RELIEFF</CODE><br>
58
59useExampleWeighting<br>
60&nbsp;&nbsp;&nbsp; if class distribution is very uneven example weighting can be used. Default: 0<br>
61
62evaluationTime<br>
63&nbsp;&nbsp;&nbsp; time in minutes that we want to spend in evaluating projections. Since there might be a large number of possible projections we can this way stop evaluation before it evaluates all projetions. Because of the seach heuristic (attrCont and attrDisc) we will most likely evaluate top ranked projections at the beginning of the evaluation. Default: 2</p>
64
65<p>Radviz specific settings:<br>
66
67optimizationType<br>
68&nbsp;&nbsp;&nbsp; for description see <CODE>attributeCount</CODE> below. Possible values are <CODE>EXACT_NUMBER_OF_ATTRS</CODE> and <CODE>MAXIMUM_NUMBER_OF_ATTRS</CODE>. Default: <CODE>MAXIMUM_NUMBER_OF_ATTRS</CODE><br>
69
70attributeCount<br>
71&nbsp;&nbsp;&nbsp; maximum number of attributes in a projection that we will consider. If <CODE>optimizationType == MAXIMUM_NUMBER_OF_ATTRS</CODE> then we will consider projections that have between 3 and <CODE>attributeCount</CODE> attributes. If <CODE>optimizationType == EXACT_NUMBER_OF_ATTRS</CODE> then we will consider only projections that have exactly CODE>attributeCount</CODE> attributes. Default: 4</p>
72
73<p>Methods:<br>
74
75setData(data)<br>
76&nbsp;&nbsp;&nbsp; set the example table to evaluate<br>
77
78evaluateProjections()<br>
79&nbsp;&nbsp;&nbsp; start projection evaluation. If not all projections are yet evaluated, it will automatically stop after <CODE>evaluationTime</CODE> minutes.<br>
80
81save(filename)<br>
82&nbsp;&nbsp;&nbsp; save the list of evaluated projections<br>
83
84load(filename)<br>
85&nbsp;&nbsp;&nbsp; load a file with evaluated projections</p>
86
87
88<h3>VizRank as a learner</h3>
89
90<p>VizRank can also be used as a learning method. You can construct a learner by creating an instance of the VizRankLearner class.</p>
91
92<xmp class = "code">learner = VizRankLearner(SCATTERPLOT)</xmp>
93
94<p>VizRankLearner can actually accept three parameters. First is the type of the visualization method to use (<CODE>SCATTERPLOT</CODE> or
95<CODE>RADVIZ</CODE>). The second parameter is an instance of VizRank class. If it is not given, a new instance is created. The third parameter is a graph instance - <CODE>orngScaleScatterPlotData</CODE> or <CODE>orngScaleRadvizData</CODE> instance. If it is not specified, a new instance is created.</p>
96
97<p>To change the VizRank's settings we simply access them through the learner.VizRank instance (e.g. <CODE>learner.VizRank.kValue = 10</CODE>).</p>
98
99<p>The learner instance can be used as any other learners. If you provide it the examples it returns a classifier of type <CODE>VizRankClassifier</CODE> which can be used as any other classifier:<br>
100
101<xmp class = "code">classifier = learner(data)</xmp>
102
103<p>When classifying VizRank classifier will use the evaluated projections to make class prediction for the new example. Evaluated projection will serve as arguments for each class value. Arguments have different values (weights) and the example is classified to the class which has the highest sum of argument values.</p>
104
105<p>VizRank's settings that are relevant when using VizRank as a classifier:</p>
106
107<p>argumentCount<br>
108&nbsp;&nbsp;&nbsp; number of arguments (projections) used when predicting the class value<br>
109
110argumentValueFormula<br>
111&nbsp;&nbsp;&nbsp; the way how argument values are computed. For argument values, two different things are important: the score of the projection and the class probability distribution computed using the kNN on the given projection. It makes sense that the argument is stronger if the projection has a high score or if probability for one class value is high. There are three ways how the argument values can be computed:<br>
112
113&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 - argument value is the same as projection score<br>
114&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 - argument value is 0.5* projection score + 0.5* predicted class probability<br>
115&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2 - argument value is the same as predicted class probability</p>
116
117<p>A simple example:</p>
118<p>
119
120<xmp class="code">
121>>> import orange
122>>> from orngVizRank import *
123>>> data = orange.ExampleTable(&quot;iris.tab&quot;)
124>>> learner = VizRankLearner(SCATTERPLOT)
125>>> learner.VizRank.argumentCount = 3
126>>> classifier = learner(data)
127>>> for i in range(5):
128        print classifier(data[i]), data[i].getclass()
129(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
130(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
131(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
132(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
133(<orange.Value 'iris'='Iris-setosa'>, <1.000, 0.000, 0.000>) Iris-setosa
134</xmp>
135
136<h2>References</h2>
137
138<p>Leban, G., Bratko, I., Petrovic, U., Curk, T., Zupan, B. VizRank: finding informative data projections in functional genomics
139by machine learning. <i>Bioinformatics</i> <b>21</b>, 413-414 (2005).</p>
140<p>Leban, G., Mramor, M., Bratko, I., Zupan, B.: Simple and Effective Visual Models for Gene Expression Cancer Diagnostics, <i>KDD-2005</i> 
141167--177 (Chicago, 2005).</p>
142
143</body> </html>
Note: See TracBrowser for help on using the repository browser.