Changeset 4625:f5d93c9a2950 in orange


Ignore:
Timestamp:
05/08/08 23:22:08 (6 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Convert:
d2be0360693876df7f682149f9e5f7da0973f063
Message:
 
File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/doc/reference/MeasureAttribute.htm

    r2827 r4625  
    156156</xmp> 
    157157 
    158 <P>If we didn't construct the attribute in advance, we could write <code>orange.MeasureAttribute_relief().thresholdFunction("petal length", data)</code>. This is not recommendable for ReliefF, since it may be a lot slower.</P> 
     158<P>If we hadn't constructed the attribute in advance, we could write <code>orange.MeasureAttribute_relief().thresholdFunction("petal length", data)</code>. This is not recommendable for ReliefF, since it may be a lot slower.</P> 
    159159 
    160160<P>The script below finds and prints out the best threshold for binarization of an attribute, that is, the threshold with which the resulting binary attribute will have the optimal ReliefF (or any other measure).</P> 
     
    238238<DT>m</DT> 
    239239<DD>Number of reference examples. Default is 100. Set to -1 to take all the examples.</DD> 
    240 </DL> 
    241  
    242 <P>Computation of ReliefF is rather slow since it needs to find <CODE>k</CODE> nearest neighbours for each of <CODE>m</CODE> reference examples (or all examples, if <code>m</code> is set to -1). Since we normally compute ReliefF for all attributes in the dataset, <CODE>MeasureAttribute_relief</CODE> caches the results. When it is called to compute a quality of certain attribute, it computes qualities for all attributes in the dataset. When called again, it uses the stored results if the domain is still the same and the example table has not changed (this is done by checking the example tables <CODE>version</CODE> and is not foolproof; it won't detect if you change values of existing examples, but will notice adding and removing examples; see the page on <A href="ExampleTable.htm"><CODE>ExampleTable</CODE></A> for details).</P> 
     240 
     241<DT>checkCachedData</DT> 
     242<DD>A flag best left alone unless you know what you do.</DD> 
     243</DL> 
     244 
     245<P>Computation of ReliefF is rather slow since it needs to find <CODE>k</CODE> nearest neighbours for each of <CODE>m</CODE> reference examples (or all examples, if <code>m</code> is set to -1). Since we normally compute ReliefF for all attributes in the dataset, <CODE>MeasureAttribute_relief</CODE> caches the results. When it is called to compute a quality of certain attribute, it computes qualities for all attributes in the dataset. When called again, it uses the stored results if the data has not changeddomain is still the same and the example table has not changed. Checking is done by comparing the data table version <A href="ExampleTable.htm"><CODE>ExampleTable</CODE></A> for details) and then computing a checksum of the data and comparing it with the previous checksum. The latter can take some time on large tables, so you may want to disable it by setting <code>checkCachedData</code> to <code>False</code>. In most cases it will do no harm, except when the data is changed in such a way that it passed unnoticed by the 'version' control, in which cases the computed ReliefFs can be false. Hence: disable it if you know that the data does not change or if you know what kind of changes are detected by the version control.</P> 
    243246 
    244247<P>Caching will only have an effect if you use the same instance for all attributes in the domain. So, don't do this:</P> 
     
    261264<P>Note that ReliefF can also compute the threshold function, that is, the attribute quality at different thresholds for binarization.</P> 
    262265 
     266<p>Finally, here is an example which shows what can happen if you disable the computation of checksums.</p> 
     267 
     268<xmp>data = orange.ExampleTable("iris") 
     269r1 = orange.MeasureAttribute_relief() 
     270r2 = orange.MeasureAttribute_relief(checkCachedData = False) 
     271 
     272print "%.3f\t%.3f" % (r1(0, data), r2(0, data)) 
     273for ex in data: 
     274    ex[0] = 0 
     275print "%.3f\t%.3f" % (r1(0, data), r2(0, data)) 
     276</xmp> 
     277 
     278<p>The first print prints out the same number, 0.321 twice. Then we annulate the first attribute. <code>r1</code> notices it and returns -1 as it's ReliefF, while <code>r2</code> does not and returns the same number, 0.321, which is now wrong.</p> 
     279 
    263280 
    264281<H2>Measure for Attributes for Regression Problems</H2> 
Note: See TracChangeset for help on using the changeset viewer.