# Changeset 8160:b22f1ee57610 in orange

Ignore:
Timestamp:
08/09/11 13:09:07 (3 years ago)
Branch:
default
Convert:
d7323a465a9964e3e8b1013af46752ba4c04da16
Message:

Updates to Orange.feature.scoring. References #882.

File:
1 edited

### Legend:

Unmodified
 r8157 prediction of the dependant (class) variable. To compute the information gain of feature "tear_rate" in the Lenses data set (loaded into data) use: To compute the information gain of feature "tear_rate" in the Lenses data set (loaded into data) use: >>> meas = Orange.feature.scoring.InfoGain() Apart from information gain you could also use other scoring methods; :ref:classification and :ref:regression. For various ways to call them see :ref:callingscore. see :ref:classification and :ref:regression. Various ways to call them are described on :ref:callingscore. It is possible to construct the object and use contingency itself. Not all classes will accept all kinds of arguments. :obj:Relief, Not all classes accept all kinds of arguments. :obj:Relief, for instance, only supports the form with instances on the input. score can be constructed as follows:: .. comment:: opposite error - is this term correct? TODO >>> meas = Orange.feature.scoring.Cost() Knowing the value of feature 3 would decrease the classification cost for approximately 0.083 per instance. .. comment:: opposite error - is this term correct? TODO .. index:: Check if the cached data is changed with data checksum. Slow on large tables.  Defaults to True. Disable it if you know that on large tables.  Defaults to :obj:True. Disable it if you know that the data will not change. .. [Breiman1984] L Breiman et al: Classification and Regression Trees, Chapman and Hall, 1984. .. [Kononenko1995] I Kononenko: On biases in estimating multi-valued attributes, International Joint Conference on Artificial Intelligence, 1995. .. _iris.tab: code/iris.tab A scoring method derived from :obj:~Orange.feature.scoring.Score. If None, :obj:Relief with m=5 and k=10 will be used. If :obj:None, :obj:Relief with m=5 and k=10 will be used. """ class Distance(Score): """The 1-D feature distance score described in [Kononenko2007]_. TODO""" """The :math:1-D distance is defined as information gain divided by joint entropy :math:H_{CA} (:math:C is the class variable and :math:A the feature): .. math:: 1-D(C,A) = \\frac{\\mathrm{Gain}(A)}{H_{CA}} """ @Orange.misc.deprecated_keywords({"aprioriDist": "apriori_dist"}) class MDL(Score): """Score feature based on the minimum description length principle. TODO.""" """Minimum description length principle [Kononenko1995]_. Let :math:n be the number of instances, :math:n_0 the number of classes, and :math:n_{cj} the number of instances with feature value :math:j and class value :math:c. Then MDL score for the feature A is .. math:: \mathrm{MDL}(A) = \\frac{1}{n} \\Bigg[ \\log\\binom{n}{n_{1.},\\cdots,n_{n_0 .}} - \\sum_j \\log \\binom{n_{.j}}{n_{1j},\\cdots,n_{n_0 j}} \\\\ + \\log \\binom{n+n_0-1}{n_0-1} - \\sum_j \\log \\binom{n_{.j}+n_0-1}{n_0-1} \\Bigg] """ @Orange.misc.deprecated_keywords({"aprioriDist": "apriori_dist"})