Ticket #882 (closed task: fixed)

Opened 3 years ago

Last modified 3 years ago

documentation check - Orange.feature.scoring

Reported by: marko Owned by: janez
Milestone: 2.5 Component: other
Severity: minor Keywords:
Cc: blaz Blocking:
Blocked By:

Description

Orange.feature.scoring seems fine to me. What do you think?

 http://orange.biolab.si/doc/orange25/Orange.feature.scoring.html

Change History

comment:1 Changed 3 years ago by janez

Not really...

The first sentence goes: "Features scoring is an assesment the relevance of features to the class variable; the higher a feature is scored, the better it should be for prediction." It should probably be "of the relevance". Besides, feature is not "relevant to the class variable", it is relevant for its prediction. It would be better to say "Feature scoring is assessment of the usefulness of the feature for prediction of the dependent (class) variable." (Not that I like it, but it's better.)

One thing that we specifically decided to avoid is the colloquial style, like in the second sentence: "You can score the feature “tear_rate” of the Lenses data set (loaded into data) with:". We should either use passive or rephrase the sentence, like "The following example computes the information gain of feature 'tear rate' in the lenses data set.". (The phrase "The following example" should not repeat before each code snippet, though.).

It continues in the same manner: "Apart from information gain you could also use other measures".

From now on, I prohibit the use of "you" in documentation. ;)

Similarly, this is not the best style: "So, don’t do this:". I'd say something like "Constructing new instances of ReliefF for each feature, like this::: ... ... runs much slower than reusing the same instance:: ... ... " There are also other places at which the old Demsar's writing should be replaced by something more decent, e.g. "based on the “saving”" should be "based on the cost decrease". I am also not sure of "the opposite error" - sounds like a direct translation from Slovenian.

"First developed by Kira and Rendell and then improved by Kononenko." is a fragment and should be changed to something like "The measure was first developed by ...".

As for the content, it would be nice to include the formulae for information gain, ratio, gini etc...

The Gini index was actually introduced by Mr. Gini himself (whose other contributions to science include a pro-fascistic essays like "The Scientific Basis of Fascism" in 1927 ;).

class Orange.feature.scoring.Distance and class Orange.feature.scoring.MDL should have at least some explanation. Why is there an empty section with methods?

I don't like the reference "As all measures are subclasses of Measure, see Measure.call for usage.". It might be better to put this in the beginning of Measures for classification and then refer to this place later on. I know that this does not represent the actual placement of the method, which is defined in the base class, but it is easier for the reader.

Why "Measures for Classification"? As I understand, we decided to use the term "Score" instead, so it would be better to say "Feature scoring in classification problems".

Would it make sense to rename Orange.feature.scoring.Measure to Orange.feature.scoring.Score?

I confess I have not really read the page, I just glanced over it, so I may have more comments in the next iteration. (Sorry, I'm working on Orange 3.0. Now that all the data is in numpy-compatible format, adult data set which used to load in 1.5s on my machine and took 12372 KB, now loads in 1.2s and takes 4384 KB. I think I can half that time (but won't go into that yet) and I could certainly half the memory consumption by using floats instead of doubles, but even this is great. The problem is that I am basically reimplementing Orange's C++ core.)

comment:2 Changed 3 years ago by marko

In [11451]:

(The changeset message doesn't reference this ticket)

comment:3 Changed 3 years ago by marko

In [11462]:

(The changeset message doesn't reference this ticket)

comment:4 Changed 3 years ago by marko

I have corrected the documentation with Janez's suggestion and I ask for a recheck.

Yes, the descriptions of the methods are still sparse, I agree, but there are at least as good as the ones in current documentation.

I have chosen this module because it was particularly bad originally (just old documentation glued together in a clumsily - check the end of July version at  http://193.2.72.57:8199/Orange.feature.html#module-Orange.feature.scoring).

comment:5 Changed 3 years ago by janez

  • Status changed from new to closed
  • Resolution set to fixed

I made a few changes to the introduction, otherwise I quite like it.

Note: See TracTickets for help on using tickets.