# Ticket #974 (new task)

## Faster threshold search

Reported by: | janez | Owned by: | janez |
---|---|---|---|

Milestone: | 3.0 | Component: | other |

Severity: | minor | Keywords: | |

Cc: | Blocking: | ||

Blocked By: |

### Description

Threshold search could be sped up by implementing special cases of procedures related to the search for binary attributes that would use a pair of double instead of TDistribution.

An alternative is to just avoid TDistribution and use a double * instead. TScoreFeature would have to implement a corresponding method.

Third alternative is to even keep TDistribution but only one (eg. the left one), implement TScoreFeature's operator that would accept the left distribution and the class distribution. This would simplify the threshold search loop (no maintining of the count for the right side) and possibly allow a more efficient computation of scores as well.

**Note:**See TracTickets for help on using tickets.

I tried to simplify the formula for information gain and Gini index for binary cases (if we know the left side distribution and the total distribution), but nothing nicer does come out. Also not keeping the right-hand side distribution does not help a lot since we need it to compute the score (e.g. the entropy).

The code would still run faster because simpler data representation, because we would have less arguments, because we would not update TDistribution::abs and cases if we used double * etc.