## Non-linear regression problem

2 posts
• Page

**1**of**1**### Non-linear regression problem

Hello.

We have a non-linear regression problem that we're considering using Orange for. The problem is unusal in some respects.

I've appended a description of the problem (especially 'what makes it unusual') written by someone who's much more familar with the theory than I am.

I'm hoping it'll make sense to you and you'll be able to say "yes, you can do that in Orange buy subclassing Foo and adding Bar" etc etc.

Let me know if things aren't clear enough and I'll expand from my more practical perspective.

Thanks.

Tim.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Our problem domain is defined by feature vectors of 5 to 20 nominal

features. Our goal is to predict a continuous value for each feature

vector presented to the system. Thus our problem is a non-linear

regression problem.

The main difference from a standard setting is that

the raw feature vectors are not labeled with a corresponding continuous

value, but with a binary one. The continuous value is the rate (or

average) value within a group of feature vectors. If thinking about a

decision tree, the predicted continuous value would be calculated from

the feature vectors within a leaf node (or any other intermediate node).

We also wish to compute a secondary continuous value from the instance

labels within each tree node.

Our splitting rule is not based on the least-squared deviation (LSD) as

the vectors themselves are not labeled with the continuous value, but

instead on an ad hoc rule that tries to maximize the variance of the

computed continuous values across all the child nodes. It's possible LSD

would work using the binary values of each feature vector.

We also have ad hoc stopping rules based on statistical measures. One

rule determines the validity of each proposed split. If a split is not

acceptable the split is undone and the splitting process stopped on that

branch. A second stopping rule applies to each node. Nodes that do not

meet this rule a deleted from the tree.

We introduced a special "other" feature value to handle cases when a

test vector has a feature value that its not represented in the tree.

Not being represented can happen because the value was not present in

the training set (a common occurrence) or because the number of training

instances was too low to include it in the tree. When training, the

"other" feature value is added to all node splits after the optimal

split is decided. All the data of the parent node is considered present

in the "other" child node. When on the testing phase the "other" value

is a match when no other values are.

We're considering having a pruning stage. It will eliminate all "other"

nodes that are leaves. And it will merge leaves whose corresponding

continuous values are close enough.

When searching the tree (performance phase) the process starts at the

root and proceeds as usual, with the caveat of the "other" value (as

described), but if at some time a match is not possible the current node

is the result of the search and the output continuous value is computed

from the instances within it.

The number of training feature vectors is large, in the millions.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We have a non-linear regression problem that we're considering using Orange for. The problem is unusal in some respects.

I've appended a description of the problem (especially 'what makes it unusual') written by someone who's much more familar with the theory than I am.

I'm hoping it'll make sense to you and you'll be able to say "yes, you can do that in Orange buy subclassing Foo and adding Bar" etc etc.

Let me know if things aren't clear enough and I'll expand from my more practical perspective.

Thanks.

Tim.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Our problem domain is defined by feature vectors of 5 to 20 nominal

features. Our goal is to predict a continuous value for each feature

vector presented to the system. Thus our problem is a non-linear

regression problem.

The main difference from a standard setting is that

the raw feature vectors are not labeled with a corresponding continuous

value, but with a binary one. The continuous value is the rate (or

average) value within a group of feature vectors. If thinking about a

decision tree, the predicted continuous value would be calculated from

the feature vectors within a leaf node (or any other intermediate node).

We also wish to compute a secondary continuous value from the instance

labels within each tree node.

Our splitting rule is not based on the least-squared deviation (LSD) as

the vectors themselves are not labeled with the continuous value, but

instead on an ad hoc rule that tries to maximize the variance of the

computed continuous values across all the child nodes. It's possible LSD

would work using the binary values of each feature vector.

We also have ad hoc stopping rules based on statistical measures. One

rule determines the validity of each proposed split. If a split is not

acceptable the split is undone and the splitting process stopped on that

branch. A second stopping rule applies to each node. Nodes that do not

meet this rule a deleted from the tree.

We introduced a special "other" feature value to handle cases when a

test vector has a feature value that its not represented in the tree.

Not being represented can happen because the value was not present in

the training set (a common occurrence) or because the number of training

instances was too low to include it in the tree. When training, the

"other" feature value is added to all node splits after the optimal

split is decided. All the data of the parent node is considered present

in the "other" child node. When on the testing phase the "other" value

is a match when no other values are.

We're considering having a pruning stage. It will eliminate all "other"

nodes that are leaves. And it will merge leaves whose corresponding

continuous values are close enough.

When searching the tree (performance phase) the process starts at the

root and proceeds as usual, with the caveat of the "other" value (as

described), but if at some time a match is not possible the current node

is the result of the search and the output continuous value is computed

from the instances within it.

The number of training feature vectors is large, in the millions.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Our splitting rule is not based on the least-squared deviation (LSD) as the vectors themselves are not labeled with

the continuous value, but instead on an ad hoc rule that tries to maximize the variance of the computed continuous values across all the child nodes. It's possible LSD

would work using the binary values of each feature vector.

I do not understand what do you mean by the variance of the computed continuous values across the child nodes. If the optimal split (your split criterion is still a value of a single attribute, right?) can be decided upon the values of learning examples that came to a particular node, you will only have to subclass the class MeasureAttribute. If, as I suspect from the above description, the selection of the attribute is global, then you will need to rewrite the skeleton (I'd say it's about 20-30 lines of Python - I'll probably have to do it today for another problem, so I can put the code in the forum), reuse most of the components, and modify others (and the skeleton) correspondingly.

We also have ad hoc stopping rules based on statistical measures. One rule determines the validity of each proposed split. If a split is not acceptable the split is undone and the splitting process stopped on that branch. A second stopping rule applies to each node. Nodes that do not meet this rule a deleted from the tree.

Orange's tree learner uses two classes for that. One is the split constructor which is given the examples at the node and returns the 'optimal' split. In your case, it should also check that the found split is acceptable. If it is not, it can return None. As for the second stopping rule, I understand that it is unrelated to the splits, but only to the examples in the node? So you can apply it prior to splitting? If this is so, you need to use (or subclass) the Orange's stopping criteria. If you can only know it after the tree is constructed, this can go into post-pruning. (I might have misunderstood this question, I fear.)

When training, the "other" feature value is added to all node splits after the optimal split is decided.

This is one of the possible treatments for unknown values, the corresponding class name is TreeExampleSplitter_UnknownsToBranch.

We're considering having a pruning stage. It will eliminate all "other" nodes that are leaves. And it will merge leaves whose corresponding continuous values are close enough.

Post-pruning is a matter of a simple script in Python - just traverse the tree and do with it whatever you want. We haven't programmed anything that does something like this.

As I said, I'll be working on some special trees today, so I'll hopefully be able to post some example scripts that will also be useful for you.

2 posts
• Page

**1**of**1**