Orange Forum • View topic - Discretization vs. Meta Attributes

Discretization vs. Meta Attributes

A place to ask questions about methods in Orange and how they are used and other general support.

Discretization vs. Meta Attributes

Postby dudester4 » Fri Feb 28, 2014 20:19

As a newbie to Orange, I have trying to do some model development, utilizing mostly discrete variables attempting to predict a continuous outcome. (dependent variable). I have worked with other DM software.

Knowing enough about traditional statistics leads me to conclude I can either t-test or ANOVA this, but am wondering if I discretize the dependent variable (income) that this should allow me to use some of the wonderful graphic & analysis tools Orange provides. However, when I try to stipulate the income as dependent, continuous, (class) outcome, Orange keeps interpreting this variable as meta attribute ONLY. I tried to discretize this, utilizing the discretize and select attributes widgets, but can't seem to get this to work.

Do I need to discretize income in Excel prior? I have attempted to follow some of the models but can't seem to get this right. Any pointers? :?

Re: Discretization vs. Meta Attributes

Postby Ales » Mon Mar 03, 2014 14:14

dudester4 wrote:However, when I try to stipulate the income as dependent, continuous, (class) outcome, Orange keeps interpreting this variable as meta attribute ONLY
How do you define the variable in the tab file (can you provide a sample)? The definition should look like
Code: Select all
... Dependent
... c
... class
... 1.0
... 1.1
... ...

For instance see the housing.tab dataset for an example of a dataset with a continuous dependent variable.
dudester4 wrote:Do I need to discretize income in Excel prior?
This should not be necessary.

Re: Discretization vs. Meta Attributes

Postby dudester4 » Thu Apr 03, 2014 8:18

Thank you for your response.

I did look at the housing.tab data set, and indeed it does have a continuous dependent variable. But all the independent variables are also continuous; my dataset has mostly discrete independent variables. I am attempting to predict, or model, moral educational approaches and their impact on later income and academic achievement. So, data definition in the tab file looks like this:

religiosity attends church thinks self religious volunteers for charity ...... income
d d d d c
class

Answers tend to be categorical: 1-2 times per week, 3-5 times per week, etc., OR very important, somewhat important, not important, etc. Income data is in thousands, annually.

I have tried to define income as integer data (rounded) but this seems to have no impact, i.e., Orange still determines this class attribute is a meta-attribute. I am wondering if the data needs cleaning; this are a few "don't know" and missing responses in the income data.

Thanks in advance for your help.


Return to Questions & Support



cron