Orange Forum • View topic - Naive Bayes bug?

Naive Bayes bug?

Report bugs (or imagined bugs).
(Archived/read-only, please use our ticketing system for reporting bugs and their discussion.)
Forum rules
Archived/read-only, please use our ticketing system for reporting bugs and their discussion.

Naive Bayes bug?

Postby vid » Wed Feb 22, 2006 18:39

Why orange crashes (infinite loop), when a piece of code like the following is executed:

##
import orange
servo = orange.ExampleTable('servo')
bayes = orange.BayesLearner()

preprocessor = orange.Preprocessor_discretize()
preprocessor.method = orange.EquiNDiscretization(numberOfIntervals=5)
preprocessor.attributes = [servo.domain.classVar]

Dservo = preprocessor(servo)
BayesLearner = orange.BayesLearner()
classifier = BayesLearner(discretizedData)
##

Regards
Vid

Postby Janez » Wed Feb 22, 2006 20:33

What is 'discretizedData' in the last line? If I replace it with Dservo (which is probably what you meant), the script works.

But I'd still like to know what exactly is in discretizedData, so I can see what happens there.

Postby vid » Thu Feb 23, 2006 17:57

That "discretizedData" was a mistake, you correctly replaced it with "Dservo".

The problem, however, remains. It seems very strange. When regression dataset is loaded from a shortened .tab file which is perfecly normal, Orange crashes after using a learner on the discretized exampleTable.
Shortened file is tab delimited and is loaded correctly as it can be seen from the shell report:


Portions Copyright 2003-2005 www.stani.be - see credits in manual for further copyright information.
Please donate if you find this program useful (see help menu). Double click to jump to error source code.
Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import orange
>>> orange.version
'0.99b (23:46:27, Oct 7 2005)'
>>> os.chdir('E:/predmeti/tehnZnanja/NOVO/data/reg/bugtest')
>>> data = orange.ExampleTable('housing-5')
>>> print data.domain.attributes
<FloatVariable 'CRIM', FloatVariable 'ZN', FloatVariable 'INDUS', FloatVariable 'CHAS', FloatVariable 'NOX', FloatVariable 'RM', FloatVariable 'AGE', FloatVariable 'DIS', FloatVariable 'RAD', FloatVariable 'TAX', FloatVariable 'PTRATIO', FloatVariable 'B', FloatVariable 'LSTAT'>
>>> print data.domain.classVar
FloatVariable 'MEDV'
>>> for x in data:
... print x
...
[0.00632, 18, 2.31, 0, 0.538, 6.575, 65.2, 4.0900, 1, 296, 15.3, 396.90, 4.98, 24.0]
[0.02731, 0, 7.07, 0, 0.469, 6.421, 78.9, 4.9671, 2, 242, 17.8, 396.90, 9.14, 21.6]
[0.02729, 0, 7.07, 0, 0.469, 7.185, 61.1, 4.9671, 2, 242, 17.8, 392.83, 4.03, 34.7]
[0.03237, 0, 2.18, 0, 0.458, 6.998, 45.8, 6.0622, 3, 222, 18.7, 394.63, 2.94, 33.4]
[0.06905, 0, 2.18, 0, 0.458, 7.147, 54.2, 6.0622, 3, 222, 18.7, 396.90, 5.33, 36.2]
>>> preprocessor = orange.Preprocessor_discretize()
>>> preprocessor.method = orange.EquiNDiscretization(numberOfIntervals=3)
>>> preprocessor.attributes = [data.domain.classVar]
>>> discretizedData = preprocessor(data)
>>> print discretizedData.domain.attributes
<FloatVariable 'CRIM', FloatVariable 'ZN', FloatVariable 'INDUS', FloatVariable 'CHAS', FloatVariable 'NOX', FloatVariable 'RM', FloatVariable 'AGE', FloatVariable 'DIS', FloatVariable 'RAD', FloatVariable 'TAX', FloatVariable 'PTRATIO', FloatVariable 'B', FloatVariable 'LSTAT'>
>>> print discretizedData.domain.classVar
EnumVariable 'D_MEDV'
>>> for x in discretizedData:
... print x
...
[0.00632, 18, 2.31, 0, 0.538, 6.575, 65.2, 4.0900, 1, 296, 15.3, 396.90, 4.98, '(22.80, 34.05]']
[0.02731, 0, 7.07, 0, 0.469, 6.421, 78.9, 4.9671, 2, 242, 17.8, 396.90, 9.14, '<=22.80']
[0.02729, 0, 7.07, 0, 0.469, 7.185, 61.1, 4.9671, 2, 242, 17.8, 392.83, 4.03, '>34.05']
[0.03237, 0, 2.18, 0, 0.458, 6.998, 45.8, 6.0622, 3, 222, 18.7, 394.63, 2.94, '(22.80, 34.05]']
[0.06905, 0, 2.18, 0, 0.458, 7.147, 54.2, 6.0622, 3, 222, 18.7, 396.90, 5.33, '>34.05']
>>> learner = orange.BayesLearner()
>>> classifier = learner(discretizedData)
.....crash

When kNN is used instead of Bayes, it works. It also works with Bayes and the original housing.tab.
So what can be wrong with housing-5.tab / discretizedData / Bayes ?
I can send housing-5.tab if you want to test with exactly the same file.

Regards
Vid


Return to Bugs