Orange Forum • View topic - [Solved]Classification with missing data

[Solved]Classification with missing data

A place to ask questions about methods in Orange and how they are used and other general support.

[Solved]Classification with missing data

Postby manisha » Mon Jun 17, 2013 17:09

Hello everyone,

I have been using Orange since a quite time for classification. Now, I need to use a data file with some missing attribute values. The data is provided in the form of .arff, so I have to convert them to orange data files. The missing values are represented by '?'.
I want to learn from this file which contains some missing values and also test those models on data files with missing value too. I tried to do a small test with Decision tree classifiers (TreeLearner and TreeClassifiers), but it gave error.

Code: Select all
TypeError: test_on_data() got an unexpected keyword argument 'testResults'
Exception TypeError: "'NoneType' object is not callable" in  ignored


Are the existing orange classifiers and learners able to deal with missing data ? Or we need to treat the data ourself to replace the missing value with something meaningful ?
Last edited by manisha on Fri Oct 25, 2013 14:33, edited 1 time in total.

Re: Classification with missing data

Postby Ales » Tue Jun 18, 2013 9:54

manisha wrote:Are the existing orange classifiers and learners able to deal with missing data ?
Yes, most should deal with missing data.

manisha wrote:
Code: Select all
TypeError: test_on_data() got an unexpected keyword argument 'testResults'
Exception TypeError: "'NoneType' object is not callable" in  ignored
Can you post the full traceback and sample code to trigger the exception. This seems to be a problem in Orange.evaluation.testing and not the TreeLearner/Classifier.

Re: Classification with missing data

Postby manisha » Tue Jun 18, 2013 10:09

Hi Ales,

Thanks for the reply. Here is the whole traceback

Code: Select all
Traceback (most recent call last):
  File "/Users/manisha/Documents/workspace/LP_SRA/src/test.py", line 219, in <module>
    ml.evaluate([tree_classifier], new_tpath)
  File "/Users/manisha/Documents/workspace/LP_SRA/src/MachineLearning.py", line 184, in evaluate
    res = orngTest.testOnData(models, data_test, testResults=None, iterationNumber=10) 
TypeError: test_on_data() got an unexpected keyword argument 'testResults'
Exception TypeError: "'NoneType' object is not callable" in  ignored


Re: Classification with missing data

Postby manisha » Wed Jun 19, 2013 13:29

Hi Ales,

This is me again.Here is the the piece of code I used for a test with missing data.
Code: Select all
import orngIO, orngStat, orngTest, orngTree        #@UnresolvedImport
test_file = "test"
tree_learner = orngTree.TreeLearner(mForPrunning=2, name="tree_learner")
data = orngIO.loadARFF(test_file)
print "Possible Classes :", data.domain.classVar.values
attributes = [x.name for x in data.domain.attributes]             
print "# Attributes :",len(attributes)
tree_classifier = tree_learner(data)
res = orngTest.testOnData([tree_classifier], data, testResults=None, iterationNumber=10)
cm = orngStat.computeConfusionMatrices(res)
precision = orngStat.precision(cm[0])
recall = orngStat.recall(cm[0])
AUC = orngStat.AUC(res)
print "Precision : ", precision
print "Recall : ",recall
print "AUC : " ,AUC[0]


I used the data that is stored in a test file in .arff format. So I have used loadARFF that converts it to orange data. I trained decision tree classifier on that and tried to test on the same data. Moreover there is only the first line in the data file that has some of its values missing.
Here is what I got after execution.

Code: Select all
/Library/Python/2.6/site-packages/Orange-2.7-py2.6-macosx-10.6-universal.egg/Orange/__init__.py:111: UserWarning: Some features will be disabled due to failing modules
Importing 'classification.neural' failed: No module named scipy.sparse
  _import("classification.neural")
/Library/Python/2.6/site-packages/matplotlib-0.91.1-py2.6-macosx-10.6-universal.egg/matplotlib/__init__.py:62: DeprecationWarning: the md5 module is deprecated; use hashlib instead
  import md5, os, re, shutil, sys, warnings
/Library/Python/2.6/site-packages/matplotlib-0.91.1-py2.6-macosx-10.6-universal.egg/pytz/tzinfo.py:5: DeprecationWarning: the sets module is deprecated
  from sets import Set
/Library/Python/2.6/site-packages/Orange-2.7-py2.6-macosx-10.6-universal.egg/Orange/__init__.py:129: UserWarning: Importing 'regression.lasso' failed: No module named scipy.linalg
  _import("regression.lasso")
Possible Classes : <0, 1>
# Attributes : 14
Traceback (most recent call last):
  File "/Users/manisha/Documents/workspace/LP_SRA/src/test.py", line 230, in <module>
    res = orngTest.testOnData([tree_classifier], data, iterationNumber=10)
TypeError: test_on_data() got an unexpected keyword argument 'iterationNumber'
Exception TypeError: "'NoneType' object is not callable" in  ignored



I use python and orange in eclipse using pydev plugin. I recently upgraded the orange that I had using easy_install upgrade on Mac OS.

Hoping to get some further help from you. Thanks.

Regards,
Manisha

Re: Classification with missing data

Postby Ales » Wed Jun 19, 2013 13:52

Just replace
Code: Select all
res = orngTest.testOnData([tree_classifier], data, testResults=None, iterationNumber=10)
with
Code: Select all
res = orngTest.testOnData([tree_classifier], data)

The testResult and iterationNumber parameters were removed in Orange v2.5 (they should never have been part of the public interface anyway)

Re: Classification with missing data

Postby manisha » Wed Jun 19, 2013 14:18

Thanks a lot Ales. I t works much better now.
Just that it still gives an error or exception. Can you please tell me why?

Code: Select all
Possible Classes : <0, 1>
# Attributes : 14
Precision :  1.0
Recall :  0.352941176471
AUC :  0.746972318339
Exception TypeError: "'NoneType' object is not callable" in  ignored

Re: Classification with missing data

Postby Ales » Wed Jun 19, 2013 15:20

Seems to be related to this bug, although it was fixed before 2.7 release.
Are you maybe using the shelve module in your own code?

Re: Classification with missing data

Postby Ales » Wed Jun 19, 2013 15:28

PS:

You can check where the error occurs by running python with the "-v" flag
Code: Select all
python -v yourscript.py

Re: Classification with missing data

Postby manisha » Wed Jun 19, 2013 16:10

I have not used shelve with this code.
I used -v option and it showed lots of information about import and installation etc.
In between I found these lines

Code: Select all
# cleanup[2] Orange.utils.addons
Exception TypeError: "'NoneType' object is not callable" in  ignored


Couldn't understand all though :(

Re: Classification with missing data

Postby manisha » Thu Jun 20, 2013 8:40

Hi Ales,

Can you please suggest me what should I do to remove this above mentioned error?
I am still getting it whatever execution I do with Orange and also I am not using shelve in my program. This was not the case before I made the upgrade.

Thanks and Regards,

Manisha

Re: Classification with missing data

Postby Ales » Thu Jun 20, 2013 12:21

Seems the problem is this issue7835 (which version of python are you using?).
A (closed) shelve instance was kept in the Orange.utils.addons module name space. This might have triggered the error on shutdown.
I have fixed this. You can install an updated development version using
Code: Select all
easy_install https://bitbucket.org/biolab/orange/get/tip.tar.gz#egg=Orange-2.7.1.dev

Re: Classification with missing data

Postby manisha » Thu Jun 20, 2013 15:39

Thanks a lot Ales.
It works without any error now.
Just one last question. Can you give me the link or document which describes how actually different Learners or classifiers treat the missing data? I could not find sufficient information in the documentation.
Thanks again.


Return to Questions & Support