Orange Forum • View topic - Similar to Orange in C

Similar to Orange in C

A place to ask questions about methods in Orange and how they are used and other general support.

Similar to Orange in C

Postby Dieter H. » Fri Jan 13, 2006 12:53

Hi.

I have been gratefully working with Orange for over a year now and very much appreciate it's versatility.

Since my new project involves an extremely large dataset that will need to be applied to a real time environment, I need to find somethng similar to Orange's diversity with an API in Standard C.

Would there be any such packages that you would recomend?

Thank you,
Dieter

Postby Blaz » Sat Jan 14, 2006 19:59

Which algorithms should it support (otherwise, i guess it will be hard to find a comprehensive data mining suite in pure C)? For instance, C4.5 in is pure C.

Postby Guest » Mon Jan 16, 2006 7:38

Blaz wrote:Which algorithms should it support (otherwise, i guess it will be hard to find a comprehensive data mining suite in pure C)? For instance, C4.5 in is pure C.

Hi Blaz,

Well, I'm not sure yet what will be best, I'm working with an extremely large multi-variate, nonstationary, continuous time-series.

For pre-processing, I will normalize, rank, and (most probably) discretize in various ways.

The training will be supervised and done off-line, and the class comparison/prediction will be done in real-time/on-line as fast as possible (time and accuracy is critical).

What would you suggest for an algorithm to consider? Btree? NN? Bayes? SVM?... etc?

I did find one fairly comprehensive package called LNKnet from MIT. But I have not investigated it yet.

Here's the url if anyone's interested:
http://www.ll.mit.edu/IST/lnknet/index.html


Any suggestions would be greatly appreciated.

Thanks,
Dieter

Postby Blaz » Mon Jan 16, 2006 9:58

Since traning time is not a problem (and will be done off-line) you can in fact use any large (non-C) data mining suite which, like Orange, allows you to investigate and save the resulting model in some format. All that you will have to do then is to code the classifier in C (which, at least for some classifiers like Naive Bayes, classification trees, and alike, should be quite trivial). For Orange it would also be quite easy to write a script that takes any such model and outputs the required code in C.

Postby Dieter » Mon Jan 16, 2006 15:01

Ok, thank you very much.
Dieter.

Postby F. Capiez » Tue Jan 17, 2006 11:16

Blaz wrote: All that you will have to do then is to code the classifier in C (which, at least for some classifiers like Naive Bayes, classification trees, and alike, should be quite trivial). For Orange it would also be quite easy to write a script that takes any such model and outputs the required code in C.


I would be extremely interested in such a script if were available. I have a little experience with SWIG (www.swig.org) so it would be relatively easy for me to create wrappers around the generated C code. After this the classifiers could be imported in Python as any other function.
I am far from fluent in C so I would not know where to start regarding this C code generating script. If it is as trivial as mentioned above, and if someone gets down to generating such scripts, it would be very nice to make them available to the public.
As far as I am concerened, I would be happy even with tree classifiers only - as my main domain of investigation is curently classification forests.

Anyway, Orange is already a great software, congratulations to the developpers and keep up the good work.

Fabrice Capiez

PS: thank you for the link to LNKnet, I will investigate it, as it also appears to be possible to generate C code of the classifiers with this library (first, I'll have to see if I manage to compile their sources with minGW ).

Postby F. Capiez » Fri Feb 03, 2006 3:35

If you want something done, do it yourself, as they say.

As no one appears to be interested in the problem I am willing to give it a try, warning beforehand that my success chances are not very high due to my level in C programming.

This is a question to Blaz : since you wrote that coding the classifier in C should be quite trivial and that writing a script to do this should be quite easy, could you at least give me a few pointers at where to start from ?

thank you in advance,

F. Capiez

Postby Janez » Fri Feb 03, 2006 11:50


Postby Guest » Mon Feb 06, 2006 1:40

Thank you Janez.

I have read the file in question but unless I misundersood it, the explainations written in it are precisely related to the part I intended to avoid by using SWIG. I am trying to keep my goal as simple as possible which probably means generating a bare-bone, Orange-independant C function that would evaluate just like the original tree classifier (given a simple array of float values) . I could always reintegrate this function in a Python class to reinstore compatibility with Orange later.

Blaz mentioned a (Python, I presume) script, so the way I understood it was something like the following:

F=open("MyExportedTree.C","w")
F.write(Invariant_Header_of_Classifier)
for node in tree :
parameters=Retreive_Decisional_Values(node)
F.write(C_code_template_for_a_node % tuple(parameters))
F.write(Invariant_End_of_Classifier)

I would then have to feed the C file to SWIG and coud reimport the function in Python.

Did I misinterpret what Blaz stated ?
When I asked for pointers as how to get started, I was not very clear. What I would like to know is :
-Which files/part of files should I try to take the C code templates from (in the case of a tree classifier) ?
-what are the parameters I need to extract from the Orange Tree Classifier object ?


Thank you in advance.

F.Capiez

Postby Dieter » Tue Feb 07, 2006 17:58

fwiw,

The Lnknet sw allows you to use a gui to actually generate classifiers in C source code (outputs acutual code), which might then be possible to utilize in orange/python as blaz/janez have indicated.

Dieter

Postby F. Capiez » Wed Feb 08, 2006 0:25

Indeed, that is what I read.
Unfortunately, Lnknet needs Cygwin to run on windows (and my Cygwin install is all messed up).
But the main problem for me is that Orange enabled me to create a classifier based on random forests which would be difficult to reimplement in another language. That is why I am primarily interested in how to save computed trees as they are the building blocks of my classifier.

Thank you anyway.

F. Capiez

PS : I noticed after having sent my previous post that I should have used the "code" tags, as the indentation of my peudocode got lost in the formatting...

Postby Janez » Thu Feb 09, 2006 14:04

OK, now I got it, I guess.

You can take the template from mymodule.zip (the link I sent) - just take the classifier if learning is done outside orange. You can do it from Python script, if you want to. You can use Swig if you really want to, but I'd still suggest avoiding it - just modify the appropriate parts of mymodule and run pyxtract to build the interface. This should be smoother.

Classifier cannot be a C function that is completely independent from Python: it must accept an argument of type orange.Example, so it must import quite a few orange classes. But you can use the above template to convert orange.Example into an ordinary Python list and then call your "independent" function. The alternative is to convert it to a list through Python (your function would in this accept PyObject * and call it's method native(), which returns a list).

I don't understand exactly what you meant by "what to extract of the tree classifier".

Postby Guest » Tue Feb 14, 2006 0:55

After struggling 2 days with de C++ code, I finally decided that it was too much for me and for my actual needs. So I spent the next hour to write the following (uncommented) script :

Code: Select all
import cPickle
import psyco
psyco.full()

class MyTreeClassifier:
    STRINGTYPE=type("String")
    TUPLETYPE=type(("Tuple",{}))
    def __init__(self,source=None):
        if source:
            if type(source)==self.STRINGTYPE:
                self.LoadFromPickle(source)
            elif type(source)==self.TUPLETYPE:
                self.tree=source
            else :
                self.ImportTreeFromOrange(source)
               

    def LoadFromPickle(self,path):
        F=open(path,"rb")
        self.tree=cPickle.load(F)
        F.close()
   
    def SaveToDisk(self,path) :
        F=open(path,"wb")
        cPickle.dump(self.tree,F,cPickle.HIGHEST_PROTOCOL)
        F.close()   
   
    def ImportTreeFromOrange(self,OrangeTree):
        self.tree=self.ImportNodeFromOrange(OrangeTree)
       
    def ImportNodeFromOrange(self,node):   
        if not node:
            print "Null Node"
            return None
   
        if node.branchSelector:
            VarName=node.branchSelector.classVar.name
            VarValue=node.branchDescriptions[0]
            TobeSaved = ("example['%s']%s"%(VarName,VarValue),{})
            TobeSaved[1][True]=self.ImportNodeFromOrange(node.branches[0])
            TobeSaved[1][False]=self.ImportNodeFromOrange(node.branches[1])
            return TobeSaved
           
        else:
            return node.nodeClassifier.defaultValue.value
       
    def Classify(self,example):
        def EvaluateNode(node,example):
            result=node[1][eval(node[0])]
            if type(result)==self.STRINGTYPE:
                return result
            elif result is not None :
                return EvaluateNode(result,example)
            else :
                return None
       
        return EvaluateNode(self.tree,example)
       
class MyForestClassifier:
    STRINGTYPE=type("String")
    def __init__(self,source=None) :
        if source:
            if type(source)==self.STRINGTYPE:
                self.LoadFromPickle(source)
            else :
                self.ImportForest(source)       
               
    def LoadFromPickle(self,path):
        F=open(path,"rb")
        self.forest=[MyTreeClassifier(i) for i in cPickle.load(F)]
        F.close()
   
    def SaveToDisk(self,path) :
        F=open(path,"wb")
        cPickle.dump([i.tree for i in self.forest],F,cPickle.HIGHEST_PROTOCOL)
        F.close()   
       
    def ImportForest(self, OrangeForest):
        self.forest=([MyTreeClassifier(i.tree) for i in OrangeForest.classifiers])
       
    def Classify(self,example):
        results={}
        for i in self.forest:
            vote=i.Classify(example)
            if vote in results : results[vote]+=1
            else : results[vote]=1
        total=len(self.forest)*1.0
        best=None
        maxVal=-1
        for key,val in results.items():
            if val>maxVal:
                best,maxVal=key,val
            results[key]=val/total
        return (best,results)


It is not very elegant, to put it mildly, but it does work either with Orange Examples or with a dictionary. As far as I am concerened, I does the job of saving and loading to/from disc my decision forests. I loose the probablilities of individual trees but it is compensated by the use of a forest. I guess that it could be possible to alter the class in order to make it compatible completely with orange.Classifier but I did not have the time to think about it, having lost quite a lot of time with C++)

A big word of caution : this implementation was really thought as a quick and dirty fix for my problem, I post it here in case someone is interested in modifying it for his own needs. My data for instance uses only continuous values so I haven't taken categorical values in account at all.

Now for the speed improvement :
with a moderate-sized forest, it took orange 8 seconds to recalculalte the forest at every start-up of the program and it now takes 0.1 sec to load from disc. On the other hand, classifying a single example with orange takes less than a 1/100th of second whereas my classifier takes 0.4 seconds.
In usage, I usually classify only one example at a time so I don't mind the small delay (still, it is less than a second) but I guess that someone who has to classify a whole set of example will find this quite a nuisance.

F. Capiez

PS: regarding Null Nodes, I haven't met any yet, so, following a very bad programming practice, I just set a print statement and I will have a look a the problem only if it arises.


Return to Questions & Support



cron