Ticket #632 (closed wish: fixed)

Opened 4 years ago

Last modified 4 years ago

Preprocess (widget)

Reported by: blaz Owned by: ales
Milestone: 2.6 Component: canvas
Severity: minor Keywords:
Cc: Blocking:
Blocked By:

Description

Napiši widget za predprocesiranje. Opcijski vhod je Examples, izhodi pa Examples (opcijsko) ter Preprocessor. Izhod Examples uporabis seveda samo, ce imas tudi na vhodu dane primere.

Spremeni vse Learnerje tako, da lahko na vhodu uporabljajo signal Preprocessor.

Spremeni Test Learners tako, da lahko uporablja signal Preprocessor. Pozor: preprocesor v tem widgetu uporabis tako, da dejansko ustrezno ovijes learnerje. (Napacna uporaba bi bila ta, da z njim preprocesiras vhodne podatke, saj bi na ta način to storil tako za učno kot testno množico).

Struktura widgeta:

http://img.skitch.com/20100722-d1uh479g3qry6hm5qjtk9fung5.png

Nekaj možih uporab:

http://img.skitch.com/20100722-mc3fmcyrntawpukaahsgsgnun6.png

Tipi predprocesiranja:

  • Discretize
  • Continuize
  • Impute
  • Feature selection ()

Scoring: pull down menu (use methods from Rank)

o Best X features

o Best Y % features

  • Sample

o X data instances

o Y % data instances

Za prve tri implementiraj, kot je to storjeno v widgetih, ki sicer implementirajo te fukcije (in ponujajo se "ročno" diskretizacijo in kontinuizacijo):

http://img.skitch.com/20100722-pdns9qnyp763y8eah3syrx9y5f.png

Primer uporabe predprocesorjev:

# shows how to use a typical preprocessor

data  = orange.ExampleTable("../../datasets/voting.tab")
newdata = Preprocessor_randomFeatureSelection(data, n=3)
print newdata[0]

pp = Preprocessor_randomFeatureSelection()
newnewdata = pp(data)
print newnewdata[0]

# shows how to use it in combination with preprocessed learner wrapper

nbc = orange.BayesLearner()

pl_one = orngWrap.PreprocessedLearner([Preprocessor_randomFeatureSelection()])
pl_many = orngWrap.PreprocessedLearner(orange.Preprocessor_addClassNoise(proportion=0.5))

nbc1 = pl_one(nbc)
nbc2 = pl_many(nbc)

res = orngTest.crossValidation([nbc, nbc1, nbc2], data)
print orngStat.AUC(res)

(predprocesorji so za PreprocessedLearner lahko našteti v seznamu, ali pa je predprocesor en sam).

Primer "rocno narejenega" predprocesorja:

class Preprocessor_randomFeatureSelection(orange.Preprocessor):
    def __new__(cls, data=None, n=5, **kwds):
        ras = orange.Preprocessor.__new__(cls, **kwds)
        if data:
            ras.__init__(n=n) # force init
            return ras.__call__(data)
        else:
            return ras  # invokes the __init__

    def __init__(self, n=5):
        self.n = n

    def __call__(self, data, weightId=None):
        newdomain = random.sample(data.domain.attributes, min(self.n, len(data.domain.attributes)))
        if data.domain.classVar:
            return orange.ExampleTable(orange.Domain(newdomain + [data.domain.classVar], True), data)
        else:
            return orange.ExampleTable(orange.Domain(newdomain, False), data)

Change History

comment:1 Changed 4 years ago by ales

  • Status changed from new to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.