Ticket #430 (closed wish: fixed)

Opened 5 years ago

Last modified 5 years ago

ugibanje tipa atributov iz .txt datotek

Reported by: blaz Owned by: janez
Milestone: 2.6 Component: library
Severity: minor Keywords:
Cc: Blocking:
Blocked By:

Description

V pripeti datoteki Orange (napacno) ugane, da sta prva dva atributa diskretna. Pravilno bi bilo, da sta oba string in meta. Predlagam, da spremenis algoritem za uganjevanje tipov atributov tako, da so atributi diskretni:

  • ce je zaloga vrednosti manjsa od npr. 20 vrednosti
  • ce je vec kot polovica vrednosti prisotna vsaj v dveh primerih

V drugacnih primerih pa je atribut string, in se oznaci kot meta (da ne moti pri kaksnih clusteringih ipd).

Tole sicer zgleda obstransko, pa ni: od Gadija (tj tipicnega bio uporabnika) sem namrec dobil email, kjer mi je poslal nekaj takega kot pripeti data file, in se cudil, zakaj mu na njem ne dela heatmap. Tipicno bodo namrec bio uporabniki iz Excela izvazal .txt datoteke brez oznak atributov.

Attachments

guess_attribute_types.txt Download (13.9 KB) - added by blaz 5 years ago.
data file with five attributes

Change History

Changed 5 years ago by blaz

data file with five attributes

comment:1 Changed 5 years ago by janez

  • Status changed from new to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.