Postby raspi177 » Thu Nov 29, 2012 22:03

Ive got a list like
I'd like to isolate the repeating items, like b c b c or whatever are repeating. I could test them against some regular expressions, like xyxy or xyzxyz, for example xyxy.. would be something like (.)(.)(\1\2)+ or so but is there a nicer way, a fuzzy way with orange, which also catches bcbibc or maybe bcbcibc? I think I can use Attribute Distance again, but what to compare against? bcbc or cdcd, anything can repeat..

I should get the a b c b c d list from my original a b c B C d list by Attribute Distance-ing by the way, with the resulting matrix
Code: Select all
  a   b   c   B   C   d
a 1 ..
b 0.1 1   ...
c 0   0   1
B 0  0.9 0   1
C 0   0   0.9   1..

I join everything 0.9 or above. Is there a safer way, i.e. a ready made function?

And finally, this should be a scraper for identifying listed data on a public transport web page for xbmc on raspberry pi, in python, in case you like to know, or its important.
Ahoi, raspi

Re: find repeating items

Postby raspi177 » Tue Dec 04, 2012 16:15

I 'm going to use just Xpath, no pattern detection. And I was thinking of another way: Actually this website info for public transport, listing train departures and such, is the same all the time except the train name and time fields, which I'm interested in. But, still too complicated.

Ahoi ang good luck

