Self-organizing maps (som)

Self-organizing map (SOM) is an unsupervised learning algorithm that infers low, typically two-dimensional discretized representation of the input space, called a map. The map preserves topological properties of the input space, such that the cells that are close in the map include data instances that are similar to each other.

Inference of Self-Organizing Maps

The main class for inference of self-organizing maps is SOMLearner. The class initializes the topology of the map and returns an inference objects which, given the data, performs the optimization of the map:

import Orange
som = Orange.projection.som.SOMLearner(map_shape=(8, 8), 
         initialize=Orange.projection.som.InitializeRandom)
data = Orange.data.table("iris.tab")
map = som(data)
class Orange.projection.som.SOMLearner(map_shape=(5, 10), initialize=0, topology=0, neighbourhood=0, batch_train=True, learning_rate=0.05, radius_ini=3, radius_fin=1, epochs=1000, solver=<class 'Orange.projection.som.Solver'>, **kwargs)

Considers an input data set, projects the data instances onto a map, and returns a result in the form of a classifier holding projection information together with an algorithm to project new data instances. Uses Map for representation of projection space, Solver for training, and returns a trained map with information on projection of the training data as crafted by SOMMap.

Parameters:
class Orange.projection.som.SOMMap(map=[], data=[])

Project the data onto the inferred self-organizing map.

Parameters:
__call__(instance, what=0)

Map instance onto the best matching node and predict its class using the majority/mean of the training data in that node.

__getitem__(val)

Return the node at position x, y

__iter__()

Iterate over all nodes in the map

get_best_matching_node(instance)

Return the best matching node for a given data instance

Topology

Orange.projection.som.HexagonalTopology

Hexagonal topology, cells are hexagon-shaped.

Orange.projection.som.RectangularTopology

Rectangular topology, cells are square-shaped

Map initialization

Orange.projection.som.InitializeLinear

Data instances are initially assigned to cells according to their two-dimensional PCA projection.

Orange.projection.som.InitializeRandom

Data instances are initially randomly assigned to cells.

Node neighbourhood

Orange.projection.som.NeighbourhoodGaussian

Gaussian (smoothed) neighborhood.

Orange.projection.som.NeighbourhoodBubble

Bubble (crisp) neighborhood.

Orange.projection.som.NeighbourhoodEpanechicov

Epanechicov (cut and smoothed) neighborhood.

Supervised Learning with Self-Organizing Maps

Supervised learning requires class-labeled data. For training, class information is first added to data instances as a regular feature by extending the feature vectors accordingly. Next, the map is trained, and the training data projected to nodes. Each node then classifies to the majority class. The dimensions corresponding to the class features are then removed from the prototype vector of each node in the map. For classification, the data instance is projected to the best matching cell, returning the associated class.

An example of the code that trains and then classifies on the same data set is:

import Orange
import random
learner = Orange.projection.som.SOMSupervisedLearner(map_shape=(4, 4))
data = Orange.data.Table("iris.tab")
classifier = learner(data)
random.seed(50)
for d in random.sample(data, 5):
    print "%-15s originally %-15s" % (classifier(d), d.getclass())
class Orange.projection.som.SOMSupervisedLearner(map_shape=(5, 10), initialize=0, topology=0, neighbourhood=0, batch_train=True, learning_rate=0.05, radius_ini=3, radius_fin=1, epochs=1000, solver=<class 'Orange.projection.som.Solver'>, **kwargs)

SOMSupervisedLearner is a class used to learn SOM from orange.ExampleTable, by using the class information in the learning process. This is achieved by adding a value for each class to the training instances, where 1.0 signals class membership and all other values are 0.0. After the training, the new values are discarded from the node vectors.

Parameters:
  • data (Orange.data.Table) – class-labeled data set
  • progress_callback – a one argument function to report on inference progress (in %)

Supporting Classes

The actual map optimization algorithm is implemented by Solver class which is used by both the SOMLearner and the SOMSupervisedLearner.

class Orange.projection.som.Solver(**kwargs)

SOM Solver class used to train the map. Supports batch and sequential training. Based on ideas from SOM Toolkit for Matlab.

Parameters:
  • neighbourhood (NeighbourhoodGaussian, NeighbourhoodBubble, or NeighbourhoodEpanechicov) – neighborhood function id
  • radius_ini (int) – initial radius
  • raduis_fin (int) – final radius
  • epoch (int) – number of training interactions
  • batch_train (bool) – if True run the batch training algorithm (default), else use the sequential one
  • learning_rate (float) – learning rate for the sequential training algorithm
__call__(data, map, progress_callback=None)

Train the map from data. Pass progress_callback function to report on the progress.

alpha(iter)

Compute the learning rate from iterations, starting with learning_rate to 0 at the end of training.

radius_seq(iter)

Compute the radius regarding the iterations, not epochs.

train_batch(progress_callback=None)

Batch training algorithm.

train_sequential(progress_callback)

Sequential training algorithm.

train_step_batch(epoch)

A single step of batch training algorithm.

train_step_sequential(epoch, indices=None)

A single step of sequential training algorithm.

Class Map stores the self-organizing map composed of Node objects. The code below (som-node.py) shows an example how to access the information stored in the node of the map:

import Orange
som = Orange.projection.som.SOMLearner(map_shape=(5, 5))
map = som(Orange.data.Table("iris.tab"))
node = map[3, 3]

print "Node position: (%d, %d)" % node.pos
print "Data instances in the node:", len(node.instances)
class Orange.projection.som.Map(map_shape=(20, 40), topology=0)

Self organizing map (the structure). Includes methods for data initialization.

map_shape

A two element tuple containing the map width and height.

topology

Topology of the map (HexagonalTopology or RectangularTopology)

map

Self orginzing map. A list of lists of Node.

__getitem__(pos)

Return the node at position x, y.

__iter__()

Iterate over all nodes in the map.

initialize_map_linear(data, map_shape=(10, 20))

Initialize the map node vectors linearly over the subspace of the two most significant eigenvectors.

initialize_map_random(data=None, dimension=5)

Initialize the map nodes vectors randomly, by supplying either training data or dimension of the data.

unit_coords()

Return the unit coordinates of all nodes in the map as an numpy.array.

unit_distances()

Return a NxN numpy.array of internode distances (based on node position in the map, not vector space) where N is the number of nodes.

vectors()

Return all vectors of the map as rows in an numpy.array.

class Orange.projection.som.Node(pos, map=None, vector=None)

An object holding the information about the node in the map.

pos

Node position.

reference_instance

Reference data instance (a prototype).

instances

Data set with training instances that were mapped to the node.

Examples

The following code (som-mapping.py) infers self-organizing map from Iris data set. The map is rather small, and consists of only 9 cells. We optimize the network, and then report how many data instances were mapped into each cell. The second part of the code reports on data instances from one of the corner cells:

import Orange

import random
random.seed(0)

som = Orange.projection.som.SOMLearner(map_shape=(3, 3),
                initialize=Orange.projection.som.InitializeRandom)
map = som(Orange.data.Table("iris.tab"))

print "Node    Instances"
print "\n".join(["%s  %d" % (str(n.pos), len(n.instances)) for n in map])

i, j = 0, 1
print
print "Data instances in cell (%d, %d):" % (i, j)
for e in map[i, j].instances:
    print e

The output of this code is:

Node    Instances
(0, 0)  31
(0, 1)  7
(0, 2)  0
(1, 0)  24
(1, 1)  7
(1, 2)  50
(2, 0)  10
(2, 1)  21
(2, 2)  0

Data instances in cell (0, 1):
[6.9, 3.1, 4.9, 1.5, 'Iris-versicolor']
[6.7, 3.0, 5.0, 1.7, 'Iris-versicolor']
[6.3, 2.9, 5.6, 1.8, 'Iris-virginica']
[6.5, 3.2, 5.1, 2.0, 'Iris-virginica']
[6.4, 2.7, 5.3, 1.9, 'Iris-virginica']
[6.1, 2.6, 5.6, 1.4, 'Iris-virginica']
[6.5, 3.0, 5.2, 2.0, 'Iris-virginica']