Self-organizing maps (som)¶
Self-organizing map (SOM) is an unsupervised learning algorithm that infers low, typically two-dimensional discretized representation of the input space, called a map. The map preserves topological properties of the input space, such that the cells that are close in the map include data instances that are similar to each other.
Inference of Self-Organizing Maps¶
The main class for inference of self-organizing maps is SOMLearner. The class initializes the topology of the map and returns an inference objects which, given the data, performs the optimization of the map:
import Orange
som = Orange.projection.som.SOMLearner(map_shape=(8, 8),
initialize=Orange.projection.som.InitializeRandom)
data = Orange.data.table("iris.tab")
map = som(data)
- class Orange.projection.som.SOMLearner(map_shape=(5, 10), initialize=0, topology=0, neighbourhood=0, batch_train=True, learning_rate=0.05, radius_ini=3, radius_fin=1, epochs=1000, solver=<class 'Orange.projection.som.Solver'>, **kwargs)¶
Considers an input data set, projects the data instances onto a map, and returns a result in the form of a classifier holding projection information together with an algorithm to project new data instances. Uses Map for representation of projection space, Solver for training, and returns a trained map with information on projection of the training data as crafted by SOMMap.
Parameters: - map_shape (tuple) – dimension of the map
- initialize (InitializeRandom or InitializeLinear) – initialization type id; linear initialization assigns the data to the cells according to its position in two-dimensional principal component projection
- topology (HexagonalTopology or RectangularTopology) – topology type id
- neighbourhood (NeighbourhoodGaussian, NeighbourhoodBubble, or NeighbourhoodEpanechicov) – cell neighborhood type id
- batch_train (bool) – perform batch training?
- learning_rate (float) – learning rate
- radius_ini (int) – initial radius
- radius_fin (int) – final radius
- epochs (int) – number of epochs (iterations of a training steps)
- solver – a class with the optimization algorithm
- class Orange.projection.som.SOMMap(map=[], data=[])¶
Project the data onto the inferred self-organizing map.
Parameters: - map (SOMMap) – a trained self-organizing map
- data (Orange.data.Table) – the data to be mapped on the map
- __call__(instance, what=0)¶
Map instance onto the best matching node and predict its class using the majority/mean of the training data in that node.
- __getitem__(val)¶
Return the node at position x, y
- __iter__()¶
Iterate over all nodes in the map
- get_best_matching_node(instance)¶
Return the best matching node for a given data instance
Topology¶
- Orange.projection.som.HexagonalTopology¶
Hexagonal topology, cells are hexagon-shaped.
- Orange.projection.som.RectangularTopology¶
Rectangular topology, cells are square-shaped
Supervised Learning with Self-Organizing Maps¶
Supervised learning requires class-labeled data. For training, class information is first added to data instances as a regular feature by extending the feature vectors accordingly. Next, the map is trained, and the training data projected to nodes. Each node then classifies to the majority class. The dimensions corresponding to the class features are then removed from the prototype vector of each node in the map. For classification, the data instance is projected to the best matching cell, returning the associated class.
An example of the code that trains and then classifies on the same data set is:
import Orange
import random
learner = Orange.projection.som.SOMSupervisedLearner(map_shape=(4, 4))
data = Orange.data.Table("iris.tab")
classifier = learner(data)
random.seed(50)
for d in random.sample(data, 5):
print "%-15s originally %-15s" % (classifier(d), d.getclass())
- class Orange.projection.som.SOMSupervisedLearner(map_shape=(5, 10), initialize=0, topology=0, neighbourhood=0, batch_train=True, learning_rate=0.05, radius_ini=3, radius_fin=1, epochs=1000, solver=<class 'Orange.projection.som.Solver'>, **kwargs)¶
SOMSupervisedLearner is a class used to learn SOM from orange.ExampleTable, by using the class information in the learning process. This is achieved by adding a value for each class to the training instances, where 1.0 signals class membership and all other values are 0.0. After the training, the new values are discarded from the node vectors.
Parameters: - data (Orange.data.Table) – class-labeled data set
- progress_callback – a one argument function to report on inference progress (in %)
Supporting Classes¶
The actual map optimization algorithm is implemented by Solver class which is used by both the SOMLearner and the SOMSupervisedLearner.
- class Orange.projection.som.Solver(**kwargs)¶
SOM Solver class used to train the map. Supports batch and sequential training. Based on ideas from SOM Toolkit for Matlab.
Parameters: - neighbourhood (NeighbourhoodGaussian, NeighbourhoodBubble, or NeighbourhoodEpanechicov) – neighborhood function id
- radius_ini (int) – initial radius
- raduis_fin (int) – final radius
- epoch (int) – number of training interactions
- batch_train (bool) – if True run the batch training algorithm (default), else use the sequential one
- learning_rate (float) – learning rate for the sequential training algorithm
- __call__(data, map, progress_callback=None)¶
Train the map from data. Pass progress_callback function to report on the progress.
- alpha(iter)¶
Compute the learning rate from iterations, starting with learning_rate to 0 at the end of training.
- radius_seq(iter)¶
Compute the radius regarding the iterations, not epochs.
- train_batch(progress_callback=None)¶
Batch training algorithm.
- train_sequential(progress_callback)¶
Sequential training algorithm.
- train_step_batch(epoch)¶
A single step of batch training algorithm.
- train_step_sequential(epoch, indices=None)¶
A single step of sequential training algorithm.
Class Map stores the self-organizing map composed of Node objects. The code below (som-node.py) shows an example how to access the information stored in the node of the map:
import Orange
som = Orange.projection.som.SOMLearner(map_shape=(5, 5))
map = som(Orange.data.Table("iris.tab"))
node = map[3, 3]
print "Node position: (%d, %d)" % node.pos
print "Data instances in the node:", len(node.instances)
- class Orange.projection.som.Map(map_shape=(20, 40), topology=0)¶
Self organizing map (the structure). Includes methods for data initialization.
- map_shape¶
A two element tuple containing the map width and height.
- topology¶
Topology of the map (HexagonalTopology or RectangularTopology)
- __getitem__(pos)¶
Return the node at position x, y.
- __iter__()¶
Iterate over all nodes in the map.
- initialize_map_linear(data, map_shape=(10, 20))¶
Initialize the map node vectors linearly over the subspace of the two most significant eigenvectors.
- initialize_map_random(data=None, dimension=5)¶
Initialize the map nodes vectors randomly, by supplying either training data or dimension of the data.
- unit_coords()¶
Return the unit coordinates of all nodes in the map as an numpy.array.
- unit_distances()¶
Return a NxN numpy.array of internode distances (based on node position in the map, not vector space) where N is the number of nodes.
- vectors()¶
Return all vectors of the map as rows in an numpy.array.
Examples¶
The following code (som-mapping.py) infers self-organizing map from Iris data set. The map is rather small, and consists of only 9 cells. We optimize the network, and then report how many data instances were mapped into each cell. The second part of the code reports on data instances from one of the corner cells:
import Orange
import random
random.seed(0)
som = Orange.projection.som.SOMLearner(map_shape=(3, 3),
initialize=Orange.projection.som.InitializeRandom)
map = som(Orange.data.Table("iris.tab"))
print "Node Instances"
print "\n".join(["%s %d" % (str(n.pos), len(n.instances)) for n in map])
i, j = 0, 1
print
print "Data instances in cell (%d, %d):" % (i, j)
for e in map[i, j].instances:
print e
The output of this code is:
Node Instances
(0, 0) 31
(0, 1) 7
(0, 2) 0
(1, 0) 24
(1, 1) 7
(1, 2) 50
(2, 0) 10
(2, 1) 21
(2, 2) 0
Data instances in cell (0, 1):
[6.9, 3.1, 4.9, 1.5, 'Iris-versicolor']
[6.7, 3.0, 5.0, 1.7, 'Iris-versicolor']
[6.3, 2.9, 5.6, 1.8, 'Iris-virginica']
[6.5, 3.2, 5.1, 2.0, 'Iris-virginica']
[6.4, 2.7, 5.3, 1.9, 'Iris-virginica']
[6.1, 2.6, 5.6, 1.4, 'Iris-virginica']
[6.5, 3.0, 5.2, 2.0, 'Iris-virginica']