Linear projection (linear)¶
Linear transformation of the data might provide a unique insight into the data through observation of the optimized projection or through visualization of the space with reduced dimensionality.
This module contains the FreeViz linear projection optimization algorithm , PCA and FDA and utility classes for classification of instances based on kNN in the linearly transformed space.
Methods in this module use given data set to optimize a linear projection of features into a new vector space. The transformation is returned as a Projector instance that, when invoked, projects any given data with the domain that matches the domain that was used to optimize the projection.
- class Orange.projection.linear.Projector(**kwds)¶
Stores a linear projection of data and uses it to transform any given data with matching input domain.
Parameters: dataset (Orange.data.Table) – input data set Return type: Orange.data.Table
Array containing means of each variable in the data set that was used to construct the projection.
Domain of the data set that was used to construct principal component subspace.
Domain used in returned data sets. This domain has a continuous variable for each axis in the projected space, and no class variable(s).
Array containing projection (vectors that describe the transformation from input to output domain).
An array containing standard deviations of each variable in the data set that was used to construct the projection.
True, if standardization was used when constructing the projection. If set, instances will be standardized before being projected.
Pricipal Component Analysis (pca)¶
PCA uses an orthogonal transformation to transform input features into a set of uncorrelated features called principal components. This transformation is defined in such a way that the first principal component has as high variance as possible and each succeeding component in turn has the highest variance possible under constraint that it is orthogonal to the preceding components.
Because PCA is sensitive to the relative scaling of the original variables, the default behaviour of PCA class is to standardize the input data.
Optimizer and Projector¶
- class Orange.projection.linear.PCA(standardize=True, max_components=0, variance_covered=1, use_generalized_eigenvectors=0, ddof=1)¶
Orthogonal transformation of data into a set of uncorrelated variables called principal components. This transformation is defined in such a way that the first variable has as high variance as possible.
- standardize (boolean) – perform standardization of the data set.
- max_components (int) – maximum number of retained components.
- variance_covered (float) – percent of the variance to cover with components.
- use_generalized_eigenvectors (boolean) – use generalized eigenvectors (ie. multiply data matrix with inverse of its covariance matrix).
Perform a PCA analysis on a data set and return a linear projector that maps data into principal component subspace.
Parameters: dataset (Orange.data.Table) – input data set. Return type: PcaProjector
Delta degrees of freedom used for numpy operations. 1 means normalization with (N-1) in cov and std operations
- class Orange.projection.linear.PcaProjector(**kwds)¶
- biplot(filename=None, components=(0, 1), title=Biplot)¶
Draw biplot for PCA. Actual projection must be performed via pca(data) before bipot can be used.
- scree_plot(filename=None, title=Scree Plot)¶
Draw a scree plot of principal components
Sum of all variances in the data set that was used to construct the PCA space.
Array containing variances of principal components.
The following example demonstrates a straightforward invocation of PCA (pca-run.py):
import Orange iris = Orange.data.Table("iris.tab") pca = Orange.projection.linear.Pca(iris) transformed_data = pca(iris) print pca
The call to the Pca constructor returns an instance of PcaClassifier, which is later used to transform data to PCA feature space. Printing the classifier displays how much variance is covered with the first few components. Classifier can also be used to access transformation vectors (eigen_vectors) and variance of the pca components (eigen_values). Scree plot can be used when deciding, how many components to keep (pca-scree.py):
import Orange iris = Orange.data.Table("iris.tab") pca = Orange.projection.linear.Pca()(iris) pca.scree_plot("pca-scree.png")
Fisher discriminant analysis (fda)¶
As a variant of LDA (Linear Discriminant Analysis), FDA finds a linear combination of features that separates two or more classes best.
Optimizer and Projector¶
- class Orange.projection.linear.Fda¶
Construct a linear projection of data using FDA. When using this projection optimization method, data is always standardized prior to being projected.
If data instances are provided to the constructor, the optimization algorithm is called and the resulting projector (FdaProjector) is returned instead of the optimizer (instance of this class).
Return type: Fda or FdaProjector
Freeviz (Demsar et al, 2005) is a method that finds a good two-dimensional linear projection of the given data, where the quality is defined by a separation of the data from different classes and the proximity of the instances from the same class. FreeViz would normally be used through a widget since it is primarily a method for graphical exploration of the data. About the only case where one would like to use this module directly is to tests the classification aspects of the method, that is, to verify the accuracy of the resulting kNN-like classifiers on a set of benchmark data sets.
Description of the method itself is far beyond the scope of this page. See the above paper for the original version of the method; at the moment of writing the method has been largely extended and not published yet, though the basic principles are the same.
 Janez Demsar, Gregor Leban, Blaz Zupan: FreeViz - An Intelligent Visualization Approach for Class-Labeled Multidimensional Data Sets, Proceedings of IDAMAP 2005, Edinburgh.
- class Orange.projection.linear.FreeViz(graph=None)¶
Contains an easy-to-use interface to the core of the method, which is written in C++. Differs from other linear projection optimizers in that it itself can store the data to make iterative optimization and visualization possible. It can, however, still be used as any other projection optimizer by calling (__call__) it.
Perform FreeViz optimization on the dataset, if given, and return a resulting linear Projector. If no dataset is given, the projection currently stored within the FreeViz object is returned as a Projector.
Parameters: dataset (Orange.data.Table) – input data set. Return type: Projector
If set, the forces are balanced so that the total sum of the attractive equals the total of repulsive, before they are multiplied by the above factors. (By our experience, this gives bad results so you may want to leave this alone.)
The sigma to be used in LAW_GAUSSIAN and LAW_KNN.
Can be LAW_LINEAR, LAW_SQUARE, LAW_GAUSSIAN, LAW_KNN or LAW_LINEAR_PLUS. Default is LAW_LINEAR, which means that the attractive forces increase linearly by the distance and the repulsive forces are inversely proportional to the distance. LAW_SQUARE would make them rise or fall with the square of the distance, LAW_GAUSSIAN is based on a kind of log-likelihood estimation, LAW_KNN tries to directly optimize the classification accuracy of the kNN classifier in the projection space, and in LAW_LINEAR_PLUS both forces rise with the square of the distance, yielding a method that is somewhat similar to PCA. We found the first law perform the best, with the second to not far behind.
If enabled, it keeps the projection of the second attribute on the upper side of the graph (the first is always on the right-hand x-axis). This is useful when comparing whether two projections are the same, but has no effect on the projection’s clarity or its classification accuracy. There are some more, undescribed, methods of a more internal nature.
- optimize_separation(steps=10, single_step=False, distances=None)¶
Optimize the class separation. If you did not change any of the settings which are not documented above, it will call a fast C++ routine which will make steps optimization steps at a time, after which the graph (if one is given) is updated. If single_step is True, it will do that only once, otherwise it calls it on and on, and compares the current positions of the anchors with those 50 calls ago. If no anchor moved for more than 1e-3, it stops. In Orange Canvas the optimization is also stopped if someone outside (namely, the stop button) manages to set the FreeViz’s flag attribute Orange.projection.linear.FreeViz.cancel_optimization.
Reset the projection so that the anchors (projections of attributes) are placed evenly around the circle.
Set the projection to a random one.
FreeViz can be used in code to optimize a linear projection to two dimensions:
import Orange zoo = Orange.data.Table('zoo') optimizer = Orange.projection.linear.FreeViz() projector = optimizer(zoo) for e, projected in zip(zoo, projector(zoo))[:10]: print e, projected
Learner and Classifier¶
- class Orange.projection.linear.FreeVizLearner(freeviz=None, **kwd)¶
If data instances are provided to the constructor, the learning algorithm is called and the resulting classifier is returned instead of the learner.
- class Orange.projection.linear.FreeVizClassifier(dataset, freeviz)¶
A kNN classifier on the 2D projection of the data, optimized by FreeViz.
Usually the learner (Orange.projection.linear.FreeVizLearner) is used to construct the classifier.
When constructing the classifier manually, the following parameters can be passed: