wiki:MatrixFactorization

Version 3 (modified by MarinkaZitnik, 3 years ago) (diff)

Matrix Factorization Techniques for Data Mining

Unified and efficient interface to matrix factorization algorithms and methods. A scripting library which includs a number of published factorization algorithms and initialization methods and facilitates the combination of these to produce new strategies.

Extensive documentation with working examples which demonstrate real applications, commonly used benchmark data and visualization methods are provided to help with the interpretation and comprehension of the results.

Overview

List of algorithms for matrix factorization.

nmf

Standard NMF based on Kullbach Leibler divergence, simple multiplicative updates, enhanced to avoid numerical overflow [2].
Reference: (Brunet, 2004).

s-nmf

Sparse NMF based on alternating non-negativity constrained least squares, solved by a fast non-negativity constrained least squares. Sparseness imposed on the left, right factor.
Reference: (Kim, 2007).

ns-nmf

Non-smooth NMF. Uses a modifed version of Lee and Seung's multiplicative updates for Kullbach-Leibler divergence. It is meant to give sparser results.
Reference: (Pascual-Montano, 2006).

l-nmf

Local Fisher NMF. Add the Fisher constraint to maximize the between-class scatter and minimize the within-class scatter.
Reference: (Wang, 2004).

ls-nmf

Alternating Least Square NMF. It is meant to be very fast compared to other approaches.
Reference: (Kim, 2007).

pmf

Probabilistic MF. Model which scales linearly with the number of observations and performs well on large, sparse, imbalanced datasets.
Reference: (Salakhutdinov, 2008).

nnma

Nonnegative matrix approximation. Method for dimensionality reduction with respect on the nonnegativity of input data. Multiplicative iterative scheme.
Reference: (Sra, 2006).

psmf

Probabilistic Sparse MF. PSMF allows for varying levels of sensor noise, uncertainty in the hidden prototypes used to explain the data and uncertainty as to the prototypes selected to explain each data vector.
Reference: (Dueck, 2005).

bd

Bayesian decomposition. A Bayesian treatment of NMF, based on a normal likelihood and exponential priors, Gibbs sampler to approximate the posterior density.
Reference: (Schmidt, 2003).

bfrm

Bayesian factor regression model. Markov chain Monte Carlo technique.
Reference: (Schmidt, 2003).

i-nmf

Interval-valued NMF.
Reference: (Shen, 2010).

i-pmf

Interval-valued PMF.
Reference: (Shen, 2010).

Milestones

April 20 – May 23 (Before the official coding time)

  • To do some self coding to improve my further understanding of techniques.
  • I will become absolutely clear about my future implementations, design and approaches I will follow.

May 23 – June 18 (Official coding period starts)

  • Interface to perform all algorithms and combine them with initialization methods and extensions.
  • Implementing family of NMF techniques: NMF, nsNMF, lsNMF.

June 18 – July 5

  • Implementing family of NMF techniques: sNMF, lNMF, NNMA.

July 5 – July 15

  • Extend NMF implementations with various extensions, additive and multiplicative update rules and initialization methods.
  • Provide factorization quality measures: measure based on cophenetic correlation coefficient, sparseness, dispersion, residuals.

July 15th mid-term evaluation deadline

July 15 – July 25

  • Implement Bayesian methods. Bayesian decomposition using Gibbs sampler, MCMC techniques such Bayesian factor regression modeling.
  • Handling PMF on large, sparse and unbalanced datasets (algorithm for probabilistic sparse matrix factorization (PSMF)).

July 25 – July 31

  • Adapt PMF model to the interval-valued matrices and implement Interval-valued PMF (I-PMF) and Interval-valued NMF (I-NMF).
  • Improve efficieny of the code, bug removal, exception handling, additional testing.

August 1 – August 15

  • For Documentation.
  • Devise working examples that demonstrate various types of applications.

August 15 – August 22

  • A buffer for unpredictable delay.

Redundant time

  • Extend Bayesian methods (variational BD, linearly constrained BD).

August 26th final evaluation deadline

References

  • Brunet, J. P., Tamayo, P., Golub, T. R., and Mesirov, J. P. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA, 2004, 101(12), 4164–4169.
  • Kim, H., Park, H. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics (Oxford, England), 2007, 23(12), 1495–502.
  • Pascual-Montano, A., Carazo, J. M., Kochi, K., Lehmann, D., and Pascual-Marqui, R. D. Nonsmooth nonnegative matrix factorization (nsnmf). IEEE transactions on pattern analysis and machine intelligence, 2006 28(3), 403–415.
  • Wang, Y., Turk, M. Fisher non-negative matrix factorization for learning local features, 2004.
  • Salakhutdinov, R., Mnih, A. Probabilistic Matrix Factorization Learning, ICML, 2008.
  • Sra, S.,Dhillon, I. S. Nonnegative Matrix Approximation : Algorithms and Applications. Sciences-New York, 2006, 1–36.
  • Zhiyong Shen, Liang Du, Xukun Shen, Yidong Shen, Interval-valued Matrix Factorization with Applications, Data Mining, IEEE International Conference on, pp. 1037–1042, 2010 IEEE International Conference on Data Mining, 2010.
  • Dueck, D., Morris, Q. D., Frey, B. J. Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics (Oxford, England), 21 Suppl 1, 2005, 144–51.
  • Schmidt, M. N., Winther, O., Kai Hansen, L. Bayesian non-negative matrix factorization. Bayesian Statistics 7 (Oxford), 2003.
  • Ochs, M. F.,Kossenkov A. V. NIH Public Access. Methods, Methods Enzymol., 2009, 59–77.