## Gretchen Schillemat and Dr. Jeffrey Humpherys, Mathematics

The purpose of this project was to study the nonnegative matrix factorization (NNMF) problem: Given a nonnegative matrix V, find the” best” nonnegative matrices W and H, such that V WH. The main focus in our project was when the data was sparse.

We began by exploring how to best approximate the decomposition of the data matrix V. There are various norms that will approximate {kV -WHk}, including the ℓ1 norm, ℓ2 norm, and ℓ1 norm. Most commonly used in literature is the ℓ2 norm, but because we focused on sparsity, we wanted to investigate if the the ℓ1 norm would be a better estimate. For our applications, sparsity meant that many of the entries in V were zero, and we expected the decomposition matrices W and H to also include many zero entries. We wanted to explore which matrix norm preserved sparsity with the most accuracy when factoring the sparse data matrix V. When we were exploring these initial norms, we found that the ℓ1 norm was the most successful at decomposing sparse data into sparse results.

Once we had worked with the various norms, we began to explore different applications for NNMF. Our first application was mass spectrometry. In mass spectrometry, a spectrogram V is factored into unique chemical elements and various concentrations of those elements. We hoped to have mass spectrometry data, but it was difficult to find useful data for our project. Without data, we created an “artificial” spectrogram that consisted of three different elements and corresponding concentrations. We then factored the spectrogram using the various norms as discussed above. The ℓ1 norm preserved sparsity and accurately identified the most prominent three chemicals in the spectogram. The other norms that were used as well would often identify the three prominent chemicals, but would also identify small concentrations of other chemicals that were not actually in the spectogram. Although we did not have real data for mass spectrometry, we could easily find music samples so that we could explore music decomposition using NNMF. The general problem for music decomposition is: Given a music sample, decompose into the various instruments and notes that are being played at each specified time. The matrix V was a time based data matrix, where W would consists of instrument and note vectors and H would be weights corresponding to a specific time for each instrument and note. Music decomposition took an extensive amount of preliminary research, where we had to understand different musical features that are commonly used in music identification. The most commonly used is from Moving Picture Experts Group, version 7 (MPEG-7). MPEG-7 standardizes a group of quantitative measures of audio-visual features, called descriptors, and the structure of descriptors [1].

Once we were familiar with MPEG-7 features and other musical features, we were able to begin NNMF on simple musical samples. We used NNMF to decompose short music samples into the instruments and specific notes being played at each moment in time. The music signal used for decomposition consisted of the addition of simple one note music samples. Figure 1 shows the decomposition of three notes that were added together to give us a new music sample. Although we were able to see the different notes and instruments that were played, there were other instruments that the decomposition showed that were not actually in the music sample. We continued to work on improving the techniques, but were unable to complete this portion of the project.

I presented some of the preliminary results at the Spring Research Conference for the College of Physical and Mathematical Sciences in March 2010. The presentation focused on the general Nonnegative Matrix Factorization problem and the investigation of various norms used in the factorization. Some of the preliminary explorations with mass spectrometry were also presented. Future research will prob- ably continue to focus on the music decomposition and the accurate identification of instruments and notes in a music sample.