Music Cognition, Perception, and Pattern Recognition

Efforts to have computers mimic human behavior can only succeed if the computer is using the same types of criteria as a human. We have explored the ability of a computer to recognize patterns in music in a similar way to a human being. The result of this work is a system for the machine recognition of musical patterns [1-7].

We frame the problem of recognition of musical patterns as a classical pattern recognition problem. The main difference, though, between our approach and traditional approaches is that the error criterion used to judge the goodness of a match between a target musical pattern and a scanned pattern is derived from studies of human perception of music that have been carried out in the field of psychology by the pioneering work of C. Krumhansl. By incorporating research in music perception and cognition, the music recognition system becomes a bit more like a human being, using what is known about how humans perceive, memorize, and reproduce music patterns. When a human being attempts to reproduce a (possibly incorrectly) memorized piece of music, say by singing or humming it, they are likely to introduce errors. However, these errors are, more often than not, musically meaningful ones. It is precisely this type of knowledge that the music recognition system exploits.

The perceptual error criterion we developed takes into account both rhythm and pitch information. The pitch error consists of an objective pitch error and a perceptual error. The latter incorporates algorithms for determining the localized tonal context in a piece of music [1,2,5,7]. The key-finding algorithm in [7] provides robust estimates of the tonal context during modulations and when there are deviations of very short duration from the prevailing local tonal context. The system also provides the means for determining the relative weight given to the objective and perceptual pitch errors. This is based on how ‘confident’ the key-finding algorithm is in its choice of local key. The weight given to the perceptual rhythm error is determined by a measure of the rhythm complexity [4,6].


  1. I. Shmulevich, E. J. Coyle, “Establishing the Tonal Context for Musical Pattern Recognition,” Proceedings of the 1997 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, N.Y., 1997.
  2. I. Shmulevich, E. J. Coyle, “The Use of Recursive Median Filters for Establishing the Tonal Context in Music,” Proceedings of the 1997 IEEE Workshop on Nonlinear Signal and Image Processing, Mackinac Island, MI, 1997.
  3. E. J. Coyle, I. Shmulevich, “A System for Machine Recognition of Music Patterns,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Seattle, WA, 1998.
  4. I. Shmulevich, D. Povel, “Rhythm Complexity Measures for Music Pattern Recognition,” Proceedings of IEEE Workshop on Multimedia Signal Processing, Redondo Beach, California, December 7-9, 1998.
  5. O. Yli-Harja, I. Shmulevich, K. Lemström, “Graph-based Smoothing of Class Data with Applications in Musical Key Finding,Proceedings of IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing, Antalya, Turkey, June 20-23, 1999.
  6. I. Shmulevich, O. Yli-Harja, E. J. Coyle, D. Povel, K. Lemström, “Perceptual Issues in Music Pattern Recognition – Complexity of Rhythm and Key Finding,” Computers and the Humanities, Vol. 35, pp. 23–35, 2001.
  7. I. Shmulevich, O. Yli-Harja, “Localized Key-Finding: Algorithms and Applications,” Music Perception, Vol. 17, No. 4, pp. 531-544, 2000.
  8. I. Shmulevich, “A Note on the Pitch Contour Similarity Index,” Journal of New Music Research, Vol. 33, No. 1, pp. 7-18, 2004.