## The MusArt Music-Retrieval System## Appendix 1: The Hidden Markov ModelWilliam Birmingham, Bryon Pardo, Colin Meek, and Jonah Shifrin |

## Representation of a QueryAs shown in Figure 1, the user's query is transcribed by a
MIDI is to a digital audio recording of music as ASCII is to a bitmap image of a page of text. Note events in MIDI are specified by three integer values in the range 0 to 127. The first value describes the
The pitch tracker divides the input file into 10 millisecond frames and tracks pitch on a frame-by-frame basis. Contiguous regions of at least five frames (50 millisecond) whose pitch varies less than one musical half step are called notes. The pitch of each note is the average of the pitches of the frames within the note. The pitch tracker returns a sequence of notes, each of which is defined by Figure 1A shows a time-amplitude representation of a sung query, along with example pitch-tracker output (shown as piano roll) and a sequence of values derived from the MIDI representation (the We define the following. - A
*note transition*between note*n*and note*n*+1 is described by the duple <*deltaPitch*,*IOIratio*>.
*deltaPitch*is the difference in pitch between note_{n}*n*and note*n*+1.
- The
*inter onset interval*(*IOI*) is the difference between the onset of notes_{n}*n*and*n*+1.
*IOIratio*is_{n}*IOI*/_{n}*IOI*+1. For the final transition,_{n}*IOI*=_{n}*IOI*/_{n}*duration*._{n+1}
We represent a query as a sequence of In order to reduce the number of possible IOI ratios, we quantize them to a logarithmic scale within a fixed range. A logarithmic scale was selected because data from a pilot study indicated that sung ## Markov RepresentationA Markov model (MM) models a process that goes through a sequence of discrete states, such as notes in a melody. The model is a weighted automaton that consists of: - A set of states,
*S*={*s*,_{1}*s*,_{2}*s*,…,_{3}*s*}._{n}
- A set of transition probabilities between states,
*T*.
- A probability distribution, where is the probability the automaton will begin in state
*s*._{i}
*E*, a subset of*S*containing the legal ending states.
In this model, the probability of transitioning from a given state to another state is assumed to depend only on the current state. This is known as the Markov property. The directed graph in Figure 2A represents a Markov model of a scalar passage of music. States are note transitions. Nodes represent states. The numerical value below each state indicates the probability a traversal of the graph will begin in that state. As a default, we currently assume all states are legal ending states. Directed edges represent transitions. Numerical values by edges indicate transition probabilities. Only transitions with non-zero probabilities are shown. Here, we have implicitly assumed that whenever state
A model that explicitly maintains a probability distribution over the set of possible observations for each state is called a hidden Markov model (HMM). More formally, an HMM requires two things in addition to that required for a standard Markov model: - A set of possible observations,
*O={o*,_{1}*o*,_{2}*o*,…,_{3}*o*}._{n}
- A probability distribution over the set of observations for each state in
*S*.
In our approach, a query is a sequence of observations. Each observation is a note-transition duple, < ## Making Markov Models from MIDIOur system represents musical themes in a database as HMMs. The unique duples characterizing the note transitions found in the MIDI file form the states in the model. Figure 3A shows a passage with eight note transitions characterized by four unique duples. Each unique duple is represented as a state. Once the states are determined for the model, transition probabilities between states are computed by calculating what proportion of the time state
Markov models can be thought of as Note that there is not a one-to-one correspondence between model and observation sequence. A single model may create a variety of observation sequences, and an observation sequence may be generated by more than one model. Recall that our approach defines an observation as a duple, < ## Estimating Observation ProbabilitiesTo make use of the strengths of a hidden Markov model, it is important to model the probability of each observation O, given a hidden state, s.Observations consist of duples < We use a typical method to estimate probabilities [3]. A number of observation-hidden state pairs are collected into a training set and observed frequencies are used as an estimator for expected probabilities. Given a state, o is seen in state _{i}s, compared to the total number of times s is entered.Training our HMM is a difficult task, given the size of the state space. To make the task tractable, we assume that duration errors and pitch errors are conditionally independent. This greatly reduces the number of training cases we need; however, there will be observations that do not occur in the training data. Conditional independence allows us to induce values for the aforementioned 450,000-entry table from the values in two considerably smaller tables; for We cannot assume that a pairing unobserved in the training set will never be observed in an actual query. Thus, we impose a minimal probability, P is set to _{min}P. Probabilities are then normalized so that the sum of all observation probabilities, given a particular hidden state, is again equal to one._{min}## Notes and References[1] Musical Instrument Digital Interface (MIDI) is a standard representation for musical performances or scores. [2] Mazzoni, D. and R.B. Dannenberg. Melody Matching Directly from Audio. in ISMIR. 2001. [3] R. Durbin, S.E., A. Krogh, G. Mitchison, Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids. 1998, Cambridge, UK: Cambridge University Press. ## Copyright© 2002 William Birmingham, Bryan Pardo, Colin Meek, and Jonah Shifrin. | |