Symbolic Screening of Duple vs. Triple Meter Feel in MIDI for RAS
Table of Contents

I. Introduction

Rhythmic Auditory Stimulation (RAS) is a widely used motor rehabilitation technique within Neurologic Music Therapy (NMT), particularly for gait training in patients with stroke- or Parkinson's disease-related gait disorders. In clinical practice, the criteria for music selection in RAS are relatively intuitive: music with a regular, stable duple meter feel, such as simple duple meters (e.g., 2/4, 4/4) or certain compound meters (e.g., 6/8), is generally preferred for facilitating gait entrainment.

The effectiveness of RAS has been supported by numerous empirical studies over the past two decades. However, in practical therapy settings, there is often a noticeable shortage of clinically tailored musical material. As a result, selecting appropriate pieces from existing music collections becomes a more realistic solution for therapists. This situation raises the need for efficient methods to screen large music databases according to rhythm-related criteria in RAS.

The essay focuses on a rapid screening approach for categorizing meter feel—specifically duple versus triple—based on symbolic MIDI data. The screening method discussed here constitutes an early filtering module within a larger system designed for RAS music selection. Rather than presenting the system as a whole, the primary aim of this study is to examine this screening layer in depth, with particular attention to its conceptual foundations and algorithmic design.

The system operates in the symbolic domain rather than audio signals for several practical reasons. First, tempo adjustment is a fundamental requirement in gait training, and MIDI allows precise and lossless tempo modification without affecting sound quality. Second, given the need to process large volumes of music data, symbolic-domain analysis offers a computationally efficient solution. Although audio-based music information retrieval methods can provide higher perceptual fidelity in certain contexts, the purpose of this layer is limited to fast and coarse screening rather than detailed rhythmic interpretation.

In essence, the goal of the proposed screening method is not to perform full meter inference or time-signature recognition, but to identify music with a high likelihood of exhibiting a duple meter feel suitable for RAS. The following sections therefore examine the approach from conceptual assumptions to algorithmic implementation, providing a focused and critical study of this symbolic screening layer.

II. Conceptual Foundations

Before introducing the algorithmic details, it is necessary to clarify several core concepts related to musical time and rhythm perception that underlie the proposed screening approach.

In music perception, tactus, often referred to as the pulse, represents the fundamental temporal unit to which listeners naturally synchronize, such as tapping a foot or nodding the head. Tempo describes the rate at which this tactus is perceived, typically expressed as beats per minute (BPM). When successive tactus beats are organized into recurring patterns of strong and weak accents, a higher-level periodic structure emerges, commonly referred to as the metrical cycle or meter. In simplified terms, tempo specifies how fast the beat proceeds, whereas meter describes how beats are grouped over time.

The hierarchical temporal structure of music gives rise to the periodic nature of rhythm and enables listeners to infer meter. At the same time, this hierarchy, spanning from note-level onsets to beat- or cycle-level groupings, also makes automatic meter estimation a challenging task, as multiple periodicities may coexist and interact within a single piece of music.

In the context of RAS, meter is not treated as a purely music-theoretical construct but as a functional property related to motor entrainment. Although triple-meter music can be useful in specific scenarios (e.g., when walking patterns involve assistive devices like canes), duple meter-dominant music is generally preferred for standard gait training. Consequently, the primary objective of the screening layer is not to identify all possible meter types, but to perform a binary distinction between duple- and triple-dominant periodic grouping in music.

As the representational basis for this analysis, MIDI data offers several benefits. It encodes music as a sequence of discrete symbolic events with explicit on/off times, allowing for highly accurate temporal analysis. At the same time, MIDI lacks timbral, spectral, and fine-grained expressive information present in audio recordings. In clinical applications such as RAS, where precise temporal control and flexible tempo adjustment are prioritized, this trade-off is acceptable and even advantageous. For these reasons, MIDI provides an appropriate representation for the screening approach.

III. Periodicity Inference in the Screening Layer

This section will present the complete process and algorithmic interpretation of the fast screening layer. An onset representation is first established from MIDI data. Two periodicity inference methods—based on tempograms and autocorrelation respectively—are then implemented in parallel. Finally, a fusion score is calculated from these two inference streams.

3.1 Onset Representation

MIDI’s discrete note events are transformed into a quasi‑continuous onset envelope on a uniform temporal grid, allowing periodicities to be analyzed as patterns of a single activation function rather than sparse onsets. To reduce registral dependence, activations are aggregated in the pitch‑class domain, treating octave‑related notes as equivalent.

We define a chroma‑flux style onset activation:
$o[k] = \sum_{i=0}^{11} \max\big(0,\, C_i[k] - C_i[k-1]\big)$,
where $C_i[k]$ denotes the energy of pitch class $i$ at frame $k$. This emphasizes new entries and accent increases and is robust to octave and instrumentation choices.

The representation intentionally discards pitch contour, harmonic function, and melodic direction, focusing purely on periodic grouping tendencies required for fast screening.

3.2 Periodicity Analysis

We derive duple and triple evidence from two views of the same onset envelope $o[k]$: a frequency‑domain tempogram and a time‑domain autocorrelation. The tempogram exposes harmonic relations across tempo; the autocorrelation measures cycle regularity directly in time. Using both reduces ambiguity and guards against single‑method failure.

3.2.1 Fourier Tempogram

We compute the Fourier tempogram $T(f,t)$ by applying an STFT to the onset envelope $o[k]$, with $f$ expressed in BPM and analysis parameters chosen to cover gait‑relevant tempos (≈40–240 BPM). Averaging across time yields a global tempogram $\bar{T}(f)$, from which we retain up to $K\leq 3$ dominant tempo candidates $\beta$.

Grouping tendencies are scored by integrating energy in narrow tolerance windows around harmonic families: duple ${0.5\beta,\, \beta,\, 2\beta,\, 4\beta}$ and triple ${1.5\beta,\, 3\beta,\, 6\beta}$. Summing across family members produces $S{\text{tempo}}^{(d)}$ and $S{\text{tempo}}^{(t)}$; multiple candidates are combined by a maximum or a weighted sum. Tolerance widths accommodate moderate tempo drift, and optional normalization by total energy makes excerpts comparable.

3.2.2 Autocorrelation

Autocorrelation provides a complementary time‑domain check. We compute a normalized, lightly smoothed $R(\tau)$ (with optional mean removal to reduce slow trends). Given the dominant tempo candidate $\beta$, the tactus period is $T_0 = 60/\beta$. Duple cycles predict peaks near lags $2T_0$ and $4T_0$; triple cycles near $3T_0$ and $6T0$. Integrating $R(\tau)$ within tolerance windows around these lags yields $S{\text{acf}}^{(d)}$ and $S_{\text{acf}}^{(t)}$. Weights can favor deeper multiples (e.g., $4T_0$) when local accents are noisy. Tolerances reflect the envelope’s sampling rate and expected expressive timing. Fusion and decision thresholds are introduced in Section 3.3.

3.3 Decision Fusion Strategy

The screening layer combines two independent scores into a principled ensemble by leveraging domain complementarity and enforcing agreement. Frequency-domain evidence (tempogram) and time-domain evidence (autocorrelation) address different failure modes, reducing bias and guarding against spurious periodicities.

1) Multiplicative fusion — agreement as a gate:
The duple and triple evidences are combined by multiplication rather than addition. This approach ensures that a high final score emerges only when both domains agree, suppressing false positives caused by noisy or conflicting cues. In contrast, additive schemes allow one strong but noisy channel to dominate, which can lead to errors in polyrhythmic or sparsely accented passages.

2) Normalization and thresholding — operational reliability:
The combined scores are normalized into a duple likelihood, making the results invariant to excerpt energy and duration. A single decision threshold (e.g., 0.55) is applied to filter pieces with consistent, cross-validated duple evidence. This thresholding process provides a tunable balance between precision and recall, ensuring that ambiguous or noisy segments are rejected without over-interpretation.

3) Goal-aligned asymmetry — favoring duple:
Given the application in RAS preselection, the classifier is intentionally asymmetric, focusing on identifying duple-dominant pieces. This asymmetry is reflected in the decision rule, which prioritizes maximizing duple detectability under expressive timing. Strong triple evidence simply routes items out of the duple pool, while optimization focuses on enhancing reliability for known duple exemplars.

IV. Discussion and Future Directions

Strengths of the Approach

The proposed screening layer offers several notable strengths:

  1. Efficiency: By operating in the symbolic domain, the method achieves rapid processing speeds, making it suitable for large-scale music database screening. The use of MIDI data ensures computational efficiency without compromising the accuracy required for coarse filtering.

  2. Interpretability: The algorithm’s reliance on well-defined periodicity measures, such as tempograms and autocorrelation, provides clear and interpretable outputs. This transparency facilitates debugging, validation, and clinical adoption.

  3. Clinical Alignment: The design of the screening layer aligns closely with the practical needs of RAS. By prioritizing duple meter detection, the method directly supports the selection of music suitable for gait training, enhancing its clinical relevance.

Limitations and Edge Cases

Despite its strengths, the approach has certain limitations and edge cases:

  1. Swing and Shuffle Rhythms: The method may struggle with swing or shuffle rhythms, where the timing of beats deviates from strict periodicity. These rhythmic nuances can lead to ambiguous or inconsistent results.

  2. Compound Duple Meters (6/8): While the algorithm is designed to detect duple meters, compound duple meters such as 6/8 may present challenges due to their hybrid rhythmic structure. This could result in misclassification or reduced confidence scores.

  3. Polymetric MIDI Files: Polymetric compositions, where multiple meters coexist, pose a significant challenge. The algorithm’s reliance on a single dominant periodicity may fail to capture the complexity of such pieces, leading to inaccurate classifications.

Future Directions

To further enhance the utility and robustness of this screening layer, future work should focus on internal parameter optimization. Refining the tolerance windows for harmonic families and adjusting the fusion weights could further improve the reliability of the system when encountering complex or ambiguous rhythmic structures. Additionally, tailoring the algorithm’s calibration to specific clinical requirements—such as optimizing for narrow tempo ranges preferred in specific gait disorders—would enhance its precision as a specialized pre-filtering module. Continual validation against expert-annotated symbolic datasets will remain essential for driving these incremental refinements.

V. Conclusion

This study presents a rapid screening layer for distinguishing duple and triple meter feels in symbolic MIDI data. By leveraging tempograms and autocorrelation, the method achieves efficient and interpretable periodicity analysis. The approach is tailored to the needs of RAS, providing a practical tool for clinicians to identify suitable music for gait training. By focusing on symbolic-domain efficiency and cross-domain agreement, this module serves as a robust early filter in automated music selection systems for neurologic rehabilitation.

No Comments

Send Comment Edit Comment


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
Previous
Next