A note based query by humming system using convolutional neural network

Naziba Mostafa, Pascale Fung

Research output: Contribution to journalConference article published in journalpeer-review

Abstract

In this paper, we propose a note-based query by humming (QBH) system with Hidden Markov Model (HMM) and Convolutional Neural Network (CNN) since note-based systems are much more efficient than the traditional frame-based systems. A note-based QBH system has two main components: humming transcription and candidate melody retrieval. For humming transcription, we are the first to use a hybrid model using HMM and CNN. We use CNN for its ability to learn the features directly from raw audio data and for being able to model the locality and variability often present in a note and we use HMM for handling the variability across the timeaxis. For candidate melody retrieval, we use locality sensitive hashing to narrow down the candidates for retrieval and dynamic time warping and earth mover's distance for the final ranking of the selected candidates. We show that our HMM-CNN humming transcription system outperforms other state of the art humming transcription systems by-2% using the transcription evaluation framework by Molina et. al and our overall query by humming system has a Mean Reciprocal Rank of 0:92 using the standard MIREX dataset, which is higher than other state of the art note-based query by humming systems.

Original languageEnglish
Pages (from-to)3102-3106
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
Publication statusPublished - 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 20 Aug 201724 Aug 2017

Bibliographical note

Publisher Copyright:
Copyright © 2017 ISCA.

Keywords

  • CNN
  • Humming transcription
  • Query by humming
  • Raw audio

Fingerprint

Dive into the research topics of 'A note based query by humming system using convolutional neural network'. Together they form a unique fingerprint.

Cite this