Binaural sound localization model based on GASSOM and DNN

  • Shutao CHEN

Student thesis: Doctoral thesis

Abstract

Humans can localize sound source(s) with two ears - binaural sound localization. Conventional methods to model binaural localization focused on artificial spatial cues such as Interaural Time Difference (ITD) and Interaural Level Difference (ILD) to decode the locational information. In this work, we extracted spatial features with sparse coding algorithms and further mapped the features to predict sound locations with Deep Neural Network (DNN). The use of GASSOM (Generative Adaptive Subspace Self-organizing Map) and Independent Component Analysis (ICA) as the sparse coding algorithms were compared. Results indicate that GASSOM outperforms ICA. Map size and basis function length have been shown to affect the performance of GASSOM and the optimal selections of both parameters are reported in the thesis. In order to verify the ability of GASSOM-DNN sound localization model to simulate human binaural localization performance, benchmark studies with past reported empirical data were conducted. Factors investigated included: the influence of bandwidth, center frequency and duration of binaural cues; and the mismatch of non-individualized HRTFs. Performance of computational model was compared with previously reported human data and similarity was achieved. Future potentials on the use of GASSOM to model binaural sound localization are discussed.
Date of Award2022
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorRichard Hau Yue SO (Supervisor)

Cite this

'