People often communicate messages through verbal and non-verbal language expressions, including voice, words, facial expressions, and body language. Interpreting the multimodal human behavior in communication has great value for many applications, such as business, healthcare, and education. For example, if students show signs of boredom or confusion during the courses, teachers can adjust the teaching methods to improve students’ engagement. With the rapid development of digital technology and social media, a huge amount of multimodal human communication data (e.g., opinion videos) is generated and collected. To facilitate the analysis of human communication data, researchers adopt computational approaches to quantify human behavior with multimodal features. However, it is still demanding and inefficient to manually extract insights (e.g., social meanings of the features) in the large and complex feature space. Furthermore, it remains challenging to utilize the knowledge distilled from the computational features to enhance human communication skills. Meanwhile, interactive visual analytics combines computational algorithms with human-centered visualization to effectively supports information representation, knowledge discovery, and skills acquisition. It demonstrates great potential to solve the challenges above. In this thesis, we focus on visual analytics of multimodal human language for conveying messages based on communication videos (e.g., public speaking and opinion videos). And we design and build novel interactive visual analytics systems to 1) help users discover valuable patterns of speakers’ multimodal communication behavior in videos and 2) further provide end-users with visual feedback and guidance to improve their communication skills. In the first work, we present
DeHumor, a visual analytics system that visually decomposes humor speeches into quantifiable multimodal features and enables humor researchers and communication coaches to systematically explore humorous verbal content and vocal delivery. In the second work, we further characterize and investigate the intra- and inter-modal interactions between visual, acoustic, and language modalities, including dominance, complement, and conflict. Then, we develop
M2Lens, a visual analytics system that helps model developers and users conduct multi-level and multi-faceted exploration of the influences of individual modalities and their interplay on model predictions for multimodal sentiment analysis. Besides understanding multimodal human communication behavior, we present
VoiceCoach, a visual analytics system that can evaluate speakers’ voice modulation skills regarding volume, pitch, speed, and pause, and recommend good learning examples of voice modulation in TED Talks to follow. Moreover, during the practice, the system can provide immediate visual feedback to speakers for self-awareness and performance improvement.
| Date of Award | 2022 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
| Supervisor | Huamin QU (Supervisor) |
|---|