Skip to main navigation Skip to search Skip to main content

Building Marine Foundation Models: Problem Formulation, Models and Applications

  • Ziqiang ZHENG

Student thesis: Doctoral thesis

Abstract

The marine ecosystem is the most productive of all ecosystems and shares immense ecological, social, and economic value. Performing marine studies scalably and automatically plays a significant role in protecting the marine ecosystem and understanding marine science. The marine research involves the study of marine biology, oceanography, and environmental science through the lens of field data, enabling scientists and researchers to observe, document, and analyze the vast and mysterious creatures and phenomena beneath the water’s surface. Existing marine studies highly depend on describing and analyzing the collected visual observations (e.g., images and videos) based on in-situ marine/underwater surveying approaches. There are two main limitations for existing marine studies: 1) they cannot support a very large scale data collection and data scarcity has become one of the important factors that hinder the further development of the marine analysis; 2) further data analysis procedure still requires the significant involvement of human labors, time costs, and is also limited to specific biology users. Recent foundation models have achieved great success, driven by a significant scale of training data and powerful networks. Such a foundation model recipe leads to efficient and flexible models, supporting a wide spectrum of downstream visual analysis tasks. However, few attempts have been explored in the marine field, and we aim to build effective and efficient marine foundation models. Furthermore, most existing marine visual analysis algorithms are mainly data-driven, specially designed for some tasks and pre-defined conditions. In this thesis, we try to formulate the basic tasks for marine visual understanding and explore the solutions for large-scale, efficient, repeated surveying, monitoring, and further analysis procedures. We first review the existing marine datasets and existing marine visual analysis algorithms. We identify the specific and universal challenges of the underwater environments, the visibility degradation, and color distortion issues. We propose to conduct the underwater visual enhancement as the optional pre-processing. We have built the first large-scale underwater video enhancement dataset and benchmark, incorporating the intrinsic properties of underwater images. The main focus of this thesis is to build efficient marine foundation models from three important aspects: problem formulation, model design, and potential applications. We perform the panoptic understanding of the marine world comprehensively, formulating how to do the marine visual analysis based on the intrinsic properties of marine creatures. We design different foundation models: where we split our marine research into two lines: things and stuff. The former things indicate the instances with consistent structural/individual units (e.g. fish). The latter stuff (e.g. coral reefs) represents the creatures without consistent structure, geometric, and minimum units. We have proposed various corresponding marine foundation models for scalable and efficient marine visual understanding: CoralSCOP and CoralSRT for coral reef segmentation; MarineInst for marine instance visual description. We extend our research from the image domain to the video field, ensuring 3D scene reconstruction, understanding, and 4D animation. The detailed and hierarchical discussions about potential applications of built marine foundation models are also included. Finally, we discuss the insightful future directions for promoting marine visual analysis.

Date of Award2025
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorSai Kit YEUNG (Supervisor)

Cite this

'