Skip to main navigation Skip to search Skip to main content

WHU-STree: A multi-modal benchmark dataset for street tree inventory

  • Ruifei Ding
  • , Zhe Chen
  • , Wen Fan
  • , Chen Long
  • , Huijuan Xiao
  • , Yelu Zeng
  • , Zhen Dong*
  • , Bisheng Yang
  • *Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

Abstract

Street trees are vital to urban livability, providing ecological and social benefits. Establishing a detailed, accurate, and dynamically updated street tree inventory has become essential for optimizing these multifunctional assets within space-constrained urban environments. Given that traditional field surveys are time-consuming and labor-intensive, automated surveys utilizing Mobile Mapping Systems (MMS) offer a more efficient solution. However, existing MMS-acquired tree datasets are limited by small-scale scene, limited annotation, or single modality, restricting their utility for comprehensive analysis. To address these limitations, we introduce WHU-STree, a cross-city, richly annotated, and multi-modal urban street tree dataset. Collected across two distinct cities, WHU-STree integrates synchronized point clouds and high-resolution images, encompassing 21,007 annotated tree instances across 50 species and 2 morphological parameters. Leveraging the unique characteristics, WHU-STree concurrently supports over 10 tasks related to street tree inventory. We benchmark representative baselines for two key tasks—tree species classification and individual tree segmentation—based on 18 major species and an “Others” category. Extensive experiments demonstrate that while multi-modal fusion yields improvements over uni-modal baselines, it currently presents performance gaps compared to strong 3D-only methods, indicating that effective fusion remains a challenging open problem requiring further research. In particular, we identify key challenges and outline potential future works for fully exploiting WHU-STree, encompassing multi-modal fusion, multi-task collaboration, cross-domain generalization, spatial pattern learning, and Multi-modal Large Language Model for street tree asset management. The WHU-STree dataset is accessible at: https://github.com/WHU-USI3DV/WHU-STree.

Original languageEnglish
Pages (from-to)519-542
Number of pages24
JournalISPRS Journal of Photogrammetry and Remote Sensing
Volume233
Early online date6 Feb 2026
DOIs
Publication statusPublished - Mar 2026

Bibliographical note

Publisher Copyright:
© 2026

Keywords

  • Deep learning
  • Individual tree segmentation
  • Mobile mapping system
  • Multi-modal
  • Tree inventory
  • Tree species classification

Fingerprint

Dive into the research topics of 'WHU-STree: A multi-modal benchmark dataset for street tree inventory'. Together they form a unique fingerprint.

Cite this