CMAB: A Multi-Attribute Building Dataset of China

  • 0School of Architecture, Tsinghua University, Beijing, 100084, China.

|

|

Summary

This summary is machine-generated.

This study introduces the first national Multi-Attribute Building dataset (CMAB), leveraging AI to extract detailed building information. The comprehensive dataset enhances urban analysis and planning with high accuracy.

Area Of Science

  • Geoinformatics
  • Urban Analytics
  • Artificial Intelligence

Background

  • Accurate 3D building data is crucial for urban analysis, simulations, and policy, but current datasets lack comprehensive multi-attribute coverage.
  • Existing building datasets often have incomplete geometric and indicative attributes, limiting their utility for detailed urban studies.

Purpose Of The Study

  • To present the first national-scale Multi-Attribute Building dataset (CMAB) with AI-driven, comprehensive building information.
  • To provide a valuable resource for accurate urban analysis, simulations, policy updates, and global Sustainable Development Goals (SDGs).

Main Methods

  • Developed a national-scale dataset (CMAB) covering 3,667 cities and 31 million buildings using AI and machine learning.
  • Employed OCRNet for attribute extraction (F1-Score 89.93%) and bootstrap aggregated XGBoost models incorporating morphology, location, and function.
  • Utilized multi-source data, including remote sensing and street view images (SVIs), to generate rooftop, height, structure, function, style, age, and quality attributes.

Main Results

  • Generated a comprehensive national building dataset (CMAB) with extensive geometric and indicative attributes.
  • Achieved high accuracy in attribute extraction, with an F1-Score of 89.93% for OCRNet and generally above 80% validation accuracy.
  • Quantified building stock at 363 billion m³, providing unprecedented detail for urban research.

Conclusions

  • The CMAB dataset is a significant advancement for urban planning and analysis, addressing limitations of previous datasets.
  • The AI-driven methodology demonstrates a scalable and accurate approach to generating rich building information.
  • This dataset is vital for supporting global SDGs and informed urban development strategies.