CMAB: A Multi-Attribute Building Dataset of China
- Yecheng Zhang 1, Huimin Zhao 1, Ying Long 2,3
- Yecheng Zhang 1, Huimin Zhao 1, Ying Long 2,3
- 1School of Architecture, Tsinghua University, Beijing, 100084, China.
- 2School of Architecture, Tsinghua University, Beijing, 100084, China. ylong@tsinghua.edu.cn.
- 3Hang Lung Center for Real Estate, Key Laboratory of Ecological Planning & Green Building, Ministry of Education, Tsinghua University, Beijing, 100084, China. ylong@tsinghua.edu.cn.
- 0School of Architecture, Tsinghua University, Beijing, 100084, China.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces the first national Multi-Attribute Building dataset (CMAB), leveraging AI to extract detailed building information. The comprehensive dataset enhances urban analysis and planning with high accuracy.
Area Of Science
- Geoinformatics
- Urban Analytics
- Artificial Intelligence
Background
- Accurate 3D building data is crucial for urban analysis, simulations, and policy, but current datasets lack comprehensive multi-attribute coverage.
- Existing building datasets often have incomplete geometric and indicative attributes, limiting their utility for detailed urban studies.
Purpose Of The Study
- To present the first national-scale Multi-Attribute Building dataset (CMAB) with AI-driven, comprehensive building information.
- To provide a valuable resource for accurate urban analysis, simulations, policy updates, and global Sustainable Development Goals (SDGs).
Main Methods
- Developed a national-scale dataset (CMAB) covering 3,667 cities and 31 million buildings using AI and machine learning.
- Employed OCRNet for attribute extraction (F1-Score 89.93%) and bootstrap aggregated XGBoost models incorporating morphology, location, and function.
- Utilized multi-source data, including remote sensing and street view images (SVIs), to generate rooftop, height, structure, function, style, age, and quality attributes.
Main Results
- Generated a comprehensive national building dataset (CMAB) with extensive geometric and indicative attributes.
- Achieved high accuracy in attribute extraction, with an F1-Score of 89.93% for OCRNet and generally above 80% validation accuracy.
- Quantified building stock at 363 billion m³, providing unprecedented detail for urban research.
Conclusions
- The CMAB dataset is a significant advancement for urban planning and analysis, addressing limitations of previous datasets.
- The AI-driven methodology demonstrates a scalable and accurate approach to generating rich building information.
- This dataset is vital for supporting global SDGs and informed urban development strategies.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.

