ArchCAD-400k: A Large-Scale CAD drawings Dataset and New Baseline for Panoptic Symbol Spotting

1Tongji University
2Arcplus Group East China Architectural Design & Research Institute Co., Ltd.
3Shanghai AI Laboratory
4Shanghai Qi Zhi Institute
5Shanghai Innovation Institute
6University of Science and Technology of China
7Shanghai Jiao Tong University
8Donghua University
(* denotes equal contribution, † denotes the corresponding author)
News
  • 2025.10.16: Source code of the DPSS framework is released !
  • 2025.10.16: ArchCAD dataset is open for download !
  • 2025.10.9: Our project homepage is launched !

🌟 Contributions

  • 🚀 Data Engine: A structure-aware annotation engine that reduces labeling cost by over 50x using CAD layers and blocks.
  • 📦 Dataset: ArchCAD-400k, a large-scale dataset with 413K annotations across 27 categories and diverse buildings.
  • 📈 Model: DPSS, a dual-pathway framework achieving state-of-the-art results with strong scalability.

🔍 Background

  • Problem: Panoptic symbol spotting aims to interpret CAD drawings by assigning semantic and instance labels to graphical primitives.
  • Value: Accurate primitive-level perception enables scalable BIM modeling and intelligent CAD understanding.
  • Challenges: Annotating CAD drawings is highly labor-intensive, limiting dataset size and model generalization.
ArchCAD

Illustration of panoptic symbol spotting in CAD drawings

🚀 Contribution 1 — Data Engine

  • Standardized Selection: 5,538 CAD drawings with consistent layer-block structures from 11,917 industry-grade samples.
  • Structure-Aware Automated Labeling: Leveraging inherent layer and block hierarchy to generate semantic and instance-level annotations at scale.
  • Expert-Guided Refinement: Human experts refine annotations directly in the vector space, ensuring high accuracy while avoiding rasterization artifacts.
ArchCAD

Overall pipeline of the annotation process

📦 Contribution 2 — ArchCAD Dataset

  • Broader Diversity: Covers a wide range of real-world building types beyond residential.
  • Larger Scale: Includes 5,500+ full drawings with an average area of 11,000 m².
  • Richer Semantics: Annotations across 27 categories, including structural, non-structural, and symbols.
ArchCAD

Comparison of ArchCAD to existing CAD drawing datasets.

ArchCAD

Distribution of ArchCAD, including (a) drawing area, (b) building types, and (c) symbol categories.

img
img
img

A glimpse of ArchCAD-400k samples

📈 Contribution 3 — DPSS framework

  • Novel Framework: Proposed DPSS, a panoptic symbol spotting framework tailored for CAD drawings.
  • State-of-the-Art Performance: Achieved new benchmarks on FloorPlanCAD and ArchCAD-400k, surpassing the second-best method by 3% and 10%, respectively.
  • Robust and Scalable: Demonstrated exceptional accuracy, robustness, and scalability across diverse CAD datasets.
ArchCAD

Overview of DPSS framework

img
img
img

Quantitive & Qualitative results

BibTeX

@article{luo2025archcad,
  title={ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting},
  author={Luo, Ruifeng and Liu, Zhengjie and Cheng, Tianxiao and Wang, Jie and Wang, Tongjie and Wei, Xingguang and Wang, Haomin and Li, YanPeng and Chai, Fu and Cheng, Fei and others},
  journal={arXiv preprint arXiv:2503.22346},
  year={2025}
}