Can artificial intelligence-driven cephalometric analysis replace manual tracing? A systematic review and meta-analysis

Search strategy on each database.

Database	Search strategy
Concept #1 artificial intelligence
PubMed	‘Artificial Intelligence’[Mesh] OR ‘Artificial Intelligence’[tiab] OR ‘machine learning’[tiab] OR ‘deep learning’[tiab] OR ‘neural network*’[tiab] OR ‘automated’[tiab] OR ‘automatic’[tiab]
Web of Science	TS = (‘Artificial Intelligence’ OR ‘machine learning’ OR ‘deep learning’ OR ‘neural network*’ OR ‘automated’ OR ‘automatic’
Embase	‘artificial intelligence’/exp OR ‘artificial intelligence’:ti,ab,kw OR ‘machine learning’:ti,ab,kw OR ‘deep learning’:ti,ab,kw OR ‘neural network*’:ti,ab,kw OR ‘automated’:ti,ab,kw OR ‘automatic’:ti,ab,kw
Grey Literature	‘Artificial Intelligence’ OR ‘machine learning’ OR ‘deep learning’ OR ‘neural network*’ OR ‘automated’ OR ‘automatic’
Concept #2 cephalometric analysis
PubMed	‘Cephalometry’[Mesh] OR ‘Cephalometry’[tiab] OR ‘craniometry’[tiab] OR ‘cephalometric*’[tiab] OR ‘landmark detection’[tiab]
Web of Science	TS = (‘Cephalometry’ OR ‘craniometry’ OR ‘cephalometric*’ OR ‘landmark detection’)
Embase	‘cephalometry’/exp OR ‘cephalometry’:ti,ab,kw OR ‘cephalometric*’:ti,ab,kw OR ‘craniometry’:ti,ab,kw OR ‘landmark detection’:ti,ab,kw
Grey Literature	‘Cephalometry’ OR ‘craniometry’ OR ‘cephalometric*’ OR ‘landmark detection’

Database	Search strategy
Concept #1 artificial intelligence
PubMed	‘Artificial Intelligence’[Mesh] OR ‘Artificial Intelligence’[tiab] OR ‘machine learning’[tiab] OR ‘deep learning’[tiab] OR ‘neural network*’[tiab] OR ‘automated’[tiab] OR ‘automatic’[tiab]
Web of Science	TS = (‘Artificial Intelligence’ OR ‘machine learning’ OR ‘deep learning’ OR ‘neural network*’ OR ‘automated’ OR ‘automatic’
Embase	‘artificial intelligence’/exp OR ‘artificial intelligence’:ti,ab,kw OR ‘machine learning’:ti,ab,kw OR ‘deep learning’:ti,ab,kw OR ‘neural network*’:ti,ab,kw OR ‘automated’:ti,ab,kw OR ‘automatic’:ti,ab,kw
Grey Literature	‘Artificial Intelligence’ OR ‘machine learning’ OR ‘deep learning’ OR ‘neural network*’ OR ‘automated’ OR ‘automatic’
Concept #2 cephalometric analysis
PubMed	‘Cephalometry’[Mesh] OR ‘Cephalometry’[tiab] OR ‘craniometry’[tiab] OR ‘cephalometric*’[tiab] OR ‘landmark detection’[tiab]
Web of Science	TS = (‘Cephalometry’ OR ‘craniometry’ OR ‘cephalometric*’ OR ‘landmark detection’)
Embase	‘cephalometry’/exp OR ‘cephalometry’:ti,ab,kw OR ‘cephalometric*’:ti,ab,kw OR ‘craniometry’:ti,ab,kw OR ‘landmark detection’:ti,ab,kw
Grey Literature	‘Cephalometry’ OR ‘craniometry’ OR ‘cephalometric*’ OR ‘landmark detection’

Table 1.

Search strategy on each database.

Database	Search strategy
Concept #1 artificial intelligence
PubMed	‘Artificial Intelligence’[Mesh] OR ‘Artificial Intelligence’[tiab] OR ‘machine learning’[tiab] OR ‘deep learning’[tiab] OR ‘neural network*’[tiab] OR ‘automated’[tiab] OR ‘automatic’[tiab]
Web of Science	TS = (‘Artificial Intelligence’ OR ‘machine learning’ OR ‘deep learning’ OR ‘neural network*’ OR ‘automated’ OR ‘automatic’
Embase	‘artificial intelligence’/exp OR ‘artificial intelligence’:ti,ab,kw OR ‘machine learning’:ti,ab,kw OR ‘deep learning’:ti,ab,kw OR ‘neural network*’:ti,ab,kw OR ‘automated’:ti,ab,kw OR ‘automatic’:ti,ab,kw
Grey Literature	‘Artificial Intelligence’ OR ‘machine learning’ OR ‘deep learning’ OR ‘neural network*’ OR ‘automated’ OR ‘automatic’
Concept #2 cephalometric analysis
PubMed	‘Cephalometry’[Mesh] OR ‘Cephalometry’[tiab] OR ‘craniometry’[tiab] OR ‘cephalometric*’[tiab] OR ‘landmark detection’[tiab]
Web of Science	TS = (‘Cephalometry’ OR ‘craniometry’ OR ‘cephalometric*’ OR ‘landmark detection’)
Embase	‘cephalometry’/exp OR ‘cephalometry’:ti,ab,kw OR ‘cephalometric*’:ti,ab,kw OR ‘craniometry’:ti,ab,kw OR ‘landmark detection’:ti,ab,kw
Grey Literature	‘Cephalometry’ OR ‘craniometry’ OR ‘cephalometric*’ OR ‘landmark detection’

Database	Search strategy
Concept #1 artificial intelligence
PubMed	‘Artificial Intelligence’[Mesh] OR ‘Artificial Intelligence’[tiab] OR ‘machine learning’[tiab] OR ‘deep learning’[tiab] OR ‘neural network*’[tiab] OR ‘automated’[tiab] OR ‘automatic’[tiab]
Web of Science	TS = (‘Artificial Intelligence’ OR ‘machine learning’ OR ‘deep learning’ OR ‘neural network*’ OR ‘automated’ OR ‘automatic’
Embase	‘artificial intelligence’/exp OR ‘artificial intelligence’:ti,ab,kw OR ‘machine learning’:ti,ab,kw OR ‘deep learning’:ti,ab,kw OR ‘neural network*’:ti,ab,kw OR ‘automated’:ti,ab,kw OR ‘automatic’:ti,ab,kw
Grey Literature	‘Artificial Intelligence’ OR ‘machine learning’ OR ‘deep learning’ OR ‘neural network*’ OR ‘automated’ OR ‘automatic’
Concept #2 cephalometric analysis
PubMed	‘Cephalometry’[Mesh] OR ‘Cephalometry’[tiab] OR ‘craniometry’[tiab] OR ‘cephalometric*’[tiab] OR ‘landmark detection’[tiab]
Web of Science	TS = (‘Cephalometry’ OR ‘craniometry’ OR ‘cephalometric*’ OR ‘landmark detection’)
Embase	‘cephalometry’/exp OR ‘cephalometry’:ti,ab,kw OR ‘cephalometric*’:ti,ab,kw OR ‘craniometry’:ti,ab,kw OR ‘landmark detection’:ti,ab,kw
Grey Literature	‘Cephalometry’ OR ‘craniometry’ OR ‘cephalometric*’ OR ‘landmark detection’

A comprehensive grey literature search was executed using databases such as ProQuest, Google Scholar, OpenThesis, and OpenGrey to minimize the risk of selection bias. In addition, a thorough hand-search of references within original articles, reviews, and conference proceedings (collection of conference papers) was performed to identify any additional studies that were not retrieved from the chosen electronic databases. The articles identified were imported into Endnote X9 software (Thomson Reuters, Philadelphia, PA, USA) for the removal of duplicates and further selection.

Study selection and data extraction

Two reviewers (J.H., M.V.) independently screened the relevant articles based on their titles and abstracts, followed by full-text reading of the included studies against the eligibility criteria. Any disagreement was resolved through discussion. A third experienced reviewer (R.J.) was consulted if consensus could not be reached.

Data extracted from the selected articles included: title, author, year of publication, country of origin, aim of the study (algorithm’s computational improvement or clinical validation), image type (2D lateral cephalograms or 3D-CBCT images), dataset source, total sample size, subsets (training, validation, test), characteristics of applied AI-based algorithm, number of landmarks and reported outcomes. The corresponding authors of the included studies were contacted for the provision of any further information or missing data.

Risk of bias assessment

The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was used to evaluate the risk of bias and applicability concerns. This tool was chosen due to its comprehensive coverage of aspects that need assessment in primary diagnostic accuracy studies, and its customizability, which allows for a more focused approach tailored to specific review. It served two purposes: first, to assess the impact of potential bias sources on test accuracy estimates, and second, to evaluate the influence of hypothesized sources of clinical heterogeneity on these estimates [19].

The tool consisted of a systematically developed checklist for determining the quality of diagnostic test accuracy studies (DTA). The checklist was divided into four domains for evaluating the risk of bias: (i) data selection (consecutive or random inclusions, no case-control design, no inappropriate exclusions); (ii) index test, i.e. test under evaluation (interpretation blinded for and independent of reference standard); (iii) reference standard, i.e. how was ground truth established (interpretation independent of and blinded for index test, valid reference test); (iv) flow and timing (sufficient time between index test and reference standard, did all data receive reference standard and the same one, all data included in the analysis). The first three domains were also evaluated in relation to concerns about applicability (does each domain match the research question) [19]. The applicability concerns help to determine if the study’s findings can be applied to real-life clinical scenarios. If significant concerns arise in any of the domains, it could impact the overall applicability of the study’s results to a broader patient population or clinical setting [20].

Two reviewers (J.H., M.V.) independently assessed the risk of bias using the QUADAS-2 checklist. Discrepancies were resolved through discussion. If consensus could not be reached, third experienced reviewer (R.J.) was consulted.

Data analysis and synthesis

A meta-analysis was conducted using RStudio (version 2023.12.1, Posit Software, Boston, MA, USA) to evaluate the accuracy of 2D landmarks identification based on MRE and standard error (SE), where MRE value closer to zero corresponds to higher accuracy of automated landmarks identification. When multiple test datasets were used in the studies, they were assessed as separate groups to account for data variability. The summary measures included the MRE of test datasets with 95% confidence interval (CI). Heterogeneity was examined using Q-value and I² statistics. The choice of statistical model was determined by the I² statistics, a measure of heterogeneity. If the I² was less than 50%, indicating low heterogeneity, a fixed-effects model was employed. Conversely, if the I² exceeded 50%, suggesting substantial heterogeneity, a random-effects model was utilized. The selected model was then used to generate the forest plot. The number of radiographs and cephalometric landmarks evaluated in each test dataset was considered when determining the weights of each study in the meta-analysis. A P-value of less than 0.05 was deemed statistically significant.

Results

Study selection

The electronic database search yielded 2082 articles. Of these, 1026 were duplicates and 971 did not meet the eligibility criteria based on their titles and abstracts. The full text of the remaining 76 articles was reviewed, resulting in further exclusion of 45 articles. Supplementary File 2 describes the reasons for exclusion. Ultimately, 34 studies were deemed eligible and included in the systematic review. The selection process is depicted in the PRISMA 2020 flow diagram (Fig. 1).

Figure 1.

PRISMA 2020 flow diagram for systematic reviews.

Study characteristics

The included studies covered a period of seven years, from 2017 until 2023. The majority of the studies originated from South Korea (n = 15), followed by China (n = 7), Japan (n = 3), USA (n = 3), Germany (n = 2), and one each from France, Hong Kong, Netherlands, and Turkey. Automated AI-based landmarks identification was applied on 2D lateral cephalograms in 27 studies and 3D-CBCT images in 7 studies. Most studies (27 studies) primarily investigated the computational improvement of algorithms for landmark detection, while seven studies focused on clinical validation of established methods. The characteristics of these 2D and 3D studies are summarized in Tables 2 and 3 respectively.

Table 2.

Characteristics of included studies using 2D lateral cephalograms.

1st author	Year	Country of origin	Compu-tational improve-ment	Clinical validation	AI framework	Land-marks (N)	Data source	Total sample	Train set	Test set	Train and test set separation	Resolution (mm/pixel)	SDR within 2 mm range or MRE in mm	SDR of gonion within 2 mm range or MRE in mm	Computa-tional time
Arik et al. [27]	2017	USA	✓		Custom CNN combined with a shape model for refinement	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	SDR Test 1: 75.37% SDR Test 2: 67.68%
Bulatova et al. [51]	2021	USA		✓	CNN YOLOv3 by DDH Inc	16	AAOF Legacy Denver collection	110			n/a	n/a	SDR: 75%
Chen et al [3]	2019	Hongkong	✓		VGG-19, ResNet20, and Inception; custom attentive feature pyramid fusion and regression voting	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.67% MRE: 1.17 ± 1.19 mm Test 2: SDR: 75.05% MRE: 1.48 ± 0.77 mm		−
Chen et al [52].	2023	China	✓		U-shpaed CNN with Monte Carlo dropout	19	IEEE Grand Challenge 2015 + own dataset (108)	508	258	150 + 100	Yes	0.1	Test 1: SDR: 79.65% MRE: 1.39 ± 1.06 mm Test 2: SDR: 80.05% MRE: 1.33 ± 0.93 mm	Test 1: SDR: 55.33% MRE: 2.15 ± 1.50 mm Test 2: SDR: 52.00% MRE: 2.20 ± 1.36mm	0.75 s
Hong et al [53].	2023	Korea	✓		Deep Q-network	19, 36	IEEE Grand Challenge 2015 + Samsung Medical Center (SMC) dataset (500)	900	150 + 420	150 + 100 + 80	Yes	0.1, 0.13	IEEE: Test 1: SDR: 77.65% MRE: 1.49 ± 1.42 mm Test 2: SDR: 70.47% MRE: 1.60 ± 1.3 mm SMC dataset: SDR:67.33% MRE: 1.89 ± 1.5 mm
Huang et al. [54]	2021	Germany	✓		LeNet-5 for ROI patches and ResNet50 for landmark location	19	CQ500 CTs (train) and IEEE Grand Challenge 2015 (test)	741	491	150 + 100	Yes	0.5	Test 1: SDR: 86.7% Test 2: SDR: 73.7%
Hwang et al. [25]	2021	Korea		✓	Customized YOLO V3	19	n/a	2183	1983	200	Yes	n/a	SDR: 75.45% MRE: 1.76 ± 2.16 mm	SDR: 38.0% MRE: 2.75 ± 1.84 mm
Kim et al .[42]	2020	Korea	✓		Stacked houglass-shaped DL model	23	Own dataset, IEEE Grand Challenge 2015	2475	Training: 1675 Validation: 175 Fine-tuning: 300 (IDBI)	225 + 100 (ISBI)	Yes	Own dataset: 0.127-0.15 ISBI: 0.1	SDR: 84.53% Group 1 (test set: 200): MRE: 1.37 ± 1.79 mm		0.4s
Kim et al. [55]	2021	Korea	✓		Multistage CNN	15	Kyung Hee University Dental Hospital	860	690	170	Yes	0.39	SDR: 87.13% MRE: 1.03 ± 1.288 mm	SDR: 62.64% MRE: 2.04 ± 1.727 mm
Kim et al. [56]	2021	Korea		✓	DL model with a 2-step structure: a ROI machine and a detection machine	13	Yonsei Dental Hospital	950	Training: 800 Validation: 100	50	Yes	0.12	SDR: 64.3% MRE: 1.84 mm		−
Kwon et al . [32]	2021	Korea	✓		Multistage probabilistic approach based on DeepLabv3	19	IEEE Grand Challenge 2015	400	Training:150 Validation: 150)	100	Yes	0.1	Validation test: SDR: 86.91% MRE: 1.12 mm Test 2: SDR: 77.16% MRE: 1.41 mm	Test 2: SDR: 85.00% MRE: 1.20 mm
Le et al. [39]	2022	Korea		✓	Deep Anatomical Context Feature Learning (DACFL)	41	Jeonbuk National University Dental Hospital	1293	1193	100	Yes	n/a	SDR: 73.32% MRE: 1.87 ± 2.04 mm	SDR: 51% MRE: 2.70 ± 2.14 mm
Lee et al. [7]	2020	Korea	✓		Custom CNN for ROI and custom Bayesian CNN for landmark detection	19	IEEE Grand Challenge 2015	400	150	250	Yes	0.1	SDR: 82.11% MLE (landmark): 1.53 ± 1.74 mm	SDR: 63.33% MLE: 2.39 ± 4.77 mm	512s (with 1 GPU) 38s (with 4 GPUs)
Lee et al [57]	2022	Korea	✓		Single-passing CNN for an accurate regression of the landmarks	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.42% MRE: 1.19 ± 0.80 mm Test 2: SDR: 74.58%	Test 1: SDR: 60.67% Test 2: SDR: 77.00%
Noothout et al. [58]	2020	Netherlands	✓		Custom FCNNs based on ResNet34	19	IEEE Grand Challenge 2015	400	Training: 140 Validation: 10	150 + 100	Yes	0.1	Test 1: SDR: 82% Test 2: SDR: 72%	Test 1: MRE: 2.12 ± 1.83 mm Test 2: MRE: 1.68 ± 1.61 mm	0.05 ± 0.009 s
Oh et al. [59]	2021	Korea	✓		DACFL, custom FCN combined with a local feature perturbator with anatomical configuration loss	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.20% Test 2: SDR: 75.89%	Test 1: SDR: 60.1% Test 2: SDR: 83.0%	0.15s
Park et al. [60]	2019	Korea		✓	YOLO V3 with modification and single shot multibox detector (SSD)	80	Seoul National University Dental Hospital	1311	1028	283	Yes	0.14	YOLOv3: SDR: 80.4%	−	YOLOv3: 0.05s SSD: 2.89s
Qian et al. [28]	2020	China	✓		Cepha-NN, combining U-Net-shaped networks, attention mechanism, and region enhancing loss	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 87.61% MRE: 1.15 mm Test 2: SDR: 76.32% MRE: 1.43 mm	Test 1: SDR: 67.33% MRE: 1.5941 mm Test 2: SDR: 81.00% MRE: 1.3809 mm
Song et al. [29]	2020	Japan	✓		Two-step approach: ROI extraction and ResNet50	19	IEEE Grand Challenge 2015, Sandong University for testing	500	150	150 + 100 + 100 (own dataset)	Yes	0.1	Test 1: SDR: 86.4% MRE: 1.077 mm Test 2: SDR: 74.0% MRE: 1.542 mm Own dataset: SDR: 62.0% MRE: 2.1 mm	Test 1: SDR: 62.7% MRE: 1.817 mm Test 2: SDR: 75.0% MRE: 1.431 mm Own dataset: SDR: 51.0% MRE: 2.4 mm
Song et al. [61]	2021	Japan	✓		U-Net based with encoders and decoders and a second fine detection step	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 85.2% MRE: 1.194 mm Test 2: SDR: 72.2% MRE: 1.643 mm	Test 1: SDR: 60.0% MRE: 1.966 mm Test 2: SDR: 67.0% MRE: 1.999 mm	4.0s
Ugurlu et al. [62]	2022	Turkey	✓		CranioCatch: feature aggregation and refinement network (FARNet), CNN based DL model	21	Departement of Ortho-dontics, Faculty of Dentistry, Eskişehir Osmangazi University	1620	Training: 1300 Validation: 140	180	Yes	n/a	SDR: 76.2% MRE: 3.400 ± 1.57 mm	SDR: 48.3% MRE: 8.304 ± 2.98 mm
Wang et al. [63]	2021	China	✓		DCNN based on a iterative method	19	IEEE Grand Challenge 2015	300	150	150	Yes	0.1	SDR: 87.51%	SDR: 74.7%	20s
Yang et al [2].	2023	Korea	✓		CephNet with FCN	19	Seoul National University	1286	Training: 704 Validation: 182	400	Yes	0.1	SDR: 73.14% MRE: 1.75 ± 1.67 mm	SDR: 74.00% MRE: 2.13 ± 3.47 mm
Yao et al. [30]	2022	China	✓		CNN with a global detection module and a locally modified module	37	Department of oral and maxillofacial surgery, West China College of stomatology Sichuan University	512	Training: 312 validation: 100	100	Yes	0.125	Validation data: SDR: 97.30% MRE: 1.127 ± 1.02 8 mm Test: SDR: 97.30% MRE: 1.038 ± 0.893 mm	Test set: SDR: 65% MRE: 1.721 ± 1.325 mm	3s
Zeng et al. [64]	2021	China	✓		Cascaded three-stage CNN	19	IEEE Grand Challenge 2015 + own dataset (102)	502	150	150 + 100 + 102	Yes	0.1	Test 1: SDR: 81.37% MRE: 1.34 ± 0.92 mm Test 2: SDR: 70.58% MRE: 1.64 ± 0.91 mm Extra validation: SDR: 64.81% MRE: 2.02 ± 1.89 mm	Test 1: SDR: 57.33% MRE: 1.97 ± 1.10 mm Test 2: SDR: 69.00% MRE: 1.59 ± 1.02 mm
Zhao et al [65].	2023	China	✓		Multi-scale YOLO V3	19	IEEE Grand Challenge 2015	400 Augmented to 2100	1950	150	Yes	0.1	SDR: 80.84%	SDR: 55.33%
Zhong et al. [66]	2019	China	✓		2-stage (global and local) U-Net models	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.91% MRE: 1.12 ± 0.88 mm Test 2: SDR: 76.00% MRE: 1.42 ± 0.84 mm

1st author	Year	Country of origin	Compu-tational improve-ment	Clinical validation	AI framework	Land-marks (N)	Data source	Total sample	Train set	Test set	Train and test set separation	Resolution (mm/pixel)	SDR within 2 mm range or MRE in mm	SDR of gonion within 2 mm range or MRE in mm	Computa-tional time
Arik et al. [27]	2017	USA	✓		Custom CNN combined with a shape model for refinement	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	SDR Test 1: 75.37% SDR Test 2: 67.68%
Bulatova et al. [51]	2021	USA		✓	CNN YOLOv3 by DDH Inc	16	AAOF Legacy Denver collection	110			n/a	n/a	SDR: 75%
Chen et al [3]	2019	Hongkong	✓		VGG-19, ResNet20, and Inception; custom attentive feature pyramid fusion and regression voting	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.67% MRE: 1.17 ± 1.19 mm Test 2: SDR: 75.05% MRE: 1.48 ± 0.77 mm		−
Chen et al [52].	2023	China	✓		U-shpaed CNN with Monte Carlo dropout	19	IEEE Grand Challenge 2015 + own dataset (108)	508	258	150 + 100	Yes	0.1	Test 1: SDR: 79.65% MRE: 1.39 ± 1.06 mm Test 2: SDR: 80.05% MRE: 1.33 ± 0.93 mm	Test 1: SDR: 55.33% MRE: 2.15 ± 1.50 mm Test 2: SDR: 52.00% MRE: 2.20 ± 1.36mm	0.75 s
Hong et al [53].	2023	Korea	✓		Deep Q-network	19, 36	IEEE Grand Challenge 2015 + Samsung Medical Center (SMC) dataset (500)	900	150 + 420	150 + 100 + 80	Yes	0.1, 0.13	IEEE: Test 1: SDR: 77.65% MRE: 1.49 ± 1.42 mm Test 2: SDR: 70.47% MRE: 1.60 ± 1.3 mm SMC dataset: SDR:67.33% MRE: 1.89 ± 1.5 mm
Huang et al. [54]	2021	Germany	✓		LeNet-5 for ROI patches and ResNet50 for landmark location	19	CQ500 CTs (train) and IEEE Grand Challenge 2015 (test)	741	491	150 + 100	Yes	0.5	Test 1: SDR: 86.7% Test 2: SDR: 73.7%
Hwang et al. [25]	2021	Korea		✓	Customized YOLO V3	19	n/a	2183	1983	200	Yes	n/a	SDR: 75.45% MRE: 1.76 ± 2.16 mm	SDR: 38.0% MRE: 2.75 ± 1.84 mm
Kim et al .[42]	2020	Korea	✓		Stacked houglass-shaped DL model	23	Own dataset, IEEE Grand Challenge 2015	2475	Training: 1675 Validation: 175 Fine-tuning: 300 (IDBI)	225 + 100 (ISBI)	Yes	Own dataset: 0.127-0.15 ISBI: 0.1	SDR: 84.53% Group 1 (test set: 200): MRE: 1.37 ± 1.79 mm		0.4s
Kim et al. [55]	2021	Korea	✓		Multistage CNN	15	Kyung Hee University Dental Hospital	860	690	170	Yes	0.39	SDR: 87.13% MRE: 1.03 ± 1.288 mm	SDR: 62.64% MRE: 2.04 ± 1.727 mm
Kim et al. [56]	2021	Korea		✓	DL model with a 2-step structure: a ROI machine and a detection machine	13	Yonsei Dental Hospital	950	Training: 800 Validation: 100	50	Yes	0.12	SDR: 64.3% MRE: 1.84 mm		−
Kwon et al . [32]	2021	Korea	✓		Multistage probabilistic approach based on DeepLabv3	19	IEEE Grand Challenge 2015	400	Training:150 Validation: 150)	100	Yes	0.1	Validation test: SDR: 86.91% MRE: 1.12 mm Test 2: SDR: 77.16% MRE: 1.41 mm	Test 2: SDR: 85.00% MRE: 1.20 mm
Le et al. [39]	2022	Korea		✓	Deep Anatomical Context Feature Learning (DACFL)	41	Jeonbuk National University Dental Hospital	1293	1193	100	Yes	n/a	SDR: 73.32% MRE: 1.87 ± 2.04 mm	SDR: 51% MRE: 2.70 ± 2.14 mm
Lee et al. [7]	2020	Korea	✓		Custom CNN for ROI and custom Bayesian CNN for landmark detection	19	IEEE Grand Challenge 2015	400	150	250	Yes	0.1	SDR: 82.11% MLE (landmark): 1.53 ± 1.74 mm	SDR: 63.33% MLE: 2.39 ± 4.77 mm	512s (with 1 GPU) 38s (with 4 GPUs)
Lee et al [57]	2022	Korea	✓		Single-passing CNN for an accurate regression of the landmarks	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.42% MRE: 1.19 ± 0.80 mm Test 2: SDR: 74.58%	Test 1: SDR: 60.67% Test 2: SDR: 77.00%
Noothout et al. [58]	2020	Netherlands	✓		Custom FCNNs based on ResNet34	19	IEEE Grand Challenge 2015	400	Training: 140 Validation: 10	150 + 100	Yes	0.1	Test 1: SDR: 82% Test 2: SDR: 72%	Test 1: MRE: 2.12 ± 1.83 mm Test 2: MRE: 1.68 ± 1.61 mm	0.05 ± 0.009 s
Oh et al. [59]	2021	Korea	✓		DACFL, custom FCN combined with a local feature perturbator with anatomical configuration loss	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.20% Test 2: SDR: 75.89%	Test 1: SDR: 60.1% Test 2: SDR: 83.0%	0.15s
Park et al. [60]	2019	Korea		✓	YOLO V3 with modification and single shot multibox detector (SSD)	80	Seoul National University Dental Hospital	1311	1028	283	Yes	0.14	YOLOv3: SDR: 80.4%	−	YOLOv3: 0.05s SSD: 2.89s
Qian et al. [28]	2020	China	✓		Cepha-NN, combining U-Net-shaped networks, attention mechanism, and region enhancing loss	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 87.61% MRE: 1.15 mm Test 2: SDR: 76.32% MRE: 1.43 mm	Test 1: SDR: 67.33% MRE: 1.5941 mm Test 2: SDR: 81.00% MRE: 1.3809 mm
Song et al. [29]	2020	Japan	✓		Two-step approach: ROI extraction and ResNet50	19	IEEE Grand Challenge 2015, Sandong University for testing	500	150	150 + 100 + 100 (own dataset)	Yes	0.1	Test 1: SDR: 86.4% MRE: 1.077 mm Test 2: SDR: 74.0% MRE: 1.542 mm Own dataset: SDR: 62.0% MRE: 2.1 mm	Test 1: SDR: 62.7% MRE: 1.817 mm Test 2: SDR: 75.0% MRE: 1.431 mm Own dataset: SDR: 51.0% MRE: 2.4 mm
Song et al. [61]	2021	Japan	✓		U-Net based with encoders and decoders and a second fine detection step	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 85.2% MRE: 1.194 mm Test 2: SDR: 72.2% MRE: 1.643 mm	Test 1: SDR: 60.0% MRE: 1.966 mm Test 2: SDR: 67.0% MRE: 1.999 mm	4.0s
Ugurlu et al. [62]	2022	Turkey	✓		CranioCatch: feature aggregation and refinement network (FARNet), CNN based DL model	21	Departement of Ortho-dontics, Faculty of Dentistry, Eskişehir Osmangazi University	1620	Training: 1300 Validation: 140	180	Yes	n/a	SDR: 76.2% MRE: 3.400 ± 1.57 mm	SDR: 48.3% MRE: 8.304 ± 2.98 mm
Wang et al. [63]	2021	China	✓		DCNN based on a iterative method	19	IEEE Grand Challenge 2015	300	150	150	Yes	0.1	SDR: 87.51%	SDR: 74.7%	20s
Yang et al [2].	2023	Korea	✓		CephNet with FCN	19	Seoul National University	1286	Training: 704 Validation: 182	400	Yes	0.1	SDR: 73.14% MRE: 1.75 ± 1.67 mm	SDR: 74.00% MRE: 2.13 ± 3.47 mm
Yao et al. [30]	2022	China	✓		CNN with a global detection module and a locally modified module	37	Department of oral and maxillofacial surgery, West China College of stomatology Sichuan University	512	Training: 312 validation: 100	100	Yes	0.125	Validation data: SDR: 97.30% MRE: 1.127 ± 1.02 8 mm Test: SDR: 97.30% MRE: 1.038 ± 0.893 mm	Test set: SDR: 65% MRE: 1.721 ± 1.325 mm	3s
Zeng et al. [64]	2021	China	✓		Cascaded three-stage CNN	19	IEEE Grand Challenge 2015 + own dataset (102)	502	150	150 + 100 + 102	Yes	0.1	Test 1: SDR: 81.37% MRE: 1.34 ± 0.92 mm Test 2: SDR: 70.58% MRE: 1.64 ± 0.91 mm Extra validation: SDR: 64.81% MRE: 2.02 ± 1.89 mm	Test 1: SDR: 57.33% MRE: 1.97 ± 1.10 mm Test 2: SDR: 69.00% MRE: 1.59 ± 1.02 mm
Zhao et al [65].	2023	China	✓		Multi-scale YOLO V3	19	IEEE Grand Challenge 2015	400 Augmented to 2100	1950	150	Yes	0.1	SDR: 80.84%	SDR: 55.33%
Zhong et al. [66]	2019	China	✓		2-stage (global and local) U-Net models	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.91% MRE: 1.12 ± 0.88 mm Test 2: SDR: 76.00% MRE: 1.42 ± 0.84 mm

Abbreviations: MRE, mean radial error; SDR, success detection rate.

Table 2.

Characteristics of included studies using 2D lateral cephalograms.

1st author	Year	Country of origin	Compu-tational improve-ment	Clinical validation	AI framework	Land-marks (N)	Data source	Total sample	Train set	Test set	Train and test set separation	Resolution (mm/pixel)	SDR within 2 mm range or MRE in mm	SDR of gonion within 2 mm range or MRE in mm	Computa-tional time
Arik et al. [27]	2017	USA	✓		Custom CNN combined with a shape model for refinement	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	SDR Test 1: 75.37% SDR Test 2: 67.68%
Bulatova et al. [51]	2021	USA		✓	CNN YOLOv3 by DDH Inc	16	AAOF Legacy Denver collection	110			n/a	n/a	SDR: 75%
Chen et al [3]	2019	Hongkong	✓		VGG-19, ResNet20, and Inception; custom attentive feature pyramid fusion and regression voting	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.67% MRE: 1.17 ± 1.19 mm Test 2: SDR: 75.05% MRE: 1.48 ± 0.77 mm		−
Chen et al [52].	2023	China	✓		U-shpaed CNN with Monte Carlo dropout	19	IEEE Grand Challenge 2015 + own dataset (108)	508	258	150 + 100	Yes	0.1	Test 1: SDR: 79.65% MRE: 1.39 ± 1.06 mm Test 2: SDR: 80.05% MRE: 1.33 ± 0.93 mm	Test 1: SDR: 55.33% MRE: 2.15 ± 1.50 mm Test 2: SDR: 52.00% MRE: 2.20 ± 1.36mm	0.75 s
Hong et al [53].	2023	Korea	✓		Deep Q-network	19, 36	IEEE Grand Challenge 2015 + Samsung Medical Center (SMC) dataset (500)	900	150 + 420	150 + 100 + 80	Yes	0.1, 0.13	IEEE: Test 1: SDR: 77.65% MRE: 1.49 ± 1.42 mm Test 2: SDR: 70.47% MRE: 1.60 ± 1.3 mm SMC dataset: SDR:67.33% MRE: 1.89 ± 1.5 mm
Huang et al. [54]	2021	Germany	✓		LeNet-5 for ROI patches and ResNet50 for landmark location	19	CQ500 CTs (train) and IEEE Grand Challenge 2015 (test)	741	491	150 + 100	Yes	0.5	Test 1: SDR: 86.7% Test 2: SDR: 73.7%
Hwang et al. [25]	2021	Korea		✓	Customized YOLO V3	19	n/a	2183	1983	200	Yes	n/a	SDR: 75.45% MRE: 1.76 ± 2.16 mm	SDR: 38.0% MRE: 2.75 ± 1.84 mm
Kim et al .[42]	2020	Korea	✓		Stacked houglass-shaped DL model	23	Own dataset, IEEE Grand Challenge 2015	2475	Training: 1675 Validation: 175 Fine-tuning: 300 (IDBI)	225 + 100 (ISBI)	Yes	Own dataset: 0.127-0.15 ISBI: 0.1	SDR: 84.53% Group 1 (test set: 200): MRE: 1.37 ± 1.79 mm		0.4s
Kim et al. [55]	2021	Korea	✓		Multistage CNN	15	Kyung Hee University Dental Hospital	860	690	170	Yes	0.39	SDR: 87.13% MRE: 1.03 ± 1.288 mm	SDR: 62.64% MRE: 2.04 ± 1.727 mm
Kim et al. [56]	2021	Korea		✓	DL model with a 2-step structure: a ROI machine and a detection machine	13	Yonsei Dental Hospital	950	Training: 800 Validation: 100	50	Yes	0.12	SDR: 64.3% MRE: 1.84 mm		−
Kwon et al . [32]	2021	Korea	✓		Multistage probabilistic approach based on DeepLabv3	19	IEEE Grand Challenge 2015	400	Training:150 Validation: 150)	100	Yes	0.1	Validation test: SDR: 86.91% MRE: 1.12 mm Test 2: SDR: 77.16% MRE: 1.41 mm	Test 2: SDR: 85.00% MRE: 1.20 mm
Le et al. [39]	2022	Korea		✓	Deep Anatomical Context Feature Learning (DACFL)	41	Jeonbuk National University Dental Hospital	1293	1193	100	Yes	n/a	SDR: 73.32% MRE: 1.87 ± 2.04 mm	SDR: 51% MRE: 2.70 ± 2.14 mm
Lee et al. [7]	2020	Korea	✓		Custom CNN for ROI and custom Bayesian CNN for landmark detection	19	IEEE Grand Challenge 2015	400	150	250	Yes	0.1	SDR: 82.11% MLE (landmark): 1.53 ± 1.74 mm	SDR: 63.33% MLE: 2.39 ± 4.77 mm	512s (with 1 GPU) 38s (with 4 GPUs)
Lee et al [57]	2022	Korea	✓		Single-passing CNN for an accurate regression of the landmarks	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.42% MRE: 1.19 ± 0.80 mm Test 2: SDR: 74.58%	Test 1: SDR: 60.67% Test 2: SDR: 77.00%
Noothout et al. [58]	2020	Netherlands	✓		Custom FCNNs based on ResNet34	19	IEEE Grand Challenge 2015	400	Training: 140 Validation: 10	150 + 100	Yes	0.1	Test 1: SDR: 82% Test 2: SDR: 72%	Test 1: MRE: 2.12 ± 1.83 mm Test 2: MRE: 1.68 ± 1.61 mm	0.05 ± 0.009 s
Oh et al. [59]	2021	Korea	✓		DACFL, custom FCN combined with a local feature perturbator with anatomical configuration loss	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.20% Test 2: SDR: 75.89%	Test 1: SDR: 60.1% Test 2: SDR: 83.0%	0.15s
Park et al. [60]	2019	Korea		✓	YOLO V3 with modification and single shot multibox detector (SSD)	80	Seoul National University Dental Hospital	1311	1028	283	Yes	0.14	YOLOv3: SDR: 80.4%	−	YOLOv3: 0.05s SSD: 2.89s
Qian et al. [28]	2020	China	✓		Cepha-NN, combining U-Net-shaped networks, attention mechanism, and region enhancing loss	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 87.61% MRE: 1.15 mm Test 2: SDR: 76.32% MRE: 1.43 mm	Test 1: SDR: 67.33% MRE: 1.5941 mm Test 2: SDR: 81.00% MRE: 1.3809 mm
Song et al. [29]	2020	Japan	✓		Two-step approach: ROI extraction and ResNet50	19	IEEE Grand Challenge 2015, Sandong University for testing	500	150	150 + 100 + 100 (own dataset)	Yes	0.1	Test 1: SDR: 86.4% MRE: 1.077 mm Test 2: SDR: 74.0% MRE: 1.542 mm Own dataset: SDR: 62.0% MRE: 2.1 mm	Test 1: SDR: 62.7% MRE: 1.817 mm Test 2: SDR: 75.0% MRE: 1.431 mm Own dataset: SDR: 51.0% MRE: 2.4 mm
Song et al. [61]	2021	Japan	✓		U-Net based with encoders and decoders and a second fine detection step	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 85.2% MRE: 1.194 mm Test 2: SDR: 72.2% MRE: 1.643 mm	Test 1: SDR: 60.0% MRE: 1.966 mm Test 2: SDR: 67.0% MRE: 1.999 mm	4.0s
Ugurlu et al. [62]	2022	Turkey	✓		CranioCatch: feature aggregation and refinement network (FARNet), CNN based DL model	21	Departement of Ortho-dontics, Faculty of Dentistry, Eskişehir Osmangazi University	1620	Training: 1300 Validation: 140	180	Yes	n/a	SDR: 76.2% MRE: 3.400 ± 1.57 mm	SDR: 48.3% MRE: 8.304 ± 2.98 mm
Wang et al. [63]	2021	China	✓		DCNN based on a iterative method	19	IEEE Grand Challenge 2015	300	150	150	Yes	0.1	SDR: 87.51%	SDR: 74.7%	20s
Yang et al [2].	2023	Korea	✓		CephNet with FCN	19	Seoul National University	1286	Training: 704 Validation: 182	400	Yes	0.1	SDR: 73.14% MRE: 1.75 ± 1.67 mm	SDR: 74.00% MRE: 2.13 ± 3.47 mm
Yao et al. [30]	2022	China	✓		CNN with a global detection module and a locally modified module	37	Department of oral and maxillofacial surgery, West China College of stomatology Sichuan University	512	Training: 312 validation: 100	100	Yes	0.125	Validation data: SDR: 97.30% MRE: 1.127 ± 1.02 8 mm Test: SDR: 97.30% MRE: 1.038 ± 0.893 mm	Test set: SDR: 65% MRE: 1.721 ± 1.325 mm	3s
Zeng et al. [64]	2021	China	✓		Cascaded three-stage CNN	19	IEEE Grand Challenge 2015 + own dataset (102)	502	150	150 + 100 + 102	Yes	0.1	Test 1: SDR: 81.37% MRE: 1.34 ± 0.92 mm Test 2: SDR: 70.58% MRE: 1.64 ± 0.91 mm Extra validation: SDR: 64.81% MRE: 2.02 ± 1.89 mm	Test 1: SDR: 57.33% MRE: 1.97 ± 1.10 mm Test 2: SDR: 69.00% MRE: 1.59 ± 1.02 mm
Zhao et al [65].	2023	China	✓		Multi-scale YOLO V3	19	IEEE Grand Challenge 2015	400 Augmented to 2100	1950	150	Yes	0.1	SDR: 80.84%	SDR: 55.33%
Zhong et al. [66]	2019	China	✓		2-stage (global and local) U-Net models	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.91% MRE: 1.12 ± 0.88 mm Test 2: SDR: 76.00% MRE: 1.42 ± 0.84 mm

1st author	Year	Country of origin	Compu-tational improve-ment	Clinical validation	AI framework	Land-marks (N)	Data source	Total sample	Train set	Test set	Train and test set separation	Resolution (mm/pixel)	SDR within 2 mm range or MRE in mm	SDR of gonion within 2 mm range or MRE in mm	Computa-tional time
Arik et al. [27]	2017	USA	✓		Custom CNN combined with a shape model for refinement	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	SDR Test 1: 75.37% SDR Test 2: 67.68%
Bulatova et al. [51]	2021	USA		✓	CNN YOLOv3 by DDH Inc	16	AAOF Legacy Denver collection	110			n/a	n/a	SDR: 75%
Chen et al [3]	2019	Hongkong	✓		VGG-19, ResNet20, and Inception; custom attentive feature pyramid fusion and regression voting	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.67% MRE: 1.17 ± 1.19 mm Test 2: SDR: 75.05% MRE: 1.48 ± 0.77 mm		−
Chen et al [52].	2023	China	✓		U-shpaed CNN with Monte Carlo dropout	19	IEEE Grand Challenge 2015 + own dataset (108)	508	258	150 + 100	Yes	0.1	Test 1: SDR: 79.65% MRE: 1.39 ± 1.06 mm Test 2: SDR: 80.05% MRE: 1.33 ± 0.93 mm	Test 1: SDR: 55.33% MRE: 2.15 ± 1.50 mm Test 2: SDR: 52.00% MRE: 2.20 ± 1.36mm	0.75 s
Hong et al [53].	2023	Korea	✓		Deep Q-network	19, 36	IEEE Grand Challenge 2015 + Samsung Medical Center (SMC) dataset (500)	900	150 + 420	150 + 100 + 80	Yes	0.1, 0.13	IEEE: Test 1: SDR: 77.65% MRE: 1.49 ± 1.42 mm Test 2: SDR: 70.47% MRE: 1.60 ± 1.3 mm SMC dataset: SDR:67.33% MRE: 1.89 ± 1.5 mm
Huang et al. [54]	2021	Germany	✓		LeNet-5 for ROI patches and ResNet50 for landmark location	19	CQ500 CTs (train) and IEEE Grand Challenge 2015 (test)	741	491	150 + 100	Yes	0.5	Test 1: SDR: 86.7% Test 2: SDR: 73.7%
Hwang et al. [25]	2021	Korea		✓	Customized YOLO V3	19	n/a	2183	1983	200	Yes	n/a	SDR: 75.45% MRE: 1.76 ± 2.16 mm	SDR: 38.0% MRE: 2.75 ± 1.84 mm
Kim et al .[42]	2020	Korea	✓		Stacked houglass-shaped DL model	23	Own dataset, IEEE Grand Challenge 2015	2475	Training: 1675 Validation: 175 Fine-tuning: 300 (IDBI)	225 + 100 (ISBI)	Yes	Own dataset: 0.127-0.15 ISBI: 0.1	SDR: 84.53% Group 1 (test set: 200): MRE: 1.37 ± 1.79 mm		0.4s
Kim et al. [55]	2021	Korea	✓		Multistage CNN	15	Kyung Hee University Dental Hospital	860	690	170	Yes	0.39	SDR: 87.13% MRE: 1.03 ± 1.288 mm	SDR: 62.64% MRE: 2.04 ± 1.727 mm
Kim et al. [56]	2021	Korea		✓	DL model with a 2-step structure: a ROI machine and a detection machine	13	Yonsei Dental Hospital	950	Training: 800 Validation: 100	50	Yes	0.12	SDR: 64.3% MRE: 1.84 mm		−
Kwon et al . [32]	2021	Korea	✓		Multistage probabilistic approach based on DeepLabv3	19	IEEE Grand Challenge 2015	400	Training:150 Validation: 150)	100	Yes	0.1	Validation test: SDR: 86.91% MRE: 1.12 mm Test 2: SDR: 77.16% MRE: 1.41 mm	Test 2: SDR: 85.00% MRE: 1.20 mm
Le et al. [39]	2022	Korea		✓	Deep Anatomical Context Feature Learning (DACFL)	41	Jeonbuk National University Dental Hospital	1293	1193	100	Yes	n/a	SDR: 73.32% MRE: 1.87 ± 2.04 mm	SDR: 51% MRE: 2.70 ± 2.14 mm
Lee et al. [7]	2020	Korea	✓		Custom CNN for ROI and custom Bayesian CNN for landmark detection	19	IEEE Grand Challenge 2015	400	150	250	Yes	0.1	SDR: 82.11% MLE (landmark): 1.53 ± 1.74 mm	SDR: 63.33% MLE: 2.39 ± 4.77 mm	512s (with 1 GPU) 38s (with 4 GPUs)
Lee et al [57]	2022	Korea	✓		Single-passing CNN for an accurate regression of the landmarks	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.42% MRE: 1.19 ± 0.80 mm Test 2: SDR: 74.58%	Test 1: SDR: 60.67% Test 2: SDR: 77.00%
Noothout et al. [58]	2020	Netherlands	✓		Custom FCNNs based on ResNet34	19	IEEE Grand Challenge 2015	400	Training: 140 Validation: 10	150 + 100	Yes	0.1	Test 1: SDR: 82% Test 2: SDR: 72%	Test 1: MRE: 2.12 ± 1.83 mm Test 2: MRE: 1.68 ± 1.61 mm	0.05 ± 0.009 s
Oh et al. [59]	2021	Korea	✓		DACFL, custom FCN combined with a local feature perturbator with anatomical configuration loss	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.20% Test 2: SDR: 75.89%	Test 1: SDR: 60.1% Test 2: SDR: 83.0%	0.15s
Park et al. [60]	2019	Korea		✓	YOLO V3 with modification and single shot multibox detector (SSD)	80	Seoul National University Dental Hospital	1311	1028	283	Yes	0.14	YOLOv3: SDR: 80.4%	−	YOLOv3: 0.05s SSD: 2.89s
Qian et al. [28]	2020	China	✓		Cepha-NN, combining U-Net-shaped networks, attention mechanism, and region enhancing loss	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 87.61% MRE: 1.15 mm Test 2: SDR: 76.32% MRE: 1.43 mm	Test 1: SDR: 67.33% MRE: 1.5941 mm Test 2: SDR: 81.00% MRE: 1.3809 mm
Song et al. [29]	2020	Japan	✓		Two-step approach: ROI extraction and ResNet50	19	IEEE Grand Challenge 2015, Sandong University for testing	500	150	150 + 100 + 100 (own dataset)	Yes	0.1	Test 1: SDR: 86.4% MRE: 1.077 mm Test 2: SDR: 74.0% MRE: 1.542 mm Own dataset: SDR: 62.0% MRE: 2.1 mm	Test 1: SDR: 62.7% MRE: 1.817 mm Test 2: SDR: 75.0% MRE: 1.431 mm Own dataset: SDR: 51.0% MRE: 2.4 mm
Song et al. [61]	2021	Japan	✓		U-Net based with encoders and decoders and a second fine detection step	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 85.2% MRE: 1.194 mm Test 2: SDR: 72.2% MRE: 1.643 mm	Test 1: SDR: 60.0% MRE: 1.966 mm Test 2: SDR: 67.0% MRE: 1.999 mm	4.0s
Ugurlu et al. [62]	2022	Turkey	✓		CranioCatch: feature aggregation and refinement network (FARNet), CNN based DL model	21	Departement of Ortho-dontics, Faculty of Dentistry, Eskişehir Osmangazi University	1620	Training: 1300 Validation: 140	180	Yes	n/a	SDR: 76.2% MRE: 3.400 ± 1.57 mm	SDR: 48.3% MRE: 8.304 ± 2.98 mm
Wang et al. [63]	2021	China	✓		DCNN based on a iterative method	19	IEEE Grand Challenge 2015	300	150	150	Yes	0.1	SDR: 87.51%	SDR: 74.7%	20s
Yang et al [2].	2023	Korea	✓		CephNet with FCN	19	Seoul National University	1286	Training: 704 Validation: 182	400	Yes	0.1	SDR: 73.14% MRE: 1.75 ± 1.67 mm	SDR: 74.00% MRE: 2.13 ± 3.47 mm
Yao et al. [30]	2022	China	✓		CNN with a global detection module and a locally modified module	37	Department of oral and maxillofacial surgery, West China College of stomatology Sichuan University	512	Training: 312 validation: 100	100	Yes	0.125	Validation data: SDR: 97.30% MRE: 1.127 ± 1.02 8 mm Test: SDR: 97.30% MRE: 1.038 ± 0.893 mm	Test set: SDR: 65% MRE: 1.721 ± 1.325 mm	3s
Zeng et al. [64]	2021	China	✓		Cascaded three-stage CNN	19	IEEE Grand Challenge 2015 + own dataset (102)	502	150	150 + 100 + 102	Yes	0.1	Test 1: SDR: 81.37% MRE: 1.34 ± 0.92 mm Test 2: SDR: 70.58% MRE: 1.64 ± 0.91 mm Extra validation: SDR: 64.81% MRE: 2.02 ± 1.89 mm	Test 1: SDR: 57.33% MRE: 1.97 ± 1.10 mm Test 2: SDR: 69.00% MRE: 1.59 ± 1.02 mm
Zhao et al [65].	2023	China	✓		Multi-scale YOLO V3	19	IEEE Grand Challenge 2015	400 Augmented to 2100	1950	150	Yes	0.1	SDR: 80.84%	SDR: 55.33%
Zhong et al. [66]	2019	China	✓		2-stage (global and local) U-Net models	19	IEEE Grand Challenge 2015	400	150	150 + 100	Yes	0.1	Test 1: SDR: 86.91% MRE: 1.12 ± 0.88 mm Test 2: SDR: 76.00% MRE: 1.42 ± 0.84 mm

Abbreviations: MRE, mean radial error; SDR, success detection rate.

Table 3.

Characteristics of included studies using 3D-CBCT images.

1st author	Year	Country of origin	Computa-tional improve-ment	Clinical validation	AI framework	Land-marks (N)	Data source	Total sample	Train set	Test set	Train and test set separation	Resolution (mm³/voxel)	SDR within 2 mm range or MRE in mm	SDR of gonion within 2 mm range or MRE in mm	Computa-tional time
Dot et al. [67]	2022	France	✓		DL method based on Spatial Configuration-Net (SCN)	33	Own dataset	198	Training: 128 Validation: 32	38	Yes	0.45	SDR: 90.4% MRE: 1.0 ± 1.3 mm	SDR: L: 70.3% R: 48.7% MRE L: 1.9 ± 1.7 mm R: 2.1 ± 1.4 mm	60s
Lang et al. [23]	2022	USA	✓		DL method extending Mask R-CNN	105	n/a	50	−	−	n/a	0.4	MSE (squared): 1.38 ± 0.95 mm	−	−
Lee et al. [22]	2019	Korea	✓		VGG-19 (DL)	7	Own dataset	27	20	7	Yes	n/a	Avarage point-to-point error: 1.5 mm	−	−
Ma et al. [31]	2020	Japan		✓	Patch-based deep neural networks with a three-layer CNN	13	The University of Tokyo Hospital	66	58	8	Yes	0.35	Avarage landmarking error: 5.785 ± 0.980 mm	Landmarking error L: 5 mm R: 4 mm	37.871 ± 3.766s
Yun et al. [26]	2020	Korea	✓		Custom CNNs, combined skull normalization, with variational autoencoder (VAE) for coarse to fine detection tasks	93	Own datasets	26 + 229	22 + 208	4 + 21	Yes	n/a	3D point-to-point error: 3.63 mm	−	−
Yun et al. [24]	2022	Korea	✓		Semi-supervised DL method	90	Yonsei University	24	15	9	Yes	n/a	MDE (detection): 2.88 mm	−	−
Weingart et al .[68]	2023	Germany		✓	Deep Neural Patchwork	60	University Hospital Freiburg	30	15	15	Yes	n/a	SDR: 66.4% Mean error: 1.94 ± 1.45 mm	−	2 min

1st author	Year	Country of origin	Computa-tional improve-ment	Clinical validation	AI framework	Land-marks (N)	Data source	Total sample	Train set	Test set	Train and test set separation	Resolution (mm³/voxel)	SDR within 2 mm range or MRE in mm	SDR of gonion within 2 mm range or MRE in mm	Computa-tional time
Dot et al. [67]	2022	France	✓		DL method based on Spatial Configuration-Net (SCN)	33	Own dataset	198	Training: 128 Validation: 32	38	Yes	0.45	SDR: 90.4% MRE: 1.0 ± 1.3 mm	SDR: L: 70.3% R: 48.7% MRE L: 1.9 ± 1.7 mm R: 2.1 ± 1.4 mm	60s
Lang et al. [23]	2022	USA	✓		DL method extending Mask R-CNN	105	n/a	50	−	−	n/a	0.4	MSE (squared): 1.38 ± 0.95 mm	−	−
Lee et al. [22]	2019	Korea	✓		VGG-19 (DL)	7	Own dataset	27	20	7	Yes	n/a	Avarage point-to-point error: 1.5 mm	−	−
Ma et al. [31]	2020	Japan		✓	Patch-based deep neural networks with a three-layer CNN	13	The University of Tokyo Hospital	66	58	8	Yes	0.35	Avarage landmarking error: 5.785 ± 0.980 mm	Landmarking error L: 5 mm R: 4 mm	37.871 ± 3.766s
Yun et al. [26]	2020	Korea	✓		Custom CNNs, combined skull normalization, with variational autoencoder (VAE) for coarse to fine detection tasks	93	Own datasets	26 + 229	22 + 208	4 + 21	Yes	n/a	3D point-to-point error: 3.63 mm	−	−
Yun et al. [24]	2022	Korea	✓		Semi-supervised DL method	90	Yonsei University	24	15	9	Yes	n/a	MDE (detection): 2.88 mm	−	−
Weingart et al .[68]	2023	Germany		✓	Deep Neural Patchwork	60	University Hospital Freiburg	30	15	15	Yes	n/a	SDR: 66.4% Mean error: 1.94 ± 1.45 mm	−	2 min

Abbreviations: MRE, mean radial error; SDR, success detection rate.

Table 3.

Characteristics of included studies using 3D-CBCT images.

1st author	Year	Country of origin	Computa-tional improve-ment	Clinical validation	AI framework	Land-marks (N)	Data source	Total sample	Train set	Test set	Train and test set separation	Resolution (mm³/voxel)	SDR within 2 mm range or MRE in mm	SDR of gonion within 2 mm range or MRE in mm	Computa-tional time
Dot et al. [67]	2022	France	✓		DL method based on Spatial Configuration-Net (SCN)	33	Own dataset	198	Training: 128 Validation: 32	38	Yes	0.45	SDR: 90.4% MRE: 1.0 ± 1.3 mm	SDR: L: 70.3% R: 48.7% MRE L: 1.9 ± 1.7 mm R: 2.1 ± 1.4 mm	60s
Lang et al. [23]	2022	USA	✓		DL method extending Mask R-CNN	105	n/a	50	−	−	n/a	0.4	MSE (squared): 1.38 ± 0.95 mm	−	−
Lee et al. [22]	2019	Korea	✓		VGG-19 (DL)	7	Own dataset	27	20	7	Yes	n/a	Avarage point-to-point error: 1.5 mm	−	−
Ma et al. [31]	2020	Japan		✓	Patch-based deep neural networks with a three-layer CNN	13	The University of Tokyo Hospital	66	58	8	Yes	0.35	Avarage landmarking error: 5.785 ± 0.980 mm	Landmarking error L: 5 mm R: 4 mm	37.871 ± 3.766s
Yun et al. [26]	2020	Korea	✓		Custom CNNs, combined skull normalization, with variational autoencoder (VAE) for coarse to fine detection tasks	93	Own datasets	26 + 229	22 + 208	4 + 21	Yes	n/a	3D point-to-point error: 3.63 mm	−	−
Yun et al. [24]	2022	Korea	✓		Semi-supervised DL method	90	Yonsei University	24	15	9	Yes	n/a	MDE (detection): 2.88 mm	−	−
Weingart et al .[68]	2023	Germany		✓	Deep Neural Patchwork	60	University Hospital Freiburg	30	15	15	Yes	n/a	SDR: 66.4% Mean error: 1.94 ± 1.45 mm	−	2 min

1st author	Year	Country of origin	Computa-tional improve-ment	Clinical validation	AI framework	Land-marks (N)	Data source	Total sample	Train set	Test set	Train and test set separation	Resolution (mm³/voxel)	SDR within 2 mm range or MRE in mm	SDR of gonion within 2 mm range or MRE in mm	Computa-tional time
Dot et al. [67]	2022	France	✓		DL method based on Spatial Configuration-Net (SCN)	33	Own dataset	198	Training: 128 Validation: 32	38	Yes	0.45	SDR: 90.4% MRE: 1.0 ± 1.3 mm	SDR: L: 70.3% R: 48.7% MRE L: 1.9 ± 1.7 mm R: 2.1 ± 1.4 mm	60s
Lang et al. [23]	2022	USA	✓		DL method extending Mask R-CNN	105	n/a	50	−	−	n/a	0.4	MSE (squared): 1.38 ± 0.95 mm	−	−
Lee et al. [22]	2019	Korea	✓		VGG-19 (DL)	7	Own dataset	27	20	7	Yes	n/a	Avarage point-to-point error: 1.5 mm	−	−
Ma et al. [31]	2020	Japan		✓	Patch-based deep neural networks with a three-layer CNN	13	The University of Tokyo Hospital	66	58	8	Yes	0.35	Avarage landmarking error: 5.785 ± 0.980 mm	Landmarking error L: 5 mm R: 4 mm	37.871 ± 3.766s
Yun et al. [26]	2020	Korea	✓		Custom CNNs, combined skull normalization, with variational autoencoder (VAE) for coarse to fine detection tasks	93	Own datasets	26 + 229	22 + 208	4 + 21	Yes	n/a	3D point-to-point error: 3.63 mm	−	−
Yun et al. [24]	2022	Korea	✓		Semi-supervised DL method	90	Yonsei University	24	15	9	Yes	n/a	MDE (detection): 2.88 mm	−	−
Weingart et al .[68]	2023	Germany		✓	Deep Neural Patchwork	60	University Hospital Freiburg	30	15	15	Yes	n/a	SDR: 66.4% Mean error: 1.94 ± 1.45 mm	−	2 min

Abbreviations: MRE, mean radial error; SDR, success detection rate.

Almost half of the 2D studies evaluated the accuracy of their AI algorithms using a public benchmark dataset from the IEEE International Symposium on Biomedical Imaging 2015 grand challenge [21]. This dataset consisted of 400 high-resolution lateral cephalograms (training set = 150, test set 1 = 150, test set 2 = 100) with 19 manually annotated landmarks by two experts (1 junior and 1 senior orthodontic specialist) as the ground truth. These manually annotated landmarks serve as a reference against which the AI algorithm’s performance is measured.

The original dimensions of the images were 1935 × 2400 pixels, with resolution of 0.1 mm per pixel in both horizontal and vertical directions. The average intra-observer variability for these landmark points was found to be 1.73 mm for the junior expert and 0.90 mm for the senior expert. On the other hand, the inter-observer variability between both experts was found to be 1.38 mm, suggesting reasonable accuracy target for automated landmark detection techniques. To compensate for any inter-observer variability, the mean position of the two points from both experts was used as the ground truth [21]. Among the included studies, the total number of landmarks tested ranged from 7 [22] to 105 [23]. The amount of data for training ranged from 15 [24] to 1983 images [25], while the test dataset ranged from 4 [26] to 400 images [2]. Figure 2 illustrates an AI-derived automated landmark identification on 2D cephalogram followed by manual correction by an expert, and manual identification on 3D-CBCT image.

Figure 2.

Cephalometric landmarks detection with Romexis module showing common landmarks for performing cephalometric analysis, such as nasion, orbitale, menton, pogonion, gnathion, point A, point B, anterior nasal spine, porion, sella, and gonion; (A) automated two-dimensional landmarking, (B) manual three-dimensional landmarking.

Qualitative synthesis

A qualitative synthesis of all reported data related to automated 2D and 3D landmarks identification was conducted. The 2D studies, which used only the IEEE dataset, demonstrated that the accuracy of test set 1 ranged from 75.37% [27] to 87.61% [28], based on the SDR value within the 2 mm error threshold. Conversely, 18 studies that used their own datasets, either alone or in combination with the IEEE dataset, revealed SDR ranging from 62.0% [11, 29] to 97.30% [30] within the clinically acceptable 2mm range.

Studies that applied automated landmarking on 3D-CBCT images reported their accuracy as either mean error (n = 7) or SDR (n = 2), with the highest observed error being 5.785 mm [31]. Of all the landmarks on 2D and 3D images, gonion was generally the most challenging to locate automatically, where the lowest SDR was 38.0% [25], and the highest was 85.00% [32] within the 2 mm threshold. The computational time to automatically detect the landmarks was calculated in 11 articles, all of which reported a timing of less than 1 min.

Table 4 presents list of cephalometric analysis that could potentially be performed using the automated landmarking proposed in the included studies. In terms of clinical applicability, the AI algorithms for automated landmark identification used in most studies could facilitate at least the Steiner and Down analyses. This was due to the algorithms’ ability to identify the following landmarks: sella, nasion, point A, point B, pogonion, gnathion, menton, gonion, porion, orbitale, upper incisor, and lower incisor [33, 34].

Table 4.

Potential cephalometric analysis using annotated landmarks.

Cephalometric Analysis	References
Steiner	Chen et al. [3], Chen et al. [52], Hong et al. [53], Lee et al. [7], Lang et al. [23], Yun et al [24], Hwang et al [25], Yun et al [26], Park et al [60], Arik et al [27], Qian et al [28], Yao et al [30], Kwon et al [32], Lee et al [57], Oh et al [59], Huang et al [54], Kim et al [42], Kim et al [56], Le et al [39], Noothout et al [58], Song et al [29], Song et al [61], Ugurlu et al [62], Wang et al [63], Yang et al. [2], Zeng et al [64], Zhong et al [66], Zhao et al. [65], Dot et al [67], Weingart et al [68]
Down	Chen et al [3], Chen et al. [52], Hong et al. [53], Lee et al [7], Lee et al [22], Lang et al [23], Yun et al [24], Hwang et al [25], Yun et al [26], Park et al [60], Arik et al [27], Qian et al [28], Yao et al [30], Lee et al [57], Oh et al [59], Huang et al [54], Kim et al [42], Kim et al [55], Kim et al [56], Le et al [39], Noothout et al [58], Song et al [29], Song et al [61], Ugurlu et al [62], Wang et al [63], Yang et al. [2], Zeng et al [64], Zhao et al. [65], Zhong et al [66], Dot et al [67], Weingart et al [68]
Wits appraisal	Hong et al. [53], Lang et al [23], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Le et al [39], Dot et al [67], Weingart et al [68]
Tweed	Hong et al. [53], Lang et al [23], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Kim et al [55], Le et al [39], Dot et al [67], Weingart et al [68]
Ballard	Hong et al. [53], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Bulatova et al [51], Le et al [39], Dot et al [67], Weingart et al [68]

Cephalometric Analysis	References
Steiner	Chen et al. [3], Chen et al. [52], Hong et al. [53], Lee et al. [7], Lang et al. [23], Yun et al [24], Hwang et al [25], Yun et al [26], Park et al [60], Arik et al [27], Qian et al [28], Yao et al [30], Kwon et al [32], Lee et al [57], Oh et al [59], Huang et al [54], Kim et al [42], Kim et al [56], Le et al [39], Noothout et al [58], Song et al [29], Song et al [61], Ugurlu et al [62], Wang et al [63], Yang et al. [2], Zeng et al [64], Zhong et al [66], Zhao et al. [65], Dot et al [67], Weingart et al [68]
Down	Chen et al [3], Chen et al. [52], Hong et al. [53], Lee et al [7], Lee et al [22], Lang et al [23], Yun et al [24], Hwang et al [25], Yun et al [26], Park et al [60], Arik et al [27], Qian et al [28], Yao et al [30], Lee et al [57], Oh et al [59], Huang et al [54], Kim et al [42], Kim et al [55], Kim et al [56], Le et al [39], Noothout et al [58], Song et al [29], Song et al [61], Ugurlu et al [62], Wang et al [63], Yang et al. [2], Zeng et al [64], Zhao et al. [65], Zhong et al [66], Dot et al [67], Weingart et al [68]
Wits appraisal	Hong et al. [53], Lang et al [23], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Le et al [39], Dot et al [67], Weingart et al [68]
Tweed	Hong et al. [53], Lang et al [23], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Kim et al [55], Le et al [39], Dot et al [67], Weingart et al [68]
Ballard	Hong et al. [53], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Bulatova et al [51], Le et al [39], Dot et al [67], Weingart et al [68]

Table 4.

Potential cephalometric analysis using annotated landmarks.

Cephalometric Analysis	References
Steiner	Chen et al. [3], Chen et al. [52], Hong et al. [53], Lee et al. [7], Lang et al. [23], Yun et al [24], Hwang et al [25], Yun et al [26], Park et al [60], Arik et al [27], Qian et al [28], Yao et al [30], Kwon et al [32], Lee et al [57], Oh et al [59], Huang et al [54], Kim et al [42], Kim et al [56], Le et al [39], Noothout et al [58], Song et al [29], Song et al [61], Ugurlu et al [62], Wang et al [63], Yang et al. [2], Zeng et al [64], Zhong et al [66], Zhao et al. [65], Dot et al [67], Weingart et al [68]
Down	Chen et al [3], Chen et al. [52], Hong et al. [53], Lee et al [7], Lee et al [22], Lang et al [23], Yun et al [24], Hwang et al [25], Yun et al [26], Park et al [60], Arik et al [27], Qian et al [28], Yao et al [30], Lee et al [57], Oh et al [59], Huang et al [54], Kim et al [42], Kim et al [55], Kim et al [56], Le et al [39], Noothout et al [58], Song et al [29], Song et al [61], Ugurlu et al [62], Wang et al [63], Yang et al. [2], Zeng et al [64], Zhao et al. [65], Zhong et al [66], Dot et al [67], Weingart et al [68]
Wits appraisal	Hong et al. [53], Lang et al [23], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Le et al [39], Dot et al [67], Weingart et al [68]
Tweed	Hong et al. [53], Lang et al [23], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Kim et al [55], Le et al [39], Dot et al [67], Weingart et al [68]
Ballard	Hong et al. [53], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Bulatova et al [51], Le et al [39], Dot et al [67], Weingart et al [68]

Cephalometric Analysis	References
Steiner	Chen et al. [3], Chen et al. [52], Hong et al. [53], Lee et al. [7], Lang et al. [23], Yun et al [24], Hwang et al [25], Yun et al [26], Park et al [60], Arik et al [27], Qian et al [28], Yao et al [30], Kwon et al [32], Lee et al [57], Oh et al [59], Huang et al [54], Kim et al [42], Kim et al [56], Le et al [39], Noothout et al [58], Song et al [29], Song et al [61], Ugurlu et al [62], Wang et al [63], Yang et al. [2], Zeng et al [64], Zhong et al [66], Zhao et al. [65], Dot et al [67], Weingart et al [68]
Down	Chen et al [3], Chen et al. [52], Hong et al. [53], Lee et al [7], Lee et al [22], Lang et al [23], Yun et al [24], Hwang et al [25], Yun et al [26], Park et al [60], Arik et al [27], Qian et al [28], Yao et al [30], Lee et al [57], Oh et al [59], Huang et al [54], Kim et al [42], Kim et al [55], Kim et al [56], Le et al [39], Noothout et al [58], Song et al [29], Song et al [61], Ugurlu et al [62], Wang et al [63], Yang et al. [2], Zeng et al [64], Zhao et al. [65], Zhong et al [66], Dot et al [67], Weingart et al [68]
Wits appraisal	Hong et al. [53], Lang et al [23], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Le et al [39], Dot et al [67], Weingart et al [68]
Tweed	Hong et al. [53], Lang et al [23], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Kim et al [55], Le et al [39], Dot et al [67], Weingart et al [68]
Ballard	Hong et al. [53], Yun et al [24], Yun et al [26], Park et al [60], Yao et al [30], Bulatova et al [51], Le et al [39], Dot et al [67], Weingart et al [68]

Quantitative synthesis

The meta-analysis was limited to the accuracy of 2D landmarks identification due to the diverse range of study designs and reported outcomes used in 3D cephalometry. The accuracy of AI-based 2D landmarks identification was evaluated in studies that reported the MRE and SE outcomes of test datasets. A total of 14 studies with 21 estimates were included, in which 3 studies tested their accuracy on 2 test sets and 2 studies on 3 sets. The statistical analysis revealed homogeneity among the included studies, as indicated by Q-value of 2.53 (P > 0.05) and I² of 0%, indicating no significant heterogeneity among the studies. A fixed-effects model was employed since the included studies demonstrated homogeneity with the same true effect size. The results indicated that the prediction of AI-based landmark placement generally fell below the 2 mm error threshold (1.43; 95% CI: 0.95–1.91), and only the results of two studies exceeded this threshold (Fig. 3).

Figure 3.

Forest plot of automated landmark identification on 2D cephalograms, reporting their accuracy in mean radial error (MRE) and standard error (SE) (mm). Studies using multiple test datasets are also indicated accordingly. Horizontal line indicates 95% confidence interval (CI), square shape indicates SE and diamond shape indicates pooled subtotal.

Risk of bias assessment

When using the QUADAS-2 tool, ‘AI-driven cephalometric landmark detection’ acted as the index test domain and ‘manual landmark placement by experts’ was considered as the reference standard domain. Most studies had a high risk of bias associated with data selection (93%), primarily because the authors did not employ randomized selection process. Furthermore, high risk also existed based on the use of the reference standard. Generally, the applicability concern associated with the included studies was high, with the exception of the index test usage. Fig. 4 provides a comprehensive overview of the risk of bias and applicability concerns.

Figure 4.

Risk of bias and applicability concerns based on Quality Assessment of Diagnostic Accuracy Studies-2 tool.

. https://doi.org/10.1051/orthodfr:2008004

Discussion

In the digital age and the rise of precision dentistry, workflows in dentomaxillofacial practices are increasingly streamlined through the incorporation of AI-based technologies. This systematic review and meta-analysis were conducted to evaluate the accuracy of AI-powered tools in automating 2D and 3D cephalometric landmarking. A significant portion of the studies included in the review originated from East Asia (76.5%), with less representation from Europe and America. This trend can be attributed to various factors, such as the rapid advancement of technology and significant investment in AI research in East Asia, as emphasized in reports by the Organisation for Economic Co-operation and Development (OECD) and the Government AI Readiness Index. The position of East Asia as a leading global centre for AI innovation is evident from its extensive production of AI-related publications and its high ranking in the government AI readiness index These factors highlight East Asia’s crucial role in propelling AI research and development [35, 36]. Nevertheless, it is crucial to ensure broad spectrum of viewpoints and contributions in AI research, as this can result in more holistic and inclusive solutions. Therefore, there is a call for international collaborative research to ensure the universal applicability of AI technologies across varied patient demographics.

The findings of the review suggested variability in the accuracy of landmark detection amongst different studies. This could be attributed to the differences in the sample size used in the training set, where large heterogeneous samples with anatomical variabilities are expected to provide a more comprehensive learning process, thereby ensuring accuracy [37]. Moreover, each study used a distinct dataset for testing, separate from the one used for training. This is a normal practice in the evaluation of AI models. It ensures that the models are tested on data not encountered during the training process, thus minimizing the chance of overfitting and providing solid assessment of their ability to generalize [38]. However, it is important to note that these studies did not provide detailed descriptions of the specific methods employed to select the subjects included in the test dataset.

Most 2D cephalometric studies used the publicly accessible IEEE dataset to train AI algorithms, with the aim of enhancing the accuracy and efficiency of automatic landmark identification through computational improvements. Although the IEEE dataset offers the advantage of standardized performance comparability, it also introduces a challenge due to limited generalizability. Hence, making the clinical applicability of the AI tool questionable [39]. This issue was corroborated by studies included in the review that emphasised on clinical validation. These studies used their own datasets and demonstrated lower accuracy compared to those that focused on computational enhancement using the IEEE dataset [29]. Therefore, future research should utilize multi-centre datasets with varying acquisition parameters for clinical validation. This approach could enhance the consistency and robustness of AI-driven solutions and address generalisability issues, which are crucial for clinical applicability [40].

In the field of AI, particularly in the context of medical imaging and analysis, the complexity of a dataset is also determined by several other factors, such as size and shape of the anatomical structures, age, gender, type of malocclusion, ethnic background, and bone density [41, 42]. These characteristics introduce wide range of variations that the AI system must be able to recognize and interpret correctly [43]. This requires sophisticated algorithms and large amount of diverse training data. The more complex the dataset, the more challenging it can be for the AI to learn and make accurate predictions [37, 38, 44]. For instance, Tanikawa et al. [44] demonstrated that the performance of AI-driven automated landmarking was lower in patients with cleft palate compared to those without this condition. When training AI algorithms for cephalometric landmark detection, it is crucial to understand that the robustness and accuracy of the algorithm depend on its adaptability to variations encountered in clinical practice.

When comparing the included studies, a negative correlation was found between the reported accuracy and the size of training data. For instance, Hwang et al. trained the AI algorithm with 1983 2D images, each containing 19 annotated landmarks, and observed an accuracy of 73.2% [25]. In contrast, Lee et al. used a training set of 150 images and achieved higher accuracy of 86% [22]. This inconsistency could be associated with the variability of the training and testing sets, where Lee et al. relied on the IEEE dataset consisting of patients without any craniofacial deformities and with similar radiological patterns in both training and testing sets [43] On the other hand, Hwang et al.‘s testing datasets included patients with variable heterogeneous radiological patterns, which the AI algorithm might not have accurately identified based on the homogeneous dataset used for training.

Most 2D studies reported an accuracy of more than 80% within 2 mm error threshold, while the mean error of 3D landmark detection approximately ranged between 1. 0mm and 5.8mm. It is important to note that the accuracy of landmarking cannot be directly compared between these two types of datasets. In 2D imaging, landmarks are projected onto a single plane, simplifying the identification process and often leading to higher reported accuracy rates within the given error threshold. Conversely, 3D landmark detection involves identifying points within a volumetric space, which introduces additional complexity and challenges [24, 45]. Despite achieving high accuracy level, the performance of AI has not yet reached the level of an expert, and further improvements are anticipated, especially in the realm of 3D landmark detection. This is an area where limited number of cases were used for the training and testing of AI algorithms in the reviewed studies. Given the challenge of accurately detecting landmarks in three dimensions using small datasets, while still maintaining high accuracy within 2 mm error threshold, it is advisable to conduct additional studies with a larger sample size [31].

The findings of the included studies were compared against a threshold of 2 mm, which is generally accepted as clinically acceptable for most cephalometric measurements [6]. This tolerance for error is justified due to the inherent limitations of 2D imaging, which involves projected image of the majority of cephalometric points in the context of right-left asymmetry. Mostly, clinicians estimate a median between the projections of paired cephalometric points to establish the references for the cephalometric analysis. Although 3D imaging avoids geometrical distortion, precise segmentation from CBCT has not yet to be standardised in semiautomatic workflows [46, 47]. A discrepancy of even 2 mm can indeed have significant implications, especially when dealing with smaller patient sizes or specific landmarks. This is why it is crucial to strive for the highest accuracy possible in these situations. It is worth noting that while certain level of error might be deemed acceptable by clinical standards, the goal should always be to minimize this as much as possible to ensure the best patient outcomes. Moreover, clinicians are cognizant of potential errors in the placement of landmarks, which are typically taken into account subjectively during the clinical interpretation of the analysis and patient’s treatment planning [6].

The selection of the cephalometric landmarks included in our review was primarily based on their widespread use in orthodontics and clinical relevance [32, 33]. Among the different annotated landmarks on both 2D and 3D images, gonion was generally one of the most difficult landmarks to localise automatically and had the lowest detection rate. The identification of gonion appears to pose a significant challenge not only for AI algorithms, but also for human observers. This is primarily due to the fact that this landmark is a constructed point on the 2D cephalogram, resulting from the imperfect overlay of the bilateral aspects of the mandible. Additionally, the 3D error may be a consequence of discrepancies in volumetric segmentation or difficulties in determining the definitive vertical position of gonion along broadly curved structures, a problem also commonly encountered by human observers [48]. Hence, it is important to take these limitations into consideration while training an AI algorithm as such to improve its performance. It is worth noting that the accuracy of landmarks identification is heavily dependent on the expertise and anatomical knowledge of the experts [6]. Consequently, the experts responsible for creating training datasets should have substantial experience in this field. A low detection rate of the gonion point might diminish the overall measured performance of the AI tool. Hence, it is essential to address this issue to ensure the effectiveness of the AI tool.

Regarding clinical applicability, the AI algorithms discussed in most studies have demonstrated the ability to identify key landmarks commonly used in two of the most prevalent cephalometric analyses, namely Steiner and Down’s analyses [33, 34]. This suggests that the current AI tools could be considered clinically applicable for cases requiring orthodontic diagnosis and treatment planning. However, caution is advised as their accuracy has not yet reached the level of an expert, which could lead to errors in cephalometric analysis for diagnostics, planning, or outcome evaluation. Furthermore, time consumption is another important parameter to be considered in a clinical practice. While it takes an expert approximately 20 min to manually identify cephalometric landmarks [37], most AI-based algorithms can do so in less than a minute. Despite this, further research is needed to enhance AI’s accuracy, as time efficiency alone is not sufficient justification until it can provide accuracy comparable to that of an expert. A beneficial addition to AI algorithms would be the ability to identify which landmarks increase time consumption and are incorrectly identified. This would allow for manual intervention to correct these errors and train the algorithm based on the corrected data [39]. Incorporation of this human-AI collaboration for error correction should be considered in future studies.

This review encountered several limitations. First, the number of studies included was relatively limited, particularly those related to 3D landmarking. Second, due to the variability in the datasets, imaging parameters, and algorithms, the results of the quantitative synthesis should be interpreted with caution. Third, a significant risk of bias was observed in patient selection. While few studies provided detailed information about patient selection, the majority relied on IEEE dataset without explicitly outlining their sampling procedures [16, 38]. Finally, the manual identification of landmarks for the training set is subject to both inter- and intra-observer variability [49]. Hence, it is advisable to specify the training and calibration protocol for landmark identification when creating the ground truth for training. Future studies should also adhere to AI standards such as CONSORT-AI and SPIRIT-AI [50].

Conclusions

The AI-driven cephalometric landmark detection on 2D and 3D images exhibited high accuracy and time efficiency. Although the majority of 2D studies indicated superior automated landmark detection performance, the error rates displayed by 3D studies were inconsistent, thus implying a need for further improvement. Moreover, clinicians are advised to remain vigilant due to the risk of inaccurate landmarks identification. To enhance the generalisability and clinical applicability of AI models, it is suggested that datasets be broadened to include a more diverse range of data. The incorporation of AI-driven landmarks identification in further studies could accelerate its refinement and overall development, thereby setting the stage for its potential to replace manual landmarking.

Conflict of interest

The authors declare no conflict of interest.

Funding

No funding was received.

Ethics approval

This study did not require ethics approval.

Data Availability

The data supporting this article will be shared upon reasonable request to the corresponding author.

References

1.

Lavergne

J

,

Gasson

N.

Les classes II basales étudiées par les réseaux neuronaux

.

L’Orthodontie Française

2008

;

79

:

91

–

7

. https://doi.org/10.1186/s12903-023-03452-7

2.

Yang

S

,

Eun Sun

S

,

Lee

ES

, et al. .

Ceph-Net: automatic detection of cephalometric landmarks on scanned lateral cephalograms from children and adolescents using an attention-based stacked regression network

.

BMC Oral Health

2023

;

23

:

1

–

17

. https://doi.org/10.1007/978-3-030-32248-9_97

3.

Chen

R

,

Ma

Y

,

Chen

N

, et al. .

Cephalometric landmark detection by attentive feature pyramid fusion and regression-voting

. In:

Medical Image Computing and Computer Assisted Intervention – MICCAI 2019

.

Cham

:

Springer International Publishing

2019

;

873

–

81

4.

Ye

H

,

Cheng

Z

,

Ungvijanpunya

N

, et al. .

Is automatic cephalometric software using artificial intelligence better than orthodontist experts in landmark identification

?

BMC Oral Health

2023

;

23

:

467

. https://doi.org/10.1186/s12903-023-03188-4

. https://doi.org/10.1016/j.ajodo.2008.07.018

5.

Lagravere

M

,

Low

C

,

Flores-Mir

C

, et al. .

Intraexaminer and interexaminer reliabilities of landmark identification on digitized lateral cephalograms and formatted 3-dimensional cone-beam computerized tomography images

.

American Journal of Orthodontics and Dentofacial Orthopedics

2010

;

137

:

598

–

604

. https://doi.org/10.5624/isd.2015.45.4.213

6.

Durao

AP

,

Morosolli

A

,

Pittayapat

P

, et al. .

Cephalometric landmark variability among orthodontists and dentomaxillofacial radiologists: a comparative study

.

Imaging Science in Dentistry

2015

;

45

:

213

–

20

. https://doi.org/10.1186/s12903-020-01256-7

7.

Lee

JH

,

Yu

HJ

,

Kim

MJ

, et al. .

Automated cephalometric landmark detection with confidence regions using Bayesian convolutional neural networks

.

BMC Oral Health

2020

;

20

:

270

. https://doi.org/10.1007/s00784-021-03990-w

8.

Schwendicke

F

,

Chaurasia

A

,

Arsiwala

L

, et al. .

Deep learning for cephalometric landmark detection: systematic review and meta-analysis

.

Clinical Oral Investigations

2021

;

25

:

4299

–

309

. https://doi.org/10.1177/0022034520915714

9.

Schwendicke

F

,

Samek

W

,

Krois

J.

Artificial intelligence in dentistry: chances and challenges

.

Journal of Dental Research

2020

;

99

:

769

–

74

. https://doi.org/10.1007/s42979-021-00815-1

10.

Sarker

IH.

Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions

.

SN Computer Science

2021

;

2

:

420

. https://doi.org/10.1259/dmfr/85147715

11.

Chen

YJ

,

Chen

SK

,

Huang

HW

, et al. .

Reliability of landmark identification in cephalometric radiography acquired by a storage phosphor imaging system

.

DentoMaxilloFacial Radiology

2004

;

33

:

301

–

6

. https://doi.org/10.1186/s12903-022-02170-w

12.

Mahto

RK

,

Kafle

D

,

Giri

A

, et al. .

Evaluation of fully automated cephalometric measurements obtained from web-based artificial intelligence driven platform

.

BMC Oral Health

2022

;

22

:

132

. https://doi.org/10.1007/s11547-023-01629-2

13.

Serafin

M

,

Baldini

B

,

Cabitza

F

, et al. .

Accuracy of automated 3D cephalometric landmarks by deep learning algorithms: systematic review and meta-analysis

.

La Radiologia Medica

2023

;

128

:

544

–

55

. https://doi.org/10.1093/ejo/cju036

14.

Pittayapat

P

,

Bornstein

MM

,

Imada

TS

, et al. .

Accuracy of linear measurements using three imaging modalities: two lateral cephalograms and one 3D model from CBCT data

.

European Journal of Orthodontics

2015

;

37

:

202

–

8

. https://doi.org/10.1186/s12903-022-02131-3

15.

Baldini

B

,

Cavagnetto

D

,

Baselli

G

, et al. .

Cephalometric measurements performed on CBCT and reconstructed lateral cephalograms: a cross-sectional study providing a quantitative approach of differences and bias

.

BMC Oral Health

2022

;

22

:

98

. https://doi.org/10.1016/j.jfma.2019.04.001

16.

Wang

RH

,

Ho

CT

,

Lin

HH

, et al. .

Three-dimensional cephalometry for orthognathic planning: normative data and analyses

.

Journal of the Formosan Medical Association

2020

;

119

:

191

–

203

. https://doi.org/10.1259/dmfr/82724776

17.

Nalcaci

R

,

Ozturk

F

,

Sokucu

O.

A comparison of two-dimensional radiography and three-dimensional computed tomography in angular cephalometric measurements

.

Dento Maxillo Facial Radiology

2010

;

39

:

100

–

6

. https://doi.org/10.1136/bmj.n160

18.

Page

MJ

,

Moher

D

,

Bossuyt

PM

, et al. .

Explanation and elaboration: updated guidance and exemplars for reporting systematic reviews

.

BMJ

2021

;

372

:

160

. https://doi.org/10.7326/0003-4819-155-8-201110180-00009

19.

Whiting

PF

,

Rutjes

AWS

,

Westwood

ME

, et al. ;

QUADAS-2 Group

.

QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies

.

Annals of Internal Medicine

2011

;

155

:

529

–

36

. https://training.cochrane.org/handbook-diagnostic-test-accuracy/current

20.

Reitsma

JB

,

Rutjes

A

,

Whiting

P

, et al. .

Chapter 8: Assessing risk of bias and applicability

. In:

Deeks

JJ

,

Bossuyt

PM

,

Leeflang

MM

,

Takwoingi

Y

(eds).

Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy

.

Version 2.0 (updated July 2023)

.

Cochrane

2023

. https://doi.org/10.1016/j.media.2016.02.004

21.

Wang

CW

,

Huang

CT

,

Lee

JH

, et al. .

A benchmark for comparison of dental radiography analysis algorithms

.

Medical Image Analysis

2016

;

31

:

63

–

76

. https://doi.org/10.1088/1361-6560/ab00c9

22.

Lee

SM

,

Kim

HP

,

Jeon

K

, et al. .

Automatic 3D cephalometric annotation system using shadowed 2D image-based machine learning

.

Physics in Medicine and Biology

2019

;

64

:

055002

. https://doi.org/10.1109/TMI.2022.3174513

23.

Lang

Y

,

Lian

C

,

Xiao

D

, et al. .

Localization of craniomaxillofacial landmarks on CBCT images using 3D mask R-CNN and local dependency learning

.

IEEE Transactions on Medical Imaging

2022

;

41

:

2856

–

66

. https://doi.org/10.1371/journal.pone.0275114

24.

Yun

HS

,

Hyun

CM

,

Baek

SH

, et al. .

A semi-supervised learning approach for automated 3D cephalometric landmark identification using computed tomography

.

PLoS One

2022

;

17

:

e0275114

. https://doi.org/10.2319/021220-100.1

25.

Hwang

HW

,

Moon

JH

,

Kim

MG

, et al. .

Evaluation of automated cephalometric analysis based on the latest deep learning method

.

The Angle Orthodontist

2021

;

91

:

329

–

35

. https://doi.org/10.1088/1361-6560/ab7a71

26.

Yun

HS

,

Jang

TJ

,

Lee

SM

, et al. .

Learning-based local-to-global landmark annotation for automatic 3D cephalometry

.

Physics in Medicine and Biology

2020

;

65

:

085018

. https://doi.org/10.1117/1.JMI.4.1.014501

27.

Arik

SO

,

Ibragimov

B

,

Xing

L.

Fully automated quantitative cephalometry using convolutional neural networks

.

Journal of Medical Imaging (Bellingham)

2017

;

4

:

014501

28.

Qian

JH

,

Luo

WZ

,

Cheng

M

, et al. .

CephaNN: A multi-head attention network for cephalometric landmark detection

.

IEEE Access

2020

;

8

:

112633

–

641

.

. https://doi.org/10.3390/app10072547

29.

Song

Y

,

Qiao

X

,

Iwamoto

Y

, et al. .

Automatic cephalometric landmark detection on X-ray images using a deep-learning method

.

Applied Sciences-Basel

2020

;

10

:

2547

. https://doi.org/10.1016/j.ajodo.2021.09.012

30.

Yao

J

,

Zeng

W

,

He

T

, et al. .

Automatic localization of cephalometric landmarks based on convolutional neural network

.

American Journal of Orthodontics and Dentofacial Orthopedics

2022

;

161

:

e250

–

9

. https://doi.org/10.1002/rcs.2093

31.

Ma

Q

,

Kobayashi

E

,

Fan

B

, et al. .

Automatic 3D landmarking model using patch-based deep neural networks for CT image of oral and maxillofacial surgery

.

The International Journal of Medical Robotics

2020

;

16

:

2093

32.

Kwon

H

,

Koo

H

,

Park

J

, et al. .

Multistage probabilistic approach for the localization of cephalometric landmarks

.

IEEE Access

2021

;

9

:

21306

–

14

.

. https://doi.org/10.1016/0002-9416(53)90082-7

33.

Steiner

CC.

Cephalometrics for you and me

.

American Journal of Orthodontics

1953

;

39

:

729

–

55

34.

Downs

W.

The role of cephalometrics in orthodontic case analysis and diagnosis

.

American Journal of Orthodontics

1952

;

38

:

20

.

. https://doi.org/10.1787/4edb761e-en

35.

Addo

PM.

Artificial intelligence, developing-country science and bilateral co‑operation, in OECD, Artificial Intelligence in Science: Challenges, Opportunities and the Future of Research

.

OECD.

Paris

:

OECD Publishing

,

2023

. https://oxfordinsights.com/ai-readiness/ai-readiness-index/

36.

Hankins

E

,

Nettel

PF

,

Martinescu

L

, et al. .

Government AI readiness index 2023

.

Oxford Insights

2023

. https://doi.org/10.1038/s41598-020-69920-0

37.

Renard

F

,

Guedria

S

,

Palma

N

, et al. .

Variability and reproducibility in deep learning for medical image segmentation

.

Scientific Reports

2020

;

10

:

13724

. https://doi.org/10.1148/ryai.2021210097

38.

Eche

T

,

Schwartz

LH

,

Mokrane

FZ

, et al. .

Toward generalizability in the deployment of artificial intelligence in radiology: role of computation stress testing to overcome underspecification

.

Radiology Artificial Intelligence

2021

;

3

:

210097

. https://doi.org/10.3390/jpm12030387

39.

Le

VNT

,

Kang

J

,

Oh

IS

, et al. .

Effectiveness of human-artificial intelligence collaboration in cephalometric landmark detection

.

Journal of Personalized Medicine

2022

;

12

:

387

. https://doi.org/10.1111/ocr.12511

40.

Tanikawa

C

,

Oka

A

,

Lim

J

, et al. .

Clinical applicability of automated cephalometric landmark identification: Part II - Number of images needed to re-learn various quality of images

.

Orthodontics and Craniofacial Research

2021

;

24

:

53

–

8

. https://doi.org/10.1016/j.tria.2018.07.001

41.

Darkwah

WK

,

Kadri

A

,

Adormaa

BB

, et al. .

Cephalometric study of the relationship between facial morphology and ethnicity: review article

.

Translational Research in Anatomy

2018

;

12

:

20

–

4

. https://doi.org/10.1016/j.cmpb.2020.105513

42.

Kim

H

,

Shim

E

,

Park

J

, et al. .

Web-based fully automated cephalometric analysis by deep learning

.

Computer Methods and Programs in Biomedicine

2020

;

194

:

105513

. https://doi.org/10.1109/TMI.2015.2412951

43.

Wang

CW

,

Huang

CT

,

Hsieh

MC

, et al. .

Evaluation and comparison of anatomical landmark detection methods for cephalometric X-ray images: a grand challenge

.

IEEE Transactions on Medical Imaging

2015

;

34

:

1890

–

900

. https://doi.org/10.1111/ocr.12501

44.

Tanikawa

C

,

Lee

C

,

Lim

J

, et al. .

Clinical applicability of automated cephalometric landmark identification: Part I-Patient-related identification errors

.

Orthodontics and Craniofacial Research

2021

;

24

:

43

–

52

. https://doi.org/10.1093/ejo/cjq144

45.

Schlicher

W

,

Nielsen

I

,

Huang

JC

, et al. .

Consistency and precision of landmark identification in three-dimensional cone beam computed tomography scans

.

European Journal of Orthodontics

2012

;

34

:

263

–

75

. https://doi.org/10.1016/j.jdent.2021.103786

46.

Verhelst

PJ

,

Smolders

A

,

Beznik

T

, et al. .

Layered deep learning for automatic mandibular segmentation in cone-beam computed tomography

.

Journal of Dentistry

2021

;

114

:

103786

. https://doi.org/10.1016/j.jdent.2022.104238

47.

Preda

F

,

Morgan

N

,

Van Gerven

A

, et al. .

Deep convolutional neural network-based automated segmentation of the maxillofacial complex from cone-beam computed tomography: a validation study

.

Journal of Dentistry

2022

;

124

:

104238

. https://doi.org/10.1259/dmfr/81889955

48.

Chien

PC

,

Parks

E

,

Eraso

F

, et al. .

Comparison of reliability in anatomical landmark identification using two-dimensional digital cephalometrics and three-dimensional cone beam computed tomography in vivo

.

Dento Maxillo Facial Radiology

2009

;

38

:

262

–

73

. https://doi.org/10.1007/s10278-022-00766-w

49.

De Queiroz Tavares Borges Mesquita

G

,

Vieira

WA

,

Vidigal

MTC

, et al. .

Artificial intelligence for detecting cephalometric landmarks: a systematic review and meta-analysis

.

Journal of Digital Imaging

2023

;

36

:

1158

–

79

. https://doi.org/10.1177/0022034521998337

50.

Schwendicke

F

,

Krois

J.

Better reporting of studies on artificial intelligence: CONSORT-AI and beyond

.

Journal of Dental Research

2021

;

100

:

677

–

80

. https://doi.org/10.1111/ocr.12542

51.

Bulatova

G

,

Kusnoto

B

,

Grace

V

, et al. .

Assessment of automatic cephalometric landmark identification using artificial intelligence

.

Orthodontics and Craniofacial Research

2021

;

24

:

37

–

42

. http://doi.org/10.2139/ssrn.4184448

52.

Chen

J

,

Che

H

,

Sun

J

, et al. .

An automatic cephalometric landmark detection method based on heatmap regression and Monte Carlo dropout

.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society

2023

;

2023

:

1

–

4

. https://doi.org/10.1097/SCS.0000000000009685

53.

Hong

W

,

Kim

SM

,

Choi

J

, et al. .

Automated cephalometric landmark detection using deep reinforcement learning

.

The Journal of Craniofacial Surgery

2023

;

34

:

2336

–

42

. https://doi.org/10.1016/j.media.2021.102028

54.

Huang

Y

,

Fan

F

,

Syben

C

, et al. .

Cephalogram synthesis and landmark detection in dental cone-beam CT systems

.

Medical Image Analysis

2021

;

70

:

102028

. https://doi.org/10.3390/s21020505

55.

Kim

MJ

,

Liu

Y

,

Oh

SH

, et al. .

Automatic cephalometric landmark identification system based on the multi-stage convolutional neural networks with CBCT combination images

.

Sensors (Basel)

2021

;

21

:

505

. https://doi.org/10.5624/isd.20210077

56.

Kim

YH

,

Lee

C

,

Ha

EG

, et al. .

A fully deep learning model for the automatic identification of cephalometric landmarks

.

Imaging Science in Dentistry

2021

;

51

:

299

–

306

. https://doi.org/10.1016/j.neucom.2021.11.003

57.

Lee

M

,

Chung

M

,

Shin

YG.

Cephalometric landmark detection via global and local encoders and patch-wise attentions

.

Neurocomputing

2022

;

470

:

182

–

9

. https://doi.org/10.1109/TMI.2020.3009002

58.

Noothout

JMH

,

De Vos

BD

,

Wolterink

JM

, et al. .

Deep learning-based regression and classification for automatic landmark localization in medical images

.

IEEE Transactions on Medical Imaging

2020

;

39

:

4011

–

22

. https://doi.org/10.1109/JBHI.2020.3002582

59.

Oh

K

,

Oh

IS

,

Le

VNT

, et al. .

Deep anatomical context feature learning for cephalometric landmark detection

.

IEEE Journal of Biomedical Health Informatics

2021

;

25

:

806

–

17

. https://doi.org/10.2319/022019-127.1

60.

Park

JH

,

Hwang

HW

,

Moon

JH

, et al. .

Automated identification of cephalometric landmarks: Part 1-Comparisons between the latest deep-learning methods YOLOV3 and SSD

.

The Angle Orthodontist

2019

;

89

:

903

–

9

. https://doi.org/10.1587/transinf.2021edp7001

61.

Song

Y

,

Qiao

X

,

Iwamoto

Y

, et al. .

An efficient deep learning based coarse-to-fine cephalometric landmark detection method

.

IECIE Transactions on Information and Systems

2021

;

E104.D

:

1359

–

66

. https://doi.org/10.5152/TurkJOrthod.2022.22026

62.

Uğurlu

M.

Performance of a convolutional neural network- based artificial intelligence algorithm for automatic cephalometric landmark detection

.

Turkish Journal of Orthodontics

2022

;

35

:

94

–

100

. https://doi.org/10.1016/j.bspc.2021.102757

63.

Wang

L

,

Ma

L

,

Li

Y

, et al. .

A DCNN system based on an iterative method for automatic landmark detection in cephalometric X-ray images

.

Biomedical Signal Processing and Control

2021

;

68

:

102757

. https://doi.org/10.1016/j.media.2020.101904

64.

Zeng

M

,

Yan

Z

,

Liu

S

, et al. .

Cascaded convolutional networks for automatic cephalometric landmark detection

.

Medical Image Analysis

2021

;

68

:

101904

. https://doi.org/10.1016/j.heliyon.2023.e17459

65.

Zhao

C

,

Yuan

Z

,

Luo

S

, et al. .

Automatic recognition of cephalometric landmarks via multi-scale sampling strategy

.

Heliyon

2023

;

9

:

e17459

. https://doi.org/10.1007/978-3-030-32226-7_60

66.

Zhong

Z

,

Li

J

,

Zhang

Z

, et al. .

An attention-guided deep regression model for landmark detection in cephalograms

. In:

Medical Image Computing and Computer Assisted Intervention – MICCAI 2019

.

Cham

:

Springer International Publishing

,

2019

;

540

–

8

67.

Dot

G

,

Schouman

T

,

Chang

S

, et al. .

Automatic 3-dimensional cephalometric landmarking via deep learning

.

Journal of Dental Research

2022

;

101

:

1380

–

7

. https://doi.org/10.1177/00220345221112333

. https://doi.org/10.1259/dmfr.20230059

68.

Weingart

JV

,

Schlager

S

,

Metzger

MC

, et al. .

Automated detection of cephalometric landmarks using deep neural patchworks

.

Dento Maxillo Facial Radiology

2023

;

52

:

20230059