Research

An artificial intelligence model that was trained on pregnancy outcomes for embryo viability assessment is highly correlated with Gardner score

Presented at: ESHRE Virtual 37th Annual Meeting

Authors: S. Diakiw1, M. VerMilyea2,3, J .M.M. Hall1,4, K. Sorby5, T. Nguyen1, M.A. Dakka1, D. Perugini1, M. Perugini1. (1Presagen, Life Whisperer, Adelaide, Australia, 2Ovation Fertility, Laboratory, Austin, U.S.A., 3Texas Fertility Center, IVF Laboratory, Austin, U.S.A., 4Australia/Australian Research Council Centre of Excellence for Nanoscale BioPhotonics, The University of Adelaide, Adelaide, Australia, 5No.1 Fertility, Melbourne, Australia)

Study question: Do AI models used to assess embryo viability (based on pregnancy outcomes) also correlate with known embryo quality measures such as Gardner score?

Summary answer: An AI for embryo viability assessment also correlates with Gardner score, further substantiating the use of AI for assessment and selection of good quality embryos.

What is known already: The Gardner score consists of three separate components of embryo morphology that are graded individually, then combined to give a final score describing Day 5 embryo (blastocyst) quality. Evidence suggests the Gardner score has some correlation with clinical pregnancy. We hypothesised that an AI model trained to evaluate likelihood of clinical pregnancy based on fetal heartbeat (in clinical use globally) would also correlate with components of the Gardner score itself. We also compared the ability of the AI and Gardner score to predict pregnancy outcomes.

Study design, size, duration: This study involved analysis of a prospectively collected dataset of single static Day 5 embryo images with associated Gardner scores and AI viability scores. The dataset comprised time-lapse images of 1,485 embryos (EmbryoScope) from 638 consecutive patients treated at a single IVF clinic between November 2019 and December 2020. The AI was not trained on data from this clinic.

Participants/materials, setting, methods: Average patient age was 35.4 years, and average embryo cohort size was 2.3/patient. There were 77 (28.8%) successful pregnancies from 267 single embryo transfers. Embryologists manually graded each embryo using the Gardner method, then subsequently used the AI to obtain a score between 0 (predicted non-viable, unlikely to lead to a pregnancy) and 10 (predicted viable, likely to lead to a pregnancy). Correlation between the AI viability score and Gardner score was then assessed.

Main results and the role of chance: The average AI score was significantly correlated with the three components of the Gardner score: expansion grade, ICM grade, and trophectoderm grade. Average AI score generally increased with advancing blastocyst developmental stage.

Blastocysts with expansion grades of ≥3 are generally considered suitable for transfer. This study showed that embryos with expansion grade 3 had lower AI scores than those with grades 4-6, consistent with a reduced pregnancy rate. AI correlation with TE grade was more significant than with ICM grade, consistent with studies demonstrating that TE grade is more important than ICM in determining likelihood of clinical pregnancy.

The AI predicted Gardner scores of ≥2BB with an accuracy of 71.7% (sensitivity 75.1%, specificity 45.9%), and an AUC of 0.68. However, when used to predict pregnancy outcome, the AI performed 27.9% better than the Gardner score (accuracies of 49.8% and 39.0% respectively).

Even though the AI is highly correlated with the Gardner score, the improved efficacy for predicting pregnancy suggests that a) the AI provides an advantage in standardization of scoring over the manual and subjective Gardner method, and b) the AI is likely identifying and evaluating morphological features of embryo quality that are not captured by the Gardner method.

Limitations, reasons for caution: The Gardner score is not a linear score, creating challenges with setting a suitable cut-off relating to the prediction of pregnancy. The 2BB cut-off was chosen based on literature (Munne et al 2019) and verification from experienced embryologists. This correlative study may also require additional confirmatory studies on independent datasets.

Wider implications of the findings: The correlation between AI and known features of embryo quality (Gardner score) substantiates the use of the AI for embryo assessment. The AI score provides further insight into components of the Gardner score, and may detect morphological features related to clinical pregnancy beyond those evaluated by the Gardner method.