Large-scale simulation of pregnancy rate improvements using an AI model for embryo ranking

Presented at: Milan, Italy

Authors: Justina Hyunjii Choa, Camelia Brumara, Paxton Maeder-Yorka, Oleksii Barashb, Jonas Malmstenc, Nikica Zaninovicc, Denny Sakkasd, Kathleen Millere, Michael Levyf, Matthew David VerMilyeag, Kevin Loewkea

Study question:

What is the expected improvement in pregnancy rates using an AI model for embryo ranking compared to manual grading systems?

Summary answer:

A large-scale retrospective bootstrapped analysis shows that use of an AI model for embryo ranking can improve pregnancy rates compared to manual grading.

What is known already:

Embryo evaluation is one of the most important steps of an in vitro fertilization (IVF) procedure. Recently, artificial intelligence (AI) models have been developed to automate embryo analysis and reduce the subjectivity of manual grading. While models are often evaluated in terms of classification accuracy or area under the curve (AUC), a more relevant metric is improvement in pregnancy rates. Here we evaluate a previously developed model using a large-scale bootstrapped analysis of virtual patient pregnancy rates and compare its performance to manual grading.

Study design, size, duration:

Historical, de-identified images of transferred blastocyst-stage embryos and manual morphology grades were collected from 11 IVF clinics in the United States for cycles started between 2015-2020. Images were captured on day 5, 6, or 7 using the inverted microscope prior to biopsy or freeze. A total of 1,776 test set images from 3-fold cross validation were used for this analysis.

Participants/materials, setting, methods:

Embryos were matched by age, PGT status, and race to create 16 distinct categories. Virtual patient panels were created within each category using a random selection of 3-5 embryos. Embryos were re-used across different panels, but each individual panel was unique. Three different manual ranking systems were created incorporating the morphology grade and day of image capture. The AI and one randomly chosen manual ranking system independently selected a top embryo for each panel.

Main results and the role of chance:

On average, 105,263 unique virtual patient panels were constructed from the 1,776 embryos. Within these panels, the AI model and manual ranking system selected different top embryos in 27,860 cases, or 26% of the time. The average pregnancy rate of the top-ranked embryo using manual grading was 53.1%, and the average pregnancy rate of the top-ranked embryo using the AI model was 59.4%. The average pregnancy rate improvement from using the AI model was 6.3%, with a standard deviation of 0.2% measured across 10 repetitions of the simulation with different random seeds.

Limitations, reasons for caution:

The primary limitation is the retrospective nature of this study. Also, this bootstrapped panel study relied on recorded manual morphology grades rather than on the actual selection of the top embryo in each panel by an embryologist.

Wider implications of the findings:

Our results demonstrate the potential of using an AI model for embryo ranking in terms of improved pregnancy rates. Results from this large-scale bootstrapped retrospective analysis will help inform the design of future clinical validation studies.