A Generalizable Model for Ranking Blastocyst Stage Embryos Using Deep Learning

Presented at: ASRM Scientific Congress, 2021

Authors: Kevin Loewke, Justina Hyunjii Cho, Paxton Maeder-York, Oleksii Barash, Marcos Meseguer, Nikica Zaninovic, Kathleen A. Miller, Denny Sakkas, Michael Levy, Matthew (Tex) VerMilyea

Materials and Methods:

Historical, de-identified images of blastocyst-stage embryos and associated patient metadata were collected from 11 IVF clinics in the United States for cycles started between 2015 and 2020. Each clinic captured a single image using its existing ICSI microscope, stereo zoom microscope, or time-lapse microscope. Images were captured on day 5, 6 or 7 prior to transfer, biopsy or freeze. 5,100 blastocysts from fresh transfers, frozen transfers, and frozen-euploid transfers were matched to clinical pregnancy outcomes (fetal heartbeat). An additional 2,900 blastocysts were matched to aneuploid (abnormal and complex abnormal) PGT results. Aneuploid embryos were added to the negative training group to reduce the selection bias of training on only transferred embryos. Data were split to 70% for training and 30% for testing. We trained a deep convolutional neural network (CNN) to rank embryo images according to their likelihood of reaching clinical pregnancy. A shallow model architecture (Resnet-18) with dropout was used to minimize overfitting. Performance was optimized using data augmentation, a custom weighted sampling technique, and hyperparameter tuning. Scores were personalized to each patient by incorporating patient age and donor egg status. Manual morphology grades were mapped from an alphanumeric grade to a numeric score for comparison. The area under the receiver operating curve (AUC) was used for evaluating the ability of the models to rank embryos. Bootstrapped analysis was performed using random combinations of two to four euploid embryos, to compare pregnancy rates of the top-ranked embryo for the CNN compared to manual grading.


The CNN model AUC on the test set was 0.72 for all embryos (including transferred embryos and non-transferred aneuploids), 0.65 for fresh and frozen non-PGT transfers, and 0.62 for euploid transfers. For euploid-only transfers, the CNN model AUC outperformed manual grading overall (+7.0%), by clinical site (ranging from +4.7% to +12.6% per site), and by day of image capture (+6.3% for day-5, +8.8% for days-6/7). Bootstrapped analysis of euploid embryos predicted improved pregnancy rates on first transfer of between +3% to +8% per site, using the CNN model compared to manual grading.


We developed a deep learning-based model for ranking embryos at the blastocyst-stage. Previous studies using deep learning for embryo grading have been application-specific, using images from a single type of microscope (e.g. a time-lapse instrument), or captured at a specific day (e.g. day 5), or from a specific cycle type (e.g. non-PGT cycles). With access to a large and diverse dataset, we developed a generalizable model that is broadly applicable and outperforms manual grading when analyzed in aggregate and by site and day of capture. Future work will focus on further expanding our training dataset and performing clinical studies to validate performance.

Impact Statement:

We developed a deep learning-based embryo ranking model that is broadly applicable and may reduce time to pregnancy by optimizing the order of embryo transfer.