The Casper Assessment and Diversity in Admissions

Recently, a paper by Gustafson and colleagues was published in the journal PLoS ONE that examined group differences on Casper as compared to interview and MMI at the University of North Carolina (UNC) School of Medicine. We believe it is important to address certain inaccuracies in the paper and to emphasize key findings that contribute to the ongoing debate surrounding the use of standardized assessments in higher education.

Kelly Dore discusses the UNC article

About the article

The study analyzed admissions performance data from 1,237 interviewees to the UNC School of Medicine, consisting of the following self-identified race/ethnicity groups:

  • 758 White,
  • 296 Asian,
  • 118 Black or African-American,
  • 87 Hispanic,
  • 20 Native American or Alaskan Native.

The study had two research questions:

  1. What is the difference in Casper percentile by gender and race/ethnicity
  2. What is the association between Casper percentile with MMI and traditional interview scores

The results the authors state for each research question are as follows:

  • Q1 – “females scored higher for MMI (t = 3.77, p = .001, d = .21) and traditional interviews (t = 3.28, p = .001, d = .19). For CASPer percentile scores, UIM (t = -6.35, p = .001, d = .49) and Hispanic applicants (t = -3.28, p = .001, d = .38) scored lower.”
  • Q2 – “linear regression model explored an association with MMI score using CASPer percentile, gender, and race. All three predictors were significantly associated with MMI score (Constant: 3.68; Casper: β = .16, t = 5.55, p < .001; Gender: β = -.10, t = -3.50, p < .001; Race: β = .07, t = 2.311, p = .02). However, these variables identified a weak correlation (r = .20), explaining approximately 4 percent of the variance in MMI scores.”

These group differences are consistent with what Acuity Insights reports and shares publicly in the Casper Technical Manual, which contains national-level data, aggregated across programs. In the 2018-19 applications cycle, the same year as data reported by Gustafson and colleagues, the same effect size analysis was conducted with the entire U.S. Casper cohort.

Similar to Gustafson et al, the U.S. national data shows that Female applicants tend to perform better than males, d=0.13 [0.11, 0.15], p<0.01. The difference in scores between White and African American applicants reported an effect size of d= 0.62 [-0.66, -0.58], p<0.01. Similarly, the difference in scores between White and Hispanic applicants showed an effect size of d=-0.17 [-0.19, -0.14]. (Casper Technical Manual Tables 12-14). 

On the basis of these findings, we would like to provide important context on the two main discussion points presented by the authors of this study:

1. They state their findings differ from previous studies

The authors state that their findings differ from previous work examining performance across groups in the US among medical school applicants (Juster et al., 2019).

“Our findings are different from the New York Medical College School of Medicine study [15], which may be a reflection of the differences in the candidate pools. Specifically, in their comparison of White and African-American applicants, the White applicants scored higher on CASPer, but not significantly.”

However, Juster et al. reported similar effect sizes to Gustafson and colleagues. As seen in the table below, both studies report significant differences between White and African American applicant performance on Casper and between White and Hispanic applicants.

Casper score differences between groups – Effect size (d)

Gustafson et al. Juster et al.
White – African American 0.49 0.60
White – Hispanic 0.38 0.29

*Gustafson et al 2023 reported UIM (d=0.49), which includes 118 African American applicants and 94 other UIM applicants. Calculation of effect size based on reported means and standard deviation show similar outcome of White – African American only comparison (d=0.47).

In addition to Juster, Woodson et al 2022 and the Casper Technical Manual show demographic analyses with similar findings.

2. They claim a disadvantage to under-represented groups

The authors claim UIM (Under-represented in Medicine) students may be further disadvantaged if Casper was weighted in applicant screening.

“Our results suggest UIM students may be further disadvantaged if CASPer was weighted in applicant screening…little research has been published using the instrument in the United States. Our results should be an example to other schools to regularly analyze results of instruments being used to ensure they are not inadvertently disadvantaging a particular population.”

Casper was designed to be used “early in the admissions process so that it can be incorporated alongside measures of technical skills such as GPA and MCAT” (Casper Technical Manual). In contrast to this intended usage, Gustafson et al. 2023 evaluate demographic differences in performance across measures designed for different stages of the selection process (i.e. Casper as compared to Interview and MMI).

The Interview and MMI are generally used in the later stages of selection after the program has already put considerable resources into prioritizing and reviewing applicants’ files. Casper is intended for initial screening and would normally be evaluated against other screening instruments like MCAT or GPA. However, the paper presents no results on the demographic differences in performance for the early stage of selection, the stage for which the test has been designed.

In contrast, Juster et al. 2019 evaluate the demographic differences in performance across measures based on the stage of selection. Their analyses focus on comparing measures used in the screening stage to determine who is invited to interview (GPA, MCAT, Casper). This is the moment in selection where standardized assessments can be most impactful, and the moment in selection for which Casper was designed.

When comparing by stage of selection, Juster et al. 2019 find that differences between UIM and non-UIM applicants are larger for both MCAT (d = 1.43) and GPA (d = 0.98) as compared to Casper (d = 0.60). Casper tends to have 25-50% smaller demographic differences, compared to more traditional metrics like GPA or knowledge-based tests. Then, through a series of simulations, they demonstrate how a higher weighting of Casper and a lower weighting of MCAT and GPA, early in the selection process, has the potential to increase the diversity of applicants selected for interviews.

It is worth noting that Juster et al. also include demographic differences of their MMI and have similar findings to Gustafson and colleagues while drawing an alternative conclusion and identifying the potential value of Casper at the early stages of selection.

MMI score differences between groups – Effect size (d)

Gustafson et al. Juster et al.
Female – Male 0.21 0.31
White – African American 0.07 0.02
White – Hispanic 0.19 0.32

*Gustafson et al 2023 reported UIM (d=0.07), which includes 118 African American applicants and 94 other UIM applicants.

Video response format: the missing piece

Another key piece of the puzzle is the inclusion of video responses in the new Casper test. The paper by Gustafson and colleagues was done before the introduction of the Casper Video Response which is based on research that shows that Casper video responses further reduce demographic differences. As such, it does not capture the significant positive impact of the new video format.


Gustafson et al. 2023 do mention that MCAT and GPA are biased against UIM applicants, and they acknowledge the impacts of structural racism on standardized test performance. The authors are correct on these points.

Casper was originally designed to assess social intelligence and professionalism rather than eliminate the demographic disparities that persist through education and selection. But over time it has been proven to reduce demographic differences and can supplement the information that standardized assessments provide on applicants to deliver a more diverse group of shortlisted applicants and mitigate some of the bias introduced by an over-reliance on GPA and MCAT in the early stage of admissions.

Additional research and evidence

For research published using Casper in the United States, we encourage readers to review the evidence:

Read the Case Studies