A Guide to Situational Judgment Tests

What is a situational judgment test?

A situational judgment test (SJT) is a type of assessment that is used to understand and evaluate how an individual would react or behave when presented with hypothetical social or workplace dilemmas.

What do SJTs measure?

What a SJT measures depends on the scenarios themselves, as well as the format of the test. Generally, the test will focus on desired behaviors and attributes for certain professions, such as:

  • Communication 
  • Teamwork and collaboration
  • Leadership and influence
  • Compassion
  • Empathy
  • Conflict management
  • Problem solving
  • Negotiation
  • Resilience
  • Ethical responsibility
  • Self-awareness
  • Adaptability
  • Reliability

Many of these constructs and competencies are captured in frameworks from regulating bodies of certain professions. In medicine, there’s the Association of American Medical Colleges (AAMC), Accreditation Council for Graduate Medical Education (ACGME), and CanMeds, they’ve all released frameworks identifying personal and professional characteristics that are considered crucial to a medical student’s future success in the field. There are regulating bodies for business, teaching, and engineering professionals, and pretty much every other type of profession you can think of.

SJTs can also measure knowledge of expected behaviors or standards, especially when assessing an individual’s awareness of specific codes of conduct, standard practices, as well as their overall growth and development in these competencies.

What is a situational judgment test used for?

SJTs are mainly used for three reasons:

  1. Hiring new employees 
  2. Evaluating applicants to higher education programs, particularly those that are people-centric (e.g., medicine, nursing, teaching)
  3. Evaluating current employees or students to identify developmental needs or opportunities

[cta cta_id=”14286″]

Why is this type of test valuable?

Extensive research has demonstrated the importance of non-technical skills such as professionalism and social or emotional intelligence in service-based roles. In health care, compassion is key to correctly diagnosing patients, improving their outcomes, and ensuring patients adhere to treatment protocols. Teachers with high social and emotional intelligence have also proven to better manage behavior in the classroom and foster a culture of belonging among their students. It’s also important to note the difficulty and high cost associated with remediating issues of professionalism.

Discover More About SJTs

4 types of SJTs

There are different types of SJTs based on their purpose and execution.

We’ve already listed the three main purposes for an SJT — hiring employees, selecting applicants for higher education programs, and assessing the professional development of current students or employees.

SJTs can also differ based on their execution and response format. Here are some of the different types:

  • Linear format – all test takers are presented with the same scenarios and questions in the same order, generally on a hand-written assessment.
  • Interactive format – generally administered on a computer, this format follows a branching process in which subsequent scenarios and questions are presented based on the test taker’s responses to previous scenarios and questions.
  • Open response – test takers construct their responses in an open-comment field. This format allows test takers to share more detailed information on the action they’d take and the rationale behind it, as well as their critical thinking and understanding of the situation at hand.
  • Fixed response – test takers are usually presented with a set of actions/responses from which to choose (i.e., multiple choice). However, fixed-response SJTs can also ask the test taker to:
    • Choose the most and least effective actions
    • Rate each action on a scale from least to most effective 
    • Rank the actions in order of effectiveness or appropriateness 

Learn more:

What’s in a situational judgment test?

Every situational judgment test has the following components:

  • Constructs or competencies – what the SJT aims to measure
  • Scenarios – the dilemmas that test takers are presented with, either in written or audio-visual format
  • Questions – these will accompany each scenario to probe how test takers would respond to a given scenario
  • Response options – test takers can record their responses by typing in an open comment field, recording their response using audio or video methods, or by selecting, rating, or ranking responses from a predefined list
  • Rating mechanism – refers to how the responses are rated, and by whom, to measure the identified constructs. Automated scoring is used for fixed responses (e.g., multiple choice), but open-response SJTs are generally rated by subject-matter experts or trained raters. Though methods may vary, it is more common for these raters to use a scoring key and a Likert scale to assess responses.

Learn more: 

Why do we need SJTs for holistic admissions?

As mentioned earlier in the guide, we know that the way professionals conduct themselves in the workplace has an impact on the organizations they work for and the people they serve. We also know that relying on technical knowledge alone is detrimental to equity, diversity, and inclusion. 

  • In corporate environments, professionalism is tied to a company’s reputation and their bottom line. When companies make a concerted effort to recruit and promote a diverse and inclusive workforce, their annual stock return can more than double.
  • In health care, issues of professionalism account for the vast majority of disciplinary action by medical boards. Professionalism is also much more difficult and costly to remediate than technical knowledge gaps, meaning that it is less likely those health care providers will be able to improve their behavior. It’s also worth noting that patients tend to remember more negative interactions with health care providers than positive ones, which can affect the likelihood of them seeking out care later on when they really need it.
  • When health care providers have more empathy and compassion for their patients, they tend to be more meticulous about their patients’ care and less likely to make major medical errors. Possessing these qualities also allows them to better understand their patients and save time by eliminating unnecessary diagnostic testing and referrals, and patients feel more confident in following the advice and treatment protocols given to them by their health care provider.
  • In schools, students are more likely to have less anxiety inside and outside the classroom, engage in their lessons, and perform better when their teachers possess a high social/emotional intelligence.

Learn more:

Some higher education institutions and workplaces try to assess social intelligence and professionalism using methods such as unstructured interviews, personal statements, and reference letters, but extensive research has shown these methods may be prone to bias, unreliable, and weak predictors of future success and behavior. Situational judgment tests, on the other hand, have been proven to measure these competencies more reliably due to the standardized structure and psychometric methods used in their development. Additionally, SJTs  designed to measure competencies such as social intelligence and professionalism have evidenced positive correlations with behaviors and competencies expected in service-based professions, examples of which you can find in the resources below. This is why more and more professions-based academic programs are adopting SJTs for use in admissions.

Learn more: 

How do we know that SJTs work?

Researchers evaluate the usefulness of SJTs by looking at three critical factors:

Reliability – this refers to the reproducibility of assessment results. A reliable test would consistently measure a construct over time, across raters, and amongst its items. The specific facets of reliability include:

  • Test-retest reliability, which measures how consistent participants’ scores are on the same test at different time periods.
  • Interrater reliability, which measures the degree to which raters agree on the scores given to test takers by having multiple raters score the same responses from the same test.
  • Internal consistency, which measures how similar test items are in terms of what they’re trying to measure by examining whether those items produce the same or similar results.

Validity – this refers to ​​whether the assessment is measuring what it claims to measure by examining:

  • Construct validity, which is how well the assessment is measuring the specific construct(s)
  • Criterion validity, which is how well the test scores predict relevant outcomes that the test was designed to predict

Equity – this focuses on the fairness of the assessment, largely by evaluating subgroup differences in scores. For instance, one might examine how much scores varied by gender, age, disability status, or racial/ethnic identity

Decades of research has shown that SJTs by design are able to tap into various relevant constructs through the realistic scenarios presented and the accompanying questions. As a result, research has also been able to demonstrate incremental validity, meaning that SJT scores are tied to performance in relevant aspects of a profession or academic program. SJTs have also shown smaller subgroup differences compared to cognitive ability tests, though the smallest differences are observed with the open or constructed response format.

Learn more:

When to use an SJT

SJTs can be used at multiple points in time, but we generally see them used before hiring new employees or selecting applicants for admission into a higher education program, or during training or professional evaluation to guide an individual’s development. In higher education, SJTs can serve as a summative assessment, which evaluates individuals at a single point in time. It can also be used as a formative assessment, administered at various points to evaluate whether a current student has made progress in the areas the SJT is measuring.

The response format of a SJT matters when deciding whether to use the assessment for formative or summative purposes. Fixed-response SJTs have been shown to be most effective for formative assessment purposes because the pre-defined list of potential responses helps to  measure the test taker’s knowledge of standards, norms or expected behaviors. Performance on this type of assessment could help programs identify students who need more support and guidance. Open-response SJTs work well as summative assessments because they don’t assume a correct or incorrect response to a scenario. Instead, they provide test takers with more freedom to express their values, rationale, and problem-solving abilities in their responses since all true dilemmas have a variety of solutions. Those personal values and abilities, rather than “knowledge,” are important in high-stakes situations such as admissions. 

Learn more:

Casper: our SJT for admissions

What is the Casper test?

Casper is an online, open-response situational judgment test that evaluates applicants’ social intelligence and professionalism. Used as a summative assessment for higher education admissions, applicants encounter a series of video- and text-based scenarios, and respond to a set of questions for each scenario within the allotted time using typed or video-recorded responses. Their responses are then reviewed by Casper raters, a diverse group of people who undergo extensive training on Casper and what it measures, as well as implicit bias training. Each applicant is rated by multiple raters (with a different rater for each scenario) so that the final aggregate score received by the program reflects diverse perspectives on the applicant’s performance. Casper z-scores empower programs with a standardized measure that shows the full spectrum of applicants’ abilities in this difficult-to-measure area.

What makes Casper ideal for admissions?

  • High test reliability, demonstrated by internal consistency of ICC=0.86
  • Evidence of predictive validity well documented
  • Lower demographic differences compared to other admissions measures
  • High applicant acceptability (7.5/10)
  • Programs given a single z-score and percentile for each applicant to easily incorporate into their holistic review process
  • Proctoring and other security protocols in place to ensure test integrity
  • Robust rating quality assurance processes to ensure fairness
  • Can’t be ‘gamed’ like standard cognitive tests
  • Used by nearly 500 programs and taken by 96% of applicants to US medical schools 

Learn more: 

Common misconceptions about the Casper test

Given that social intelligence and professionalism is more difficult to measure than technical knowledge, it is not surprising that there may be concerns and misconceptions about assessments like Casper that focus on personal and professional qualities. Some of the most common myths include: Casper is unable to predict future student performance, Casper is nothing more than a typing test, and there are no checks on the human raters. You can read our myth-buster article below for answers to the most commonly asked questions and rumors we’ve encountered.

What’s new in Casper?

Based on preliminary research, which showed that a video response format was associated with reduced demographic differences in SJT performance relative to a typed response format, Acuity Insights carried out an experiment with nearly 2,000 test takers in 2021 to see if the video-response format could work for Casper. Our results showed that demographic differences in test performance were reduced when the applicants provided a video-response to the scenario rather than a typed response. As a result, Casper now asks test takers to respond to 14 scenarios of which six require typed responses and eight require video-recorded responses.

Learn more:

What Casper users have to say

Our research team has done many studies with partners that have found meaningful, positive relationships between Casper and admissions, pre-clinical, and clinical professionalism indicators: MMI/interview performance, odds of acceptance, clerkship performance, and match rates. These are important, informative relationships on a similar magnitude as seen between knowledge-based tests and academic performance. We have tested Casper across many types of higher education programs with predictive validity studies either complete or underway with over 30 programs across undergraduate and graduate medicine, allied health, dentistry, physical therapy, occupational therapy, physician assistant, veterinary sciences, teacher’s education, and business education. You can read some of these studies below.

Learn more:

How applicants to your program can prepare for the Casper test

The intention and format of the Casper test means that a person can’t study for it in the same way they would for a cognitive test, such as the MCAT. In fact, our own research has shown that SJTs like Casper aren’t susceptible to gaming or coaching. Though there are many third parties offering paid test prep courses for Casper, Acuity Insights does not endorse any of them as we firmly believe that these courses will only distance applicants from disadvantaged backgrounds who are already disproportionately shut off from pursuing specific career paths. 

All official test prep material we provide is free and accessible through our official AcuityInsights.app website. We also provide practice tests, which are available to all applicants in their account. Aside from being familiar with the online format and technical requirements of the test, the best way applicants can prepare for Casper is to meaningfully invest in their self-development. This means going through experiences that can build their empathy and resilience. The AcuityInsights.app website also provides resources on how to improve these important skills.

Learn more:

[cta cta_id=”14292″]

Lire en Francais