AI Biases in Hiring

Case study

Introduction

When you apply for a job online, your resume usually doesn’t go straight to a human being. Instead, it passes through an automated system designed to filter, rank, and sort candidates before a recruiter ever sees a name. These systems are called Applicant Tracking Systems (ATS).

This case study will examine how these systems may make biased decisions with the help of AI, including what inputs they receive, and what might make them favor one applicant over another.

ATS systems take in a large number of resumes, and determine if the applicants are eligible for the jobs they are applying to, often with the help of AI. But in practice, the results are often biased.

One such example is Amazon’s AI hiring tool. Built in 2014, it aimed at mechanizing the search for top talent by automatically reviewing job applicants’ resumes. By 2015, they realized that the hiring tool is biased against women (Dastin, 2018).

Inputs

Many organizations and companies, such as Amazon, use ATS systems. The primary system that Amazon uses for recruiting corporate employees is ICIMS. ICIMS analyzes a candidate's resume, application information, and existing employee records. Furthermore, the TARAI index illustrates how HireVue is the primary company that uses ATS to analyze an applicant's profiling and background checks. The TARAI index explains how the AI works. HireVue collects data from an applicant's interview video, job scenario tests, and language skills tests. The ATS then analyzes the applicant on three assumptions.

First, the applicant's language and behavior in the interview video are analyzed to see how the candidate reflects the values of the organization. Second, the ATS system utilizes historical data to set a benchmark. Third, the product's language tools effectively analyze a candidate's speech even with accents (Simpson et al, 2025). It is not specified what historical data the organization uses to set benchmarks for future candidates. Candidates are analyzed based on past and current employees' skills and performance, and ATS finds matches based on the resume, employment history, and a series of online assessments to recruit future candidates. So, top performing employees can be set as benchmarks and the ATS will look for candidates with similar skills and background.

In the article, The Impossibility of Fairness: Different Value Systems Require Different Mechanisms For Fair Decision Making, Friedler explains, “It is common in data science to directly use any observed data that is available; doing so without modification or further evaluation is an adoption of the WYSIWYG worldview and assumption.” (Friedler et al, 2021). This speaks to the exact problem with the inputs that are being fed into these resume filtering systems - they use all the available data without considering the hidden biases that may exist within that data. Without the necessary evaluation of that data, those hidden biases transfer over to the decision making process and lead to unfair results.

This means that every time ATS evaluates someone's resume, it uses historical data as benchmarks without considering if that data is free from any bias. With ATS, there's a high chance that it can commit reocurring mistakes that can impact future hiring decisions, especially towards groups that may have already been historically disadvantaged in hiring.

Diagram Demonstrating the ATS "Black-box"

Outputs

The TARAI Index Methodology paper points out that a major output done by these systems is candidate ranking, either as a result of resume screening or interviewing, to help the company make decisions on who to recruit. The AI could also sort and classify the job candidates based on their application materials, and produce a ranking of who is the most fit for the position. In practice, these rankings can heavily influence hiring decisions, especially in cases where there are thousands of applications to review and recruiters could not possibly review them all.

Some systems, such as HireVue, may also automatically eliminate ineligible candidates based on the basic candidate information they gathered, so that your resume may not even be seen by a human recruiter at all. These requirements may include years of experience, certifications, among other things. As a result, eligible candidates may end up excluded from the pool. Other systems, such as Amazon's generate scores, such as star ratings or skill-match scores, which are then used to compare candidates or generate the aformentioned rankings (Dastin, 2018). However, there is not a lot of transparency into how these scores are generated. What we do know is that they use the data of previous applicants and current employees to determine these scores.

The TARAI Index has noted the “black-boxed nature of AI system outputs” (Simpson et al, 2025), and how often the problem of transparency is resolved through the creation of another AI model to create transparency specific to a certain model only, which can be misleading. This black-boxed nature is where the majority of the controversy lies - without a way to understand how these scores and rankings are generated, there is no way to ensure that there is no hidden biases in the process, especially when there is evidence to prove otherwise.


To learn about how the system might see a resume from a man and a woman, click on the links below!

Apply as a woman! Apply as a man!