Answering these questions is critical for ensuring that you get the right people every time – based on facts, not chance. Assessments are only useful if they are reliable, valid, and consistent. As such, Industrial/organizational psychologists have developed tools that can help you ensure that your evaluations are consistent:
- Assessment centers – these place candidates in an environment that emulates the look and feel of a job, allowing participants to “test drive” jobs and practice new behaviors while being evaluated by multiple assessors at the same time. Assessment centers can be used for selection and hiring or development and coaching purposes.
- Frame of reference tools – these allow for fairer assessments by providing benchmarks for what high and low performance looks like in a specific role.
- Interrater reliability – a measure that insures assessors are consistently rating a candidate’s behaviors the same.
Why assessment centers?
Assessment centers allow candidates to “test-drive” jobs. Substantial research shows that evaluating behavior in contextually relevant situations like what candidates would actually experience in the role is the best predictor of on-the-job success. In other words, evaluating a sample of a candidate’s work for use in selection is the best method for evaluating on-the-job performance.1
What does this look like in practice? Say there is a consultative salesperson interviewing for an opening on your team. The interviewer asks, “how driven are you to continue prospecting after a rejection?” – the candidate will likely say “very driven!” and perhaps even give an example of drive, which may not represent how they would actually perform in this situation. However, if the applicant is put in a simulation that is contextually-relevant to what they would do in the role, you can observe how they would actually perform after being rejected during a prospecting call.
Assessment centers allow for this contextually relevant evaluation, which can be very revealing for both candidates and employers. However, it is critical to be transparent about what is being observed and evaluated to ensure an assessment is successful.
Why is being transparent so important? Say you need to renew your driver’s license and are required to take a road test. You are a confident driver and expect to pass no problem. It goes well – during the test, you drive as you normally would, following all the rules of the road and even avoid an accident when another car runs a red light. However, you fail. You receive feedback that you can’t steer well. You’re exasperated. Is this feedback legitimate? You expected the test to evaluate your understanding of the rules of the road, not your finesse as a driver. This misunderstanding causes you to question the entire evaluation – did you deserve to fail? Would the road test have had the same result with a different evaluator? Was the steering system in need of repair? You are certain this feedback is due to the poor evaluation skills of the evaluator. Is it?
The point is, making sure the candidate understands what is being evaluated is critical for the assessment to be considered successful, otherwise the candidate may feel that they are being unfairly judged. This brings up another important question – how do you ensure consistent evaluations? How do you ensure the same results with different evaluators?
Frame of reference tools
Frame of reference tools are one way to ensure more consistent evaluations. They provide behavioral benchmarks that are points of reference for what high and low performance looks like across exercises and competencies. The use of these tools can improve assessors’ rating accuracy through two processes2:
- Helping assessors understand the behaviors that constitute specific levels of performance on distinct dimensions
- Establishing performance prototypes during training that allow raters to more easily categorize candidates based on standardized benchmarks. This counteracts normal assessor information loss.3
Assessors trained to leverage frame of reference tools yield the most accurate ratings when compared to raters who had received either rater error training alone, frame of reference with rater error training, or no training4.
Without clear standards and training (specifically frame of reference training), certain personality types may judge candidates very differently. Sympathetic assessors may tend to give elevated scores, while merciless assessors can give harsh ratings. Research shows that agreeable personality characteristics may also be related to lenient assessment center ratings.5 Due to their training and knowledge of scoring biases, psychologists consistently rank as better assessors.6 They are more valid and reliable in their ratings than non-trained individuals.7
But how exactly do you measure the efficacy of these tools?
Interrater reliability is one way to do this. Interrater reliability is a measure of the level of consistency that exists between different assessor’s scoring of the same observed behaviors – which can take the form of a contextual assessment, interview, or similar. One approach for evaluating interrater reliability among assessors is to have multiple assessors evaluate a candidate’s behavior simultaneously – this keeps stimuli consistent across assessors. A high interrater reliability among assessors demonstrates different assessors are consistently rating a candidate’s behavior the same. So, how do you measure interrater reliability? You need to test your assessors.
What does this look like in practice? For example, we wanted to make sure our assessors’ scores were consistent in a large-scale consultative sales assessment, so we decided to measure the interrater reliability of our assessors. First, we selected a candidate who had participated in the assessment, where they acted as if they were a member of a consultative sales team. This candidate’s performance served as the test case. Next, we provided twenty assessors with the data from the candidate’s experience in the same format that they would normally receive. Then, our assessors scored this candidate as if they were scoring a ‘live’ candidate. These scores were compiled and evaluated for interrater reliability.
In our test case, when assessors were asked to rate performance independently, we found a 95% agreement among assessors’ ratings, outperforming the published findings that range from 66% to 84%8, and 62% to 91%9. When we added our lead assessor strategy, we reached 100% agreement. These results demonstrate that our assessors scores are consistent and as a result validate our assessment as an effective evaluation tool. Running tests on your assessment team ensures that you have consistent and reliable data about your candidates, allowing you to dependably select the right people for the job.
Assessment Centers and the future
Reliable data from assessment centers is critical for hiring and promoting the best people, but assessments of the future hold even more promise. Historically, assessment centers have been held in person. However, in recent years, assessment centers have moved towards virtual delivery, for both ease of management and cost concerns.
Research shows that virtual assessment centers, in comparison to more traditional on-site assessment centers, are more valid and precise. Candidates feel less stressed online, they showcase their true self, and virtual assessment centers are more objective and accurate than face to face interviews which are plagued with biases, due to “like me” bias, gender and cultural biases, etc.10
Hiring the best people will always provide the best results. But as we look to the future, advancements in digital technology will only help with the ease, cost, and accuracy of getting there.