Validity of Assessment Centre Ratings Questioned

01 August 2016

Recent research from Birkbeck academics questions whether the flagship recruitment method deserves its prominence.

Predicting the success of prospective employees has always been a complicated alchemy, one that, for better or worse, is often indebted as much to hope as it is to conviction. Employers, in a bid to eschew such ambiguity, have increasingly looked to Assessment Centres to give them better leverage in making these decisions. Indeed, Assessment Centres appear to offer a comprehensive insight into employee competencies, seeing applicants undertake tasks relevant to the job they’re vying for and giving employers a collective ‘wash-up’ period to mull over an applicants traits.

However, a recent study from Birkbeck academics Dr. Duncan Jackson, Dr. George Michaelides and Dr. Chris Dewberry, considering variance in assessment centre ratings, has turned this thinking upside down and looks set to subvert the prevailing zeitgeist among employers. Their research paper ‘Everything That You Have Ever Been Told About Assessment Centre Ratings Is Confounded’, recently published in the Journal of Applied Psychology, has questioned the validity of a dimension-based approach to assessment centres, noting that “general performance explained at least 23 times and exercise effects explained at least 22 times more variance than dimension effects.”

Dimensions – How Reliable?

Essentially, the validity of dimension-based assessment doesn’t reconcile with the variance in ratings noted in this and other recent research. Therefore, once assessment scores are aggregated, they have already been subject to enough confounding that they no longer really signal to employers reliable and accurate profiles of applicant’s competencies. Of course, the implications for this are that, fundamentally, such a ‘score’ would not signify what it purports to, placing the assessment method and the integrity of its interpretability at risk.

The paper takes as its foundation a sceptical approach to ‘dimensions’, the attributes or skill-sets that are, in theory, meant to remain consistent across different exercises. The idea here, would be that dimensions can translate into the practical effectiveness of an employee; for instance, a good communicator would carry their capability in that dimension across exercises. They would communicate well in simulated board meetings and simulated report-writing, translating these attributes into the real world of work. However, this isn’t always the reality and, as noted in the study, past research into Assessment Centres has tended to confound both exercise and dimension-related effects with many other sources of variation. Essentially, the validity of dimension-based assessment doesn’t reconcile with the variance in ratings noted in this and other recent research. Therefore, once assessment scores are aggregated, they have already been subject to enough confounding that they no longer really signal to employers reliable and accurate profiles of applicant’s competencies. Of course, the implications for this are that, fundamentally, such a ‘score’ would not signify what it purports to, placing the assessment method and the integrity of its interpretability at risk.

- Dr Chris Dewberry presents the study at an Organizational Psychology Summer Lecture.

Method

The sample used in the study involved five separate administrations of an operational assessment centre based in South East Asia. In total, 698 candidates participated along with 322 assessors, the assessments themselves comprising three exercises: an ‘in-basket’, a roleplay and a case study. From this information, 29 effects – or, sources of variance - were noted and accordingly divided into categories of reliable, unreliable and reliability-unrelated, with the intention of allowing the study to consider closely why scores would vary across dimensions and exercises. Crucially, the study adopted Bayesian generalizability theory in an effort to contain the complexity involved in 29 variances and delineate some of the confounded data. As the paper notes “Capitalizing on the advantages of Bayesian generalizability theory, ours is the first known study to decompose, and thus, unconfound, all of the 29 sources of variance that could potentially contribute to variance in AC ratings.”

Holes in the Ship

Assessment Centres are widely regarded as the flagship method of recruitment, however, as noted in a presentation by Chris Dewberry, held at Birbeck’s Department of Organizational Psychology ‘Summer Seminar’ series in July, there appears to be little to commend this status. In reviewing the research on Assessment Centres, Dewberry draws three important conclusions.

The first of these is that the criterion-related validity of assessment centres is substantially lower than simpler, cheaper and less resource-intensive selection methods. Put simply, this means that although Assessment Centres predict future work performance to some extent, on the whole and despite the additional time and resources allocated, they do so considerably less well than other cheaper and simpler selection methods – such as structured interviews, cognitive ability testing or a combination of both.

The second conclusion relates to the ‘wash-up’ period, whereby assessors will discuss overall scores at the end of the assessment. The ‘integration’ stage, or the point at which several different results are brought together for a coherent overview of strengths, is ordinarily conducted either as a ‘wash-up’ consensus or a statistical aggregation. Extensive research shows that these periods of consensus-building among assessors counter-intuitively work to reduce the extent to which Assessment Centre scores predict the future work performance of job candidates, and therefore undermines their effectiveness as a selection method. The reason for this again relates to unacknowledged variables, of which a large amount are brought to the table with the opinions of assessors. Although, for the most part, employers tend to favour this integration method, statistical or arithmetical integration was shown in the study to yield far more accurate results.

The third of the three observations relates to behavioural dimensions, often referred to as competencies; the study shows that Assessment Centres cannot reliably inform you of an applicant’s competencies. The findings indicate that in the best case scenario, around 1% of the variation in the scores given to Assessment Centre candidates is associated with their performance on competencies, implying that dimensions, although the ultimate focus of many Assessment Centres, are, in practical terms, potentially irrelevant to the entire process.

What Next?

The implications for the field are serious. In using Assesment and Development Centres, employers are spending a great deal of time and money on an approach that, overall, is possibly less effective than they had anticipated, and less effective than cheaper and less resource-intensive alternatives. It also has serious implications for applicants undergoing intensive assessment: how can recruitment and development processes in organizations remain equitable and fair if they are based on misleading assessments?

The study, conducted by Birkbeck academics Dr Chris Dewberry, Dr Duncan Jackson, George Michaelides and ASSESTA Ltd’s Young-Jae Kim, concedes that further research is needed. It has, however, signalled a change in the wind; and a change with profound consequences for the de facto flagship selection method. Employers, many of whom are in the same boat in adopting ACs, should be looking harder at their assessment procedures.

For further information on Birkbeck’s Organizational Psychology Department, please visit their webpage where you can view research, courses and staff profiles. The article is based on a study conducted by Dr Chris Dewberry, Dr Duncan Jackson and Dr George Michaelides.

Validity of Assessment Centre Ratings Questioned

Dimensions – How Reliable?

Method

Holes in the Ship

What Next?

More news about: