Artificial Intelligence exceeds Humans in Epidemiological Job Coding

Hi, I'm Mathijs Langezaal and my research focuses on the use of artificial intelligence in epidemiological job coding. Feel free to ask me questions!


Work can expose us to health risks, such as asbestos and constant noise. To study these risks, job descriptions are collected in large occupational cohorts and classified by experts to standard codes. This is time-consuming, expensive, and requires expert knowledge. For example, after three months of extensive coding and training, the coding efficiency of an expert coder can reach ~2700 codes per month.

Given the large scale of many occupational epidemiological cohorts, multiple expert coders need to be deployed. This introduces a 42–71% inter-coder reliability. Although several automatic coding tools have been developed to tackle this issue, the accurate assessment of health risks is only feasible with human intervention.

Our research

We developed OPERAS, a customizable decision support system for epidemiological job coding. Using over 800,000 manually expert-coded job descriptions, we developed and tested classification models for four (inter)national classification systems. OPERAS provides a score showing its confidence in each classification, which helps identify cases where a human expert should double-check the work. This confidence score also allows the automatic coding of job descriptions above a customizable score threshold. For example, expert-coders could choose to automatically code all codes with a confidence score above 95%, enabling a large workload reduction. Lastly, to test the exposure assessment accuracy, we used four job-exposure matrices, which link standardized jobs to potentially hazardous exposures.


During our evaluations, we found that OPERAS’ classification performance ranged between 58.31 – 78.94% accuracy, with an 75.0-98.4% exposure assessment accuracy. This performance exceeds both expert coders and other currently available tools. Further, using the automatic coding function of OPERAS at a 95% confidence score threshold, OPERAS enables a 19.7-55.7% estimated workload reduction. As such, OPERAS supports custom occupational coding, enabling large-scale occupational health research in an efficient, effective, accurate, and stable manner.


Our research has been published in Communications Medicine (


Feel free to ask questions about the research via Slido.

Read paper