24th EANN 2023, 14 - 17 June 2023, León, Spain

Evaluating the Extraction of Toxicological Properties with Extractive Question Answering

Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, Catarina Silva


  Preparing toxicological analysis of chemical substances is a time-consuming process that requires a safety advisor to search text documents from multiple sources for information on several properties and experiments. There has been a growing interest in using Machine Learning (ML) approaches, specifically Natural Language Processing (NLP) Techniques to improve Human-Machine integration in processes in different areas. In this paper we explore this integration in toxicological analysis. To minimise the effort of preparing toxicological analysis of chemical substances, we explore several available neural network models tuned for Extractive Question Answering (BERT, RoBERTa, BioBERT, ChemBERT) for retrieving toxicological properties from sections of the document sources.This formulation of Information Extraction as a targeted Question Answering task can be considered as a more flexible and scalable alternative to manually creating a set of (limited) extraction patterns or even training a model for chemical relation extraction. The proposed approach was tested for a set of eight properties, each containing multiple fields, in a sample of 33 reports for which golden answers were provided by a security advisor.Compared to the golden responses, the best model tested achieved a BLEU score of 0.55. When responses from different models are combined, BLEU increases to 0.59. Our results suggest that while this approach cannot yet be fully automated, it can be useful in supporting security advisor's decisions and reducing time and manual effort.  

*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.