Classification of biodegradable materials using QSAR modeling with uncertainty estimation
Werickson Fortunato de Carvalho Rocha, David Sheen
The ability to determine the biodegradability of chemicals without resorting to expensive tests is ecologically and economically desirable. Models based on quantitative structure-activity relations (QSAR) provide some promise in this direction. However, QSAR models in the literature rarely provide uncertainty estimates in more detail than aggregated statistics such as the sensitivity and specificity of the model's predictions. Almost never is there a means of assessing the uncertainty in an individual prediction. Without an uncertainty estimate, it is impossible to assess the trustworthiness of any particular prediction, which leaves the model with a low utility for regulatory purposes. In the present work, a QSAR model with uncertainty estimates is used to predict biodegradability for a set of substances from a publicly available data set. Separation was performed using a partial least squares discriminant analysis model, and the uncertainty was estimated using bootstrapping. The uncertainty prediction allows for confidence intervals to be assigned to any of the model's predictions, allowing for a more complete assessment of the model that would be possible through a traditional statistical analysis. The results presented here are broadly applicable to other areas of modeling as well, because the calculation of the uncertainty will clearly demonstrate where additional tests are needed.
Fortunato de Carvalho Rocha, W.
and Sheen, D.
Classification of biodegradable materials using QSAR modeling with uncertainty estimation, Sar and Qsar in Environmental Research, [online], https://doi.org/10.1080/1062936X.2016.1238010, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=921193
(Accessed March 4, 2024)