Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks
Multi-class classification in imbalanced datasets is a challenging problem. In these cases, common validation metrics (such as accuracy or recall) are often not suitable. In many of these problems, often real-world problems related to health, some classification errors may be tolerated, whereas othe...
Main Authors: | , |
---|---|
Format: | info:eu-repo/semantics/article |
Language: | English |
Published: |
MDPI
2021
|
Subjects: | |
Online Access: | http://hdl.handle.net/10835/9318 |
_version_ | 1789406332322840576 |
---|---|
author | Ramos-López, Darío Maldonado, Ana D. |
author_facet | Ramos-López, Darío Maldonado, Ana D. |
author_sort | Ramos-López, Darío |
collection | DSpace |
description | Multi-class classification in imbalanced datasets is a challenging problem. In these cases, common validation metrics (such as accuracy or recall) are often not suitable. In many of these problems, often real-world problems related to health, some classification errors may be tolerated, whereas others are to be avoided completely. Therefore, a cost-sensitive variable selection procedure for building a Bayesian network classifier is proposed. In it, a flexible validation metric (cost/loss function) encoding the impact of the different classification errors is employed. Thus, the model is learned to optimize the a priori specified cost function. The proposed approach was applied to forecasting an air quality index using current levels of air pollutants and climatic variables from a highly imbalanced dataset. For this problem, the method yielded better results than other standard validation metrics in the less frequent class states. The possibility of fine-tuning the objective validation function can improve the prediction quality in imbalanced data or when asymmetric misclassification costs have to be considered. |
format | info:eu-repo/semantics/article |
id | oai:repositorio.ual.es:10835-9318 |
institution | Universidad de Cuenca |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | dspace |
spelling | oai:repositorio.ual.es:10835-93182023-04-12T19:36:23Z Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks Ramos-López, Darío Maldonado, Ana D. multi-class classification imbalanced data Bayesian networks variable selection Multi-class classification in imbalanced datasets is a challenging problem. In these cases, common validation metrics (such as accuracy or recall) are often not suitable. In many of these problems, often real-world problems related to health, some classification errors may be tolerated, whereas others are to be avoided completely. Therefore, a cost-sensitive variable selection procedure for building a Bayesian network classifier is proposed. In it, a flexible validation metric (cost/loss function) encoding the impact of the different classification errors is employed. Thus, the model is learned to optimize the a priori specified cost function. The proposed approach was applied to forecasting an air quality index using current levels of air pollutants and climatic variables from a highly imbalanced dataset. For this problem, the method yielded better results than other standard validation metrics in the less frequent class states. The possibility of fine-tuning the objective validation function can improve the prediction quality in imbalanced data or when asymmetric misclassification costs have to be considered. 2021-01-18T09:33:56Z 2021-01-18T09:33:56Z 2021-01-13 info:eu-repo/semantics/article 2227-7390 http://hdl.handle.net/10835/9318 en https://www.mdpi.com/2227-7390/9/2/156 Attribution-NonCommercial-NoDerivatives 4.0 Internacional http://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess MDPI |
spellingShingle | multi-class classification imbalanced data Bayesian networks variable selection Ramos-López, Darío Maldonado, Ana D. Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks |
title | Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks |
title_full | Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks |
title_fullStr | Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks |
title_full_unstemmed | Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks |
title_short | Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks |
title_sort | cost-sensitive variable selection for multi-class imbalanced datasets using bayesian networks |
topic | multi-class classification imbalanced data Bayesian networks variable selection |
url | http://hdl.handle.net/10835/9318 |
work_keys_str_mv | AT ramoslopezdario costsensitivevariableselectionformulticlassimbalanceddatasetsusingbayesiannetworks AT maldonadoanad costsensitivevariableselectionformulticlassimbalanceddatasetsusingbayesiannetworks |