Analysis and design of scalable pre-processing techniques of instances for imbalanced Big Data problems. Applications in humanitarian emergencies situations.

Authors

DOI:

https://doi.org/10.24215/16666038.22.e15

Keywords:

Big Data, Machine Learning, Data Preprocessing, Data Imbalance, Data reduction

Abstract

This thesis addresses the distributed and scalable pre-processing of Big Data sets, in order to obtain good quality data, known as Smart Data. Particularly, it focuses on classification problems, and on addressing the following characteristics: (a) imbalanced data; (b) redundancy; (c) high dimensionality; and (d) overlapping.

The following specific objectives are established for the aforementioned purpose:

  • Enable a state-of-the-art algorithm widely used for the treatment of class imbalance in traditional data scenarios (Small Data), to be able to obtain adequate results from large datasets in a distributed manner and in reasonable execution times.
  • To design and to implement a fast and scalable methodology for the reduction in both instances and attributes for Big Data sets with high redundancy and dimensionality, while maintaining the predictive capacity of the original dataset.
  • To design and to implement a strategy for scalable data characterisation in the context of Big Data classification, focusing on the ambiguous areas of the problem.
  • To apply the knowledge acquired during the development phase to solve problems of interest related to humanitarian emergencies.

Downloads

Download data is not yet available.

References

Basgall, M. J., Naiouf, M., & Fernández, A. (2021). FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics, 10(15), 1757.

Basgall, M. J., Hasperué, W., Naiouf, M., Fernández, A., & Herrera, F. (2019). An Analysis of Local and Global Solutions to Address Big Data Imbalanced Classification: A Case Study with SMOTE Preprocessing. Cloud Computing and Big Data (Vol. 1050, pp. 75–85). Springer International Publishing.

Basgall, M. J., Hasperué, W., Naiouf, M., Fernández, A., & Herrera, F. (2018). SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data. Journal of Computer Science and Technology, 18(03), e23.

Downloads

Published

2022-10-17

How to Cite

Basgall, M. J. (2022). Analysis and design of scalable pre-processing techniques of instances for imbalanced Big Data problems. Applications in humanitarian emergencies situations. Journal of Computer Science and Technology, 22(2), e15. https://doi.org/10.24215/16666038.22.e15

Issue

Section

Thesis Overview