A Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) Model Approach Towards Improving HBV Prediction

DOI: https://doi.org/jobasr

Asiya Baba Abdullahi

Yakubu Musa

Babayemi W. A.

Jibril A.H.

Abstract
Timely classification of the Hepatitis B Virus (HBV) infection stages remains a major challenge in clinical diagnostics, particularly in differentiating acute from chronic cases using complex serological profiles. This study aims to propose a CNN-LSTM predictive model for accurate prediction of Hepatitis B Virus (HBV) stages, with improved performance metrics and generalizability. A dataset comprising 758 patient records from the Immunology Department of Usmanu Danfodiyo University Teaching Hospital, Sokoto, collected between February 14 and December 31, 2019 was utilized. Pre-processing involved data imputation, categorical encoding, normalization, and expert rule-based labeling. The model architecture combines convolutional and recurrent layers to enhance feature extraction and sequence learning, thereby improving classification accuracy between acute and chronic infection states. The CNN-LSTM model architecture consists of approximately 8 trainable layers. Input Layer :Input shape = 7,1, Conv 1D(1): 32 filters, kernel size = 3, activation =ReLU, Maxpooling 1D: Pool size =2, Conv 1D(2): 64 filters, kernel size = 3, activation = ReLU, Maxpooling 1D: Pool size =2, LSTM(1): 50 units, dropout = 0.3, LSTM(2): 25 units, dropout = 0.3, Dense( Fully Connected Layer: 16 neurons, activation = ReLU, Output Layer: 1 neuron, activation = Sigmoid. The proposed CNN-LSTM model was trained and evaluated using stratified 10-fold cross-validation, achieving mean values of accuracy, precision, recall, and F1-score of 99.50%, 99.80%, 99.69%, and 99.69% respectively. Receiver Operating Characteristic (ROC) analysis yielded near-perfect Area Under the Curve (AUC) values across folds. Comparative evaluations against standalone CNN, LSTM, and Deep Neural Network (DNN) models demonstrated the superior predictive capability of the hybrid model, outperforming previous studies that achieved 66.30% accuracy. The CNN–LSTM model achieved outstanding performance with 99.50% accuracy, 99.80% precision, 99.69% recall, and 99.69% F1-score, significantly surpassing existing models. The identification of key serological risk factors (HBeAg, HBeAb), the successful development of a high-performing CNN–LSTM classifier, and the model’s demonstrated superiority over existing approaches together illustrate a cohesive framework for HBV stage prediction.
References
PDF