Concurrent Machine learning Assisted Raman Spectroscopy of Whole Blood and Saliva for Breast Cancer Diagnostics

  • John Irungu Githaiga Department of Physics, University of Nairobi
  • Hudson Kalambuka Angeyo Department of Physics, University of Nairobi
  • Kenneth Amiga Kaduki Department of Physics, University of Nairobi
  • Wallace Dimbuson Bulimo Department of Biochemistry, University of Nairobi
  • Daniel Kenyoru Ojuka Department of Surgery, University of Nairobi
Keywords: Raman spectroscopy, Breast cancer, Machine learning, Whole blood, Saliva


Highly sensitive and unique biomarkers are needed for early cancer detection. In particular, biomarkers in biofluids can be useful in detecting the existence of a tumor early in the body. The utility of biofluid markers for cancer detection can be enhanced when multiple biofluids are simultaneously biochemically analyzed in order to acquire complementary information for diagnostic purposes. This work aimed at investigating the universal human whole blood and saliva biomarkers for breast cancer screening using machine learning-assisted Raman spectroscopy. Raman spectroscopy was performed in the 393 – 2063 cm-1 region using 785 nm laser excitation. Machine learning-assisted Raman spectroscopy was implemented by performing principal component analysis, independent component analysis, and support vector machine modeling on the Raman spectra in order to extract the underlying multivariate relationships between the observed biochemical alterations. Ten spectral regions were determined: 612 ± 1.44 cm-1, 785 cm-1, 968 ± 2.02 cm-1, 1000 ± 0.86 cm-1, 1248 cm-1, 1340 cm-1, 1371 ± 0.57 cm-1, 1448 ± 1.73 cm-1, 1500 ± 2.88 cm-1, and 1661 ± 1.44 cm-1, which can be regarded as universal biomarkers of breast cancer using both whole blood and saliva samples. The diagnostic models based on principal component analysis followed by support vector machine achieved mean sensitivity of 95.83 ± 2.48%, specificity of 99.16 ± 0.65%, and accuracy of 98.50 ± 0.65% when differentiating healthy blood samples from diseased blood samples. Further, this model yielded mean sensitivity of 73.0 ± 6.20%, specificity of 97.50 ± 0.67%, and accuracy of 93.66 ± 0.80% when differentiating the healthy saliva samples from diseased saliva samples. The determined biomarkers could be used to establish a spectral system for detection of breast cancer. Further work, including large sample sizes, has to be done to figure out how proteins and nucleic acids behave in their conformational states in human blood and saliva before translating the findings to actual clinical application.