ABSTRACT
Fruit ripening is usually a natural process in which fruits undergo various chemical and physical changes before they become palatable. New artificial ways of fruit ripening have been developed as a result of recent breakthroughs in agricultural technology mainly to meet market demands and to deal with the logistics of storage and transportation. However, this practice has become a concern because of the human health risks resulting from the uncontrolled use of ripening agents which contain toxic elements. For instance industrial grade calcium carbide has impurities of arsenic and phosphorus as well as other heavy metals. High intake of these elements is known to cause neurological disorders such has cerebral edema and memory loss as well as carcinogenic disorders like cancer of the colon, lungs and peptic ulcers. Hardly is there a method capable of rapidly and non-invasively assessing artificial ripeners in fruits (ARF) with reliability and accuracy. The wet chemical techniques conventionally used such as the various forms of chromatography are time consuming, destructive, costly and involve laborious sample preparations. This work aims at developing a rapid and non-invasive technique for assessing calcium carbide ripened bananas using machine-learning (ML) assisted laser Raman spectroscopy (LRS). In this study, Raman spectra was recorded from naturally and carbide ripened banana samples using a 785 nm laser for excitation. The bananas were ripened using calcium carbide with concentrations ranging from 0.240 g/L to 2.0 g/L. Exploratory analysis using PCA revealed that clustering of the carbide ripened samples was due to the presence of sulfur, acetylene, calcium hydroxide and phosphine impurities contained in CaC2. These molecules have Raman bands centered at 480 cm-1 (S-S bond stretching), 612 cm-1 (C-H asymmetric bending), 780 cm-1 (O-H bending) and 979 cm-1 (P-H stretching) respectively. Classification and quantification of CaC2 concentrations used in ripening was achieved using the following ML algorithms: support vector machine, artificial neural networks and random forest. High correct classification accuracies were realized (> 85 %) in the ML classification models. Furthermore, the performance of the regression models showed good performance as indicated by high R2 values (>0.95) and the low RMSEP values (<0.34g/L) when predicting test data sets. Banana samples collected from local markets around Nairobi were found to have been ripened by CaC2 (up to 1.30 g/L) using the optimized LRS conditions and ML models developed in this work. Therefore, ML-assisted LRS allows for rapid and direct assessment of artificial ripeners in fruits. The findings of this study will aid in the development of spectral libraries for use in food safety analysis procedures involving fruits.
TABLE OF CONTENTS
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
LIST OF TABLES xi
LIST OF FIGURES xii
LIST OF ABBREVIATIONS AND ACRONYMS xiv
CHAPTER 1: INTRODUCTION
1.1 Background to the Study 1
1.1.1 Use of Calcium Carbide as an Artificial Ripener and Associated Health Risks 2
1.1.2 Challenges Associated with Conventional Methods for Assessing Artificial Ripeners 3
1.2 Statement of the Problem 4
1.3 Research Objectives 4
1.3.1 Main Objective 4
1.3.2 Specific Objectives 4
1.4 Justification and Significance 5
CHAPTER 2: LITERATURE REVIEW
2.1 Chapter Overview 6
2.2 Common Artificial Fruit Ripeners 6
2.3 Methods for Assessing Artificial Ripeners in Fruits 6
2.3.1 Wet Chemistry-based Techniques 7
2.3.2 Nuclear Magnetic Resonance (NMR) Spectroscopy 8
2.3.3 Near Infrared Spectroscopy 9
2.3.4 Laser Raman Spectroscopy 9
2.4 Applications of Machine Learning in Analytical Spectroscopy 10
CHAPTER 3: THEORETICAL FRAMEWORK
3.1 Chapter Overview 13
3.2 Raman Scattering 13
3.2.1 Classical Theory of Raman Scattering 15
3.2.2 Quantum Theory of Raman Scattering 17
3.3 Intensity of Raman scattered light 21
3.4 Molecular vibrations and the Raman spectra 22
3.5 Spectral Data Preprocessing 23
3.6 Machine Learning Techniques in Raman Spectroscopy 25
3.6.1 Exploratory Analysis of Raman Spectra Using the Principal Component Analysis Technique 26
3.6.2 Modeling Approaches for Multivariate Data Sets Using Supervised Machine Learning Methods 27
3.6.3 The Artificial Neural Network Model 27
3.6.4 The Random Forest Model 30
3.6.5 The Support Vector Machine and Support Vector Regression Model 32
CHAPTER 4: MATERIALS AND METHODS
4.1 Chapter Overview 36
4.2 Instrumentation for the Laser Raman Spectroscopy Set-up 36
4.3 Preparation of Fruit Samples 38
4.4 Procedure for Laser Raman Spectra Acquisition 40
4.5 Elemental Analysis of Calcium Carbide 42
4.6 Preprocessing of laser Raman spectra 42
4.7 Software for Data Analysis 43
4.8 Utility of Machine Learning Techniques in the analysis of laser Raman spectra 44
4.8.1 Exploratory analysis of laser Raman spectra utilizing PCA 44
4.8.2 Multivariate Calibration of Laser Raman Spectra Utilizing ANN 44
4.8.3 Multivariate Calibration of Laser Raman Spectra Utilizing RF 46
4.8.4 Multivariate Calibration of Laser Raman Spectra Utilizing SVR 47
4.9 Evaluation of Limits of Detection (LOD) and Quantification (LOQ) 50
CHAPTER 5: RESULTS AND DISCUSSIONS
5.1 Chapter Overview 52
5.2 Optimized Conditions for Rapid Laser Raman Spectroscopic Measurements 52
5.3 Raman Spectra of Samples before and after Data Preprocessing 55
5.4 Exploratory Multivariate Analysis of Raman Spectra by PCA and Assignment of Peaks. 59
5.4.1 PCA for Raman shift Region 450 cm-1 - 500 cm-1 61
5.4.2 PCA for Raman shift Region 600 cm-1 - 650 cm-1 63
5.4.3 PCA for Raman shift Region 750 cm-1 - 800 cm-1 65
5.4.4 PCA for Raman shift Region 950 cm-1 - 1000 cm-1 67
5.5 Qualitative Analysis (Classification) of Raman Spectra Using Machine Learning 70
5.5.1 Classification of Naturally and Carbide Ripened Samples Utilizing Support Vector Machine Classifier 70
5.5.2 Classification of Naturally and Carbide Ripened Samples Utilizing Random Forest Classifier 72
5.5.3 Classification of Naturally and Carbide Ripened Samples Utilizing Artificial Neural Network Classifier 73
5.6 Multivariate Calibration Using Machine Learning for Quantitative Analysis 75
5.6.1 Quantitative Analysis of Carbide in Samples Using Artificial Neural Network 76
5.6.2 Quantitative Analysis of Carbide in Samples Using Random Forest Regression 80
5.6.3 Quantitative Analysis of Carbide in Samples Using Support Vector Regression 86
5.7 Prediction of Calcium Carbide Concentration used in Ripening Market Samples 91
CHAPTER 6: CONCLUSIONS AND RECOMMENDATIONS
6.1 Conclusion 94
6.2 Recommendations and Future Prospects 95
REFERENCES 97
APPENDICES 105
APPENDIX 1: PCA code in R for Exploratory Analysis 105
APPENDIX 2: ANN, RF and SVM Codes in R for Classification (Qualitative Studies) 107
APPENDIX 3: ANN, RF and SVR Codes in R for Regression (Quantitative Studies) 110
APPENDIX 4: Model Predictions for Test Data Sets 115
APPENDIX 5: EDXRF Analysis of Calcium Carbide 117
LIST OF TABLES
Table 2.1: Common impurities found in CaC2 and their Raman peaks. 10
Table 4.1: Concentration levels of CaC2 used in ripening samples 39
Table 5.1: Power of different lasers at different objectives 53
Table 5.2: Confusion matrix of test data set 71
Table 5.3: Confusion matrix of test data set 73
Table 5.4: Confusion matrix of test data set 74
Table 5.5: Model prediction ability with a variation of hidden layers 77
Table 5.6: Model prediction ability with a variation of the number of hidden neurons 78
Table 5.7: Model performance based on R2, RMSEP and multivariate LOD and LOQ 90
Table 5.8: Results turn-around time: from sample preparation to data analysis 90
Table 5.9: Market samples predictions 91
Table 5.10: Anova statistics showing comparison between different model performance 93
Table A4.1: Model predictions for test data sets 115
LIST OF FIGURES
Figure 3.1: Feynman diagram of Stokes scattering. 14
Figure 3.2: Feynman diagram of anti-Stokes scattering 15
Figure 3.3 Diagram of scattering during illumination with monochromatic light. 19
Figure 3.4: A schematic showing Rayleigh and Raman scattering 20
Figure 3.5: Biological and artificial neural design. 28
Figure 3.6: A generalized schematic of the BP-ANN. 29
Figure 3.7: A sample hyperplane in a 2 dimensional space 33
Figure 4.1: Layout of confocal laser Raman spectrometer 37
Figure 4.2: Banana sample slices for mounting under the microscope 40
Figure 4.3: Conceptual framework for machine learning methodologies employed towards calibration and prediction of Raman spectral data 49
Figure 5.1: Images of banana samples as seen under the microscope (X 50 lens) 54
Figure 5.2: Raw Raman spectra for naturally ripened banana samples. 56
Figure 5.3: Raw Raman spectra of CaC2 ripened banana samples. 56
Figure 5.4: Pre-processed Raman spectra of naturally and carbide ripened banan samples 58
Figure 5.5: PCA scores plots of naturally (NRF) and artificially (ARF) ripened samples for waveshift region 200 -1200 cm-1 60
Figure 5.6: PCA loadings plots of naturally and artificially ripened samples for waveshift region 200 -1200 cm-1 61
Figure 5.7: PCA scores plots for sulfur molecules ROI (450 -500 cm-1 ) 62
Figure 5.8: PCA loadings plot for sulfur molecules ROI (450 -500 cm-1). 63
Figure 5.9: PCA scores plots for acetylene molecules ROI (600 -650 cm-1) 64
Figure 5.10: Loadings plot for acetylene molecules ROI (600 -650 cm-1). 65
Figure 5.11: PCA scores plots for hydroxyl molecules ROI (750 -800 cm-1 ) 66
Figure 5.12: Loadings plot for hydroxyl molecules ROI (750 -800 cm-1). 66
Figure 5.13: PCA scores plots for phosphine molecules ROI (950 -1000 cm-1 ) 67
Figure 5.14: Loadings plot for phosphine molecules ROI (950 -1000 cm-1) 68
Figure 5.15: SVM classification plot utilizing PCs as inputs with RBK function 71
Figure 5.16: RF top 5 variable importance plot 72
Figure 5.17: PCA scores plot of naturally (NRF) and artificially ripened samples showing clusters based on concentration of CaC2 76
Figure 5.18: ANN regression plot for test data set. Error! Bookmark not defined.
Figure 5.19: ANN calibration plot for calculating LOD and LOQ 80
Figure 5.20: RF variation of OOB error against number of trees in the forest. 82
Figure 5.21: RF variation of OOB error against number of variables tried at each split 83
Figure 5.22: RF regression top 10 variable importance plot 84
Figure 5.23: RF regression plot for test data set. 85
Figure 5.24: RF calibration plot for calculating LOD and LOQ 86
Figure 5.25: variation of the SVR model error with the cost function 87
Figure 5.26: SVR plot for test data set. 88
Figure 5.27: SVR calibration plot for calculating LOD and LOQ 89
Figure 5.28: A visual comparison of suspected artificially ripened banana and naturally ripened banana from local markets. 92
LIST OF ABBREVIATIONS AND ACRONYMS
ANN Artificial neural networks
ANOVA Analysis of variance
ARF Artificial ripeners in fruits
BP-ANN Back propagated - artificial neural network
CaC2 Calcium carbide
CCD Charged-coupled device
EDXRF Energy dispersive X-Ray fluorescence
FWHM Full width half maximum
GC Gas chromatography
HPLC High performance liquid chromatography
ICP-AES Inductively coupled plasma – atomic emission spectrometry
IUPAC International union of pure and applied chemistry
LOD Limit of detection
LOQ Limit of quantification
LRS Laser Raman spectroscopy
MIR Mid-infrared
ML Machine-learning
MS Mass spectrometry
MSE Mean square error
mtry number of variables tried at each split
ND Neutral density
NIR Near-infrared
NRF Naturally ripened fruits
ntree number of trees in the forest
OOB error Out-of-bag error
PCA Principal component analysis
RBK Radial basis kernel
RF Random forest
RMSEC Root mean square error of calibration
RMSEP Root mean square error of prediction
SEM Standard error of mean
SNR Signal-to-noise ratio
SVM Support vector machine
SVR Support vector regression
CHAPTER 1
INTRODUCTION
1.1 Background to the Study
Fruit ripening is ordinarily a natural process involving a series of physiological changes in odour, colour and quality of the fruit (Adeniji et al., 2010). The time taken for fruits to ripen naturally varies across different fruits. Thus farmers and retailers ripen fruits artificially using chemicals and other ripening agents, primarily, to meet market demands and achieve uniform ripeness of their fruits. Artificial ripening is also done to deal with logistics of transportation and distribution as ripe fruits cannot be stored in transit for long; farmers harvest fruits whilst still raw and ripen them later (Dhembare, 2013).
The use of artificial ripeners on fruits dates back to ancient China whereby pears were ripened artificially by placing them in closed spaces and burning incense in the chamber (Mehnaz et al., 2013). In the 1920s, researchers observed that unsaturated hydrocarbon gases such as ethylene were responsible for ripening and that plants were able to produce ethylene by themselves (Kendrick, 2009). These observations led to growth of a variety of chemicals and ways of artificially ripening of fruits. Notably, the traditional chemical-free fruit ripening techniques hardly posed any human health risks.
At present, artificial ripening agents commonly used include ethylene gas, calcium carbide, ethephon (2-chloroethylphosphonic acid), ethylene glycol (1,2-thanediol), carbon monoxide, potassium sulphate and oxytocin (Singal et al., 2012). Whereas the natural ripening process is usually initiated with the production of ethylene within mature fruits, artificial ripening agents like ethephon, methanol, and ethylene glycol produce ethylene for accelerating the process in a manner similar to the ethylene produced naturally by fruits (Nagel, 1989). The practice is mostly prevalent during post-harvest stages in the food chain, particularly, during transportation and storage. Fruits which are more prone to artificial ripening include bananas, apples, mangoes, tomatoes and avocados (Dhembare, 2013). These fruits are targeted owing to their widespread demand.1
The uncontrolled use of hazardous ripening agents particularly in developing countries poses a great concern to human health. Several studies have shown the detrimental nature to human health of these ripening agents as they cause memory loss, cerebral edema, colonic and lung cancer among others (Kesse et al., 2019; Lakade et al., 2018; Chandel et al., 2018; Kathirvelan et al., 2017). As these ripeners could have direct and indirect health hazards, it is imperative to determine their elemental compositions and assess their safety levels within the artificially ripened fruits.
1.1.1 Use of Calcium Carbide as an Artificial Ripener and Associated Health Risks Among the many chemicals used to ripen fruits artificially calcium carbide (CaC2) the most preferred due to its fast action, ease of use and availability. Hydrolyzed CaC2 liberates acetylene which functions as ethylene analogue to influence the ripening of fruits (Bari et al., 2018). Equation (1.1) represents the chemical reaction for liberating ethylene and equation (1.2) shows how it accelerates the ripening process.
CaC2(s) + 2𝐻2O(l) → Ca(OH)2(s) + C2𝐻2(g) (1.1)
Unripe (green)banana + C2𝐻2 → Ripe (yellow) banana (1.2)
Notably, the form of CaC2 that is usually readily available for purposes of artificial ripening of fruits is the impure form. Impurities of calcium phosphide and calcium arsenide have been discovered in industrial grade calcium carbide (Nowshad et al., 2018). Phosphine is liberated when calcium phosphide reacts with water and arsine liberated when calcium arsenide reacts with water. These hydrides are fat soluble and can dissolve through the wax surface of fruits and diffuse from the peel to pulp of fruits exposed to them (Haturusihghe et al., 2004). These impurities are the ones which largely contribute to making carbide ripened fruits having adverse health effects to humans.
Workers who are directly involved in applying CaC2 to the fruits bear the highest risk burden of the negative health effects associated with it. They may suffer from conditions such as vomiting and diarrhea, fluid buildup in the lungs, peptic ulcers and colonic cancer caused by exposure to high levels of arsenic and phosphorus. Further, direct exposure to acetylene gas is known to affect the neurological system as it reduces the brain’s oxygen supply causing dizziness, seizures, memory loss and cerebral edema (Fattah et al., 2010). Dhembare. (2013) reports that the health risks resulting from consumption of CaC2 ripened fruits may even be passed down genetically, if consumed by pregnant women, resulting to children born with abnormalities.
1.1.2 Challenges Associated with Conventional Methods for Assessing Artificial Ripeners
There are various analyses methods, devices and procedures that are conventionally used for assessing artificial ripeners in fruits that are premised on chemical analysis. These include HPLC-MS, GC and ELISA. These processes are time-consuming, inconvenient in terms of sample preparation and not environmentally friendly (Liu et al., 2011). The chromatographic methods have indeed been successful in such kinds of tests but the destructive nature and analytical cost has hindered their widespread and regular use.
Presently, more emphasis and research is geared towards the development of non- destructive techniques which are rapid. Consequently, vibrational spectroscopic techniques such as near-infrared (NIR) spectroscopy, mid-infrared (MIR) spectroscopy and LRS have shown great potential in the fruit industry to check for ARF (Kangas et al., 2007 ). Nonetheless, the applicability of these spectroscopic techniques in studies involving fruits faces challenges arising from the complex nature of fruits such as sample inhomogeneity. In this case, it becomes difficult to resolve spectral intensity profiles of inhomogeneous fruit samples. It is for these reasons that spectroscopic techniques are coupled with ML techniques to aid in overcoming such challenges. Considering LRS, the technique facilitates quick analysis as the time taken for each cycle of measurement is less than one minute. However, this advantage is eroded in practical applications owing to the low reliability of data processing and hence, the need to validate the same measurements by multiple techniques. Nevertheless, with a high degree of accuracy and reliability, ML assisted LRS has the capacity to solve a wide range of complex issues such as the one in this research.
1.2 Statement of the Problem
The challenge with conventional techniques for assessment of artificial ripeners in fruits is that they are costly, time-consuming, require specialized sample preparation and more often involve destruction of the test sample. The laser Raman spectroscopy (LRS) technique overcomes most of these challenges. However, analysis of trace analyte concentrations in complex matrices such as fruit samples can be challenging. This is because, the traditional data analysis approach assigns known individual peaks to specific vibrational groups but the composition of the entire sample affects each individual peak due to matrix effects, thus this approach cannot adequately represent the resulting peak intensity shifts. In addition, the fluorescence effect tends to be more intense than the Raman effect. This implies that the laser Raman spectroscopy technique may not be sufficient independently. Multivariate machine learning techniques offer alternatives to this traditional approach by using data from the entire wave range collected to solve issues connected with univariate Raman data analysis. Therefore, the machine learning assisted laser Raman spectroscopy approach has potential for rapid, non-destructive and cost- effective assessment of artificial ripeners in fruits.
1.3 Research Objectives
1.3.1 Main Objective
The primary goal of this research was to develop a machine learning-assisted laser Raman spectroscopy technique for the direct and rapid assessment of calcium carbide ripened bananas.
1.3.2 Specific Objectives
i. To design and optimize a protocol for the assessment of calcium carbide ripened bananas for rapid laser Raman spectroscopic measurements.
ii. To pre-process the LRS measurements obtained from specific objective (i) for spectral noise reduction and perform exploratory analysis of the pre-processed data using PCA to assign molecular vibrations and for dimensionality reduction.
iii. To develop calibration models for quantitative analysis of the data obtained from specific objective (ii) above using selected machine learning techniques, namely ANN, SVM/R and RF.
iv. To test the applicability of machine learning-assisted laser Raman spectroscopy technique in assessing the presence of calcium carbide in market samples.
1.4 Justification and Significance
Fruits are a popular source of food as they are a vital source of nutrients for the well-being of humans. However, continued consumption of artificially ripened fruits has raised growing concern due the associated health risks. For instance, consumption of CaC2 ripened fruits has been reported to cause peptic ulcers and colon cancer. To address this health concern, a rapid method for assessing ARF is needed. Hardly is there a rapid and non-invasive method for assessing ARF as the standard wet laboratory techniques are destructive and time-consuming and therefore inappropriate for such practical applications.
Applied vibrational spectroscopy techniques such as LRS are non-invasive and rapid and have potential to be applied in assessment of ARF. However, when this technique is applied in studies involving fruits, it suffers from the influence of broad fluorescence baseline that obscures the requisite Raman signal. It also becomes difficult to assign Raman bands to different chemicals of interest when the spectra is recorded in the background of interfering molecules. Further, the inhomogeneity of fruit samples results to Raman spectra with variations in intensity making quantification studies difficult. Therefore, the practical application of LRS in studies such as in the current work increases significantly when combined with appropriate data pre-processing techniques as well as ML techniques such as ANN, RF and SVM. The findings in this study highlight optimal conditions for recording Raman spectra of fruits ripened artificially or naturally. In addition, the use of appropriate data mining techniques and optimally tuned ML parameters for the fast and reliable analysis of LRS data are outlined.
Login To Comment