Open access peer-reviewed chapter

Wheat Crops Monitor: A More Reliable Rust Disease Detector for Wheat Farming

Written By

Mohammed El Idrissi, Redmond R. Shamshiri and Ibrahim A. Hameed

Submitted: 19 March 2024 Reviewed: 13 June 2024 Published: 04 September 2024

DOI: 10.5772/intechopen.115211

From the Edited Volume

Precision Agriculture - Emerging Technologies

Edited by Redmond R. Shamshiri, Sanaz Shafian and Ibrahim A. Hameed

Chapter metrics overview

12 Chapter Downloads

View Full Metrics

Abstract

Crop health monitoring, as an intelligent farming activity, has become increasingly difficult and tedious for farmers due to the diversity in plant pathology. The performance of frequentist convolutional neural network (CNN) models is no longer sufficient to detect pertinent features in images and detect diseases early and efficiently. On the other hand, uncertainty in convolutional network inference is a big concern that can introduce more inexactitude in the predicted classes. In this context, we propose an intelligent farming application that aims to provide farmers with assistance in wheat crop health monitoring. The Bayesian inference with local reparameterization trick has been used to improve the sampling process during the learning phase. Thus, the uncertainty in the model and the output have been modeled to give an idea of the room for improvement. The classification skill of the proposed Bayesian uncertainty-based monitor can distinguish between wheat crops with no diseases and those infected with leaf and stem rust based on leaf and stem super-resolution image processing. The achieved accuracy is 96%, with a big resistance against overfitting/underfitting issues, and more reliability is obtained through the tolerance of the classification concept. The model is also optimized for real-time inference and adapted for resource-constrained devices.

Keywords

  • smart farming
  • rust disease detection
  • uncertainty
  • Bayesian inference
  • deep convolutional neural network

1. Introduction

Recently, bio-aggressors have taken diverse forms and caused significant damage to cereal production. For instance, 20% of the productivity of some sensitive wheat varieties is lost due to oidium disease, and 50% of productivity loss is caused by septoria leaf blotch disease, without forgetting pathogens that infect wheat plant’s leaves and stems with different kinds of rust diseases (crown, brown, black, and yellow). Moreover, the morphological identification of diseases is unreliable, and the costs generated by human monitoring and technologies are increased. Hence, there is a necessity to have more affordable and sophisticated detection methods based on modern scientific models to accurately and early identify infections in the leaves and stems of wheat plants. Computer vision techniques in general and deep convolutional neural networks (DCNN) in a particular way are considered attractive variants that gain more attention in image processing applications and are widely used in many sophisticated applications such as face recognition [1], analyzing documents [2], understanding climate [3]. Consisting of layers and neurons interconnected with each other and signals called weights that contribute to neurons’ activation, the DCNN learns from a dataset composed of linear and non-linear data samples. Noisy data, and random fluctuations in the learned data points generally cause this non-linearity. Consequently, the DCNN is susceptible to falling into overfitting or underfitting problems, which results in the model being significantly overconfident in its future decisions. Many techniques are proposed to limit the effect of overfitting-underfitting issues and introduce more generalization such as resampling or k-fold cross-validation technique, using the validation sub-dataset or regularization techniques such as weight decay, early stopping, L1-L2 regularization, however dropout function remains a widely and effective technique that is recently used to regularize models [4, 5].

1.1 Problem statement

Traditional neural networks perform well in most prediction and classification tasks, but they need stricter conditions to achieve these performances, such as large amounts of data and the requirement of regularization techniques. Despite that, overconfidence in decisions persists. Estimating the uncertainty in the classification tasks of the deep CNN decisions using Bayesian inference seems to be a good technique to get more confidence and trust in the decisions since it allows quantifying the degree of uncertainty in the classification tasks and introduces more regularization to the model. When applied in the smart farming domain, this feature will help farmers monitor their crops with a high level of precision. To address the problem of uncertainty in DCNNs and introduce more regularization into the network’s parameters, the Bayesian approach with sampling process enhancement is used.

1.2 Contributions

This work capitalizes on the results of [5, 6, 7] to achieve the main following contributions:

  • Introducing the Bayesian approach as a statistical and effective method of handling uncertainty.

  • Investigating how random and stochastic uncertainties can be estimated.

  • Applying these techniques in the smart farming field specifically to monitor the health of wheat crops with high precision throughout the tolerance of classification concept. This monitoring application is called Wheat Crops Monitor (WCM).

The rest of this paper is organized as follows: the sources of uncertainty are discussed in Section 2. The uncertainty quantification in DCNN using the Bayesian framework is developed in Section 3. The WCM model is presented in Section 4. The experiments and obtained results are discussed in Section 5. The summary and conclusions are given in Section 6.

Advertisement

2. Background

2.1 Uncertainty and its sources

Understanding the statistical and non-statistical uncertainty in deep artificial neural networks (ANN) is essential for many real-world applications. In statistical uncertainty, the probability distribution density follows some models that have already been proposed [8] or can be approximated in the case of intractability. While in non-statistical uncertainty, the behavior of the input data may be imprecise, which leads to an unknown probability distribution. In this case, fuzzy logic can be used to estimate the uncertainty in the prediction task [9].

The Bayesian framework offers a principle-based approach to achieve uncertainty quantification, but its application in deep ANNs is challenging due to a large number of parameters and data. Neural networks have become a very attractive discipline in so-called high-risk areas such as medical image analysis and autonomous vehicles, yet their implementation in such missions is still limited. The limitations are primarily imposed by uncertainty already included in the data (data uncertainty) or a lack of good knowledge of the deep ANN model (model uncertainty). To overcome these limitations, it is essential to estimate the uncertainty, and the uncertain predictions can then be ignored or passed on to human experts. Providing uncertainty estimates is not only important for reliable decision-making in high-risk areas but also crucial in areas where data sources are highly inconsistent and labeled data are scarce. Uncertainty estimates are also very important for areas where uncertainty is a crucial factor in the learning technique, such as active learning or reinforcement learning. In practice, these are the five factors we believe to be a vital source of uncertainty in the prediction of ANN (Table 1) [10].

Type of uncertaintySource of uncertaintySignification
Stochastic uncertainty1Errors in the specifications of the ANN architectureThe structure of an ANN has a direct effect on its performance and also on the uncertainty of its prediction. For example, the number of parameters affects the memorizing ability, which can lead to overfitting or underfitting of training data. Regarding the uncertainty in ANNs, it is known that deeper networks tend to be overly confident in their Softmax output, which means that they predict too much probability on the class with the highest probability score.
Errors in the ANN learning processThe ANN learning process consists of several parameters that need to be defined (batch size, optimization function, learning rate, stopping criteria, regularization, etc.) as well as stochastic decisions in the learning process (data batch generation and weight initialization). All these decisions affect the local optimum and it is therefore very unlikely that two training processes provide the same model setting. Training data that suffer from an imbalance or low coverage of individual regions in the distribution of data also introduce uncertainties in the learned parameters. This could be mitigated by applying a data augmentation strategy to improve variety or by balancing the impact of classes or individual regions on the cost function.
Random uncertainty2Errors caused by unknown dataParticularly in classification tasks, an ANN learned from samples from space E1 may also be able to process samples from a completely different space E2. This is the case when a network that has trained using images of bikes and cars receives a sample showing a tree. Here, the source of the uncertainty is not linked to the process of data acquisition, since it is assumed that space will contain only samples deemed to be true for a prediction task. Even if the practical result may be impacted by sensor noise or by the complete failure of these sensors, the inputs considered here represent a valid sample, but for a different task or domain.
Variability in real-world applicationsMost real-world environments are highly variable and almost constantly affected by changes. Changes in the environment can also affect the expression of objects (ex. plants after rain appear very different from plants after a drought). When the real-world conditions change regarding the training data, this is known as a distribution change. ANNs are sensitive to changes in distribution that can significantly impact their performance.
Errors in the measurement systemThe measurements themselves can be a source of uncertainty in the ANN prediction. This can be caused by insufficiency in measured information such as image resolution, or by having a false measure or insufficient information modalities. In addition, it can be caused by a noise like a sensor noise, movement, or mechanical stress, leading to inaccurate measurements. Moreover, false labeling is also a source of uncertainty that can be considered as an error and noise in the measurement system. It is referenced as a labeling noise and affects the model by reducing confidence in accurate class prediction during the training phase.

Table 1.

A summary of vital uncertainty sources.

Uncertainty in the intrinsic parameters of the model.


Directly impacts the data collection process and introduces what is called error in observation. This type of error captures the so-called “residual or statistic” noise introduced by the physical sensors into the manipulated data, or motion noise in the case of vision systems. This type of uncertainty is divided into two families: Heteroscedastic random uncertainty (or non-equivalent variance) which assumes that for each observation (x, y) the distribution of noise is different in the data samples. On the other hand, the random Homoscedastic uncertainty (or equivalent variance) assumes that the noise is the same for each data point (x, y).


2.2 Comparison with existing works

Many recent contributions have addressed methods and techniques to control plant diseases. For instance, Barbedo [11] identified key challenges in plant disease identification using digital image processing and proposed solutions to address them. Ferentinos [12] used CNN to develop a plant diseases classifier and reported a training accuracy of 99.53%. Similarly, Lu et al. [13] proposed a CNN-based method for rice disease identification, reporting a training accuracy of 95.48%. Umamageswari et al. [14] developed a PNAS (Progressive Neural Architecture Search) model to predict plant leaf disease with a training accuracy of 97.43%. Xie et al. [15] proposed an enhanced CNN to detect grape leaf diseases, achieving a training accuracy of 81.1%. Zhang et al. [16] suggested the GPDCNN model (global pooling dilated CNN) to identify plant cucumber leaf diseases, reporting an average recognition rate of 90% compared to the other tested methods. Despite the high training accuracy reported by these previous works, none of them addressed the uncertainty in the decision-making process. or declared how to deal with missing data, or how to treat the problems related to the structure of their models. The potential of this work is to detect leaf and stem rust diseases in their early stages with high reliability, making the model more robust against overfitting problems thanks to the regularization effect introduced on the model’s parameters, and identify the room for improvements (in the data quality or the model structure).

Advertisement

3. Deep neural networks with a probability distribution over parameters

Statistically speaking, it is recommended to consider probability distribution over the model’s weights rather than assigning them specific values. This gives rise to two approaches, the classical method, where each model parameter is assigned a fixed value, resulting in a direct prediction (Figure 1(a)). The new Bayesian-based method (Figure 1(b)) computes the posterior probability distribution wqθwD over the parameters and updates them by backpropagation. This update process uses the Bayes-by-backdrop framework [5, 6, 17], which is a variational inference method that allows the model to learn from the dataset the posterior probability distribution over weights and translate the estimated uncertainty into predictions. This method provides increased robustness against overfitting, a common issue in classical neural network approaches, and significantly improves model regularization. Besides this, it is very useful for applications where enough data for model training and validation are not available or available data present a lot of noise and uncertainty.

Figure 1.

Frequentist method; (a) versus Bayesian approach; and (b) in parameters tuning (the numbers in (a) are indicative).

Various methods have been proposed to approximate the true posterior probability distribution in Bayesian learning. For instance, Monte-Carlo Dropout and hybrid Monte-Carlo Dropout have been utilized by Gal and Ghahramani [18]. The posterior distribution is approximated using Laplace’s method [19] with a Gaussian centered at the maximum a posteriori (MAP) estimate. However, the variational inference remains the most attractive approach for approximating posterior distribution. Some attempts [20, 21] in this context have been applied using Dropout and Gaussian Dropout, respectively, in this context.

In probabilistic machine learning, the Bayesian rule is used to compute the posterior probability distribution of the model’s parameters with respect to (w.r.t) θ given our dataset pDθx,y=pDyx,θpθpyx, the denominator is a normalizing constant and does not depend on θ; it is often ignored, so the posterior becomes proportional to the likelihood times the prior pDθx,y=pDyx,θpθ. Once the posterior is computed, we are now interested in making new inferences of new outputs y given new inputs, data samples x using the complete Bayesian analysis described in Eq. (1) and known as the predictive distribution:

pDyx,x,y=pDyx,θpDθx,yE1

It was agreed that this predictive distribution is intractable due to the integration sign and the large number of parameters the model has; thus, many approximation methods have been investigated, as mentioned earlier. Bayes-by-backdrop seems to be an effective way to approximate the true posterior probability distribution with a new probability distribution and make the approximate distribution as close as possible to the true posterior. This approximation is found by minimizing the Kullback-Leibler distance between the distributions q and p w.r.t θ (Eq. (2)):

KLqθwDpwθ=qθwDlog2qθwDpwdwE2

Eq. (2) forms another optimization problem that must be minimized w.r.t θ, with the resulting cost function known as variational free energy [22] and given by Eq. (3):

θopt=argminKLqθwDpwDθ
argminKLqθwDpwθEqwDlogpDw+logpDE3

The variational free energy is built upon three terms: the first term is called complexity cost, the second term is known as likelihood cost, and the last term is a constant that is ignored during the learning process [23]. This cost function is intractable, and it is unable to be computed exactly. A stochastic variational method [17] is adopted to arrive finally at this tractable form of the objective function given by Eq. (4) [5, 24]. This equation should be optimized during the Bayesian learning process.

FDθi=1NlogqθωiDlogpωilogpDωiE4

Where n represents the number of samples ωi drawn from the approximated variational distribution. The first term of this equation is known as the variational posterior and is taken as the log of the Gaussian distribution with mean μ and variance σ2, so logqθwDlogNwμ,σ2. The second term is the prior and is taken as the log of many elementary Gaussians, so logpwlogNw0,σ2. The last term is a vector of likelihood probabilities that is computed using the Softmax function. The final statistical and stochastic uncertainties are expressed by Eq. (5) [25, 26]:

U=1Nn=1Ndiagp̂np̂np̂nN+1Nn=1Np̂np¯p̂np¯NE5

Where p̂n is a vector of probabilities computed using the Softmax function and p¯ is the mean value of p̂n. The sampling process is improved using the local reparameterization trick (LRT) [5], which is a commonly used method in statistical problem rewriting [7]. This technique directly samples parameters and consumes more time and resources. Instead, we sample from the approximate variational distribution to win in terms of time and resources required for processing.

Advertisement

4. Application in smart farming through the wheat crops monitor (WCM)

Before describing the WCM, we remember an important point of Jayesh Bapu Ahire [27] says that the common architecture of classical deep convolutional neural network stacks a series of Convolution-Activation (Conv-Act) layers, followed by pooling layers. This series is repeated till having an image with a minimum of features, then sends them to the fully connected (FC) layers; the last FC layer holds predicted outputs. Famous deep CNN models are based on the following form:

InputConvActMPoolingNFCActLFC

Where M, N, and L are the times each bloc of layers is repeated in the deep CNN model. Smart wheat crops health monitor schema also follows this architecture, with the particularity of considering weights, biases, and kernels as random variables and computing the probability distribution over them instead of working with single-point estimate values. The following Figure 2 represents the structure of the model:

Figure 2.

Typical DCNN structure showing the distribution (mean, variance) of the weights instead of single values.

In this schema, we work only with probability distributions. Instead of computing a direct and single convolution operation between a small portion of the image and the kernel, we perform two convolution operations. In the first convolution, we calculate the mean μ of the variational posterior probability distribution considered for each parameter, and this means that we compute the maximum a posteriori probability (MAP) around the parameter [5]. In the second convolution operation, the variance σ=α.μ2 is learned, this quantity consists of the pre-calculated mean and the variable α that is also learned by the convolution operation of the weights [28], and the Bayes by backpropagation algorithm is used to compute the posterior probability distributions. The local reparameterization trick assumes that for a fixed input and a Gaussian distribution over fully connected layers’ weights, the obtained distribution of activations (the sampling is directly performed over activation distributions instead of calculated individually for each weight) is also a Gaussian distribution [7]. The success of this structure requires that the weights in the fully connected layers need to have probability distributions as well.

To be more precise, we decided to introduce a new concept called tolerance of classification, which aims to express how confident the true predicted labels are. The classification decision is expressed relative to a certain level of uncertainty. So, the classification result has always this form:

Class=predictedlabel±UncertaintyE6

In such a way, we can have an idea about how far or near our classification is from the most precise decision. The closer the estimated uncertainty can be to zero, the better the decision is.

Advertisement

5. Experiment and discussion

5.1 Parameters initialization

As mentioned earlier, we should approximate the true posterior probability distribution. For that, we consider the Gaussian process as an approximate variational probability distribution, and we use it as prior over the weights to compute the mean and the variance of each parameter of the model. Then, the gradient descent is computed over the random variables θ and over each weight individually. The Bayes-by-backprop is used to sample the weights and update the approximated variational posterior by backpropagation, and the LRT is applied to improve the sampling process. Adam optimizer is used in all phases of the experiment, and the Softplus function is used as an activation function since it represents the smooth approximation of the ReLu activation function. The rest of the Bayesian initial parameters are summarized in Table 2 in Section 5.3.

ParameterValueRange of possible values
Number of epochsn_epochs = 300n_epochs ∈ R
Activation functionSoftplus{Softplus, Relu}
Model layer typeLocal Reparametrization Trick{Bayes by Backprop (BBB), Local Reparametrization Trick (BBB_LRT}
Initial configuration of the priorsInitial μ = 0.01μ should be small
Initial σ = 0.2σ should be small
Initial configuration of the posteriorInitial μ (mean, standard deviation) = (0.01, 0.01).The couple (mean, standard deviation) should have small values
Initial ρ (mean, standard deviation) = (−3, 0.01)The couple (mean, standard deviation) should have small values
Initial learning rate10E-3lr_initial should be small
Number of workers5num_workers ∈ R
Batch size32batch_size ∈ R
Train ensemble1train_ens ∈ R
Validation ensemble1valid_ens ∈ R
Beta type0.01Beta_type can be float or ∈ {Blundell, Soenderby, Standard}
DatasetSize = 11,420, Train = 57%, Validation = 38%, Test = 5%

Table 2.

A summary of initial hyperparameters used in training and testing of the WCM model.

5.2 Model compression

The approach used in this work goes back to estimating two parameters μσ, which means that the overall number of parameters in the model will be doubled; consequently, the model will become heavier in processing. Model pruning is a concept that was proposed to introduce more optimization into the model by reducing the number of parameters and thus decreasing the complexity and resources required for processing. Here, we used the L1 normalization technique, and this method consisted of two actions [5]:

  • In the first action, we halved the number of filters supposed to be used in the convolution layers without impacting the model’s overall performance.

  • The second point consists of defining a threshold and only keeping the weights above this threshold. The other weights under the threshold are eliminated. The defined threshold and the technique of halving the number of filters are still points that need to be empirically argued. However, experimentally, it allows model pruning without impacting the overall performance of the model.

5.3 The material used in this work

5.3.1 Hardware and software

The WCM algorithm is coded in the Python programming language using the Torch library. Torch is a high-level framework entirely written in Python. It can highly interact with neural network and machine learning libraries and provides two high-level components: tensor calculation and deep neural network manipulation, as well as it is compatible with CPU and GPU (Graphics Processing Unit) processor architectures. WCM was trained on a UCS (Unified Computing System) cluster. Each node of the cluster has an Intel(R) Xeon(R) CPU E5–2630 v4 @ 2.20GHz and 256 gigabytes of RAM, and this configuration is suitable to finish each training epoch in less than 5 minutes, instead of 15 minutes for usual computers.

5.3.2 Dataset

The model provided by [5] is tested in small and large datasets: MNIST, CIFAR10, and CIFAR100 using models: AlexNet, LeNet, and 3Conv3FC. Since we work with a noMNIST dataset, many hyperparameters are reconfigured notably activation function type, layer type, and the number of batches for which uncertainty needs to be estimated. The model is trained and tested using the CGIAR Computer Vision for Crop Disease [29] dataset. The images were collected from different sources, and a great part was gathered by CIMMYT and CIMMYT partners in Ethiopia and Tanzania. The remainder of the data is sourced from public images found on Google Images. The training, validation, and testing sub-datasets consecutively represent 57%, 38%, and 5% of the overall size of the dataset.

5.4 Evaluation and validation of the model

As mentioned previously, the implementation provided by [5, 30] was tested using MNIST and CIFAR datasets and using AlexNet, LeNet, and 3Conv3FC models. To build our model, we inherited the ModuleWrapper module, and we used the BBBLinear and BBBConv2d functions in all the BayesianCNN layers. The number of layers in the customized model is increased as well from three convolutions, three fully connected to four convolutions, and four fully connected. The super-resolution technique [31] with the Bayesian approach [5] is used to recover high resolution from low-resolution images to extract more pertinent features in the images. The configurations of the initial Bayesian network’s hyperparameters are summarized in Table 2.

At the end of the training process, we get the accuracy and the loss function curves of the training and validation processes, as shown in Figures 3 and 4. We observe from the accuracy function curve (Figure 3) that the accuracy value of the training process begins from 77.5% and continues increasing till the third epoch, where it goes back to decreasing. From this step, we note many fluctuations between 85% and 88.75%. The maximal accuracy value is recorded in the 170th epoch; beyond this phase, accuracy begins decreasing, and this is a sign of model convergence. Regarding the cost function (Figure 4), the training loss value starts from 34,464,858 and continues to decrease exponentially; the minimum value is observed in the 200th epoch, where it achieves 1,016,792. The curves seem to be flattened from the 170th epoch, and this is the exact point that coincides with the convergence of the model.

Figure 3.

Training and validation accuracy functions.

Figure 4.

Training and validation loss functions.

5.4.1 Metrics for evaluation

Table 3 presents the metrics used for the evaluation of our model. Since the WCM detects the leaf and stem rust diseases in the wheat plants and provides more reliability of detection by estimating the uncertainty, the precision, sensitivity, and F1-measure values obtained in the test phase consecutively are 96%, 92.35%, and 93.84%. Each metric is obtained by calculating the average of all classes from the confusion matrix shown in Figure 5. The obtained values are good and reflect the effect of the uncertainty quantification on the classification task using deep CNN. The sensitivity is significantly good because the model has shown good skills in identifying the infected samples. Some difficulties can be encountered in differentiating between leaf rust and stem rust, but generally, the performance of detecting diseases is high. Figure 6 shows the receiver operating characteristic (ROC) of the proposed model. The global area under the curve-ROC (AUC-ROC) value of all classes is 96%, which proves the high separability performance. The individual AUC values of healthy wheat, leaf rust, and stem rust classes successively are 98.77%, 93.58%, and 95.36%.

MetricDesignationMathematical formulaVariables
Precision (P)The ratio of infected wheat plants correctly detected among all wheat plants detected as infected.P=TITI+FI (8)
  • TI: represents the number of true infected wheat plants classified as infected.

  • FI: denotes the incorrectly predicted samples as infected.

  • FH: represents the incorrectly predicted samples as healthy.

Sensitivity (S)The ratio of infected wheat plants correctly detected among all true infected wheat plants.S=TITI+FH (9)
F1-Measure (F1)The balanced mean between precision and sensitivity.F1=2P.SP+S (10)

Table 3.

Metrics used for model evaluation.

Figure 5.

Confusion Matrix of WCM with uncertainty quantification.

Figure 6.

Receiver operating characteristic of WCM with uncertainty estimates.

To test the performances of the uncertainty estimation of the proposed model, we have prepared Table 4 of 8 new samples that the model has never seen before. These new samples are streamed from the test repository, six of which are healthy wheat plants, and two of which are infected with leaf and stem rust (red rust) diseases. We first prepared the new samples by making the required transforms including the high-resolution recovery, and then injected them into the input of the pre-trained model. Uncertainties and probabilities are handled using a temporary text file and compared to the inference made by a classical frequentist model prepared for this purpose. The results shown in Table 4 prove the great classification skills (test accuracy in T6 could achieve 95.74% in variational inference) with a big precision provided by the total estimated uncertainty (statistical accumulated with stochastic in the same test T6 is estimated to be only 19%).

Table 4.

Test of the prediction and uncertainty estimation function compared to the classical frequentist inference.

5.5 Discussion

In our adopted approach, we assumed that the weights of the DCNN have a probability distribution instead of fixed values. Due to the great difficulty of applying the Bayesian framework to infer the Bayesian distribution on weights, it is often necessary to use the Bayes-by-backprop algorithm. The use of this technique to extract the variational approximation of the true posterior distribution is not only a method to make our model much more accurate in quantifying global uncertainty, but it also leads to the regularization of the model without having to use the dropout function. The utilization of this method to measure the uncertainty caused by the model or by data helps to improve the reliability of the obtained results, which is why the output of each classification decision is expressed using Eq. (6). Table 4 summarizes an example of tests coded Ti that we performed using images collected from the test field, and others downloaded from the internet. To better explain the results, let us take the test code T6. The output, according to Eq. (6), is 95.74% ± 19.35% healthy against 92.45% for the frequentist inference. This decision is composed of two terms: the accuracy of classification and uncertainty about this accuracy. Thus, the image is seen as healthy, with an accuracy of 95.74% and a doubt of 19.35%. The details of the estimated uncertainty are given by the statistical and stochastic values. Another significant example is test T5, which is a wrong classification decision. In this scenario. The model classified the image as infected with leaf rust with a probability of 37.27% and a doubt level equal to 20.14% compared to 66.01% healthy for frequentist inference. However, the truth is that this is a healthy wheat plant that suffers from vegetation issues due to the drought, causing the distribution of plants to be non-uniform and appear infected. This is one of the factors that can lead to false positives. In reality, these false positives highlight issues other than diseases. These false positives highlight issues other than diseases, such as drought or nutrient deficiencies, which require the attention of the farmer or the field manager.

Advertisement

6. Summary conclusions

In this study, we introduce a novel promising wheat crops monitoring system called WCM, employing the Bayesian framework built upon a customized deep CNN algorithm. The technique utilized in this research, referred to as Bayes by backpropagation and Gaussian distribution, embodies Bayesian characteristics. Initially, samples undergo transformation to better suit the model’s requirements, generating high-quality images from low-resolution ones. Subsequently, super-resolution images are integrated into the model’s input for training and validation. The achieved validation accuracy is 88.75%, whereas the test accuracy could achieve up to 95% with an uncertainty margin precisely estimated. The reliability of the system is proved by its ability to provide the predicted class and the level of doubt in the classification task, as well as to direct us toward the improvement of the performance through uncertainty estimation. The real implementation of this monitor can be done using a drone to visually scan the field, then send the captured real-time images to the WCM to draw the wheat global health map and send results back to the farmer, who can make early interventions to make pesticides in early stages and in reasonable quantities to save the wheat crops.

References

  1. 1. Li HC, Deng ZY, Chiang HH. Lightweight and resource-constrained learning network for face recognition with performance optimization. Sensors. 2020;20:6114. DOI: 10.3390/s20216114
  2. 2. Borges Oliveira DA, Viana MP. Fast CNN-based document layout analysis. Proceedings IEEE International Conference on Computer Vision Workshops, Venice, Italy. 2017;1173-1180. DOI: 10.1109/ICCVW.2017.142
  3. 3. Chattopadhyay A, Hassanzadeh P, Pasha S. Predicting clustered weather patterns: A test case for applications of convolutional neural networks to spatio-temporal climate data. Scientific Reports. 2020;10(3):1317. DOI: 10.1038/s41598-020-57897-9
  4. 4. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. 2012
  5. 5. Kumar S, Felix L, Marcus L. A Comprehensive Guide to Bayesian Convolutional Neural Network with Variational Inference. arXiv preprint. arXiv:1901.02731. 2019
  6. 6. Blundell C, Cornebise J, Koray Kavukcuoglu K, and Wierstra D. Weight Uncertainty in Neural Networks. arXiv preprint arXiv:1505.05424. 2015
  7. 7. Kingma DP, Salimans T, Welling T. Variational Dropout and the Local Reparameterization Trick. arXiv:1506.02557. 2015
  8. 8. Ronald HD. Measurement uncertainty models. ISA Transactions. 1997;36:29-35. DOI: 10.1016/S0019-0578(97)00004-9
  9. 9. Xia X, Wang Z, Yongsheng G. Estimation of non-statistical uncertainty using fuzzy-set theory. Measurement Science and Technology. 2000;11:430-435. DOI: 10.1088/0957-0233/11/4/314
  10. 10. Gawlikowski J, Tassi CRN, Ali M, Lee J, Humt M, Feng J, Kruspe A, Triebel R, Jung P, Roscher R, Shahzad M, Yang W, Bamler R, Zhu XX. A Survey of Uncertainty in Deep Neural Networks. arXiv:2107.03342. 2021
  11. 11. Barbedo JGA. A review on the main challenges in automatic plant disease identification based on visible range images. Biosystems Engineering. 2016;144:52-60. DOI: 10.1016/j.biosystemseng.2016.01.017
  12. 12. Ferentinos KP. Deep Learning Models for Plant Disease Detection and Diagnosis. Computers and Electronics in Agriculture. 2018;145:311-318. DOI: 10.1016/j.compag.2018.01.009
  13. 13. Lu Y, Yi S, Zeng N, Liu Y, Zhang Y. Identification of rice diseases using deep convolutional neural networks. Neurocomputing. 2017;267:378-384. DOI: 10.1016/j.neucom.2017.06.023
  14. 14. Umamageswari A, Bharathiraja N, Shiny Irene D. A novel fuzzy C-means based chameleon swarm algorithm for segmentation and progressive neural architecture search for plant disease classification. ICT Express. 2021;9:160-167. DOI: 10.1016/j.icte.2021.08.019
  15. 15. Xiaoyue X, Yuan M, Bin L, Jinrong H, Shuqin L, Hongyan W. A deep-learning-based real-time detector for grape leaf diseases using improved convolutional neural networks. Frontiers in Plant Science. 2020;11:751. DOI: 10.3389/fpls.2020.00751
  16. 16. Shanwen Z, Subing Z, Chuanlei Z, Xianfeng W, Yun S. Cucumber leaf disease identification with global pooling dilated convolutional neural network. Computers and Electronics in Agriculture. 2019;162:422-430. DOI: 10.1016/j.compag.2019.03.012
  17. 17. Graves A. Practical variational inference for neural networks. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, editors. Curran Associates Inc: Advances in Neural Information Processing Systems; 2011. pp. 2348-2356
  18. 18. Gal Y, Ghahramani Z. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. arXiv preprint arXiv:1506.02158. 2015
  19. 19. Friston K, Ashburner J, Kiebel S, Nichols T, Penny W. Statistical Parametric Mapping: The Analysis of Functional Brain Images. Academic Press; 2006
  20. 20. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research. 2014;15:1929-1958
  21. 21. Wang S, Manning C. Fast dropout training. In: Dasgupta S, McAllester D, editors. Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia: PMLR; 2013. pp. 118-126
  22. 22. Shockley JM, Dillon CR, Stombaugh T. A whole farm analysis of the influence of auto-steer navigation on net returns, risk and production practices. Journal of Agricultural and Applied Economics. 2011;43:57-75. DOI: 10.1017/S1074070800004053
  23. 23. Shridhar K, Laumann F, Liwicki, M. Uncertainty Estimations by Softplus Normalization in Bayesian Convolutional Neural Networks with Variational Inference. arXiv:1806.05978. 2018
  24. 24. Gluon MXnet. Chapter18 Variational-Methods-and-Uncertainty. 2022. Available from: https://gluon.mxnet.io/chapter18_variational-methods-and-uncertainty/bayes-by-backprop.html [Accessed: 26 January 2023]
  25. 25. Kendall A, Gal Y. What uncertainties do we need in bayesian deep learning for computer vision? Advances in Neural Information Processing Systems. 2017:5574-5584
  26. 26. Kwon Y, Won JH, Kim BJ, Paik MC. Uncertainty quantification using bayesian neural networks in classification: Application to ischemic stroke lesion segmentation. Medical Imaging with Deep Learning; 2018
  27. 27. Ahire JB. Artificial Neural Networks: The Brain Behind AI. Available from: Lulu.com; 2018
  28. 28. Molchanov D, Ashukha A, Vetrov D. Variational Dropout Sparsifies Deep Neural Networks. arXiv preprint arXiv:1701.05369. 2017
  29. 29. CGIAR Computer Vision for Crop Disease, 2020. Available from: https://www.kaggle.com/shadabhussain/cgiar-computer-vision-for-crop-disease/ [Accessed: 20 May 2021]
  30. 30. GitHub – kumar-shridhar/PyTorch-BayesianCNN, 2020. Bayesian Convolutional Neural Network with Variational Inference based on Bayes by Backprop in PyTorch. Available from: https://github.com/kumar-shridhar/PTorch-BayesianCNN/ [Accessed: 25 May 2021]
  31. 31. Shi W, Caballero J, Huszar F, Totz J, Aitken AP, Bishop R, Rueckert D, Wang Z. Real-Time Single Image and Video Super-Resolution using an Efficient Sub-Pixel Convolutional Neural Network. CoRR, abs/1609.05158. 2016
  32. 32. Garg S. Stem Rust – Alchetron, The Free Social Encyclopedia. 2021. Available from: https://alchetron.com/Stem-rust#stem-rust-d0c8800c-4178-4ebe-a94e-ed4e3a6e8a3-resize-750.jpeg [Accessed: 14 September 2021]
  33. 33. Yara UK. How to Maintain Wheat Health with Nutrition|Yara UK. 2021. Available from: https://www.yara.co.uk/crop-nutrition/wheat/maintaining-wheat-health/ [Accessed: 20 September 2022]

Written By

Mohammed El Idrissi, Redmond R. Shamshiri and Ibrahim A. Hameed

Submitted: 19 March 2024 Reviewed: 13 June 2024 Published: 04 September 2024