Open access peer-reviewed chapter - ONLINE FIRST

Earth Observation Applications to Agricultural and Environmental Statistics

Written By

Luis Ambrosio, Luis Iglesias, Carmen Marin, Sophie Bontemps, Böris Nörgaard, Pierre Houdmont, Cosmin Cara, Cosmin Udroin, Laurentiu Nicola, Sadri Haouet and Zoltan Szantoi

Submitted: 12 December 2023 Reviewed: 09 January 2024 Published: 19 March 2024

DOI: 10.5772/intechopen.1004409

Revolutionizing Earth Observation IntechOpen
Revolutionizing Earth Observation New Technologies and Insights Edited by Rifaat M. Abdalla

From the Edited Volume

Revolutionizing Earth Observation - New Technologies and Insights [Working Title]

Rifaat M. Abdalla

Chapter metrics overview

35 Chapter Downloads

View Full Metrics

Abstract

Pixels classifiers do not allow assessment of uncertainty due to classification errors. This is a major limitation to the direct use of crop type maps (CTM) as agricultural or environmental statistics. Here, we include the results of our recent research to overcome this limitation, using statistical models that integrate ground and CTM data to evaluate any source of uncertainty, including classification errors. We use linear models for field data that can be treated as values of a continuous variable, such as crop acreage observed at parcel level, and multinomial logit models for field data that can be treated as values of a categorical variable, such as the type of crop observed at the pixel level. Among the tasks required to produce agricultural and environmental statistics, we focus on three for which CTM are especially useful: improving the effectiveness of currently used methods, increasing the granularity of statistics by producing small area estimations, and optimizing the sample design. A prototype for integrating ground and CTM data to achieve each of them is included, together with the results of its application to data provided by two National Statistical Offices.

Keywords

  • crop type maps and ground data integration
  • linear models
  • multinomial logit models
  • efficiency
  • spatial resolution

1. Introduction

“End hunger, achieve food security, and promote sustainable agriculture” is the second (the first is “end poverty”) Sustainable Goal in the United National 2030 Development Agenda. To implement policies and practices in pursuit of these goals, and to monitor and control such policies, detailed information about land uses, crops acreage, and crops yield is necessary.

International programs to improve agricultural and environmental statistics have been recently launched by the World Bank and the Food and Agriculture Organization (FAO) of the United Nations. Looking for synergies with these programs, and with the aim of facilitating the use of remote sensing (RS) information at the National Statistical Offices (NSOs) supporting agricultural and environmental statistics, the European Space Agency (ESA) launched the Sentinels for Agricultural Statistics (Sen4Stat) project.

In this chapter, along with well-established results in the RS literature (the statistical classifiers for crop type maps (CTM) generation), we include the main results of our recent research in the applications of CTM to agricultural and environmental statistics, carried out within the framework of Sen4Stat and partially published in the Statistical Journal of the IAOS [1]. To date, pixel classifiers do not allow the assessment of uncertainty due to classification errors. This is a major limitation to directly using CTM data as statistical estimates, and here, we focus on how to use statistical models to integrate CTM with ground data to produce efficient agricultural and environmental statistics, taking into account any source of uncertainty, including the classification errors.

In large areas, where the sample size for collecting ground data is large enough, we follow an approach based on designing a probabilistic scheme to select the sample and on statistical models to integrate the ground data with CTM data. If ground data can be treated as values of a continuous variable, such as crop acreage observed at the parcel level, we use linear models. If ground data can be treated as values of a categorical variable, such as the type of crop observed at the pixel level, we use multinomial logit models. This design-based approach is robust in the sense that the inference is based not on the model, but on the sampling design, and the estimates are design-consistent. In small areas, where the sample size is small, the design-based approach is not accurate enough for most applications, and the inference is based on the model.

We focus on three of the tasks necessary to produce agricultural and environmental statistics, for which CTM is especially useful: improving the efficiency of currently used methods, increasing the granularity of statistics, and optimizing sample design. A ground and CTM data integration prototype to conduct each of these tasks is included, along with examples of its application to data provided by National Statistical Offices.

Advertisement

2. Pixels classification error

Many RS applications involve the classification of the i=1,2,,N population pixels, using multispectral (L bands) reflectance data, xi=row1lLxli [2]. In agricultural and environmental statistics, the product of pixels’ classification most often used is a CTM, where a class is associated with each one of the type of crops or land uses, k=1,2,,K.

However, pixels classifiers are subject to error, and as a result, CTM are subject to uncertainty. In this section, we will show that pixel classifiers do not allow the assessment of this uncertainty. This is a major limitation to use directly CTM as agricultural or environmental statistics, because in addition to estimates of the study population characteristics, measures of the uncertainty of these estimates are required.

2.1 Classifiers

Let ik denote the event “pixel i belongs to class k.” Except for the few pixels included in the sample, there is uncertainty about the occurrence or not of this event, and to model this uncertainty, as well as the instrument measurement errors, we consider the joint probability function, fikxi, of the two events “pixel i belongs to class k and its spectral measure is xi.” This function is used to define the decision rule: “Classify i in the class k for which the probability fikxi is maximum: fikxi=maxk=1,2,,Kfikxi.” Three main approaches proposed in the literature to implement this rule are (i) maximum likelihood, (ii) maximum entropy, and (iii) multinomial regression.

The maximum likelihood (ML) approach [2] assumes that xi is distributed, within a given class k, according to a multivariate normal, xiikNLmkVk, of mean mk=Exiik, and covariance matrix Vk=Varxiik. Under this assumption, the decision rule fikxi=maxk=1,2,,Kfikxi is equivalent to classify i in the class k if gkxi=maxk=1,2,,Kgkxi, where gkxi=logfik12logdetVk12xiμkVk1xiμkTi=12Nk=12K (see Appendix 1). mk and Vk are unknown but can be estimated using a simple sample of nk pixels and the estimators m̂k=col1lLm̂lk=col1lL1nki=1nkxlki and V̂k=col1lLrow1lL1nk1i=1nkxlkim̂lkxlkim̂lk.

Two disadvantages of this approach are that normality is a very restrictive assumption for reflectance data and that it does not allow a direct computation of class probability, μik=Pikxi (given xi, μik is the probability that crop k covers pixel i).

The maximum entropy (ME) approach has the advantage that it does not require any assumptions about statistical distribution and directly provides estimates of the probability of each class. The entropy is an uncertainty measure: for the ith pixel, the quantity of uncertainty in μi=col1kKμik is Hμi=k=1Kμiklnμik. To estimate μ=col1iNcol1kKμik, we use crops data observed in a sample of n pixels, y=col1incol1kKyik where yik=1 if crop k covers pixel i and yik=0 otherwise, together with RS data X=col1inrow1lLxli. The maximum entropy principle is to choose the class probability values, μik, compatible with the observed data, y, that makes Hμ=μTlnμ=i=1nk=1Kμiklnμik maximum.

The condition of compatibility between μ and y is specified by the model y=μ+ε, where ε=col1incol1kKεik and εik is an unobserved random perturbation. The RS data are introduced by transforming this model into a set of consistency restrictions: IKXTy=IKXTμ+IKXTε, to which we impose the condition of uncorrelation between spectral measures and random perturbations, IKXTε=0, to reduce the consistency restrictions to IKXTy=IKXTμ.

The number nK of unknown class probabilities, μ=col1incol1kKμik, is higher than the number LK of consistency restrictions, and as a result, there is not a unique solution for μ. Among the countless solutions, we choose the one making Hμ maximum, subject to the consistency restrictions and the adding-up normalization conditions, diagn1KTμ=1n, where 1K is a K×1 vector of ones, and 1n is a n×1 vector of ones. This solution is found by solving the Lagrange function Ψ=μTlnμ+βTIKXTμIKXTy+λT1ndiagn1KTμ, where β=col1kKβk=col1kKcol1lLβkl and λ=col1inλi.

The multinomial (MNL) approach is similar to the maximum entropy. In fact, the entropy Hμ can be derived from the multinomial distribution (see Appendix 1). Both approaches directly provide coincident class probabilities estimates, but the former requires to specify a function relating these probabilities with the RS data, and we consider the logit function, μik=el=1Lβklxli1+k=1K1el=1Lβklxli, where the coefficients β=col1kKβk=col1kKcol1lLβkl coincide with the Lagrange parameters linked to the consistency restrictions in the ME approach. Consequently, the class probability estimates μ̂ik based on ME coincide with those based on MNL [3]: for any i=1,2,,N it is μ̂ik=el=1Lβ̂klxli1+k=1K1el=1Lβ̂klxli for k=1,2,,K1 and μ̂iK=11+k=1K1el=1Lβ̂klxli, where β̂=col1kKcol1lLβ̂kl is the β estimate.

Although MNL has the disadvantage over ME that it requires specifying a functional form to link μik and xi, it has the advantage over ML and ME that it allows to calculate the accuracy of the estimates by integrating ground and RS data (even in the case where the classifier used is ML or ME), taking into account the uncertainty due to classification errors.

2.2 Classification errors

The decision rule is subject to errors, and as a result, the classification is subject to uncertainty. Let ik denote the decision to “include pixel i in class k.” This decision is subject to two types of error: the type one, called false negative and denoted by ikik, consists of excluding from a given class k a pixel i that is actually of that class, and the error of type two, called false positive and denoted by ikik, consists of including a pixel i in a given class that is not of that class.

The computation of the classification error probabilities Pikik and Pikik is almost never possible, mainly due to the high dimension L of RS data [2]. In effect, Pikik=Pik1Pikik/1Pik and ikβ̂kβ̂k>0;kk so that Pik=Pβ̂kβ̂k>0;kk. In the MNL approach, the β̂kβ̂k distribution is asymptotically normal L-dimensional, and the computation of Pβ̂kβ̂k>0;kk is complex due to the high L values.

Instead of Pikik and Pikik, in practice, several ratios are calculated using the sample data (or, more often, by dividing the sample data into two groups, one for training and the other for testing the classifier), but this does not allow us to evaluate the uncertainty due to classification errors. For instance, if we denote by nik the number of sample pixels that actually are of class k, and by nikik those among nik that are correctly classify in k, then nikik/nik is an estimate of the so-called producer’s accuracy for class k. In the same way, if we denote by nik the number of sample pixels included in the class k, then nikik/nik is an estimate of the so-called user’s accuracy for class k. An estimate of the so-called overall accuracy is k=1Knikik/k=1Knik.

Accounting for the uncertainty due to classification errors is key, and that is why the data resulting from classification cannot be used directly as agricultural or environmental statistics. To take into account any uncertainties, including those due to classification errors, the CTM are integrated with the ground data, using statistical models in the way we show in the following sections.

Advertisement

3. Design-based approach: large area estimation

The approach to producing official statistics is based on a probabilistic scheme designed to select the sample from which to collect the required ground data. This scheme allows us to assign a probability of inclusion in the sample to each population unit. To estimate the characteristics of the study population, we use design-consistent estimators whose distribution is tightly concentrated around the true characteristics value, if the sample size is large enough. The uncertainty of these estimates is assessed using the statistical distribution defined by the inclusion probabilities.

The sample is selected using an area frame or a list frame [4, 5, 6]. An example of an area frame is that used by Spain: the national topographic map divided into sampling units, each of which is a square of 700 m on each side (49 hectares). The inclusion probabilities is equal to n/N, where n and N are the number of sampling units in the sample and in the population, respectively [6]. France and the United States also use area frames but using points as sampling units, rather than polygons. France to estimate crop acreage [7], and the United States to collect environmental data and develop a national inventory of natural resources [8, 9].

An example of list frame is that used by Senegal: it is based on the 2013 population census [10]. The sampling unit is a household with agricultural activities. The sampling scheme consists of two stages. In the first stage, a sample of Enumeration Areas (EA) is selected with replacement and probabilities proportional to the number of households in the EA. In the second stage, from each selected EA, a sample of households is selected without replacement and with equal probability among households with agricultural activities. To integrate the data collected on the household sample with the CTM data, a georeferenced pixel is selected in each parcel within the two-stage sample.

To assess uncertainties, and particularly those due to classification errors, we use statistical models. The choice of a model depends on ground data, and we consider linear models, for ground data collected at the parcel level, which can be treated as values of a continuous variable, and multinomial logit models for ground data collected at the pixel level, which can be treated as values of a categorical variable.

3.1 Ground data collected at the parcel level

Let the sampling unit, i, be a parcel or a cluster of parcels (say a polygon, as in Spain), and let i=1,2,,N be the set of sampling units of the population. We consider a linear model yi=xiβ+εi that relates ground data observed on the study crop in the ith sampling unit, yi, with the CTM data, xi, aggregated at the ith level. For a given crop, xi is a row vector with only two components; the first is the constant 1, x1i=1, and the second, x2i=xi, is the number of pixels classified in the CTM as belonging to the study crop in i: for convenience, we will denote xi in the general form xi=row1lLxli where L=2, x1i=1 and x2i=xi. The parameters vector β is unknown and must be estimated.

In sampling units not included in the sample, only xi is observed. Given xi, the expected value of the unknown yi is modeled by Eyi=xiβ, and the deviation of yi from xiβ is modeled by εi. This deviation is a source of uncertainty, mainly due to classification errors, and we will show how to assess it using the εi sampling variance.

The survey variable total in the population is yN=i=1Nyi. As a result of including the component x1i=1 in the model, yN can be written as yN=xNBN, where xN=i=1Nxi=NxN, and xN=i=1Nxi is the total number of pixels classified in the CTM as belonging to the study crop, and BN=XNTXN1XNTyN designates the vector of population regression coefficients BN=col1lLBl (with L=2), where XN=col1iNrow1lLxli (with L=2) and yN=col1iNyi [11]. Since xi is known for every population unit i=1,2,,N, then xN and XN are known. However, yN is unknown and so is BN.

To estimate BN, we use the data collected in the sample n, selected from the population with inclusion probabilities πii=12N. The design-based estimator B̂π=i=1nxiTxiπi1i=1nxiTyiπi is design-consistent for BN, and as a result, the synthetic or projective estimator ŷN=xNB̂π is design-consistent for yN [11].

The error of ŷN as an estimator of yN is ŷNyN=xNB̂πBN, and its sampling variance is VŷNyN=xNVB̂πBNxNT, where VB̂πBN=i=1NxiTxi1Vi=1nxiTεiNπii=1NxiTxi1 and εiN=yixiBN. The sampling variance is a measure of uncertainty that depends both on the deviation εiN between the true value of yi and its expected value based on xi and on the inclusion probabilities, πi. A design-consistent estimator of the sampling variance is V̂ŷNyN=Vi=1nε̂iNπi, where V. is the design-based variance and ε̂iN=yixiB̂π.

The asymptotic distribution of ŷN is normal, VŷNyN1/2ŷNyNN01, and can be used for constructing uncertainty measures in terms of probability for yN, such as confidence intervals.

To estimate the total of the survey variable, yNR=i=1NRyi, in large areas that are part of the population, such as a region or a province R with NR sampling units within a country, we use the sample of size n selected from the population with inclusion probabilities πii=12N and the estimator ŷNR=xNRB̂π+NRN̂Ri=1nRyixiB̂ππi, where nR is number of sampling units included in the sample n that fall in R, and N̂R=i=1nIiπi is an estimator of NR, where Ii=1 if i is in R and Ii=0 otherwise. A measure of the ŷNR uncertainty is its sampling variance VŷNRyNR=NR2V1N̂Ri=1nRεiNπi, and a design-based estimator of VŷNRyNR is V̂ŷNRyNR=NR2V1N̂Ri=1nRε̂iNπi.

3.1.1 Crop type map relative efficiency

The relative efficiency of CTM data is RECTM=VGVG+CTM1, where VG is the current sampling variance based on ground data alone, and VG+CTM is the sampling variance based on the integration of ground and CTM data. If RECTM>1, then the current sampling variance may be reduced by the amount VGVG+CTM=1RECTM1VG, without increasing the sample size. As a result, the ground sample size can be reduced to nG+CTM=RECTM1nG sampling units, without loss of accuracy. nG is the current sample size for ground data collection.

For πi=n/N, the sampling variance based on ground data alone is VG=Vi=1nyin/N=N21n/N1/nn1i=1nyiy¯2 and the sampling variance using both ground and CTM data is VG+CTM=Vi=1nyixiB̂πn/N=N21n/N1/nn2i=1nyixiB̂π2. As a result, RECTMi=1nyiy¯2i=1nyixiB̂π21 and nG+CTM=i=1nyixiB̂π2i=1nyiy¯21nG.

3.1.2 Example

Our data source is Sen4Stat. The goal of this project was focused on developing an open-source system for generating CTM from Sentinel images, using data provided by the NSOs (https://www.esa-sen4stat.org/). We use field data on crops collected at the parcel level by the NSO of Spain, to illustrate the integration of continuous ground data and CTM data using linear models.

3.1.2.1 Crop acreage estimates

We consider a large area of size 100 × 100 km in Castilla y León (Spain), with X coordinates (in meters): (300,000,400,000) and Y coordinates (4,600,000,4,700,000). The sampling unit is a square with a side of 700 m, and the sample size is 419 sampling units. The study crop is barley, and the ground data were observed at the parcel level. However, data were aggregated at the sampling unit level for calculations so that yi represents the ground data on the barley acreage in sampling unit i and in xi=1xi, xi is the number of pixels classified in CTM as belonging to barley in this same sampling unit i.

Model parameter estimates are B̂π=0.0460.093T, and barley acreage estimates based both on ground data alone and on the integration of ground and CTM data are found in Table 1. As a result of this integration, the accuracy of the estimates improves considerably: the sampling error (or coefficient of variation, which is the ratio between the square root of the sampling variance and the acreage estimate in %) and the wide of the confidence interval (within whose limits the true value of the barley acreage lies, with a high probability (95%) were reduced by half.

DataAcreage (hectare)UncertaintyRelative efficiency of CTM data
95% Confidence interval (hectare)Sampling error (CV%)
LimitsWide
Ground236165.4Lw:15951.740427.34.37
Up: 56379
Ground + CTM228550.1Lw:19699.817700.51.985.2
Up:37400.3

Table 1.

Barley acreage estimates in a 100 × 100 km area of Castilla y León, 2018.

The CTM relative efficiency, RECTM, was high: using CTM data, the current ground sample size could be reduced to less than one-fifth without loss of accuracy.

3.1.2.2 Crop yield estimates

Let yij be the yield of barley observed in parcel j of sampling unit i, and let xij=row1lLxijl be a 1×L row vector with L=2, xij1=1 in the first position, and a normalized difference vegetation index (NDVI) in the second position, . The estimates based both on ground data alone and on the integration of ground and NDVI data are found in Table 2.

DataYield (kg/hectare)UncertaintyRelative efficiency of RS data
95% Confidence interval (kg/hectare)Sampling error (CV%)
LimitsWide
Ground4213.617Lw: 4093.97239.2931.45
Up: 4333.263
Ground + RS4155.688Lw: 4033.231244.9151.500.95
Up: 4278.146

Table 2.

Barley yield estimates in a 100 × 100 km area of Castilla y León, 2018. NDVI.

Although the vegetation index contribution to explain crop yield is statistically significant, its contribution to improve yield estimator accuracy is low: the sampling error is of the same order of magnitude as if ground data alone were used. The relative efficiency is nearly 1, so the current sample size could barely be reduced without loss of accuracy. Additional research is required to make production estimates more reliable through improved yield estimation from RS data.

3.1.2.3 Crop production estimates

Table 3 shows the estimate production of barley, calculated as the product of the crop acreage estimate and yield estimate.

DataProduction (tons) (1000 kg)Uncertainty
95% Confidence interval (tons: 1000 kg)Sampling error (CV%)
LimitsWide
Ground987,841Lw: 903640168,4024.35
Up: 1072042
Ground + RS965,733Lw: 915194101,0782.67
Up: 1016272

Table 3.

Barley production estimates in a 100 × 100 km area of Castilla y León, 2018.

Using RS data, the production estimator accuracy increased considerably: the estimation error decreased by half, even if the RS data is not reliable for crop yield.

3.1.2.4 Crop acreage estimates at the provincial level

We estimate barley acreage in that part of the provinces within the 100 × 100 km area of Castilla y León (Spain). Results are in Table 4.

ProvinceGround data aloneGround & CTM dataRelative efficiency of CTM data
Acreage (has.)Sampling error (CV%)Acreage (has.)Sampling error (CV%)
León6853.224.156834.516.202.3
Palencia88602.07.3690535.33.334.7
Valladolid128209.55.57119707.42.665.1
Zamora12324.217.7110948.28.166.4
Total area235989.14.37228028.51.985.2

Table 4.

Barley acreage estimates at provincial level.

The estimate accuracy at provincial level is less than that at the national level, but the CTM relative efficiency is similar. Where the sampling variance using ground data alone is high (León), its integration with CTM data reduces the sampling error below the standard limits (20%) considered acceptable in official statistics.

3.2 Ground data collected at the pixel level

We use pixel i as sampling unit, in the same way that a point is used, that is, treating the ground data observed at the pixel level as values of a categorical variable. To each pixel i=1,2,,N of the study area, we associate a survey vector yi=col1kKyik, where yik=1 if crop k covers pixel i and yik=0 otherwise. K is the total number of crops so that the constraint k=1Kyik=1 holds.

The total of this survey vector in the population is yN=i=1Nyi=col1kKi=1Nyik=col1kKyNk, where yNk is the total number of pixels covered by crop k. It is assumed that a pixel is covered by a single crop (or that a class of mixed pixels is included) so that the area covered by k is Ak=ayNk, where a is the area of the terrain represented by a pixel.

To estimate yN, we follow a design-based approach using ground data collected in a sample of n pixels, selected from the population with inclusion probabilities πii=12N. To integrate these sample data with CTM data, we use a multinomial logit model [12, 13, 14]. We assume that the survey vector yi=col1kKyik follows a multinomial distribution MN1μi, where μi=col1kKμik and μik is the probability that crop k=1,2,,K covers pixel i (probability of yik=1 and yik=0;kk), with the constraint k=1Kμik=1.

To link μik with CTM data, we use a logit model μik=expxiβkk=1Kexpxiβk, where xi=row1lLxil is an indicator vector of the class where pixel i is located within the CTM (xil=1 if pixel i is in the class l and xil=0 otherwise), and βk=col1lLβkl is an unknown parameter vector. To estimate μik, it is sufficient to obtain estimates of βk for k=1,2,,K1 [12]. The probability of crop K is μK=1k=1K1μik and can be estimated using the estimates β̂πk for k=1,2,,K1.

The design-based parameter estimator B̂π=col1kK1B̂πk=col1kK1col1lLB̂πkl is design-consistent and can be found iteratively using B̂πm+1=B̂πm+i=1nHyiβπiB=B̂πm1i=1nbyiβπiB=B̂πm [11], where byiβ=col1kK1col1lLyikμikxil and Hyiβ=col1kK1row1kK1δkkμikμikμikxiTxi, with δkk=1 if k=k and δkk=0 otherwise.

A design-consistent estimator of yN=col1kKyNk is ŷN=i=1Nμ̂i+NN̂pi=1nyiμ̂iπi=col1kK1ŷNk, where μ̂i=col1kK1μ̂ik, μ̂ik=expxiB̂πk1+k=1K1expxiB̂πk for k=1,2,,K1, μ̂iK=1k=1K1μ̂ik=11+k=1K1expxiB̂πk, N̂p=i=1n1πi, and ŷNk=i=1Nμ̂ki+NN̂pi=1nykiμ̂kiπi is the estimator of the total number of pixels covered by crop k for k=1,2,,K1; for crop K, it is ŷNK=Nk=1K1ŷNk. The sampling covariance matrix of ŷN is VŷN=N2V1N̂pi=1nyiμ̂iπi=col1kK1row1kK1N2Covŷrkŷrk, where ŷrk=1N̂pi=1nykiμkiπi is a function of N̂pk=i=1nykiπi (because N̂p=k=1KN̂pk), the estimators of the total number of pixels covered by crop k=1,2,,K based on ground data alone, and of r̂kN=i=1nykiμ̂kiπi, the estimator of the total of the deviations between yki and μki in the population.

The sampling variance is an uncertainty measure that depends both on the deviation between the true, yi, value and its expected value estimate, μ̂i, and on the sample design, πi. We focus on the sampling variance VŷNk for k=1,2,,K1, which are the elements along the diagonal of VŷN. Let Ĝk=rowgj1jK+1=r̂kNN̂p1N̂p2N̂pK be a row vector whose K+1 components are the estimators on which ŷrk depends. The ŷNk sampling variance is VŷNk=row1jK+1ŷrkgjVĜkcol1jK+1ŷrkgj, where VĜk=col1jK+1row1jK+1Covgjgj is the design-based covariance matrix of Ĝk. The sampling covariance matrix of ŷN is estimated by replacing μi with μ̂i inVŷN.

3.2.1 Crop type map relative efficiency

The sampling variance of the yNk estimator based on ground data alone is VG=N2V1N̂pi=1nykiπi and that based on its integration with CTM data is VG+CTM=N2V1N̂pi=1nykiμkiπi. The relative efficiency of CTM data for estimating the total number of pixels covered by crop k is RECTMk=V1N̂pi=1nykiπi×V1N̂pi=1nykiμkiπi1.

3.2.2 Example: crop acreage estimates

We use field data collected at the pixel level by the NSO of Senegal, to illustrate the integration of categorical ground data and CTM data using multinomial logit models. The crop covering each pixel included in the sample selected in the Nioro Department was observed in 2021. In the CTM, pixels covering agricultural areas were classified into four crop types: (i) maize, (ii) millet, (iii) groundnut, and (iv) other crops, and the data are encode as xi=1000 for any pixel i of the class maize, xi=0100 for any pixel i of the class millet, xi=0010 for any pixel i of the class groundnut, and xi=0001 for any pixel i of the class other crops.

We focus on estimating the acreage of the two main crops: millet and groundnut. The remaining crops (mainly maize) are included in a third category called other, whose estimated probability is one minus the estimated probability of millet and groundnut. Model parameter estimates are in Table 5.

CropCrop type map
MaizeMilletGroundnutOther crops
Millet−0.0106551.479583340.889313468.76504093
Groundnut−0.659204−0.598717002.898739481.21898514

Table 5.

Model parameter estimates (B̂πmillet and B̂πgroundnut).

Crop acreage estimates of millet and groundnut based on ground and CTM data are in Table 6.

CropAcreage (Hectare)Uncertainty
Sampling error (CV%)Limits of 95% Confidence interval
LowerUpperWide
Millet89,2154.1181978.8896330.414351.52
Groundnut78,8153.7173089.1584550.9811461.82

Table 6.

Crop acreage estimates, Nioro (Senegal) 2021.

The CTM data efficiency estimate is in Table 7.

Crop typeStandard errors of proportion estimatorsRelative efficiency of CTM data
Ground data aloneGround and CTM data
Millet3.371.903.13
Groundnut3.341.524.80

Table 7.

Efficiency of CTM data for crop acreage estimation, Nioro (Senegal).

The integration of ground data observed at the pixel level and CTM data improves the accuracy of the estimators based on ground data alone: the standard error is cut by a half, without loss of design-based consistency. As a result, the size of the current sample used for collecting ground data could be reduced to less than a third, without loss of accuracy.

Table 8 shows crop acreage estimates in the districts Medina Sabakh, Paoskoto, and Wack Ngouna.

DistrictMilletGroundnut
Acreage (Has.)Sampling error (CV%)Acreage (has.)Sampling error (CV%)
Medina Sabakh20067.218.619765.367.3
Paoskoto38316.025.335018.234.0
Wack Ngouna30831.7711.924030.7110.7
Total Nioro89215.004.1178815.003.71

Table 8.

Crop acreage estimates at the district level, Nioro (Senegal).

The estimate accuracy at the district level is lower than at the department level. However, the sampling error is within the standard limit (CV < 20%) for labeling as official statistics.

Advertisement

4. Model-based approach: small area estimation

In small areas, such a municipality, the sample size is small, and as a result, the design-based approach is not accurate enough for most applications. In these areas, inference is based on the model, rather than the sampling design, and the estimates are not robust, but model-dependent.

4.1 Ground data collected at the parcel level

Let the sampling unit, i, be a parcel or a cluster of parcels (say a polygon, as in Spain), and let ydi be the value of the survey variable in a sampling unit i=1,2,,Nd of the small area d=1,2,,D. Let ydixdii=12nd be the ground data set, ydi, and the CTM data set, xdi=1xd2i, in the nd sampling units from the whole sample n that fall in d. Here, xd2i is the number of pixels classified in CTM as belonging to the study crop within the sampling unit i of the small area d.

To estimate yNd=i=1Ndydi, the total of the survey variable in d, we use CTM data and the linear mixed model ydi=xdiβ+ud+εdi, where udεdi are zero-mean independent random variables of variances σu2σe2.

The model-based estimator of yNd is ŷNd=Nd1γ̂dx¯Ndβ̂+γ̂dy¯nd+x¯Ndx¯ndβ̂, where β̂=XTV̂1X1XTV̂1y is the generalized least-square estimator of β, with y=col1dDcol1indydi, X=col1dDcol1indxdi, V̂1=diag1dDV̂d1 and V̂d1=1σ̂e2Indγ̂dndσ̂e21nd1ndT, where γ̂d=σ̂u2σ̂u2+σ̂e2/nd. Here, σ̂e2=eTenD1 and σ̂u2=uTun2σ̂e2n, with n=d=1Dnd1ndx¯dXTX1x¯dT, are unbiased estimators of the variance components, where eTe is the sum of squared residuals in the model fitted by ordinary least squares, taking as fixed the small-area effect ud.uTu is the sum of squared residuals in the model fitted by ordinary least squares, with ud=0. x¯Nd and x¯nd are the population and sample means of the vector xdi, respectively.

An unbiased estimator of the total mean-square error estimator MSEŷNd is [15]: M̂SEŷNd=Nd21ndNd2h1dσ̂v2σ̂e2+h2dσ̂v2σ̂e2+2h3dσ̂v2σ̂e2, where:

h1dσ̂v2σ̂e2=γ̂dσ̂e2nd+1ndNd2NdndNd2h2dσ̂v2σ̂e2=σ̂e2X¯Ndndγ̂dx¯dA1X¯Ndndγ̂dx¯dTh3dσ̂v2σ̂e2=1nd21σ̂v2+σ̂e2nd3σ̂e22Vσ̂v2+σ̂v22Vσ̂e22σ̂e2σ̂v2Covσ̂e2σ̂v2

(where)

A=d=1Di=1ndxdiTxdiγ̂dndx¯ndTx¯ndVσ̂v2=2n21nD1D1n2σ̂e22+2nσ̂e2σ̂v2+nσ̂v22Vσ̂e2=2σ̂e22nD1Covσ̂e2σ̂v2=1nD1Vσ̂e2

Here, n=d=1Dnd21ndx¯ndA11x¯ndT+trA11d=1Dnd2x¯ndTx¯nd2. Because A1=d=1Di=1ndxdiTxdi, n may be simplified to n=d=1Dnd21x¯ndXTX1x¯ndT=nn+d=1Dnd2.

4.1.1 Example

Table 9 shows barley acreage estimates at the municipality level in that part of Zamora province within the 100 × 100 km area of Castilla y León (Spain).

MunicipalitySample sizeEstimates
Acreage (Hectares)Sampling error (CV%)
Belver de los Montes1212.9629.1
Castroverde32914.228.0
Pinilla de Toro4963.3010.0
Quintanilla del Monte1466.6520.3
Toro1615.9114.0
Vezdemarbán31358.2212.6
Villalpando2560.0539.1
Villamayor de Campos11056.2311.1
Villanueva del Campo1784.0313.2
Villar de Fallaves1844.1611.0
Villardondiego1516.4011.5
Villavendimio1656.0710.4
Total Zamora2010948.28.16

Table 9.

Barley acreage estimates at municipality level.

Even in municipalities where the sample size is as small as a single sampling unit, CTM data contribute to reducing sampling error below the standard limit (20%) considered to label estimates as official statistics.

4.2 Ground data collected at the pixel level

We consider a population of i=1,2,,N pixels grouped in a set of d=1,2,,D small areas (municipalities) such that N=d=1DNd, where Nd is the number of pixels in the small area d. Let ydi=col1kKydik be the survey vector where ydik=1 when crop k covers pixel i and ydik=0 otherwise.

To estimate yNd=i=1Ndydi=col1kKyNdk, the total of the survey vector over a small area d, where yNdk=i=1Ndydik is the number of pixels of d covered by crop k, we use CTM data, and we assume that ydi follows a multinomial distribution MN1μdi, where μdi=col1kKμdik and μdik is the probability of ydik=1 and ydik=0;kk, with the constraint k=1Kμdik=1. To link μdik with CTM data, we use a logit mixed model μdik=expxdiβk+vdk1+k=1K1expxdiβk+vdk, where μdik is the probability that crop k covers i of d, xdi=row1lLxdil is an indicator vector whose components are xdil=1 if pixel i of d is classified in the crop-type class l=1,2,,L and xdil=0 otherwise, and xdiβk+vdk=ηdik is a linear mixed predictor consisting of a fixed xdiβk and a random component, vdk, associated with the small areas and specifics for each crop k.

Here, βk=col1lLβkl is a vector of parameters measuring the effects of the CTM predictors xdi on μdi, for example, for a pixel of the crop-type class l=1,2,,L we have μdik=expβkl+vdk1+k=1K1expβkl+vdk for crops k=1,2,,K1 and μdiK=11+k=1K1expβkl+vdk for crop K. The random effect, vdk, accounts for the heterogeneity of the survey vector ydi=col1kKydik among small areas, not explained by CTM data. It is assumed that vdk is a realization of a zero-mean random variable whose small-area variance is Vvdk=σk2 for crop k and whose cross-crop covariance is Covvdkvdk=σkk2 so that the vector of small-area random effects V=col1dDcol1kK1vdk has mean EV=0 and covariance matrix Σ=θID, where θ=row1kK1col1kK1σkk2 and σkk2=σk2 if k=k.

To estimate yNd, we observe ydi only in the nd pixels that fall within the small area d, that is, a small part of the full sample of size n=d=1Dnd; the values of ydi in the remaining Ndnd pixels are unknown, and we want to predict them using the model. We consider yNd decomposed into two ydi aggregates yNd=is=1ndydis+ir=1Ndndydir: one known, is=1ndydis, and the other, ir=1Ndndydir, unknown. We use ŷNd=is=1ndydis+ir=1Ndndμ̂dir as a predictor of yNd, where μ̂dir=col1kK1μ̂dirk with μ̂dirk=expxdirβ̂k+v̂dk1+k=1K1expxdirβ̂k+v̂dk for crop k=1,2,,K1, and μ̂dirK=11+k=1K1expxdirβ̂k+v̂dk for crop K. Here, β̂k is the βk estimate and v̂dk is the predictor of the small-area random effect vdk in v=col1dDcol1kK1vdk. To estimate both the fixed-effect β=col1kK1βk and random-effect θ=row1kK1col1kK1σkk2 model parameters, we use maximum-likelihood estimators and the big sample of size n=d=1Dnd. As a result, both predictors μ̂dir and ŷNd are asymptotically unbiased.

The mean squared error of the predictor ŷNd of yNd is MSEŷNd=EŷNdyNdŷNdyNdT=Eir=1Ndndμ̂dirydir2, where ir=1Ndndμ̂dirydir=ir=1Ndndμ̂dirμdirydirμdir=τ̂drτdrεdr. Here, τdr=ir=1NdndEydir=ir=1Ndndμ̂dir is the expected value of the survey vector in the Ndnd non-observed pixels, τ̂dr=ir=1Ndndμ̂dir is the predictor of τdr, and εdr=ir=1Ndndεdir with εdir=ydirμdir is the unobservable error vector.

Then, MSEŷNd=Eτ̂dtrτdtrεdtr2=MSEτ̂dr+EεdrεdrTEτ̂drτdrεdrTEεdrτ̂drτdrT. If we assumed that εdr and τ̂dr are independent, then the covariance term is null, Eεdrτ̂drτdrT=0, and the mean square error of ŷNd reduces to MSEŷNd=MSEτ̂dr+EεdrεdrT, where EεdrεdrTEεdrεdrTVd=ir=1NdndVydir, where Vydir is the ydir covariance matrix.

Appendix 2 for more details on parameter estimations, prediction of random effects, and mean squared error estimations.

4.2.1 Example

We use the field data described in Section 3.2.2. Here, we encode the CTM data as follow: xi=100 for a pixel i in the crop-type class maize and other crops, xi=010 for crop-type class millet, and xi=001 for crop-type class groundnut. We assume that random effects have the same variance σ2 and are independent among crops: Covvdkvdk=0 for kk and k=1,2,,K1. The model parameter estimates are in Table 10. The values of the z-Wald statistics, for βk parameters, and λ likelihood ratio, for the variance of the random effects σ2, are shown in parentheses.

Crop kβ̂k=col1lLβ̂kl for crop-type class lVariance of the small area effect σ̂2
Maize and otherMilletGroundnut
Millet0.109 (z = 0.240)1.429 (z = 6.236)0.559 (z = 1.608)0.028 (λ = 24.636)
Groundnut−0.662 (z = −1.153)−0.587 (z = −1.711)2.522 (z = 8.494)

Table 10.

Model parameter estimates.

According to the z-Wald statistics, the parameters associated with the crop-type classes millet and groundnut are significant for both millet and groundnut (at level < 0.1). The parameters associated with the crop-type class maize and other are not significant (at the 10% level) for both millet and groundnut. According to the likelihood ratio test, the random small-area effect is also significant (at level < 0.01).

Crop-acreage estimates for millet and groundnut in four communities in Senegal’s Nioro region are in Table 11.

MunicipalityPopulation (pixels)Sample (pixels)Crop acreage (hectares) (Error: RRMSE%)
MilletGroundnut
Paos Koto1,163,94864500 (26.4)1957 (56.5)
Taiba Niassene1,163,096144902 (34.3)2000 (74.5)
Keur Madongo302,08821281 (24.9)577 (45.1)
Wack Ngouna2,766,043129280 (25.0)4167 (47.1)

Table 11.

Crop acreage estimation at municipality level, Nioro (Senegal).

The integration of ground and CTM data makes possible to evaluate the estimates accuracy using the relative root mean squared error, which is the ratio between root mean squared error and crop acreage. It is remarkable that in small areas where the sampling fraction is almost null, this integration reduces the estimation error to levels close to the limits that allow the estimates to be labeled as official: for millet, the estimation error is close to 20%, and for groundnut, the root mean squared error is similar but the acreage is about half the acreage of millet so that, as a result, the relative root mean squared error is about twice than for millet.

Advertisement

5. Optimizing the sample design

The objective is to minimize the sampling variance, that is, maximizing the estimates accuracy, without increasing the cost. The sampling variance depends on the spatial correlation structure of the survey variable, yi. The correlation ρydistsisi between two observations yiyi at the points of coordinates si and si is a positive and decreasing function of the distance, d=distsisi: it decreases when the distance between these points increases. Details on how to use a land-use map to estimate the spatial correlation structure of crops data, using variogram and correlogram models [16], and on how to use this structure for sample design are in [17]. Here, we propose to use a CTM to estimate the parameters of the correlogram model. We use the estimated correlogram to evaluate the (anticipated) sampling variance [18] as a function of the design variables (sampling unit size and sample size) [19, 20].

We consider a simple case: to estimate the total of the survey variable in the population, yN=i=1Nyi, using a simple random sample (SRS) scheme (equal probability of inclusion, πi=n/N) and the estimator ŷNSRS=N/ni=1nyi. The sampling variance is VŷNSRS=N21n/NS2/n, where and y¯N=yN/N are the variance and the mean of the survey variable in the population, respectively.

Pixels are arranged in rows and a columns, and we consider the sampling unit as a square block of n0 pixels and identify each sampling unit i=1,2,,N by the row and column of the pixel in the lower left corner of the square block. The distance d=distsisi can be expressed as d=u2+v2, where u is the number of sampling units between si and si in the row direction, and v is the number of sampling units between si and si in the column direction. The two most often used models to assess the structure of ρyd are the exponential, ρuvaτ=1τed/a, and the spherical ρuvaτ=1τed/a132da+d32a3 if da and ρuvaτ=0 if d>a.

The model parameters are the range rate a and ratio τ=τ0/τ0+τd. Here, τ0 is the nugget effect, that is, the variation at or near the origin (independent of distance); τd is the partial sill (a function of distance d between sampling points); and τ0+τd is the sill, that is, the maximum variation far from the origin.

The anticipated variance is the expected value (based on the correlogram model) of the sampling variance AVŷNSRS=EVŷNSRS=N21n/NES2/n, where ES2=σ2ΨNn0aτ, σ2 is the variance of yi, and ΨNn0aτ=n0Nn01/N11ΦNn0aτNn0n01/N11Φn0aτ. Here, ΦNn0aτ is the average correlation between pairs of pixels over the CNn02 pairs in the population, and Φn0aτ is the average correlation between pairs of pixels over the Cn02 pairs in a sampling unit.

The sample design is optimized to find the sampling unit size, n0, and the sample size, n, that minimize the sampling variance, minnn0AVŷNSRS=minnn0N21n/Nσ2/nΨNn0aτ subject to the cost constraint, C0+Cwnn0+CkAnC. Here, C is the total budget, C0 is a fixed cost independent of the sample size, Cw is the interviewing cost, Ck is a cost per distance unit, and A is the surveyed area. The solution to this problem is the optimum sampling unit size n0 and optimum sample size n. In addition to the budget, C, this optimum solution is conditioned to the correlogram model parameters aτ.

5.1 Example

For illustrations, we use the case study in [20]. The data source in this case study was a land-use map, but we establish a parallelism between the map and the satellite image to show how the proposed approach should be applied when a CTM is used, instead of a land-use map. The map is digitized, but the minimum level of disaggregation is a square area of side 1 km so that the minimum sampling unit size in the CTM would be a square block of side 100 pixels of 10 m and the survey variable, yi, would be the number of pixels of the study crop type within sampling unit i.

The studied region was an area of 200 km by 200 km in Castilla y León, Spain, that is, 20,000 by 20,000 pixels of 10 m and N=40000 sampling units of 10,000 pixels. We computed the empirical variogram for nonirrigated herbaceous crops using the moment estimator [17], 2γ̂d=1NdNdyiyi2, where Nd=sisidistsisi=d. A spherical model was fitted to the empirical variogram and found as parameter estimates, τ̂0=728.96,τ̂d=539.60, and â=46.17. Using these parameter estimates, we compute the estimate ΦNâτ̂=0.032. The parameter σ2 is estimated by σ̂2=S21ΦNâτ̂, where S2=11Ni=1Nyiy¯2 and y¯=1Ni=1Nyi.

In addition to the sampling unit of minimum size n0=1, we considered a set of sampling unit sizes ranging from n0=2 to n0=10 times the sampling unit of minimum size n0=1. To estimate Φn0aτ, it is assumed that the sampling unit of size n0 is a square lattice of side n0 and that the distance between a pair of its components is d=2n0 so that Φn0âτ̂=1τ̂1322n0â+2n032â3.

The optimization problem is

minnn0N21nNσ̂2nn0Nn01N110.032Nn0n01N111τ̂1322n0â+2n032â3s.t.C0+Cwnn0+CkAnC.

To find the solution to this problem, we use the cost function coefficients in [21] (C=120000$, C0=20500$, Cw=50$, and Ck=1$/Km) and the nloptr package [22]. The solution is in Table 12.

VariableOptimal solution
Lower bound for the sampling unit size
1234510
Sample size: n1661891611465375192
Sampling unit size: n01234510
Minimum Anticipated Variance1.1 e+91.4 e+91.7e+92.0e+92.3e+93.8e+9

Table 12.

SRS: Optimum solution for several values of lower bound for the sampling unit size.

The optimum sampling unit size is the lower bound allowed. The minimum anticipated variance increases from 1.1 e+9 to 3.8e+9, when the sampling unit size increases from 1 to 10. In other words, using sampling units of size 10, the sampling variance is 1.86 times higher than using sampling units of size 1.

Advertisement

6. Concluding remarks

We have shown how to integrate CTM and ground data to produce agricultural and environmental statistics, along with measures of their accuracy that take into account any source of uncertainty, including classification errors. We have developed prototypes for this integration at the national, provincial, and municipal levels. For large areas (national and provincial level), these prototypes are based on the design of a probabilistic scheme to select the sample from which to collect the ground data, and we use a statistical model to integrate this sample with CTM data. The resulting statistics are design-consistent and robust; that is, they are based on the sampling design rather than on the model. The sample design is key, and we have shown how to optimize it using CTM. For small areas (municipal level), the prototypes are based solely on the model so that the resulting statistics are model-dependent.

To facilitate the transference of these prototypes, we have included examples of their application to data provided by two National Statistical Offices: Spain and Senegal. The former is an example of countries where ground data are collected at the parcel level, and we use linear models for their integration with CTM. The latter is an example of countries where ground data are observed at the pixel level, and we use multinomial logit models for their integration with CTM. We show how to estimate the model parameters and how to use CTM data to compute the required statistics, as well as their accuracy measures in these two countries, at both large and small area levels.

We have evaluated the efficiency of CTM data to improve the methods currently used to produce official agricultural statistics in these two countries, and we have found it to be high: using CTM data, the current sample size at the level of large areas could be reduced to less than one-fifth in the case of Spain and between one-third and one-fifth in the case of Senegal, without loss of neither robustness nor accuracy. At the small area level, where the sampling fraction is almost null, we have shown how to produce estimates accurate enough to be labeled as official statistics, that is, useful for most applications, without increasing costs.

Advertisement

Acknowledgments

The research results presented in this chapter were supported by the European Space Agency, in the Sen4Stat project framework [Contract No. 4000127181/19/I-NS]. The national statistical offices of Senegal and Spain provided the ground data used in the examples.

Advertisement

A. Appendix 1—pixels classification

We want to divide the i=1,2,,N pixels into a set of k=1,2,,K groups, each one corresponding to a discriminable class on the terrain identified by a crop type or land use. Let fikxi be the joint probability function of the events “pixel i is in the terrain class k ik, and the spectral measure in i is xi”. The decision rule is to “classify i in the class k if and only if fikxi=maxk=1,2,,Kfikxi.” Maximum likelihood, maximum entropy, and multinomial regression are three approaches to implement this rule.

A.1 Maximum likelihood

This approach looks for minimizing the expected loss due to classification errors. Let λikk be a loss function measuring the cost incurred when pixel i is classified in class kk but is actually of class k. The expected loss is Eλikk=k=1Kλikkfikxi, where fikxi=fikfxiik/fxi. Using this loss function, we consider the set of K discriminant functions, gkxi=k=1Kλikkfikfxiik/fxii=12Nk=12K and we decide to classify the pixel i in the class k if gkxi=maxk=1,2,,Kgkxi=mink=1,2,,Kgkxi. Note that this classifier minimizes the expected loss, as desired. A simple loss function is λikk=0ifk=k and λikk=1 otherwise. Using this loss function, the discriminant function is gkxi=kk=1Kfxiikfik/fxi, and since fxi is a constant for a given xi, an equivalent discriminant function is gkxi=gkxifxi=kk=1Kfxiikfik. Note that fxi=k=1Kfikfxiik=kk=1Kfxiikfik+fxiikfik, and as a result, gkxi=fxi+fxiikfik. Since fxi is a constant for a given xi, an equivalent discriminant function is gkxi=gkxi+fxi=fxiikfik.

Thus, we consider the set of K discriminant functions, gkxi=fxiikfiki=12Nk=12K and we decide to classify the pixel i in the class k if gkxi=maxk=1,2,,Kgkxi=maxk=1,2,,Kfxiikfik.

Note that maxk=1,2,,Kfxiikfik=maxk=1,2,,Klogfxiikfik, where logfxiikfik=logfik+logfxiik, where logfxiik=L2log2π12logdetVk12xiμkVk1xiμkT. Thus, an equivalent set of K discriminant functions is gkxi=logfik12logdetVk12xiμkVk1xiμkT;i=1,2,N,k=1,2,K and we decide to classify the pixel i in the class k if gkxi=maxk=1,2,,Kgkxi.

A.2 Maximum entropy

  1. Multinomial distribution. The entropy Hμ can be derived from the multinomial distribution. Assume that the survey vector, yi=col1kKyik, follows a multinomial distribution, MN1μi. Then, i=1nyi=col1kKyk=i=1nyik follows a multinomial distribution MNn=k=1Kykμ=col1kKμk, with k=1Kμk=1. The number of i=1nyi realizations favorable to the observed sample yn=col1kKyk is W=n!k=1Kyk!. Thus, lnW=lnn!k=1Klnyk! and, using the Stirling approximation, lnW=nlnnnk=1Kyklnyk+n=nlnnk=1Kyklnyk. For large samples, and according to the Bernoulli theorem, yknμk when n, and as a result, lnWnlnnk=1Knμklnnμk=nlnnnk=1Kμklnn+lnμk=nk=1Kμklnμk. Finally, n1lnWk=1Kμklnμk=Hμ. The entropy and multinomial logit approaches provide coincident class probability estimates because they are maximizing the same function.

  2. Information gain measures. In the simple case where the available information reduces to sample ground data, y=col1incol1kKyik, μ is considered compatible with the observed data yn=i=1nyi=col1kKyk=i=1nyik, when yn=μ+εn, where εn=col1kKi=1nεik and εik is a random noise. The set of μ=col1kKμk values compatible with yn is countless, and according to the maximum entropy principle, we choose as μ value the one making Hμ maximum: μ̂=maxμHμ=maxμk=1Kμklnμk=minμk=1Kμklnμk, subject to k=1Kμk=1. To find μ̂, we solve the Lagrange function Ψ=k=1Kμklnμk+λ1k=1Kμk. The Ψ derivative with respect to μk is ∂Ψμk=μkk=1Kμklnμk+λ1k=1Kμk=μk1μk+lnμkλ=lnμk1λ, for k=1,2,,K and with respect to λ is ∂Ψλ=1k=1Kμk. The solution of this system of K+1 equations is lnμ̂k1λ̂=0μ̂k=e1λ̂;k=1,2,,K, with k=1Kμ̂k=Ke1λ̂=1, so that lnKe1λ̂=ln1=0lnK1λ̂=0λ̂=lnK1. The entropy is Hμ̂=k=1Ke1λ̂1λ̂=Ke1λ̂1+λ̂=1+λ̂=lnK, and it is achieved when μ̂k=e1λ̂=elnK;k=1,2,,K, that is, when, lnμ̂k=lnK=ln1Kμ̂k=1K;k=1,2,,K so that the assigned probabilities μ̂=col1kKμ̂k correspond to a uniform distribution.

When additional information, X=col1inrow1lLxil, is introduced, then a reduction of uncertainty is achieved, from lnK to k=1Kμ̂klnμ̂k. This reduction, lnK+k=1Kμ̂klnμ̂k, is called “information gain,” and the ratio between this reduction and the maximum uncertainty, Iμ̂=lnK+k=1Kμ̂klnμ̂klnK=1Sμ̂, is called “information index”, where Sμ̂=k=1Kμ̂klnμ̂klnK. A value Sμ̂=0, that is, Iμ̂=1, corresponds to no uncertainty (μ̂k=1 for some k and μ̂k=0 for every kk), and a value Sμ̂=1, that is, Iμ̂=0, corresponds to perfect uncertainty (μ̂k=1K;k=1,2,,K). The measure Sμ̂ can be used to evaluate the effect of each piece of information. For instance, the effect of reducing the sample size in a unit can be measured by comparing Sμ̂n and Sμ̂n1: if Sμ̂n<Sμ̂n1, then the unit in question reduced the uncertainty. In the same way, the effect of excluding an auxiliary variable can be measured using Sμ̂L and Sμ̂L1: if Sμ̂L<Sμ̂L1, then the variable in question reduces the uncertainty.

Advertisement

B. Appendix 2 - Small area estimation based on multinomial logit mixed models

B.1 Parameter estimation

To estimate μdik, it is sufficient to acquire estimates of βk for k=1,2,,K1. The probability that crop K covers pixel i of d is μdiK=1k=1K1μdik and can be estimated using the estimates of βk for k=1,2,,K1. We specify the vector η=col1dDcol1iNdcol1kK1ηdik where ηdik=xdiβk+vdk as a linear mixed model η=+Zv, where X=col1dDcol1iNddiag1kK1row1lLxdil are known CTM values, β=col1kK1col1lLβkl is a vector of parameters, and Z=diag1dDcol1iNdrow1kK1zdik is an indicator matrix where zdik=1 if k covers i of d and zdik=0 otherwise. It is assumed that the vector of small-area random effect v=col1dDcol1kK1vdk has mean Ev=0 and covariance matrix Σ=θID, where θ=row1kK1col1kK1σkk2 and σkk2=σk2 if k=k.

The parameters in β and θ are often estimated following a penalized quasi-likelihood approach [22, 23, 24]. As discussed by McCulloch and Searle [13], these methods are not completely satisfactory in practice, and these authors recommended to linearize the nonlinear mixed model instead, thus obtaining a working linear mixed model fitted by maximum likelihood. Here, we follow this recommendation and consider the link function applied to the ground data gkydi=logydik1k=1K1ydik for k=1,2,,K1, and its Taylor expansion about Eydiv=μdi, gkydigkμdi+gkydiydi1ydi=μdiydi1μdi1++gkydiydiK1ydi=μdiydiK1μdiK1, to obtain the working linear model ξ=+Zv+e [24]. Here, ξ=col1dDcol1iNdξdi and ξdi=col1kK1ξdik is a working vector, e=col1dDcol1iNdedi with edi=Hdi1ydiμdi, and Hdi=hηdiηdi=col1kK1row1kK1δkkμdikμdikμdik with δkk=1 if k=k and δkk=0 otherwise, ydi=col1kK1ydik, and μdi=col1kK1μdik. Here, hηdi is the inverse of the link function gμdi=ηdi, that is, μdi=g1ηdi=hηdi.

The working vector covariance matrix is V==ZΣZT+Ve, where Ve=diag1dDdiag1iNdVedi and Vedi=Hdi1VydiμdiHdi1, where Vydiμdi is the covariance matrix ofydiMN1μdi.

We assume that the ξ asymptotic distribution is normal, ξANV, and we see ξ=+Zv+e as a linear mixed model. Vedi is complex because the derivative of the link function Hdi depends on V through μ, and μdi also depends on V. We consider a simplification of this problem; it consists of setting v=0 in μdi that reduces Hdi to Hdi=col1kK1row1kK1δkkμdikμdikμdik, where μdik=expxdiβk1+k=1K1expxdiβk no longer depends on the random effects v [13]. As a result, Vedi=Hdi1VydiμdiHdi1, where μdi does depend on v.

To determine the β and θ maximum likelihood estimates, we follow an iterative procedure:

  1. Let θ0 and β0 be starting values of θ and β, respectively

  2. Let v0=0 and compute ξ0=0+H10yμ0

  3. Compute 0=diag1dDVedi0, where Vedi0=Hdi10Vydiμdi0Hdi10 with μdi0=col1kK1μdik0 and μdik0=expxdiβk01+k=1K1expxdiβk0

  4. Update β0using β̂1=XT01X1XT01ξ0

  5. Update θ0 using θ̂1=θ02lβθθTβ=β̂11lβθβ=β̂1, where lβ is the profile log likelihood of the ξ distribution.

  6. Update V̂0 using v̂1=Σ̂1ZT11ξ0Xβ̂1 and compute ξ1=Xβ̂1+Zv̂1+Ĥ11yμ̂1

  7. Update β̂m using β̂m+1=XTm1X1XTm1ξm

  8. Update θ̂m using θ̂m+1=θ̂m2lβ∂θ∂θTβ=β̂m1lβ∂θβ=β̂m

    The gradient vector for θ=row1kK1col1kK1σkk2 is b=lβ∂θ=col1lLlθl, where θl=σk for k=k and θl=σkk2 for kk and lθl=12trV1Vl+12rTV1VlV1r, where rm+1=ξmXβm+1 and Vl=Vθl, which for θl=σk is equal to Vσk=Vσk=ZIDθσkZT with θσk=diag1kK12σk and for θl=σkk2 when kk it is Vσkk2=Vσkk2=ZIDθσkk2ZT with θσkk2=row1kK1col1kK1σkk2σkk2.

    The Hessian matrix for θ is 2lβθθT=row1lLcol1lL2lθlθl, where we use l instead of lβ. Here, 2lθlθl=12trV1VllV1VlV1Vl12eTVlle with Vll=2Vθlθl and Vll=2V1θlθl=V1VllVlV1VlVlV1VlV1.

    For K=3 and l,l=σ1,σ2,σ122, the Hessian matrix is 2lβθθT=2lσ1σ12lσ1σ22lσ1σ1222lσ2σ12lσ2σ22lσ2σ1222lσ122σ12lσ122σ22lσ122σ122, and Vσ1σ1=2Vσ1σ1=ZID2θσ1σ1ZT=ZID2000ZT, Vσ1σ2=2Vσ1σ2=2Vσ1σ2=ZID2θσ1σ2ZT=ZID0000ZT=0nK1×nK1, Vσ1σ122=2Vσ1σ122=ZID2θσ1σ122ZT=0nK1×nK1, Vσ2σ1=Vσ1σ2 Vσ2σ2=2Vσ2σ2=ZID2θσ2σ2ZT=ZID0002ZT, Vσ2σ122=2Vσ2σ122=ZID2θσ2σ122ZT=0nK1×nK1, Vσ122σ1=Vσ1σ122, Vσ122σ2=Vσ2σ122,Vσ122σ122=2Vσ122σ122=ZID2θσ122σ122ZT=0nK1×nK1

  9. Continue until convergence.

B.2 Prediction

We partition the linear predictor vector ηd=col1iNdcol1kK1ηdik into a part ηds=col1isndcol1kK1ηdisk containing the nd known sample values and another part ηdr=col1irNdndcol1kK1ηdirk containing Ndnd unknown values we want to predict. We partition the model ηd=Xdβ+ZdV in the same way: ηdsηdr=XdsXdrβ+ZdsZdrV=Xdsβ+ZdsVXdrβ+ZdrV. We use ŷNd=is=1ndydis+ir=1Ndndμ̂dir as a predictor of yNd, where μ̂dir=col1kK1μ̂dirk with μ̂dirk=expxdirβ̂k+v̂dk1+k=1K1expxdirβ̂k+v̂dk for crop k=1,2,,K1 and μ̂dirK=11+k=1K1expxdirβ̂k+v̂dk for crop K. Here, β̂k is the βk estimate at the convergence iteration β̂m=col1kK1β̂km and v̂dk is the predictor of the small-area random effect vdk in v=col1dDcol1kK1vdk defined by v̂=col1dDcol1kK1v̂dk=Σ̂ZTV̂ξ1ξXβ̂, where Σ̂=θ̂ID and θ̂ is the θ estimate at the convergence iteration θ̂m=row1kK1col1kK1σ̂kkm2.

B.3 Mean squared error

The mean squared error of predictor ŷNd is MSEŷNd=MSEτ̂dr+ir=1NdndVydir. The term MSEτ̂dr can be approximated by a Taylor series expansion of μ̂di=hη̂di around μdi=hηdi, where ηdi=Xdiβ+ZdiV.

As a result, we have μ̂diμdiHdiη̂diηdi. Let τdrT=ir=1NdndHdirηdir=ir=1NdndHdirXdirβ+HdirZdirV=Μdrβ+KdrV, where Μdr=ir=1NdndHdirXdir and Kdr=ir=1NdndHdirZdir. The approximation of MSEτ̂dr=MSEτ̂drT is MSEτ̂dr=G1dθ+G2dθ+G3dθ+G4dθ, where G1dθ=KdTKdT G2dθ=ΛddT, G3dθ=Γd∂θTVΓd∂θVθ1 and G4dθ=Vydr=ir=1NdndVydir, with T=ZTHZ+Σ11, H=diag1dDdiag1iNdHdi, P=Vβ̂=XTV1X1, V=ZΣZT+Ve, Λd=mdKdTZTVeX, and Γd∂θ=V1ZKdT. Here, Vθ denotes the Fisher information of the variance component parameters θ.

As estimator of the mean squared error of predictor ŷNd, we use the asymptotically unbiased estimator proposed in [23], MŜEŷNd=G1dθ̂+G2dθ̂+2G3dθ̂+G4dθ̂, where θ̂ is the θ estimate at the convergence iteration.

References

  1. 1. Ambrosio L, Iglesias L, Marin C, Deffense N. Integration of remote sensing data into national statistical office sampling designs for agriculture. Statistical Journal of the IAOS. 2023;39(2):473-489. DOI: 10.3233/SJI-220116
  2. 2. Sawin PH, Swain PH, Davis SM. editors. Remote Sensing: The Quantitaive Approach, pp136-187. New York: McGraw-Hill; 1978, 396 p
  3. 3. Golan A, Judge G, Miller D. Maximum entropy econometrics. New York: Wiley; 1996. 307 p
  4. 4. FAO. Handbook on master sampling frames for agricultural statistics. Frame development, sample design and estimation. In: Improving Agricultural and Rural Statistics. Global Strategy. Rome: FAO; 2015. 170 p
  5. 5. Cotter J, Nealon J. Area frame design for agricultural surveys. Washington, D.C.: Research and Applications Division. National Agricultural Statistics Service. United States Department of Agriculture; 1987
  6. 6. FAO. Multiple Frame Agricultural Surveys. Volume 1: Current Surveys Based on Area and List Sampling Methods. Volume 2: Agricultural Survey Programmes Based on Area Frame or Dual Frame (Area and List) Sample Designs, Statistical Development Series 7 and 10. Roma: FAO; 1998
  7. 7. Gay C, Porchier JC. Land cover and land use classification using TER-UTI. In: Holland TE and Van den Broecke MPR, Editors. Proceedings of Agricultural Statistics 2000, International Statistical Institute. Voorburg, The Netherlands. 1998. pp. 193-201. Available from: https://www.isi-web.org/isi.cbs.nl/iamamember/Books/agric2000/page-193.pdf
  8. 8. Nusser SM, Goebel JJ. The national resources inventory: A long-term multi-resource monitoring programme. Environmental and Ecological Statistics. 1997;4(3):181-204. DOI: 10.1023/A:1018574412308
  9. 9. Nusser SM, Goebel JJ, Fuller WA. Design and estimation for investigating the dynamics of natural resources. Ecological Applications. 1998;8(2):234-245. DOI: 10.1890/1051-0761(1998)008[0234:DAEFIT]2.0.CO;2
  10. 10. Directorate of Forecasting Analysis and Agricultural Statistics. Ministry of Agriculture and Rural Equipment. Senegal: Senegal-Annual Agricultural Survey. Methodology and sampling plan of the agricultural survey; 2017. Available from: http://anads.ansd.sn/index.php/catalog/218/download/1846
  11. 11. Fuller WA. Sampling Statistics. New York: Wiley; 2009. 472 p. DOI: 10.1002/9780470523551
  12. 12. Agresti A. Categorical data analysis. New York: Wiley; 2002. 710 p
  13. 13. McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: Wiley; 2001, 325 p
  14. 14. Hogland J, Billor N, Anderson N. Comparison of standard maximum likelihood classification and polytomous logistic regression used in remote sensing. European Journal of Remote Sensing. 2013;46(1):623-640. DOI: 10.5721/EuJRS20134637
  15. 15. Ambrosio L, Iglesias L. Land cover estimation in small areas using ground surveys and remote sensing. Remote Sensing of Environment. 2000;74:240-248. DOI: 10.1016/S0034-4257(00)00114-0
  16. 16. Cressie N. Statistics for Spatial Data. New York: John Wiley & Sons; 1991. 900 p
  17. 17. Ambrosio L, Iglesias L, Marin C. Systematic sample design for the estimation of spatial means. Environmetrics. 2003;14(1):45-61. DOI: 10.1002/env.564
  18. 18. Fuller WA, Isaki CT. Survey design under superpolutation models. In: Krewski D, Rao JNK, Platek R, editors. Current Topics in Survey Sampling. New York: Academic Press; 1981. pp. 199-226
  19. 19. Ambrosio L, Iglesias L. Identifying the most Appropriate Sampling Frame for Specific Landscape Types, Technical Report Series. GO-01-2014. FAO. Available from: https://www.fao.org/3/ca6436en/ca6436en.pdf
  20. 20. Ambrosio L, Iglesias L, Marin C. A model-assisted approach to identify a cost-efficient spatial sampling strategy. In: Paper Presented at: Eighth International Conference on Agricultural Statistics (ICAS-VIII). New Delhi (IN); 2019
  21. 21. Ypma J, Johnson SG, Stamm A. lnoptr R package [software]. Version 2.0.3.9100. Available from: https://astamm.github.io/nloptr/index.html
  22. 22. Saei A, Chambers R. Small Area Estimation under Linear and Generalized Linear Mixed Models with Time and Area Effects. Methodology Working Paper M03/15. Southampton (UK): Southampton Statistical Sciences Research Institute; 2003: 31 p. http://eprints.soton.ac.uk/id/eprint/8165
  23. 23. Molina I, Saei A, Lombardia MJ. Small area estimates of labour force participation under a multinomial logit mixed model. Journal of the Royal Statistical Society. Series A, Statistics in Society. 2007;170(4):975-1000. DOI: 10.1111/j.1467-985X.2007.00493.x
  24. 24. Lopez-Vizcaino E, Lombardia MJ, Morales D. Small area estimation of labour force indicators under a multinomial model with correlated time and area effects. Journal of the Royal Statistical Society. Series A, Statistics in Society. 2015;178(3):535-565. DOI: 10.1111/rssa.12085

Written By

Luis Ambrosio, Luis Iglesias, Carmen Marin, Sophie Bontemps, Böris Nörgaard, Pierre Houdmont, Cosmin Cara, Cosmin Udroin, Laurentiu Nicola, Sadri Haouet and Zoltan Szantoi

Submitted: 12 December 2023 Reviewed: 09 January 2024 Published: 19 March 2024