Earth Observation Applications to Agricultural and Environmental Statistics

Luis Ambrosio; Luis Iglesias; Carmen Marin; Sophie Bontemps; Böris Nörgaard; Pierre Houdmont; Cosmin Cara; Cosmin Udroin; Laurentiu Nicola; Sadri Haouet; Zoltan Szantoi

doi:10.5772/intechopen.1004409

Correction to: Earth Observation Applications to Agricultural and Environmental Statistics

Abstract

Pixels classifiers do not allow assessment of uncertainty due to classification errors. This is a major limitation to the direct use of crop type maps (CTM) as agricultural or environmental statistics. Here, we include the results of our recent research to overcome this limitation, using statistical models that integrate ground and CTM data to evaluate any source of uncertainty, including classification errors. We use linear models for field data that can be treated as values of a continuous variable, such as crop acreage observed at parcel level, and multinomial logit models for field data that can be treated as values of a categorical variable, such as the type of crop observed at the pixel level. Among the tasks required to produce agricultural and environmental statistics, we focus on three for which CTM are especially useful: improving the effectiveness of currently used methods, increasing the granularity of statistics by producing small area estimations, and optimizing the sample design. A prototype for integrating ground and CTM data to achieve each of them is included, together with the results of its application to data provided by two National Statistical Offices.

Keywords

crop type maps and ground data integration
linear models
multinomial logit models
efficiency
spatial resolution

Author Information

Show +

Luis Ambrosio*
- Technical University of Madrid, Madrid, Spain
Luis Iglesias
- Technical University of Madrid, Madrid, Spain
Carmen Marin
- Technical University of Madrid, Madrid, Spain
Sophie Bontemps
- Catholic University of Louvain, Louvain, Belgium
Böris Nörgaard
- Catholic University of Louvain, Louvain, Belgium
Pierre Houdmont
- Catholic University of Louvain, Louvain, Belgium
Cosmin Cara
- CS Group Romania, Craiova, Romania
Cosmin Udroin
- CS Group Romania, Craiova, Romania
Laurentiu Nicola
- CS Group Romania, Craiova, Romania
Sadri Haouet
- CLS, Lille, France
Zoltan Szantoi
- European Space Agency, Frascati, Italy
- Stellenbosch University, Stellenbosh, South Africa

*Address all correspondence to: luis.ambrosio@upm.es

1. Introduction

“End hunger, achieve food security, and promote sustainable agriculture” is the second (the first is “end poverty”) Sustainable Goal in the United National 2030 Development Agenda. To implement policies and practices in pursuit of these goals, and to monitor and control such policies, detailed information about land uses, crops acreage, and crops yield is necessary.

International programs to improve agricultural and environmental statistics have been recently launched by the World Bank and the Food and Agriculture Organization (FAO) of the United Nations. Looking for synergies with these programs, and with the aim of facilitating the use of remote sensing (RS) information at the National Statistical Offices (NSOs) supporting agricultural and environmental statistics, the European Space Agency (ESA) launched the Sentinels for Agricultural Statistics (Sen4Stat) project.

In this chapter, along with well-established results in the RS literature (the statistical classifiers for crop type maps (CTM) generation), we include the main results of our recent research in the applications of CTM to agricultural and environmental statistics, carried out within the framework of Sen4Stat and partially published in the Statistical Journal of the IAOS [1]. To date, pixel classifiers do not allow the assessment of uncertainty due to classification errors. This is a major limitation to directly using CTM data as statistical estimates, and here, we focus on how to use statistical models to integrate CTM with ground data to produce efficient agricultural and environmental statistics, taking into account any source of uncertainty, including the classification errors.

In large areas, where the sample size for collecting ground data is large enough, we follow an approach based on designing a probabilistic scheme to select the sample and on statistical models to integrate the ground data with CTM data. If ground data can be treated as values of a continuous variable, such as crop acreage observed at the parcel level, we use linear models. If ground data can be treated as values of a categorical variable, such as the type of crop observed at the pixel level, we use multinomial logit models. This design-based approach is robust in the sense that the inference is based not on the model, but on the sampling design, and the estimates are design-consistent. In small areas, where the sample size is small, the design-based approach is not accurate enough for most applications, and the inference is based on the model.

We focus on three of the tasks necessary to produce agricultural and environmental statistics, for which CTM is especially useful: improving the efficiency of currently used methods, increasing the granularity of statistics, and optimizing sample design. A ground and CTM data integration prototype to conduct each of these tasks is included, along with examples of its application to data provided by National Statistical Offices.

2. Pixels classification error

Many RS applications involve the classification of the i=1,2,⋯,N population pixels, using multispectral (L bands) reflectance data, xi=row1≤l≤Lxli [2]. In agricultural and environmental statistics, the product of pixels’ classification most often used is a CTM, where a class is associated with each one of the type of crops or land uses, k=1,2,⋯,K.

However, pixels classifiers are subject to error, and as a result, CTM are subject to uncertainty. In this section, we will show that pixel classifiers do not allow the assessment of this uncertainty. This is a major limitation to use directly CTM as agricultural or environmental statistics, because in addition to estimates of the study population characteristics, measures of the uncertainty of these estimates are required.

2.1 Classifiers

Let i∈k denote the event “pixel i belongs to class k.” Except for the few pixels included in the sample, there is uncertainty about the occurrence or not of this event, and to model this uncertainty, as well as the instrument measurement errors, we consider the joint probability function, fi∈kxi, of the two events “pixel i belongs to class k and its spectral measure is xi.” This function is used to define the decision rule: “Classify i in the class k for which the probability fi∈kxi is maximum: fi∈kxi=maxk=1,2,⋯,Kfi∈kxi.” Three main approaches proposed in the literature to implement this rule are (i) maximum likelihood, (ii) maximum entropy, and (iii) multinomial regression.

The maximum likelihood (ML) approach [2] assumes that xi is distributed, within a given class k, according to a multivariate normal, xii∈k→NLmkVk, of mean mk=Exii∈k, and covariance matrix Vk=Varxii∈k. Under this assumption, the decision rule fi∈kxi=maxk=1,2,⋯,Kfi∈kxi is equivalent to classify i in the class k if gkxi=maxk=1,2,⋯,Kgkxi, where gkxi=logfi∈k−12logdetVk−12xi−μkVk−1xi−μkTi=12⋯Nk=12⋯K (see Appendix 1). mk and Vk are unknown but can be estimated using a simple sample of nk pixels and the estimators m̂k=col1≤l≤Lm̂lk=col1≤l≤L1nk∑i=1nkxlki and V̂k=col1≤l≤Lrow1≤l′≤L1nk−1∑i=1nkxlki−m̂lkxl′ki−m̂l′k.

Two disadvantages of this approach are that normality is a very restrictive assumption for reflectance data and that it does not allow a direct computation of class probability, μik=Pi∈kxi (given xi, μik is the probability that crop k covers pixel i).

The maximum entropy (ME) approach has the advantage that it does not require any assumptions about statistical distribution and directly provides estimates of the probability of each class. The entropy is an uncertainty measure: for the ith pixel, the quantity of uncertainty in μi=col1≤k≤Kμik is Hμi=−∑k=1Kμiklnμik. To estimate μ=col1≤i≤Ncol1≤k≤Kμik, we use crops data observed in a sample of n pixels, y=col1≤i≤ncol1≤k≤Kyik where yik=1 if crop k covers pixel i and yik=0 otherwise, together with RS data X=col1≤i≤nrow1≤l≤Lxli. The maximum entropy principle is to choose the class probability values, μik, compatible with the observed data, y, that makes Hμ=−μTlnμ=−∑i=1n∑k=1Kμiklnμik maximum.

The condition of compatibility between μ and y is specified by the model y=μ+ε, where ε=col1≤i≤ncol1≤k≤Kεik and εik is an unobserved random perturbation. The RS data are introduced by transforming this model into a set of consistency restrictions: IK⊗XTy=IK⊗XTμ+IK⊗XTε, to which we impose the condition of uncorrelation between spectral measures and random perturbations, IK⊗XTε=0, to reduce the consistency restrictions to IK⊗XTy=IK⊗XTμ.

The number nK of unknown class probabilities, μ=col1≤i≤ncol1≤k≤Kμik, is higher than the number LK of consistency restrictions, and as a result, there is not a unique solution for μ. Among the countless solutions, we choose the one making Hμ maximum, subject to the consistency restrictions and the adding-up normalization conditions, diagn1KTμ=1n, where 1K is a K×1 vector of ones, and 1n is a n×1 vector of ones. This solution is found by solving the Lagrange function Ψ=−μTlnμ+βTIK⊗XTμ−IK⊗XTy+λT1n−diagn1KTμ, where β=col1≤k≤Kβk=col1≤k≤Kcol1≤l≤Lβkl and λ=col1≤i≤nλi.

The multinomial (MNL) approach is similar to the maximum entropy. In fact, the entropy Hμ can be derived from the multinomial distribution (see Appendix 1). Both approaches directly provide coincident class probabilities estimates, but the former requires to specify a function relating these probabilities with the RS data, and we consider the logit function, μik=e∑l=1Lβklxli1+∑k=1K−1e∑l=1Lβklxli, where the coefficients β=col1≤k≤Kβk=col1≤k≤Kcol1≤l≤Lβkl coincide with the Lagrange parameters linked to the consistency restrictions in the ME approach. Consequently, the class probability estimates μ̂ik based on ME coincide with those based on MNL [3]: for any i=1,2,⋯,N it is μ̂ik=e∑l=1Lβ̂klxli1+∑k=1K−1e∑l=1Lβ̂klxli for k=1,2,⋯,K−1 and μ̂iK=11+∑k=1K−1e∑l=1Lβ̂klxli, where β̂=col1≤k≤Kcol1≤l≤Lβ̂kl is the β estimate.

Although MNL has the disadvantage over ME that it requires specifying a functional form to link μik and xi, it has the advantage over ML and ME that it allows to calculate the accuracy of the estimates by integrating ground and RS data (even in the case where the classifier used is ML or ME), taking into account the uncertainty due to classification errors.

2.2 Classification errors

The decision rule is subject to errors, and as a result, the classification is subject to uncertainty. Let i⊂k denote the decision to “include pixel i in class k.” This decision is subject to two types of error: the type one, called false negative and denoted by i⊄ki∈k, consists of excluding from a given class k a pixel i that is actually of that class, and the error of type two, called false positive and denoted by i⊂ki∉k, consists of including a pixel i in a given class that is not of that class.

The computation of the classification error probabilities Pi⊂ki∉k and Pi⊄ki∈k is almost never possible, mainly due to the high dimension L of RS data [2]. In effect, Pi⊂ki∉k=Pi⊂k1−Pi∈ki⊂k/1−Pi∈k and i⊂k⇔β̂k−β̂k′>0;k≠k′ so that Pi⊂k=Pβ̂k−β̂k′>0;k≠k′. In the MNL approach, the β̂k−β̂k′ distribution is asymptotically normal L-dimensional, and the computation of Pβ̂k−β̂k′>0;k≠k′ is complex due to the high L values.

Instead of Pi⊂ki∉k and Pi⊄ki∈k, in practice, several ratios are calculated using the sample data (or, more often, by dividing the sample data into two groups, one for training and the other for testing the classifier), but this does not allow us to evaluate the uncertainty due to classification errors. For instance, if we denote by ni∈k the number of sample pixels that actually are of class k, and by ni∈k∩i⊂k those among ni∈k that are correctly classify in k, then ni∈k∩i⊂k/ni∈k is an estimate of the so-called producer’s accuracy for class k. In the same way, if we denote by ni⊂k the number of sample pixels included in the class k, then ni∈k∩i⊂k/ni⊂k is an estimate of the so-called user’s accuracy for class k. An estimate of the so-called overall accuracy is ∑k=1Kni∈k∩i⊂k/∑k=1Kni∈k.

Accounting for the uncertainty due to classification errors is key, and that is why the data resulting from classification cannot be used directly as agricultural or environmental statistics. To take into account any uncertainties, including those due to classification errors, the CTM are integrated with the ground data, using statistical models in the way we show in the following sections.

3. Design-based approach: large area estimation

The approach to producing official statistics is based on a probabilistic scheme designed to select the sample from which to collect the required ground data. This scheme allows us to assign a probability of inclusion in the sample to each population unit. To estimate the characteristics of the study population, we use design-consistent estimators whose distribution is tightly concentrated around the true characteristics value, if the sample size is large enough. The uncertainty of these estimates is assessed using the statistical distribution defined by the inclusion probabilities.

The sample is selected using an area frame or a list frame [4, 5, 6]. An example of an area frame is that used by Spain: the national topographic map divided into sampling units, each of which is a square of 700 m on each side (49 hectares). The inclusion probabilities is equal to n/N, where n and N are the number of sampling units in the sample and in the population, respectively [6]. France and the United States also use area frames but using points as sampling units, rather than polygons. France to estimate crop acreage [7], and the United States to collect environmental data and develop a national inventory of natural resources [8, 9].

An example of list frame is that used by Senegal: it is based on the 2013 population census [10]. The sampling unit is a household with agricultural activities. The sampling scheme consists of two stages. In the first stage, a sample of Enumeration Areas (EA) is selected with replacement and probabilities proportional to the number of households in the EA. In the second stage, from each selected EA, a sample of households is selected without replacement and with equal probability among households with agricultural activities. To integrate the data collected on the household sample with the CTM data, a georeferenced pixel is selected in each parcel within the two-stage sample.

To assess uncertainties, and particularly those due to classification errors, we use statistical models. The choice of a model depends on ground data, and we consider linear models, for ground data collected at the parcel level, which can be treated as values of a continuous variable, and multinomial logit models for ground data collected at the pixel level, which can be treated as values of a categorical variable.

3.1 Ground data collected at the parcel level

Let the sampling unit, i, be a parcel or a cluster of parcels (say a polygon, as in Spain), and let i=1,2,⋯,N be the set of sampling units of the population. We consider a linear model yi=xiβ+εi that relates ground data observed on the study crop in the ith sampling unit, yi, with the CTM data, xi, aggregated at the ith level. For a given crop, xi is a row vector with only two components; the first is the constant 1, x1i=1, and the second, x2i=xi, is the number of pixels classified in the CTM as belonging to the study crop in i: for convenience, we will denote xi in the general form xi=row1≤l≤Lxli where L=2, x1i=1 and x2i=xi. The parameters vector β is unknown and must be estimated.

In sampling units not included in the sample, only xi is observed. Given xi, the expected value of the unknown yi is modeled by Eyi=xiβ, and the deviation of yi from xiβ is modeled by εi. This deviation is a source of uncertainty, mainly due to classification errors, and we will show how to assess it using the εi sampling variance.

The survey variable total in the population is yN=∑i=1Nyi. As a result of including the component x1i=1 in the model, yN can be written as yN=xNBN, where xN=∑i=1Nxi=NxN, and xN=∑i=1Nxi is the total number of pixels classified in the CTM as belonging to the study crop, and BN=XNTXN−1XNTyN designates the vector of population regression coefficients BN=col1≤l≤LBl (with L=2), where XN=col1≤i≤Nrow1≤l≤Lxli (with L=2) and yN=col1≤i≤Nyi [11]. Since xi is known for every population unit i=1,2,⋯,N, then xN and XN are known. However, yN is unknown and so is BN.

To estimate BN, we use the data collected in the sample n, selected from the population with inclusion probabilities πii=12⋯N. The design-based estimator B̂π=∑i=1nxiTxiπi−1∑i=1nxiTyiπi is design-consistent for BN, and as a result, the synthetic or projective estimator ŷN=xNB̂π is design-consistent for yN [11].

The error of ŷN as an estimator of yN is ŷN−yN=xNB̂π−BN, and its sampling variance is VŷN−yN=xNVB̂π−BNxNT, where VB̂π−BN=∑i=1NxiTxi−1V∑i=1nxiTεiNπi∑i=1NxiTxi−1 and εiN=yi−xiBN. The sampling variance is a measure of uncertainty that depends both on the deviation εiN between the true value of yi and its expected value based on xi and on the inclusion probabilities, πi. A design-consistent estimator of the sampling variance is V̂ŷN−yN=V∑i=1nε̂iNπi, where V. is the design-based variance and ε̂iN=yi−xiB̂π.

The asymptotic distribution of ŷN is normal, VŷN−yN−1/2ŷN−yN→N01, and can be used for constructing uncertainty measures in terms of probability for yN, such as confidence intervals.

To estimate the total of the survey variable, yNR=∑i=1NRyi, in large areas that are part of the population, such as a region or a province R with NR sampling units within a country, we use the sample of size n selected from the population with inclusion probabilities πii=12⋯N and the estimator ŷNR=xNRB̂π+NRN̂R∑i=1nRyi−xiB̂ππi, where nR is number of sampling units included in the sample n that fall in R, and N̂R=∑i=1nIiπi is an estimator of NR, where Ii=1 if i is in R and Ii=0 otherwise. A measure of the ŷNR uncertainty is its sampling variance VŷNR−yNR=NR2V1N̂R∑i=1nRεiNπi, and a design-based estimator of VŷNR−yNR is V̂ŷNR−yNR=NR2V1N̂R∑i=1nRε̂iNπi.

3.1.1 Crop type map relative efficiency

The relative efficiency of CTM data is RECTM=VGVG+CTM−1, where VG is the current sampling variance based on ground data alone, and VG+CTM is the sampling variance based on the integration of ground and CTM data. If RECTM>1, then the current sampling variance may be reduced by the amount VG−VG+CTM=1−RECTM−1VG, without increasing the sample size. As a result, the ground sample size can be reduced to nG+CTM=RECTM−1nG sampling units, without loss of accuracy. nG is the current sample size for ground data collection.

For πi=n/N, the sampling variance based on ground data alone is VG=V∑i=1nyin/N=N21−n/N1/nn−1∑i=1nyi−y¯2 and the sampling variance using both ground and CTM data is VG+CTM=V∑i=1nyi−xiB̂πn/N=N21−n/N1/nn−2∑i=1nyi−xiB̂π2. As a result, RECTM≃∑i=1nyi−y¯2∑i=1nyi−xiB̂π2−1 and nG+CTM=∑i=1nyi−xiB̂π2∑i=1nyi−y¯2−1nG.

3.1.2 Example

Our data source is Sen4Stat. The goal of this project was focused on developing an open-source system for generating CTM from Sentinel images, using data provided by the NSOs (https://www.esa-sen4stat.org/). We use field data on crops collected at the parcel level by the NSO of Spain, to illustrate the integration of continuous ground data and CTM data using linear models.

3.1.2.1 Crop acreage estimates

We consider a large area of size 100 × 100 km in Castilla y León (Spain), with X coordinates (in meters): (300,000,400,000) and Y coordinates (4,600,000,4,700,000). The sampling unit is a square with a side of 700 m, and the sample size is 419 sampling units. The study crop is barley, and the ground data were observed at the parcel level. However, data were aggregated at the sampling unit level for calculations so that yi represents the ground data on the barley acreage in sampling unit i and in xi=1xi, xi is the number of pixels classified in CTM as belonging to barley in this same sampling unit i.

Model parameter estimates are B̂π=0.0460.093T, and barley acreage estimates based both on ground data alone and on the integration of ground and CTM data are found in Table 1. As a result of this integration, the accuracy of the estimates improves considerably: the sampling error (or coefficient of variation, which is the ratio between the square root of the sampling variance and the acreage estimate in %) and the wide of the confidence interval (within whose limits the true value of the barley acreage lies, with a high probability (95%) were reduced by half.

Data	Acreage (hectare)	Uncertainty			Relative efficiency of CTM data
		95% Confidence interval (hectare)		Sampling error (CV%)
		Limits	Wide	Sampling error (CV%)
Ground	236165.4	Lw:15951.7	40427.3	4.37	—
Ground	236165.4	Up: 56379	40427.3	4.37	—
Ground + CTM	228550.1	Lw:19699.8	17700.5	1.98	5.2
Ground + CTM	228550.1	Up:37400.3	17700.5	1.98	5.2

Table 1.

Barley acreage estimates in a 100 × 100 km area of Castilla y León, 2018.

The CTM relative efficiency, RECTM, was high: using CTM data, the current ground sample size could be reduced to less than one-fifth without loss of accuracy.

3.1.2.2 Crop yield estimates

Let yij be the yield of barley observed in parcel j of sampling unit i, and let xij=row1≤l≤Lxijl be a 1×L row vector with L=2, xij1=1 in the first position, and a normalized difference vegetation index (NDVI) in the second position, . The estimates based both on ground data alone and on the integration of ground and NDVI data are found in Table 2.

Data	Yield (kg/hectare)	Uncertainty			Relative efficiency of RS data
		95% Confidence interval (kg/hectare)		Sampling error (CV%)
		Limits	Wide	Sampling error (CV%)
Ground	4213.617	Lw: 4093.97	239.293	1.45	—
Ground	4213.617	Up: 4333.263	239.293	1.45	—
Ground + RS	4155.688	Lw: 4033.231	244.915	1.50	0.95
Ground + RS	4155.688	Up: 4278.146	244.915	1.50	0.95

Table 2.

Barley yield estimates in a 100 × 100 km area of Castilla y León, 2018. NDVI.

Although the vegetation index contribution to explain crop yield is statistically significant, its contribution to improve yield estimator accuracy is low: the sampling error is of the same order of magnitude as if ground data alone were used. The relative efficiency is nearly 1, so the current sample size could barely be reduced without loss of accuracy. Additional research is required to make production estimates more reliable through improved yield estimation from RS data.

3.1.2.3 Crop production estimates

Table 3 shows the estimate production of barley, calculated as the product of the crop acreage estimate and yield estimate.

Data	Production (tons) (1000 kg)	Uncertainty
		95% Confidence interval (tons: 1000 kg)		Sampling error (CV%)
		Limits	Wide	Sampling error (CV%)
Ground	987,841	Lw: 903640	168,402	4.35
Ground	987,841	Up: 1072042	168,402	4.35
Ground + RS	965,733	Lw: 915194	101,078	2.67
Ground + RS	965,733	Up: 1016272	101,078	2.67

Table 3.

Barley production estimates in a 100 × 100 km area of Castilla y León, 2018.

Using RS data, the production estimator accuracy increased considerably: the estimation error decreased by half, even if the RS data is not reliable for crop yield.

3.1.2.4 Crop acreage estimates at the provincial level

We estimate barley acreage in that part of the provinces within the 100 × 100 km area of Castilla y León (Spain). Results are in Table 4.

Province	Ground data alone		Ground & CTM data		Relative efficiency of CTM data
Province	Acreage (has.)	Sampling error (CV%)	Acreage (has.)	Sampling error (CV%)	Relative efficiency of CTM data
León	6853.2	24.15	6834.5	16.20	2.3
Palencia	88602.0	7.36	90535.3	3.33	4.7
Valladolid	128209.5	5.57	119707.4	2.66	5.1
Zamora	12324.2	17.71	10948.2	8.16	6.4
Total area	235989.1	4.37	228028.5	1.98	5.2

Table 4.

Barley acreage estimates at provincial level.

The estimate accuracy at provincial level is less than that at the national level, but the CTM relative efficiency is similar. Where the sampling variance using ground data alone is high (León), its integration with CTM data reduces the sampling error below the standard limits (20%) considered acceptable in official statistics.

3.2 Ground data collected at the pixel level

We use pixel i as sampling unit, in the same way that a point is used, that is, treating the ground data observed at the pixel level as values of a categorical variable. To each pixel i=1,2,⋯,N of the study area, we associate a survey vector yi=col1≤k≤Kyik, where yik=1 if crop k covers pixel i and yik=0 otherwise. K is the total number of crops so that the constraint ∑k=1Kyik=1 holds.

The total of this survey vector in the population is yN=∑i=1Nyi=col1≤k≤K∑i=1Nyik=col1≤k≤KyNk, where yNk is the total number of pixels covered by crop k. It is assumed that a pixel is covered by a single crop (or that a class of mixed pixels is included) so that the area covered by k is Ak=ayNk, where a is the area of the terrain represented by a pixel.

To estimate yN, we follow a design-based approach using ground data collected in a sample of n pixels, selected from the population with inclusion probabilities πii=12⋯N. To integrate these sample data with CTM data, we use a multinomial logit model [12, 13, 14]. We assume that the survey vector yi=col1≤k≤Kyik follows a multinomial distribution MN1μi, where μi=col1≤k≤Kμik and μik is the probability that crop k=1,2,⋯,K covers pixel i (probability of yik=1 and yik′=0;∀k′≠k), with the constraint ∑k=1Kμik=1.

To link μik with CTM data, we use a logit model μik=expxiβk∑k=1Kexpxiβk, where xi=row1≤l≤Lxil is an indicator vector of the class where pixel i is located within the CTM (xil=1 if pixel i is in the class l and xil=0 otherwise), and βk=col1≤l≤Lβkl is an unknown parameter vector. To estimate μik, it is sufficient to obtain estimates of βk for k=1,2,⋯,K−1 [12]. The probability of crop K is μK=1−∑k=1K−1μik and can be estimated using the estimates β̂πk for k=1,2,⋯,K−1.

The design-based parameter estimator B̂π=col1≤k≤K−1B̂πk=col1≤k≤K−1col1≤l≤LB̂πkl is design-consistent and can be found iteratively using B̂πm+1=B̂πm+∑i=1nHyiβπiB=B̂πm−1∑i=1nbyiβπiB=B̂πm [11], where byiβ=col1≤k≤K−1col1≤l≤Lyik−μikxil and Hyiβ=col1≤k≤K−1row1≤k′≤K−1δkk′μik−μikμik′⊗xiTxi, with δkk′=1 if k=k′ and δkk′=0 otherwise.

A design-consistent estimator of yN=col1≤k≤KyNk is ŷN=∑i=1Nμ̂i+NN̂p∑i=1nyi−μ̂iπi=col1≤k≤K−1ŷNk, where μ̂i=col1≤k≤K−1μ̂ik, μ̂ik=expxiB̂πk1+∑k=1K−1expxiB̂πk for k=1,2,⋯,K−1, μ̂iK=1−∑k=1K−1μ̂ik=11+∑k=1K−1expxiB̂πk, N̂p=∑i=1n1πi, and ŷNk=∑i=1Nμ̂ki+NN̂p∑i=1nyki−μ̂kiπi is the estimator of the total number of pixels covered by crop k for k=1,2,⋯,K−1; for crop K, it is ŷNK=N−∑k=1K−1ŷNk. The sampling covariance matrix of ŷN is VŷN=N2V1N̂p∑i=1nyi−μ̂iπi=col1≤k≤K−1row1≤k′≤K−1N2Covŷrkŷrk′, where ŷrk=1N̂p∑i=1nyki−μkiπi is a function of N̂pk=∑i=1nykiπi (because N̂p=∑k=1KN̂pk), the estimators of the total number of pixels covered by crop k=1,2,⋯,K based on ground data alone, and of r̂kN=∑i=1nyki−μ̂kiπi, the estimator of the total of the deviations between yki and μki in the population.

The sampling variance is an uncertainty measure that depends both on the deviation between the true, yi, value and its expected value estimate, μ̂i, and on the sample design, πi. We focus on the sampling variance VŷNk for k=1,2,⋯,K−1, which are the elements along the diagonal of VŷN. Let Ĝk=rowgj1≤j≤K+1=r̂kNN̂p1N̂p2⋯N̂pK be a row vector whose K+1 components are the estimators on which ŷrk depends. The ŷNk sampling variance is VŷNk=row1≤j≤K+1∂ŷrk∂gjVĜkcol1≤j≤K+1∂ŷrk∂gj, where VĜk=col1≤j≤K+1row1≤j′≤K+1Covgjgj′ is the design-based covariance matrix of Ĝk. The sampling covariance matrix of ŷN is estimated by replacing μi with μ̂i inVŷN.

3.2.1 Crop type map relative efficiency

The sampling variance of the yNk estimator based on ground data alone is VG=N2V1N̂p∑i=1nykiπi and that based on its integration with CTM data is VG+CTM=N2V1N̂p∑i=1nyki−μkiπi. The relative efficiency of CTM data for estimating the total number of pixels covered by crop k is RECTMk=V1N̂p∑i=1nykiπi×V1N̂p∑i=1nyki−μkiπi−1.

3.2.2 Example: crop acreage estimates

We use field data collected at the pixel level by the NSO of Senegal, to illustrate the integration of categorical ground data and CTM data using multinomial logit models. The crop covering each pixel included in the sample selected in the Nioro Department was observed in 2021. In the CTM, pixels covering agricultural areas were classified into four crop types: (i) maize, (ii) millet, (iii) groundnut, and (iv) other crops, and the data are encode as xi=1000 for any pixel i of the class maize, xi=0100 for any pixel i of the class millet, xi=0010 for any pixel i of the class groundnut, and xi=0001 for any pixel i of the class other crops.

We focus on estimating the acreage of the two main crops: millet and groundnut. The remaining crops (mainly maize) are included in a third category called other, whose estimated probability is one minus the estimated probability of millet and groundnut. Model parameter estimates are in Table 5.

Crop	Crop type map
Crop	Maize	Millet	Groundnut	Other crops
Millet	−0.010655	1.47958334	0.88931346	8.76504093
Groundnut	−0.659204	−0.59871700	2.89873948	1.21898514

Table 5.

Model parameter estimates (B̂πmillet and B̂πgroundnut).

Crop acreage estimates of millet and groundnut based on ground and CTM data are in Table 6.

Crop	Acreage (Hectare)
		Sampling error (CV%)	Limits of 95% Confidence interval
		Sampling error (CV%)	Lower	Upper	Wide
Millet	89,215	4.11	81978.88	96330.4	14351.52
Groundnut	78,815	3.71	73089.15	84550.98	11461.82

Table 6.

Crop acreage estimates, Nioro (Senegal) 2021.

The CTM data efficiency estimate is in Table 7.

Crop type	Standard errors of proportion estimators		Relative efficiency of CTM data
Crop type	Ground data alone	Ground and CTM data	Relative efficiency of CTM data
Millet	3.37	1.90	3.13
Groundnut	3.34	1.52	4.80

Table 7.

Efficiency of CTM data for crop acreage estimation, Nioro (Senegal).

The integration of ground data observed at the pixel level and CTM data improves the accuracy of the estimators based on ground data alone: the standard error is cut by a half, without loss of design-based consistency. As a result, the size of the current sample used for collecting ground data could be reduced to less than a third, without loss of accuracy.

Table 8 shows crop acreage estimates in the districts Medina Sabakh, Paoskoto, and Wack Ngouna.

District	Millet		Groundnut
District	Acreage (Has.)	Sampling error (CV%)	Acreage (has.)	Sampling error (CV%)
Medina Sabakh	20067.21	8.6	19765.36	7.3
Paoskoto	38316.02	5.3	35018.23	4.0
Wack Ngouna	30831.77	11.9	24030.71	10.7
Total Nioro	89215.00	4.11	78815.00	3.71

Table 8.

Crop acreage estimates at the district level, Nioro (Senegal).

The estimate accuracy at the district level is lower than at the department level. However, the sampling error is within the standard limit (CV < 20%) for labeling as official statistics.

4. Model-based approach: small area estimation

In small areas, such a municipality, the sample size is small, and as a result, the design-based approach is not accurate enough for most applications. In these areas, inference is based on the model, rather than the sampling design, and the estimates are not robust, but model-dependent.

4.1 Ground data collected at the parcel level

Let the sampling unit, i, be a parcel or a cluster of parcels (say a polygon, as in Spain), and let ydi be the value of the survey variable in a sampling unit i=1,2,⋯,Nd of the small area d=1,2,⋯,D. Let ydixdii=12⋯nd be the ground data set, ydi, and the CTM data set, xdi=1xd2i, in the nd sampling units from the whole sample n that fall in d. Here, xd2i is the number of pixels classified in CTM as belonging to the study crop within the sampling unit i of the small area d.

To estimate yNd=∑i=1Ndydi, the total of the survey variable in d, we use CTM data and the linear mixed model ydi=xdiβ+ud+εdi, where udεdi are zero-mean independent random variables of variances σu2σe2.

The model-based estimator of yNd is ŷNd=Nd1−γ̂dx¯Ndβ̂+γ̂dy¯nd+x¯Nd−x¯ndβ̂, where β̂=XTV̂−1X−1XTV̂−1y is the generalized least-square estimator of β, with y=col1≤d≤Dcol1≤i≤ndydi, X=col1≤d≤Dcol1≤i≤ndxdi, V̂−1=diag1≤d≤DV̂d−1 and V̂d−1=1σ̂e2Ind−γ̂dndσ̂e21nd1ndT, where γ̂d=σ̂u2σ̂u2+σ̂e2/nd. Here, σ̂e2=eTen−D−1 and σ̂u2=uTu‐n−2σ̂e2n∗, with n∗=∑d=1Dnd1−ndx¯dXTX−1x¯dT, are unbiased estimators of the variance components, where eTe is the sum of squared residuals in the model fitted by ordinary least squares, taking as fixed the small-area effect ud.uTu is the sum of squared residuals in the model fitted by ordinary least squares, with ud=0. x¯Nd and x¯nd are the population and sample means of the vector xdi, respectively.

An unbiased estimator of the total mean-square error estimator MSEŷNd is [15]: M̂SEŷNd=Nd21−ndNd2h1dσ̂v2σ̂e2+h2dσ̂v2σ̂e2+2h3dσ̂v2σ̂e2, where:

h1dσ̂v2σ̂e2=γ̂dσ̂e2nd+1−ndNd2Nd−ndNd2h2dσ̂v2σ̂e2=σ̂e2X¯Nd−nd−γ̂dx¯dA−1X¯Nd−nd−γ̂dx¯dTh3dσ̂v2σ̂e2=1nd21σ̂v2+σ̂e2nd3σ̂e22Vσ̂v2+σ̂v22Vσ̂e2−2σ̂e2σ̂v2Covσ̂e2σ̂v2

(where)

A=∑d=1D∑i=1ndxdiTxdi−γ̂dndx¯ndTx¯ndVσ̂v2=2n∗21n−D−1D−1n−2σ̂e22+2n∗σ̂e2σ̂v2+n∗∗σ̂v22Vσ̂e2=2σ̂e22n−D−1Covσ̂e2σ̂v2=−1n∗D−1Vσ̂e2

Here, n∗∗=∑d=1Dnd21−ndx¯ndA1−1x¯ndT+trA1−1∑d=1Dnd2x¯ndTx¯nd2. Because A1=∑d=1D∑i=1ndxdiTxdi, n∗∗ may be simplified to n∗∗=∑d=1Dnd21−x¯ndXTX‐1x¯ndT=n∗−n+∑d=1Dnd2.

4.1.1 Example

Table 9 shows barley acreage estimates at the municipality level in that part of Zamora province within the 100 × 100 km area of Castilla y León (Spain).

Municipality	Sample size	Estimates
Municipality	Sample size	Acreage (Hectares)	Sampling error (CV%)
Belver de los Montes	1	212.96	29.1
Castroverde	3	2914.22	8.0
Pinilla de Toro	4	963.30	10.0
Quintanilla del Monte	1	466.65	20.3
Toro	1	615.91	14.0
Vezdemarbán	3	1358.22	12.6
Villalpando	2	560.05	39.1
Villamayor de Campos	1	1056.23	11.1
Villanueva del Campo	1	784.03	13.2
Villar de Fallaves	1	844.16	11.0
Villardondiego	1	516.40	11.5
Villavendimio	1	656.07	10.4
Total Zamora	20	10948.2	8.16

Table 9.

Barley acreage estimates at municipality level.

Even in municipalities where the sample size is as small as a single sampling unit, CTM data contribute to reducing sampling error below the standard limit (20%) considered to label estimates as official statistics.

4.2 Ground data collected at the pixel level

We consider a population of i=1,2,⋯,N pixels grouped in a set of d=1,2,⋯,D small areas (municipalities) such that N=∑d=1DNd, where Nd is the number of pixels in the small area d. Let ydi=col1≤k≤Kydik be the survey vector where ydik=1 when crop k covers pixel i and ydik=0 otherwise.

To estimate yNd=∑i=1Ndydi=col1≤k≤KyNdk, the total of the survey vector over a small area d, where yNdk=∑i=1Ndydik is the number of pixels of d covered by crop k, we use CTM data, and we assume that ydi follows a multinomial distribution MN1μdi, where μdi=col1≤k≤Kμdik and μdik is the probability of ydik=1 and ydik′=0;∀k′≠k, with the constraint ∑k=1Kμdik=1. To link μdik with CTM data, we use a logit mixed model μdik=expxdiβk+vdk1+∑k=1K−1expxdiβk+vdk, where μdik is the probability that crop k covers i of d, xdi=row1≤l≤Lxdil is an indicator vector whose components are xdil=1 if pixel i of d is classified in the crop-type class l=1,2,⋯,L and xdil=0 otherwise, and xdiβk+vdk=ηdik is a linear mixed predictor consisting of a fixed xdiβk and a random component, vdk, associated with the small areas and specifics for each crop k.

Here, βk=col1≤l≤Lβkl is a vector of parameters measuring the effects of the CTM predictors xdi on μdi, for example, for a pixel of the crop-type class l=1,2,⋯,L we have μdik=expβkl+vdk1+∑k=1K−1expβkl+vdk for crops k=1,2,⋯,K−1 and μdiK=11+∑k=1K−1expβkl+vdk for crop K. The random effect, vdk, accounts for the heterogeneity of the survey vector ydi=col1≤k≤Kydik among small areas, not explained by CTM data. It is assumed that vdk is a realization of a zero-mean random variable whose small-area variance is Vvdk=σk2 for crop k and whose cross-crop covariance is Covvdkvdk′=σkk′2 so that the vector of small-area random effects V=col1≤d≤Dcol1≤k≤K−1vdk has mean EV=0 and covariance matrix Σ=θ⊗ID, where θ=row1≤k≤K−1col1≤k′≤K−1σkk′2 and σkk′2=σk2 if k=k′.

To estimate yNd, we observe ydi only in the nd pixels that fall within the small area d, that is, a small part of the full sample of size n=∑d=1Dnd; the values of ydi in the remaining Nd−nd pixels are unknown, and we want to predict them using the model. We consider yNd decomposed into two ydi aggregates yNd=∑is=1ndydis+∑ir=1Nd−ndydir: one known, ∑is=1ndydis, and the other, ∑ir=1Nd−ndydir, unknown. We use ŷNd=∑is=1ndydis+∑ir=1Nd−ndμ̂dir as a predictor of yNd, where μ̂dir=col1≤k≤K−1μ̂dirk with μ̂dirk=expxdirβ̂k+v̂dk1+∑k=1K−1expxdirβ̂k+v̂dk for crop k=1,2,⋯,K−1, and μ̂dirK=11+∑k=1K−1expxdirβ̂k+v̂dk for crop K. Here, β̂k is the βk estimate and v̂dk is the predictor of the small-area random effect vdk in v=col1≤d≤Dcol1≤k≤K−1vdk. To estimate both the fixed-effect β=col1≤k≤K−1βk and random-effect θ=row1≤k≤K−1col1≤k′≤K−1σkk′2 model parameters, we use maximum-likelihood estimators and the big sample of size n=∑d=1Dnd. As a result, both predictors μ̂dir and ŷNd are asymptotically unbiased.

The mean squared error of the predictor ŷNd of yNd is MSEŷNd=EŷNd−yNdŷNd−yNdT=E∑ir=1Nd−ndμ̂dir−ydir2, where ∑ir=1Nd−ndμ̂dir−ydir=∑ir=1Nd−ndμ̂dir−μdirydir−μdir=τ̂dr−τdr−εdr. Here, τdr=∑ir=1Nd−ndEydir=∑ir=1Nd−ndμ̂dir is the expected value of the survey vector in the Nd−nd non-observed pixels, τ̂dr=∑ir=1Nd−ndμ̂dir is the predictor of τdr, and εdr=∑ir=1Nd−ndεdir with εdir=ydir−μdir is the unobservable error vector.

Then, MSEŷNd=Eτ̂dtr−τdtr−εdtr2=MSEτ̂dr+EεdrεdrT−Eτ̂dr−τdrεdrT−Eεdrτ̂dr−τdrT. If we assumed that εdr and τ̂dr are independent, then the covariance term is null, Eεdrτ̂dr−τdrT=0, and the mean square error of ŷNd reduces to MSEŷNd=MSEτ̂dr+EεdrεdrT, where EεdrεdrT≃EεdrεdrTVd=∑ir=1Nd−ndVydir, where Vydir is the ydir covariance matrix.

Appendix 2 for more details on parameter estimations, prediction of random effects, and mean squared error estimations.

4.2.1 Example

We use the field data described in Section 3.2.2. Here, we encode the CTM data as follow: xi=100 for a pixel i in the crop-type class maize and other crops, xi=010 for crop-type class millet, and xi=001 for crop-type class groundnut. We assume that random effects have the same variance σ2 and are independent among crops: Covvdkvdk′=0 for k≠k′ and k=1,2,⋯,K−1. The model parameter estimates are in Table 10. The values of the z-Wald statistics, for βk parameters, and λ likelihood ratio, for the variance of the random effects σ2, are shown in parentheses.

Crop k	β̂k=col1≤l≤Lβ̂kl for crop-type class l			Variance of the small area effect σ̂2
Crop k	Maize and other	Millet	Groundnut	Variance of the small area effect σ̂2
Millet	0.109 (z = 0.240)	1.429 (z = 6.236)	0.559 (z = 1.608)	0.028 (λ = 24.636)
Groundnut	−0.662 (z = −1.153)	−0.587 (z = −1.711)	2.522 (z = 8.494)	0.028 (λ = 24.636)

Table 10.

Model parameter estimates.

According to the z-Wald statistics, the parameters associated with the crop-type classes millet and groundnut are significant for both millet and groundnut (at level < 0.1). The parameters associated with the crop-type class maize and other are not significant (at the 10% level) for both millet and groundnut. According to the likelihood ratio test, the random small-area effect is also significant (at level < 0.01).

Crop-acreage estimates for millet and groundnut in four communities in Senegal’s Nioro region are in Table 11.

Municipality	Population (pixels)	Sample (pixels)	Crop acreage (hectares) (Error: RRMSE%)
Municipality	Population (pixels)	Sample (pixels)	Millet	Groundnut
Paos Koto	1,163,948	6	4500 (26.4)	1957 (56.5)
Taiba Niassene	1,163,096	14	4902 (34.3)	2000 (74.5)
Keur Madongo	302,088	2	1281 (24.9)	577 (45.1)
Wack Ngouna	2,766,043	12	9280 (25.0)	4167 (47.1)

Table 11.

Crop acreage estimation at municipality level, Nioro (Senegal).

The integration of ground and CTM data makes possible to evaluate the estimates accuracy using the relative root mean squared error, which is the ratio between root mean squared error and crop acreage. It is remarkable that in small areas where the sampling fraction is almost null, this integration reduces the estimation error to levels close to the limits that allow the estimates to be labeled as official: for millet, the estimation error is close to 20%, and for groundnut, the root mean squared error is similar but the acreage is about half the acreage of millet so that, as a result, the relative root mean squared error is about twice than for millet.

5. Optimizing the sample design

The objective is to minimize the sampling variance, that is, maximizing the estimates accuracy, without increasing the cost. The sampling variance depends on the spatial correlation structure of the survey variable, yi. The correlation ρydistsisi′ between two observations yiyi′ at the points of coordinates si and si′ is a positive and decreasing function of the distance, d=distsisi′: it decreases when the distance between these points increases. Details on how to use a land-use map to estimate the spatial correlation structure of crops data, using variogram and correlogram models [16], and on how to use this structure for sample design are in [17]. Here, we propose to use a CTM to estimate the parameters of the correlogram model. We use the estimated correlogram to evaluate the (anticipated) sampling variance [18] as a function of the design variables (sampling unit size and sample size) [19, 20].

We consider a simple case: to estimate the total of the survey variable in the population, yN=∑i=1Nyi, using a simple random sample (SRS) scheme (equal probability of inclusion, πi=n/N) and the estimator ŷNSRS=N/n∑i=1nyi. The sampling variance is VŷNSRS=N21−n/NS2/n, where and y¯N=yN/N are the variance and the mean of the survey variable in the population, respectively.

Pixels are arranged in rows and a columns, and we consider the sampling unit as a square block of n0 pixels and identify each sampling unit i=1,2,⋯,N by the row and column of the pixel in the lower left corner of the square block. The distance d=distsisi′ can be expressed as d=u2+v2, where u is the number of sampling units between si and si′ in the row direction, and v is the number of sampling units between si and si′ in the column direction. The two most often used models to assess the structure of ρyd are the exponential, ρuvaτ=1−τe−d/a, and the spherical ρuvaτ=1−τe−d/a1−32da+d32a3 if d≤a and ρuvaτ=0 if d>a.

The model parameters are the range rate a and ratio τ=τ0/τ0+τd. Here, τ0 is the nugget effect, that is, the variation at or near the origin (independent of distance); τd is the partial sill (a function of distance d between sampling points); and τ0+τd is the sill, that is, the maximum variation far from the origin.

The anticipated variance is the expected value (based on the correlogram model) of the sampling variance AVŷNSRS=EVŷNSRS=N21−n/NES2/n, where ES2=σ2ΨNn0aτ, σ2 is the variance of yi, and ΨNn0aτ=n0Nn0−1/N−11−ΦNn0aτ−Nn0n0−1/N−11−Φn0aτ. Here, ΦNn0aτ is the average correlation between pairs of pixels over the CNn02 pairs in the population, and Φn0aτ is the average correlation between pairs of pixels over the Cn02 pairs in a sampling unit.

The sample design is optimized to find the sampling unit size, n0, and the sample size, n, that minimize the sampling variance, minnn0AVŷNSRS=minnn0N21−n/Nσ2/nΨNn0aτ subject to the cost constraint, C0+Cwnn0+CkAn≤C. Here, C is the total budget, C0 is a fixed cost independent of the sample size, Cw is the interviewing cost, Ck is a cost per distance unit, and A is the surveyed area. The solution to this problem is the optimum sampling unit size n0 and optimum sample size n. In addition to the budget, C, this optimum solution is conditioned to the correlogram model parameters aτ.

5.1 Example

For illustrations, we use the case study in [20]. The data source in this case study was a land-use map, but we establish a parallelism between the map and the satellite image to show how the proposed approach should be applied when a CTM is used, instead of a land-use map. The map is digitized, but the minimum level of disaggregation is a square area of side 1 km so that the minimum sampling unit size in the CTM would be a square block of side 100 pixels of 10 m and the survey variable, yi, would be the number of pixels of the study crop type within sampling unit i.

The studied region was an area of 200 km by 200 km in Castilla y León, Spain, that is, 20,000 by 20,000 pixels of 10 m and N=40000 sampling units of 10,000 pixels. We computed the empirical variogram for nonirrigated herbaceous crops using the moment estimator [17], 2γ̂d=1Nd∑Ndyi−yi′2, where Nd=sisi′distsisi′=d. A spherical model was fitted to the empirical variogram and found as parameter estimates, τ̂0=728.96,τ̂d=539.60, and â=46.17. Using these parameter estimates, we compute the estimate ΦNâτ̂=0.032. The parameter σ2 is estimated by σ̂2=S21−ΦNâτ̂, where S2=11−N∑i=1Nyi−y¯2 and y¯=1N∑i=1Nyi.

In addition to the sampling unit of minimum size n0=1, we considered a set of sampling unit sizes ranging from n0=2 to n0=10 times the sampling unit of minimum size n0=1. To estimate Φn0aτ, it is assumed that the sampling unit of size n0 is a square lattice of side n0 and that the distance between a pair of its components is d=2n0 so that Φn0âτ̂=1−τ̂1−322n0â+2n032â3.

The optimization problem is

minnn0N21−nNσ̂2nn0Nn0−1N−11−0.032−Nn0n0−1N−11−1−τ̂1−322n0â+2n032â3s.t.C0+Cwnn0+CkAn≤C.

To find the solution to this problem, we use the cost function coefficients in [21] (C=120000$, C0=20500$, Cw=50$, and Ck=1$/Km) and the nloptr package [22]. The solution is in Table 12.

Variable	Optimal solution
	Lower bound for the sampling unit size
	1	2	3	4	5	10
Sample size: n	1661	891	611	465	375	192
Sampling unit size: n0	1	2	3	4	5	10
Minimum Anticipated Variance	1.1 e+9	1.4 e+9	1.7e+9	2.0e+9	2.3e+9	3.8e+9

Table 12.

SRS: Optimum solution for several values of lower bound for the sampling unit size.

The optimum sampling unit size is the lower bound allowed. The minimum anticipated variance increases from 1.1 e+9 to 3.8e+9, when the sampling unit size increases from 1 to 10. In other words, using sampling units of size 10, the sampling variance is 1.86 times higher than using sampling units of size 1.

6. Concluding remarks

We have shown how to integrate CTM and ground data to produce agricultural and environmental statistics, along with measures of their accuracy that take into account any source of uncertainty, including classification errors. We have developed prototypes for this integration at the national, provincial, and municipal levels. For large areas (national and provincial level), these prototypes are based on the design of a probabilistic scheme to select the sample from which to collect the ground data, and we use a statistical model to integrate this sample with CTM data. The resulting statistics are design-consistent and robust; that is, they are based on the sampling design rather than on the model. The sample design is key, and we have shown how to optimize it using CTM. For small areas (municipal level), the prototypes are based solely on the model so that the resulting statistics are model-dependent.

To facilitate the transference of these prototypes, we have included examples of their application to data provided by two National Statistical Offices: Spain and Senegal. The former is an example of countries where ground data are collected at the parcel level, and we use linear models for their integration with CTM. The latter is an example of countries where ground data are observed at the pixel level, and we use multinomial logit models for their integration with CTM. We show how to estimate the model parameters and how to use CTM data to compute the required statistics, as well as their accuracy measures in these two countries, at both large and small area levels.

We have evaluated the efficiency of CTM data to improve the methods currently used to produce official agricultural statistics in these two countries, and we have found it to be high: using CTM data, the current sample size at the level of large areas could be reduced to less than one-fifth in the case of Spain and between one-third and one-fifth in the case of Senegal, without loss of neither robustness nor accuracy. At the small area level, where the sampling fraction is almost null, we have shown how to produce estimates accurate enough to be labeled as official statistics, that is, useful for most applications, without increasing costs.

Acknowledgments

The research results presented in this chapter were supported by the European Space Agency, in the Sen4Stat project framework [Contract No. 4000127181/19/I-NS]. The national statistical offices of Senegal and Spain provided the ground data used in the examples.

A. Appendix 1—pixels classification

We want to divide the i=1,2,⋯,N pixels into a set of k=1,2,⋯,K groups, each one corresponding to a discriminable class on the terrain identified by a crop type or land use. Let fi∈kxi be the joint probability function of the events “pixel i is in the terrain class k i∈k, and the spectral measure in i is x_i”. The decision rule is to “classify i in the class k if and only if fi∈kxi=maxk=1,2,⋯,Kfi∈kxi.” Maximum likelihood, maximum entropy, and multinomial regression are three approaches to implement this rule.

A.1 Maximum likelihood

This approach looks for minimizing the expected loss due to classification errors. Let λik′k be a loss function measuring the cost incurred when pixel i is classified in class k′≠k but is actually of class k. The expected loss is Eλik′k=∑k′=1Kλik′kfi∈k′xi, where fi∈k′xi=fi∈k′fxii∈k′/fxi. Using this loss function, we consider the set of K discriminant functions, gk′xi=−∑k′=1Kλik′kfi∈k′fxii∈k′/fxii=12⋯Nk′=12⋯K and we decide to classify the pixel i in the class k′ if gk′xi=maxk=1,2,⋯,Kgk′xi=mink=1,2,⋯,K−gk′xi. Note that this classifier minimizes the expected loss, as desired. A simple loss function is λik′k=0ifk=k′ and λik′k=1 otherwise. Using this loss function, the discriminant function is gk′xi=−∑k′≠k=1Kfxii∈k′fi∈k′/fxi, and since fxi is a constant for a given xi, an equivalent discriminant function is gk′′xi=gk′xifxi=−∑k′≠k=1Kfxii∈k′fi∈k′. Note that fxi=∑k′=1Kfi∈k′fxii∈k′=∑k′≠k=1Kfxii∈k′fi∈k′+fxii∈kfi∈k, and as a result, gk′′xi=−fxi+fxii∈kfi∈k. Since fxi is a constant for a given xi, an equivalent discriminant function is gk′″xi=gk′′xi+fxi=fxii∈kfi∈k.

Thus, we consider the set of K discriminant functions, gk′″xi=fxii∈kfi∈ki=12⋯Nk=12⋯K and we decide to classify the pixel i in the class k′ if gk′″xi=maxk=1,2,⋯,Kgk′″xi=maxk=1,2,⋯,Kfxii∈kfi∈k.

Note that maxk=1,2,⋯,Kfxii∈kfi∈k=maxk=1,2,⋯,Klogfxii∈kfi∈k, where logfxii∈kfi∈k=logfi∈k+logfxii∈k, where logfxii∈k=−L2log2π−12logdetVk−12xi−μkVk−1xi−μkT. Thus, an equivalent set of K discriminant functions is gk‴xi=logfi∈k−12logdetVk−12xi−μkVk−1xi−μkT;i=1,2,⋯N,k=1,2,⋯K and we decide to classify the pixel i in the class k if gk‴xi=maxk=1,2,⋯,Kgk‴xi.

A.2 Maximum entropy

Multinomial distribution. The entropy Hμ can be derived from the multinomial distribution. Assume that the survey vector, yi=col1≤k≤Kyik, follows a multinomial distribution, MN1μi. Then, ∑i=1nyi=col1≤k≤Kyk=∑i=1nyik follows a multinomial distribution MNn=∑k=1Kykμ=col1≤k≤Kμk, with ∑k=1Kμk=1. The number of ∑i=1nyi realizations favorable to the observed sample yn=col1≤k≤Kyk is W=n!∏k=1Kyk!. Thus, lnW=lnn!−∑k=1Klnyk! and, using the Stirling approximation, lnW=nlnn−n−∑k=1Kyklnyk+n=nlnn−∑k=1Kyklnyk. For large samples, and according to the Bernoulli theorem, ykn→μk when n→∞, and as a result, lnW≈nlnn−∑k=1Knμklnnμk=nlnn−n∑k=1Kμklnn+lnμk=−n∑k=1Kμklnμk. Finally, n−1lnW≈−∑k=1Kμklnμk=Hμ. The entropy and multinomial logit approaches provide coincident class probability estimates because they are maximizing the same function.
Information gain measures. In the simple case where the available information reduces to sample ground data, y=col1≤i≤ncol1≤k≤Kyik, μ is considered compatible with the observed data yn=∑i=1nyi=col1≤k≤Kyk=∑i=1nyik, when yn=μ+εn, where εn=col1≤k≤K∑i=1nεik and εik is a random noise. The set of μ=col1≤k≤Kμk values compatible with yn is countless, and according to the maximum entropy principle, we choose as μ value the one making Hμ maximum: μ̂=maxμHμ=maxμ−∑k=1Kμklnμk=minμ∑k=1Kμklnμk, subject to ∑k=1Kμk=1. To find μ̂, we solve the Lagrange function Ψ=−∑k=1Kμklnμk+λ1−∑k=1Kμk. The Ψ derivative with respect to μk is ∂Ψ∂μk=∂∂μk−∑k=1Kμklnμk+λ1−∑k=1Kμk=−μk1μk+lnμk−λ=−lnμk−1−λ, for k=1,2,⋯,K and with respect to λ is ∂Ψ∂λ=1−∑k=1Kμk. The solution of this system of K+1 equations is −lnμ̂k−1−λ̂=0⇔μ̂k=e−1−λ̂;k=1,2,⋯,K, with ∑k=1Kμ̂k=Ke−1−λ̂=1, so that lnKe−1−λ̂=ln1=0⇔lnK−1−λ̂=0⇔λ̂=lnK−1. The entropy is Hμ̂=−∑k=1Ke−1−λ̂−1−λ̂=Ke−1−λ̂1+λ̂=1+λ̂=lnK, and it is achieved when μ̂k=e−1−λ̂=e−lnK;k=1,2,⋯,K, that is, when, lnμ̂k=−lnK=ln1K⇔μ̂k=1K;k=1,2,⋯,K so that the assigned probabilities μ̂=col1≤k≤Kμ̂k correspond to a uniform distribution.

When additional information, X=col1≤i≤nrow1≤l≤Lxil, is introduced, then a reduction of uncertainty is achieved, from lnK to −∑k=1Kμ̂klnμ̂k. This reduction, lnK+∑k=1Kμ̂klnμ̂k, is called “information gain,” and the ratio between this reduction and the maximum uncertainty, Iμ̂=lnK+∑k=1Kμ̂klnμ̂klnK=1−Sμ̂, is called “information index”, where Sμ̂=−∑k=1Kμ̂klnμ̂klnK. A value Sμ̂=0, that is, Iμ̂=1, corresponds to no uncertainty (μ̂k=1 for some k and μ̂k′=0 for every k′≠k), and a value Sμ̂=1, that is, Iμ̂=0, corresponds to perfect uncertainty (μ̂k=1K;k=1,2,⋯,K). The measure Sμ̂ can be used to evaluate the effect of each piece of information. For instance, the effect of reducing the sample size in a unit can be measured by comparing Sμ̂n and Sμ̂n−1: if Sμ̂n<Sμ̂n−1, then the unit in question reduced the uncertainty. In the same way, the effect of excluding an auxiliary variable can be measured using Sμ̂L and Sμ̂L−1: if Sμ̂L<Sμ̂L−1, then the variable in question reduces the uncertainty.

B. Appendix 2 - Small area estimation based on multinomial logit mixed models

B.1 Parameter estimation

To estimate μdik, it is sufficient to acquire estimates of βk for k=1,2,⋯,K−1. The probability that crop K covers pixel i of d is μdiK=1−∑k=1K−1μdik and can be estimated using the estimates of βk for k=1,2,⋯,K−1. We specify the vector η=col1≤d≤Dcol1≤i≤Ndcol1≤k≤K−1ηdik where ηdik=xdiβk+vdk as a linear mixed model η=Xβ+Zv, where X=col1≤d≤Dcol1≤i≤Nddiag1≤k≤K−1row1≤l≤Lxdil are known CTM values, β=col1≤k≤K−1col1≤l≤Lβkl is a vector of parameters, and Z=diag1≤d≤Dcol1≤i≤Ndrow1≤k≤K−1zdik is an indicator matrix where zdik=1 if k covers i of d and zdik=0 otherwise. It is assumed that the vector of small-area random effect v=col1≤d≤Dcol1≤k≤K−1vdk has mean Ev=0 and covariance matrix Σ=θ⊗ID, where θ=row1≤k≤K−1col1≤k′≤K−1σkk′2 and σkk′2=σk2 if k=k′.

The parameters in β and θ are often estimated following a penalized quasi-likelihood approach [22, 23, 24]. As discussed by McCulloch and Searle [13], these methods are not completely satisfactory in practice, and these authors recommended to linearize the nonlinear mixed model instead, thus obtaining a working linear mixed model fitted by maximum likelihood. Here, we follow this recommendation and consider the link function applied to the ground data gkydi=logydik1−∑k=1K−1ydik for k=1,2,⋯,K−1, and its Taylor expansion about Eydiv=μdi, gkydi≃gkμdi+∂gkydi∂ydi1ydi=μdiydi1−μdi1+⋯+∂gkydi∂ydiK−1ydi=μdiydiK−1−μdiK−1, to obtain the working linear model ξ=Xβ+Zv+e [24]. Here, ξ=col1≤d≤Dcol1≤i≤Ndξdi and ξdi=col1≤k≤K−1ξdik is a working vector, e=col1≤d≤Dcol1≤i≤Ndedi with edi=Hdi−1ydi−μdi, and Hdi=∂hηdi∂ηdi=col1≤k≤K−1row1≤k′≤K−1δkk′μdik−μdikμdik′ with δkk′=1 if k=k′ and δkk′=0 otherwise, ydi=col1≤k≤K−1ydik, and μdi=col1≤k≤K−1μdik. Here, hηdi is the inverse of the link function gμdi=ηdi, that is, μdi=g−1ηdi=hηdi.

The working vector covariance matrix is V=Vξ=ZΣZT+Ve, where Ve=diag1≤d≤Ddiag1≤i≤NdVedi and Vedi=Hdi−1Vydi−μdiHdi−1, where Vydi−μdi is the covariance matrix ofydi→MN1μdi.

We assume that the ξ asymptotic distribution is normal, ξ→ANXβV, and we see ξ=Xβ+Zv+e as a linear mixed model. Vedi is complex because the derivative of the link function Hdi depends on V through μ, and μdi also depends on V. We consider a simplification of this problem; it consists of setting v=0 in μdi that reduces Hdi to Hdi∗=col1≤k≤K−1row1≤k′≤K−1δkk′μdik∗−μdik∗μdik′∗, where μdik∗=expxdiβk1+∑k=1K−1expxdiβk no longer depends on the random effects v [13]. As a result, Vedi∗=Hdi∗−1Vydi−μdiHdi∗−1, where μdi does depend on v.

To determine the β and θ maximum likelihood estimates, we follow an iterative procedure:

Let θ0 and β0 be starting values of θ and β, respectively
Let v0=0 and compute ξ0=Xβ0+H∗−10y−μ0
Compute Vξ0=diag1≤d≤DVedi∗0, where Vedi∗0=Hdi∗−10Vydi−μdi0Hdi∗−10 with μdi0=col1≤k≤K−1μdik0 and μdik0=expxdiβk01+∑k=1K−1expxdiβk0
Update β0using β̂1=XTVξ0−1X−1XTVξ0−1ξ0
Update θ0 using θ̂1=θ0−∂2lβ∂θ∂θTβ=β̂1−1∂lβ∂θβ=β̂1, where lβ is the profile log likelihood of the ξ distribution.
Update V̂0 using v̂1=Σ̂1ZTVξ1−1ξ0−Xβ̂1 and compute ξ1=Xβ̂1+Zv̂1+Ĥ∗−11y−μ̂1
Update β̂m using β̂m+1=XTVξm−1X−1XTVξm−1ξm
Update θ̂m using θ̂m+1=θ̂m−∂2lβ∂θ∂θTβ=β̂m−1∂lβ∂θβ=β̂m
The gradient vector for θ=row1≤k≤K−1col1≤k′≤K−1σkk′2 is b=∂lβ∂θ=col1≤l≤L∂l∂θl, where θl=σk for k=k′ and θl=σkk′2 for k≠k′ and ∂l∂θl=−12trV−1Vl+12rTV−1VlV−1r, where rm+1=ξm−Xβm+1 and Vl=∂V∂θl, which for θl=σk is equal to Vσk=∂V∂σk=ZID⊗∂θ∂σkZT with ∂θ∂σk=diag1≤k≤K−12σk and for θl=σkk′2 when k≠k′ it is Vσkk′2=∂V∂σkk′2=ZID⊗∂θ∂σkk′2ZT with ∂θ∂σkk′2=∂row1≤k≤K−1col1≤k′≤K−1σkk′2∂σkk′2.
The Hessian matrix for θ is ∂2lβ∂θ∂θT=row1≤l≤Lcol1≤l′≤L∂2l∂θl∂θl′, where we use l instead of lβ. Here, ∂2l∂θl∂θl′=−12trV−1Vll′−V−1VlV−1Vl′−12eTVll′e with Vll′=∂2V∂θl∂θl′ and Vll′=∂2V−1∂θl∂θl′=V−1Vll′−VlV−1Vl′−Vl′V−1VlV−1.
For K=3 and l,l′=σ1,σ2,σ122, the Hessian matrix is ∂2lβ∂θ∂θT=∂2l∂σ1∂σ1∂2l∂σ1∂σ2∂2l∂σ1∂σ122∂2l∂σ2∂σ1∂2l∂σ2∂σ2∂2l∂σ2∂σ122∂2l∂σ122∂σ1∂2l∂σ122∂σ2∂2l∂σ122∂σ122, and Vσ1σ1=∂2V∂σ1∂σ1=ZID⊗∂2θ∂σ1∂σ1ZT=ZID⊗2000ZT, Vσ1σ2=∂2V∂σ1∂σ2=∂2V∂σ1∂σ2=ZID⊗∂2θ∂σ1∂σ2ZT=ZID⊗0000ZT=0nK−1×nK−1, Vσ1σ122=∂2V∂σ1∂σ122=ZID⊗∂2θ∂σ1∂σ122ZT=0nK−1×nK−1, Vσ2σ1=Vσ1σ2 Vσ2σ2=∂2V∂σ2∂σ2=ZID⊗∂2θ∂σ2∂σ2ZT=ZID⊗0002ZT, Vσ2σ122=∂2V∂σ2∂σ122=ZID⊗∂2θ∂σ2∂σ122ZT=0nK−1×nK−1, Vσ122σ1=Vσ1σ122, Vσ122σ2=Vσ2σ122,Vσ122σ122=∂2V∂σ122∂σ122=ZID⊗∂2θ∂σ122∂σ122ZT=0nK−1×nK−1
Continue until convergence.

B.2 Prediction

We partition the linear predictor vector ηd=col1≤i≤Ndcol1≤k≤K−1ηdik into a part ηds=col1≤is≤ndcol1≤k≤K−1ηdisk containing the nd known sample values and another part ηdr=col1≤ir≤Nd−ndcol1≤k≤K−1ηdirk containing Nd−nd unknown values we want to predict. We partition the model ηd=Xdβ+ZdV in the same way: ηdsηdr=XdsXdrβ+ZdsZdrV=Xdsβ+ZdsVXdrβ+ZdrV. We use ŷNd=∑is=1ndydis+∑ir=1Nd−ndμ̂dir as a predictor of yNd, where μ̂dir=col1≤k≤K−1μ̂dirk with μ̂dirk=expxdirβ̂k+v̂dk1+∑k=1K−1expxdirβ̂k+v̂dk for crop k=1,2,⋯,K−1 and μ̂dirK=11+∑k=1K−1expxdirβ̂k+v̂dk for crop K. Here, β̂k is the βk estimate at the convergence iteration β̂m=col1≤k≤K−1β̂km and v̂dk is the predictor of the small-area random effect vdk in v=col1≤d≤Dcol1≤k≤K−1vdk defined by v̂=col1≤d≤Dcol1≤k≤K−1v̂dk=Σ̂ZTV̂ξ−1ξ−Xβ̂, where Σ̂=θ̂⊗ID and θ̂ is the θ estimate at the convergence iteration θ̂m=row1≤k≤K−1col1≤k′≤K−1σ̂kk′m2.

B.3 Mean squared error

The mean squared error of predictor ŷNd is MSEŷNd=MSEτ̂dr+∑ir=1Nd−ndVydir. The term MSEτ̂dr can be approximated by a Taylor series expansion of μ̂di=hη̂di around μdi=hηdi, where ηdi=Xdiβ+ZdiV.

As a result, we have μ̂di−μdi≃Hdiη̂di−ηdi. Let τdrT=∑ir=1Nd−ndHdirηdir=∑ir=1Nd−ndHdirXdirβ+HdirZdirV=Μdrβ+KdrV, where Μdr=∑ir=1Nd−ndHdirXdir and Kdr=∑ir=1Nd−ndHdirZdir. The approximation of MSEτ̂dr=MSEτ̂drT is MSEτ̂dr=G1dθ+G2dθ+G3dθ+G4dθ, where G1dθ=KdTKdT G2dθ=ΛdPΛdT, G3dθ=∂Γd∂θTV∂Γd∂θVθ−1 and G4dθ=Vydr=∑ir=1Nd−ndVydir, with T=ZTHZ+Σ−1−1, H=diag1≤d≤Ddiag1≤i≤NdHdi, P=Vβ̂=XTV−1X−1, V=ZΣZT+Ve, Λd=md−KdTZTVeX, and ∂Γd∂θ=V−1ZKdT. Here, Vθ denotes the Fisher information of the variance component parameters θ.

As estimator of the mean squared error of predictor ŷNd, we use the asymptotically unbiased estimator proposed in [23], MŜEŷNd=G1dθ̂+G2dθ̂+2G3dθ̂+G4dθ̂, where θ̂ is the θ estimate at the convergence iteration.

References

1. Ambrosio L, Iglesias L, Marin C, Deffense N. Integration of remote sensing data into national statistical office sampling designs for agriculture. Statistical Journal of the IAOS. 2023;39(2):473-489. DOI: 10.3233/SJI-220116
2. Sawin PH, Swain PH, Davis SM. editors. Remote Sensing: The Quantitaive Approach, pp136-187. New York: McGraw-Hill; 1978, 396 p
3. Golan A, Judge G, Miller D. Maximum entropy econometrics. New York: Wiley; 1996. 307 p
4. FAO. Handbook on master sampling frames for agricultural statistics. Frame development, sample design and estimation. In: Improving Agricultural and Rural Statistics. Global Strategy. Rome: FAO; 2015. 170 p
5. Cotter J, Nealon J. Area frame design for agricultural surveys. Washington, D.C.: Research and Applications Division. National Agricultural Statistics Service. United States Department of Agriculture; 1987
6. FAO. Multiple Frame Agricultural Surveys. Volume 1: Current Surveys Based on Area and List Sampling Methods. Volume 2: Agricultural Survey Programmes Based on Area Frame or Dual Frame (Area and List) Sample Designs, Statistical Development Series 7 and 10. Roma: FAO; 1998
7. Gay C, Porchier JC. Land cover and land use classification using TER-UTI. In: Holland TE and Van den Broecke MPR, Editors. Proceedings of Agricultural Statistics 2000, International Statistical Institute. Voorburg, The Netherlands. 1998. pp. 193-201. Available from: https://www.isi-web.org/isi.cbs.nl/iamamember/Books/agric2000/page-193.pdf
8. Nusser SM, Goebel JJ. The national resources inventory: A long-term multi-resource monitoring programme. Environmental and Ecological Statistics. 1997;4(3):181-204. DOI: 10.1023/A:1018574412308
9. Nusser SM, Goebel JJ, Fuller WA. Design and estimation for investigating the dynamics of natural resources. Ecological Applications. 1998;8(2):234-245. DOI: 10.1890/1051-0761(1998)008[0234:DAEFIT]2.0.CO;2
10. Directorate of Forecasting Analysis and Agricultural Statistics. Ministry of Agriculture and Rural Equipment. Senegal: Senegal-Annual Agricultural Survey. Methodology and sampling plan of the agricultural survey; 2017. Available from: http://anads.ansd.sn/index.php/catalog/218/download/1846
11. Fuller WA. Sampling Statistics. New York: Wiley; 2009. 472 p. DOI: 10.1002/9780470523551
12. Agresti A. Categorical data analysis. New York: Wiley; 2002. 710 p
13. McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: Wiley; 2001, 325 p
14. Hogland J, Billor N, Anderson N. Comparison of standard maximum likelihood classification and polytomous logistic regression used in remote sensing. European Journal of Remote Sensing. 2013;46(1):623-640. DOI: 10.5721/EuJRS20134637
15. Ambrosio L, Iglesias L. Land cover estimation in small areas using ground surveys and remote sensing. Remote Sensing of Environment. 2000;74:240-248. DOI: 10.1016/S0034-4257(00)00114-0
16. Cressie N. Statistics for Spatial Data. New York: John Wiley & Sons; 1991. 900 p
17. Ambrosio L, Iglesias L, Marin C. Systematic sample design for the estimation of spatial means. Environmetrics. 2003;14(1):45-61. DOI: 10.1002/env.564
18. Fuller WA, Isaki CT. Survey design under superpolutation models. In: Krewski D, Rao JNK, Platek R, editors. Current Topics in Survey Sampling. New York: Academic Press; 1981. pp. 199-226
19. Ambrosio L, Iglesias L. Identifying the most Appropriate Sampling Frame for Specific Landscape Types, Technical Report Series. GO-01-2014. FAO. Available from: https://www.fao.org/3/ca6436en/ca6436en.pdf
20. Ambrosio L, Iglesias L, Marin C. A model-assisted approach to identify a cost-efficient spatial sampling strategy. In: Paper Presented at: Eighth International Conference on Agricultural Statistics (ICAS-VIII). New Delhi (IN); 2019
21. Ypma J, Johnson SG, Stamm A. lnoptr R package [software]. Version 2.0.3.9100. Available from: https://astamm.github.io/nloptr/index.html
22. Saei A, Chambers R. Small Area Estimation under Linear and Generalized Linear Mixed Models with Time and Area Effects. Methodology Working Paper M03/15. Southampton (UK): Southampton Statistical Sciences Research Institute; 2003: 31 p. http://eprints.soton.ac.uk/id/eprint/8165
23. Molina I, Saei A, Lombardia MJ. Small area estimates of labour force participation under a multinomial logit mixed model. Journal of the Royal Statistical Society. Series A, Statistics in Society. 2007;170(4):975-1000. DOI: 10.1111/j.1467-985X.2007.00493.x
24. Lopez-Vizcaino E, Lombardia MJ, Morales D. Small area estimation of labour force indicators under a multinomial model with correlated time and area effects. Journal of the Royal Statistical Society. Series A, Statistics in Society. 2015;178(3):535-565. DOI: 10.1111/rssa.12085

[1] 1. Ambrosio L, Iglesias L, Marin C, Deffense N. Integration of remote sensing data into national statistical office sampling designs for agriculture. Statistical Journal of the IAOS. 2023;39(2):473-489. DOI: 10.3233/SJI-220116

[2] 2. Sawin PH, Swain PH, Davis SM. editors. Remote Sensing: The Quantitaive Approach, pp136-187. New York: McGraw-Hill; 1978, 396 p

[3] 3. Golan A, Judge G, Miller D. Maximum entropy econometrics. New York: Wiley; 1996. 307 p

[4] 4. FAO. Handbook on master sampling frames for agricultural statistics. Frame development, sample design and estimation. In: Improving Agricultural and Rural Statistics. Global Strategy. Rome: FAO; 2015. 170 p

[5] 5. Cotter J, Nealon J. Area frame design for agricultural surveys. Washington, D.C.: Research and Applications Division. National Agricultural Statistics Service. United States Department of Agriculture; 1987

[6] 6. FAO. Multiple Frame Agricultural Surveys. Volume 1: Current Surveys Based on Area and List Sampling Methods. Volume 2: Agricultural Survey Programmes Based on Area Frame or Dual Frame (Area and List) Sample Designs, Statistical Development Series 7 and 10. Roma: FAO; 1998

[7] 7. Gay C, Porchier JC. Land cover and land use classification using TER-UTI. In: Holland TE and Van den Broecke MPR, Editors. Proceedings of Agricultural Statistics 2000, International Statistical Institute. Voorburg, The Netherlands. 1998. pp. 193-201. Available from: https://www.isi-web.org/isi.cbs.nl/iamamember/Books/agric2000/page-193.pdf

[8] 8. Nusser SM, Goebel JJ. The national resources inventory: A long-term multi-resource monitoring programme. Environmental and Ecological Statistics. 1997;4(3):181-204. DOI: 10.1023/A:1018574412308

[9] 9. Nusser SM, Goebel JJ, Fuller WA. Design and estimation for investigating the dynamics of natural resources. Ecological Applications. 1998;8(2):234-245. DOI: 10.1890/1051-0761(1998)008[0234:DAEFIT]2.0.CO;2

[10] 10. Directorate of Forecasting Analysis and Agricultural Statistics. Ministry of Agriculture and Rural Equipment. Senegal: Senegal-Annual Agricultural Survey. Methodology and sampling plan of the agricultural survey; 2017. Available from: http://anads.ansd.sn/index.php/catalog/218/download/1846

[11] 11. Fuller WA. Sampling Statistics. New York: Wiley; 2009. 472 p. DOI: 10.1002/9780470523551

[12] 12. Agresti A. Categorical data analysis. New York: Wiley; 2002. 710 p

[13] 13. McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: Wiley; 2001, 325 p

[14] 14. Hogland J, Billor N, Anderson N. Comparison of standard maximum likelihood classification and polytomous logistic regression used in remote sensing. European Journal of Remote Sensing. 2013;46(1):623-640. DOI: 10.5721/EuJRS20134637

[15] 15. Ambrosio L, Iglesias L. Land cover estimation in small areas using ground surveys and remote sensing. Remote Sensing of Environment. 2000;74:240-248. DOI: 10.1016/S0034-4257(00)00114-0

[16] 16. Cressie N. Statistics for Spatial Data. New York: John Wiley & Sons; 1991. 900 p

[17] 17. Ambrosio L, Iglesias L, Marin C. Systematic sample design for the estimation of spatial means. Environmetrics. 2003;14(1):45-61. DOI: 10.1002/env.564

[18] 18. Fuller WA, Isaki CT. Survey design under superpolutation models. In: Krewski D, Rao JNK, Platek R, editors. Current Topics in Survey Sampling. New York: Academic Press; 1981. pp. 199-226

[19] 19. Ambrosio L, Iglesias L. Identifying the most Appropriate Sampling Frame for Specific Landscape Types, Technical Report Series. GO-01-2014. FAO. Available from: https://www.fao.org/3/ca6436en/ca6436en.pdf

[20] 20. Ambrosio L, Iglesias L, Marin C. A model-assisted approach to identify a cost-efficient spatial sampling strategy. In: Paper Presented at: Eighth International Conference on Agricultural Statistics (ICAS-VIII). New Delhi (IN); 2019

[21] 21. Ypma J, Johnson SG, Stamm A. lnoptr R package [software]. Version 2.0.3.9100. Available from: https://astamm.github.io/nloptr/index.html

[22] 22. Saei A, Chambers R. Small Area Estimation under Linear and Generalized Linear Mixed Models with Time and Area Effects. Methodology Working Paper M03/15. Southampton (UK): Southampton Statistical Sciences Research Institute; 2003: 31 p. http://eprints.soton.ac.uk/id/eprint/8165

[23] 23. Molina I, Saei A, Lombardia MJ. Small area estimates of labour force participation under a multinomial logit mixed model. Journal of the Royal Statistical Society. Series A, Statistics in Society. 2007;170(4):975-1000. DOI: 10.1111/j.1467-985X.2007.00493.x

[24] 24. Lopez-Vizcaino E, Lombardia MJ, Morales D. Small area estimation of labour force indicators under a multinomial model with correlated time and area effects. Journal of the Royal Statistical Society. Series A, Statistics in Society. 2015;178(3):535-565. DOI: 10.1111/rssa.12085