Open access peer-reviewed chapter

Variability Analysis of Observational Time Series: An Overview of the Decomposition Methods for Non-Stationary and Noisy Signals

Written By

Olivier Delage, Hassan Bencherif, Thierry Portafaix, Alain Bourdier, René Tato Loua and Damaris Kirsch Pinheiro

Submitted: 13 July 2023 Reviewed: 02 August 2023 Published: 17 October 2023

DOI: 10.5772/intechopen.1002825

From the Edited Volume

Time Series Analysis - Recent Advances, New Perspectives and Applications

Jorge Rocha, Cláudia M. Viana and Sandra Oliveira

Chapter metrics overview

70 Chapter Downloads

View Full Metrics

Abstract

The analysis of observational data sequences in Geophysics consists of characterizing the underlying dynamics. An important preliminary step aims to analyze the variability related to the observed dynamic. The specific objectives related to this step are to remove noise, to determine the overall trend of the observational time series and to identify the relevant components contributing significantly to the original time series variability knowing that their number determines the dimensionality of the observed dynamics. Most of the observational time series have characteristics of non-stationarity and present fluctuations at all-time scales. In this context, variability analysis consists in representing time series in the time-frequency space and requires the development of specific numerical signal decomposition methods. The most commonly used techniques are adaptive and data-driven and among the most cited in the literature are the empirical mode decomposition, the empirical wavelet transform, and singular spectrum analysis. In this work, we describe all of these techniques and evaluate their ability to remove noise and to identify components corresponding to the physical processes involved in the evolution of the observed system and deduce the dimensionality of the associated dynamics. Results obtained with all of these methods on experimental total ozone columns and rainfall time series will be discussed and compared.

Keywords

  • time series analysis
  • non-stationary signals
  • complex systems
  • noise removal
  • empirical mode decomposition
  • wavelet decomposition
  • singular Spectrum analysis
  • underlying dynamics

1. Introduction

The notion of complexity is a characteristic present in most of systems of the real world. Intuitively, complexity constitutes a measure of the organization, the structure and the dynamics of a system i.e. its behavior over time. In the physical sciences, a complex system can be defined as composed of several entities interacting with each other on several scales of time and space. The complexity is created by the superimposition of the individual behavior of each entity with their interactions and makes the overall behavior of the system difficult to characterize and predict. Therefore, the characterization of the system dynamic is carried out from the global properties of the system which cannot be reduced to the sum of the characteristics of the entities of which it is composed. Many studies that aim to gain a better understanding of the complex system dynamics are based on the analysis of observational measurements data sequence. Most of the observational data sequences in geophysics derive from complex dynamics resulting from the superimposition of the individual behaviors of the physical processes involved in the evolution of the overall system with the interactions they have between them. As a consequence, time series are non-stationary and have fluctuations at all scales of time. Then, the analysis methodology consists of analyzing the variability of the time series. The method used is to decompose the observed time into the sum of noise, physically meaning full components and trends.

TS=noise+ici+Trend,TS=xk=xkδtk=1NE1

Where TS is the original times series composed of N measurement values xk of the signal x with a sample time δt and ci are the components that contribute significantly to the variability of the original signal. Each of ci can be identified with a physical process involved in the overall evolution of the system. To implement such decomposition, two approaches have been considered: The first approach consists in getting a representation of TS in the time-frequency space which amounts to classify the fluctuations according to the temporal resolution at which they occur. Among the most commonly used techniques enabling to obtain a representation of time series in time frequency space, the empirical mode decomposition (EMD) is very popular. EMD is a relatively new method proposed by Huang et al. in 1998 well suited to non-stationary signals that decompose a time series into a finite number of components called “Intrinsic Mode Function” (IMFs) [1]. IMF captures the repeating behavior of the signal at some particular time scales. Unlike the Fourier or wavelet transforms, EMD enables to decompose a time signal into a set of basis signals derived from the data itself. The biggest advantage of the EMD method is to be totally adaptive and data-driven without the need for a-priori basis function selection for signal decomposition. As EMD acts in the time domain, another advantage of this method is to be close to the observed dynamics. Moreover, the EMD acts as a bank of bandpass filter [2] in the frequency domain and as a result, the main limiting factor of EMD is the frequency resolution which when it becomes too small can induce the mode-mixing phenomenon where the spectral content of some IMFs overlaps each other [3]. Although several techniques exist to overcome this problem [4, 5, 6], Gilles proposed an alternative entitled empirical wavelet transform (EWT) [7], which operates in the frequency domain and consists of elaborating a segmentation of the original signal’s Fourier spectrum and building an appropriate wavelet filter bank on the segmented Fourier spectrum. The EWT allows a better frequency resolution and thus overcomes the mode-mixing problem by partitioning the spectrum of the original signal into separate spectral bands. However, although the EWT technique enables to detect the relevant frequencies involved in the original signal, it does not allow to associate to a specific mode of variability as EMD does. Because the variability modes provided by the EMD technique are closer to the observed dynamics than EWT, O. Delage [8] proposed an optimization of the EMD technique entitled Empirical Adaptive Wavelet Decomposition (EAWD) which combines the advantages of both EMD and EWT techniques. The heart of the EWT method lies in the segmentation of the original signal’s Fourier spectrum. The main idea is to use the local maxima of the IMFs’ spectral support returned by EMD to design the Fourier spectrum segmentation required to initiate the EWT method. IMFs involved in the Fourier spectrum segmentation are those selected to be relevant i.e. whose contribute more than 1% of the variability of the original signal. The final step consists in applying the EWT method on the segmented Fourier spectrum.

Another technique called Singular Spectrum Analysis (SSA) using a different approach is described in the scope of this chapter. Such a technique can decompose a non-stationary signal into a sum of independent components and also be able to reconstruct the underlying dynamics.

This document is structured around three sections: the first section is devoted to the description of all the signal decomposition techniques mentioned above. In the second section, three observational time series are analyzed by using EMD, EAWD and SSA techniques and the corresponding results are presented. In the last section, the results obtained respectively with the EMD, EAWD and SSA techniques are compared and discussed.

Advertisement

2. Review of the signal decomposition techniques

The methodology related to the signal variability analysis is composed of three main steps: (1) express the original signal as a sum of a finite number of components, (2) identify the components dominated by noise, (3) select the relevant components that contribute significantly to the signal variability with the objectives of determining the dimensionality of the underlying dynamics and identifying the relevant components to physical processes involved in the underlying dynamics.

2.1 The empirical mode decomposition (EMD)

2.1.1 EMD basics

In 1998, Huang et al. proposed an original method called the Empirical Mode Decomposition (EMD) whose purpose is to adaptively decompose any signals into oscillatory contributions. The EMD technique can be summarized as an iterative method where the signal can be decomposed into a local average m called trend and a strongly oscillating part called details. The trend is related to low frequencies while the details characterized by strong fluctuations are related to high frequencies. At each iteration, the high-frequency fluctuations part is separated from the low-frequencies trend and are reinjected as a new signal in the next iteration. During the iterations of the algorithm [1], the “details” related to the high frequencies are successively separated from the low-frequency part by using a procedure called “Sifting process”. The main steps of the sifting process consist in: identify the signal local extrema; the local maxima and minima are then interpolated by using cubic splines to form respectively the upper and lower envelopes of the signal; the mean envelope is then determined by calculating the half sum of the upper and lower envelopes; the mean envelope is subtracted from the initial signal. The same procedure is reiterated on the remainder until the mean envelope is close to zero everywhere and the resultant signal is designated as the first IMF. The criterion for the sifting process to stop is generally set as:

0.2SD=i=1Thk1thkt2hk120.3E2

Where hk is the result after k iterations of the sifting process. The higher order IMFs are iteratively extracted by using the above sifting process procedure until the remaining signal cannot be assimilated to an IMF knowing that a signal can be called IMF if it satisfies the two following criteria: the numbers of local maxima and local minima must differ by at most one; the half sum of its upper and lower envelopes is locally close to zero. The original signal can be finally expressed as:

TS=m=1MIMFm+RE3

Where R is the residual trend. EMD decomposes the data into M fundamental components with distinct time scales where the first component has the smallest time scale. The interesting fact about this algorithm is that it is adaptive and data-driven and is able to extract non-stationary parts of the original signal at different time scales as multi-resolution analysis does. In this context, Flandrin et al. [9] described the EMD as behaving as a dyadic filter bank as those involved in the multi-resolution analysis. Consequently, the maximum number of IMFs that can be extracted from the original time series is Log(N)/Log (2) if N is the time series size.

2.1.2 Noise removing-Noisy components identification

Generally, the original signal is corrupted by noise. As a consequence, a few IMFs may be the oscillations of the noise-free signal and the others correspond to noise. To determine which IMFs are noise-related components, a robust threshold is required. In this chapter, a “Detrended Fluctuations analysis” (DFA) technique is used to obtain such a threshold. The basic principle of the DFA [10, 11] is to compute how the time series fluctuations around the local trend varies as a function of the time scale. When applied to a time series TS(t), the first step of the DFA technique is to compute the time series TSI(t) composed of the cumulative sums of TS after removing its mean:

TSIk=i=1kxiδtTS¯,1kNE4

Where TS¯ states the average of TS time series over [1, N], TSI is then divided into time windows of size n samples each. For each time window, the estimated local trend TSIn(k) is determined by using least-square linear fitting. Finally, the average root mean square (RMS) of the fluctuations at a specific time scale n, F(n) may be written as:

Fn=1Nk=1NTSIkTSInk2E5

The method suggested to identify the noise-related IMFs is to use the slope α of the curve log[F(n)]/log(n) as a threshold. The method consists in excluding IMFs with α value less than a threshold θ. For DFA, an α value of 0.5 characterizes an uncorrelated white noise. The commonly threshold value θ taken in the literature is 0.5 with a 0.2 confidence interval that is θ = 0.7.

2.1.3 Relevant components selection

Although the EMD is a powerful tool for analyzing complicated datasets, many irrelevant IMF may appear in the decomposition. A relevant IMF may be defined as an IMF that retains most of the information content of the signal. As a consequence, relevant IMFs would have a good correlation with the original signal while irrelevant ones would have a poor correlation. For discriminating between relevant and irrelevant IMFs, we use the Pearson correlation coefficient between each IMF and the original time series, i.e.:

CORRIMFmTS=covIMFmTSsdIMFmsdTSm=1,ME6

Where cov(IMFm,TS) is the covariance between the mth IMF and the original time series, sd(IMFm) is the standard deviation of the mth IMF while sd(TS) is the standard deviation of the original time series. A threshold s is required for selecting relevant IMF: if CORR(IMFm,TS) > s, keep the mth IMF. Otherwise, eliminate the mth IMF and add it to the residual. As the threshold is different for different signals, a suitable threshold must be selected according to the relativity between IMfs and the original signal. In general threshold s can be the ratio of the mean value between the maximum and the minimum of CORRIMFmTSm=1M that is:

s=M+m2ηE7

Where η is a coefficient greater than 1, M = maxm(CORR(IMFm,TS), m = minm(CORR(IMFm,TS).

So that the correlation coefficients between each of the IMFs and the original signal would be in the interval [0,1], all IMF and original signals will be normalized at first. In the literature, two different thresholds are proposed s1=M+m20 [12] and s2=M10M3 [13]. The threshold proposed in this work depends on the relative position de M/m and 10. If we suppose that 0.7 ≤ M ≤ 1, and M/m > 10, m could be less than 0.07 which characterizes a very weak correlation. In this case s has to be greater than m and from (Eq. (7)), η > 5.5 which is verified when, s = s1. If on the contrary when M/m ≤ 10, m could be greater than 0.1 and s must be close to m which from (Eq. (7)) can be written:

1εsm=1+Mm2.1η1+εE8

Given that 5 ≤ M/m ≤ 10, then 3 ≤ (1 + M/m)/2 ≤ 5.5 and (Eq. (8)) becomes

5.51+εη31εE9

If we set ε = 0.052, (Eq. (9)) becomes 3 ≤ η ≤ 5.2 which is verified when s = s2.

In summary, s is defined as follows:

s=M+m20ifMm>10M10M3ifMm10E10

2.1.4 EMD limiting factors-IMF oscillation cycle

As the EMD technique acts as a bank of bandpass filters [2], each IMF is associated with a specific bandpass and the oscillation cycle corresponds to the prevailing frequency in the bandpass. One way to get the IMFs oscillation cycle is to calculate the spectral density for each IMF and identify the frequency for which the spectral density is maximum. The local maxima occurring in the IMFs spectrum characterize the frequencies involved in the corresponding EMD mode of variability. One of the major limiting factors of the EMD is the frequency resolution which can, when it is not sufficient, induce the mode-mixing phenomenon where the spectral content of some IMFs overlaps each other. The source at the origin of mode-mixing can be categorized in three main groups: (1) presence of noise, (2) presence of intermittency, (3) presence of closely spaced spectral components. To overcome the inherent mode-mixing problem of the EMD, Wu and Huang [14] proposed the Ensemble Empirical Mode Decomposition (EEMD). In EEMD, white noise signals ni(t) are added to the original signal x(t); Because the white noise spectrum is evenly distributed, the white noise signals will be automatically distributed to the appropriate reference scales. Moreover, because of its zero-mean characteristic, the white noise will cancel itself out after many rounds of averaging. The specific steps of the EEMD algorithm are as follows:

  • Initialize the number of ensemble members M. - Compute M realizations of white noise with the different variances that are added to the original signal: xit=nitσi+xti=1,M where ni is a white noise of variance σi. – Use the EMD algorithm to decompose xi(t) into IMFs: xit=j=1JIMFi,j+Ri i = 1,M - Calculate the ensemble means of the decomposed IMFs: IMFjt=1Mi=1MIMFi,j

2.2 Wavelet approaches

Wavelets are commonly used to analyze the variability of a signal. In the temporal domain, a wavelet basis is defined as the mother wavelet Ψ of zero mean, dilated with a parameter s > 0 and translated by u∈ℜ:

ψu,st=1sψtusE11

For the wavelet decomposition of a time series x(t), the most widely used case is the dyadic one, s = 2j. Then the wavelet decomposition of x is obtained by computing the inner product Wx(kj) as:

Wxkj=xψk,jwithψk,jt=12jψtk2j,kZE12

where j represents the resolution level. The decomposition is then similar to a multi-resolution analysis carrying out successive projections of x on a sequence of nested subspaces VjL2() j = [0, n], which leads to increasingly coarse approximations of x as j increases. At each resolution level j, the information about the signal x, at resolution j-1 in the subspace Vj-1, is split into two parts named approximation and details corresponding, respectively to the low frequencies and the high frequencies contained in the signal x. The approximation of x results from the orthogonal projection of x onto the subspace Vj and the information of details results from the orthogonal projection of x onto the subspace Wj orthogonal to Vj such that Vj1=VjWj, where denotes the direct sum of vector subspaces. Wavelets ψk,jtkZ form the basis of Wj. According to the definition of multi-resolution analysis, there exists a function φ(t), called scaling function, such that φtkkZ form a basis of V0 corresponding to the coarsest approximation of x. The reconstruction of x is obtained from:

xt=xtφt.φt+j=1nxtψk,jt.ψk,jtE13

where represents the inner product. The approximation coefficients corresponding to the coarsest resolution level are given by xφ and the details coefficients corresponding to the successively decreasing resolution level sj=sj12 are given by xψk,j as follows:

xtψt=xτ.φτt¯,xtψk,jt=xτ.ψk,jτt¯E14

2.2.1 Empirical wavelets-EWT

The essence of EMD is that the time domain functions into which a signal is decomposed have the same length as the original signal, allowing time-varying frequencies to be preserved. EMD is described [2, 9] as behaving as a dyadic filter bank as those involved in the multi-resolution analysis. In this context, the mode-mixing phenomenon specific to EMD can be interpreted as the presence of several filters of overlapping frequency content. As a result, the spectral content of some IMFs restituted by the EMD overlap each other. To overcome this problem, Gilles [7] proposed an alternative named the EWT. This method acts in the spectral domain and starts from the segmentation of the original signal’s Fourier spectrum. The Fourier support [0,π] is subdivided into N non-overlapping contiguous segments denoted Δn = [ωn-1,ωn]. An appropriate wavelet filter bank is then used to extract spectra relative to each Fourier segment. In the time domain, the components related to the original signal’s decomposition are obtained by performing inverse Fourier transform on each Fourier spectrum segments.

The filter bank [15,16] is defined by the empirical scaling function and the empirical wavelets on each Δn through the following equations:

φnω̂=1ifω1γωncosπ2β12γωnω1γωnif1γωnω1+γωn0otherwiseE15

and

ψnω̂=1if1+γωnω1γωn+1cosπ2β12γωn+1ω1γωn+1if1γωn+1ω1+γωn+1sinπ2β12γωnω1γωnif1γωnω1+γωn0otherwiseE16

The function β(x) is an arbitrary Ck([0,1]) function defined as:

βx=0ifx0βx+β1x=1x011ifx1E17

Many functions satisfy this property and the one the most used in the literature [17] is:

βx=x43584x+70x220x3E18

The parameter γ is chosen to satisfy the following criterion:

γ<Minnωn+1ωnωn+1+ωnE19

The details and approximation coefficients are calculated by using (Eqs. (15) and (16)) and are respectively given by inner products with the empirical wavelets ψn and the scaling function φ1:

Wxnt=xtψn=IFFTE20
Wx1t=xtφ1=IFFTE21

where X is the Fourier transform of the original signal x, ¯ represents the complex conjugate, IFFT represents the inverse Fourier transform, and ψn and φ1 are the results of the inverse Fourier transforms of ψn̂ and φ1̂ respectively. The segmentation of the original signal’s Fourier spectrum is obtained from the detection of local maxima. Each segment is centered around one or a group of successive local maxima. The boundary between two contiguous segments is set as the nearest local minimum to the midpoint between two adjacent local maxima groups. Many of the detected local maxima are irrelevant as their contributions to the original signal variability are negligible. Selecting the relevant local maxima requires determining a threshold which is not always possible.

2.2.2 The empirical adaptive wavelet decomposition (EAWD)

The EMD technic enables an observational data sequence to be decomposed into multiple empirical modes of variability, each of them reflecting the observed dynamics at a specific timescale. As the spectral contents of the IMFs returned by the EMD are determined from the relative positions of the original signal’s maxima, one of the major drawbacks of the EMD is the mode-mixing phenomenon resulting in a few IMFs to have overlapping spectral supports. On the contrary, the EWT has a more solid theoretical context using wavelets and therefore provides a better frequency resolution. On the other hand, the EWT does not allow to associate the detected frequencies to a specific mode of variability as the EMD technique does. The main idea of the proposed EAWD method is to combine EMD and EWT techniques by setting non-overlapping groups of local maxima from the spectral contents of the IMFs returned by the EMD technique. Each IMF local maximum group will be associated with a segment of the original signal Fourier spectrum segmentation. The boundaries of each of these segments will be set as the local minima the closest to the midpoint between local maximum groups of two consecutive IMFs. As the EMD method acts as a bank of dyadic band-pass filters, the result of each of these filters in the frequency domain is composed of a set of local maxima relative to a specific time scale in which the resolution is divided by 2 in comparison with the timescale immediately above it. Considering that a time scale is characterized by a set of values in the range [2n,2n + 1], to carry out a segmentation of the Fourier spectrum, it is necessary to distribute the local maxima groups relatively to a grid [2i,2i + 1], i[2,J] with J=logNlog21, where N represents the size of the original time series and int. is the integer part of a number. The proposed Fourier spectrum segmentation algorithm has three main steps: (1) calculation of the spectral content of each IMF based on spectral density and selection of significant local maxima whose energy contribution is >1%. (2) Compute the cutoff frequency between the spectral supports of two consecutive IMFs. (3) EWT technic is run from the Fourier transform obtained from step 2. To get more information on the segmentation algorithm, it is described in detail in Ref. [8].

2.3 Signal decomposition based on singular spectrum analysis

In time series analysis, singular spectrum analysis (SSA) combines elements of classical time series analysis, multivariate statistics and dynamical systems. SSA is used both to decompose time series into components each having a meaningful interpretation and to reconstruct the underlying dynamics from a single time series based on the embedding theorem [18]. In this paragraph, as the reconstruction of the underlying dynamics is beyond the scope of this work, we will describe the SSA technique and its use in signal decomposition.

2.3.1 Singular spectrum decomposition

2.3.1.1 The basic singular spectrum analysis method

As EMD and EWT, SSA is a data-adaptive and non-parametric method for time series decomposition which is suitable to non-stationary time series. The efficiency of such a technic has been recognized for its ability to provide meaningful results in a wide range of application fields, without making any assumption on the processed data. Generally, the components extracted by SSA can be identified as trends, periodic (possibly amplitude-modulated) components or noise components. Furthermore, the characteristics of the components provided by SSA which are to be independent make it particularly suitable for blind source separation. SSA consists of four stages: embedding, decomposition, grouping, and reconstruction [19, 20, 21].

Embedding: The starting point of SSA is to embed a time series TS = {xk, k = [1,N]} of size N in a vector space of dimension L. L is a strictly positive integer named the embedding dimension or window length, 1 < L < N. The embedding procedure forms K=N-L + 1 lagged vectors Xi = (xi, .xi + L-1)T with i = 1.N-L + 1. The trajectory matrix of the time series TS is then given by:

MT=X1XKE22

MT is a Hankel matrix (constant cross-diagonals) of size (LxK). Hence, the embedding procedure builds a sequence of L-dimensional vectors from the original time series TS, by using a sliding window of size L.

Decomposition: The singular value decomposition (SVD) of the trajectory matrix MT is then computed, providing MT = U.D.VT with U = (LxL) and V = (KxK) being orthogonal matrices containing respectively the left and right singular vectors. D = (LxK) is a matrix containing the singular values σI on the main diagonal and zero elsewhere (where σi=Kλi) with λI being the eigenvalues of C the covariance matrix of MT which can be expressed C=1KMT.MTT. The matrix MT is thus decomposed in a sum of rank-one matrices MTi such that:

MT=i=1LMTi=i=1LσiuiviTE23

where ui is ith column of matrix U, vi is the ith column of matrix V. In other word, the SVD of the trajectory matrix corresponds to an orthogonal matrix whose column vectors form an eigenbasis of the multidimensional space created by the embedding step. Relevant components of the signal can be obtained by projecting the data onto that eigenbasis and omitting noise-related components and finally reconstruct an improved (denoised) version of the original time series.

Grouping step: This step consists of splitting the set of elementary matrices MTi into r disjoint groups and summing the matrices within each group. The result may be written as:

MT=k=1rMTIkwithMTIk=iIkMTiE24

Thus MTIk is the resulting matrix of the group Ik, k = 1, …., r.

Diagonal averaging: The averaging along cross-diagonals of the matrix MTIk aims at solving the problem of finding the time series TSk for which the trajectory matrix is the closest to MTIk in a least-squares sense. In other words, the diagonal averaging of MTIk={xi,j, i = 1..L,j = 1…K} provide the elements of a time series TSnkn=1N as:

TSnk=1nm=1nxm,nm+1for1n<L1Lm=1Lxm,nm+1forLnK1Nn+1m=nK+1Lxm,nm+1forK+1nNE25

This cross-averaging process can also be applied to each MTi matrix. The resulting time series can be assimilated to elementary components. This process finally provides an exact expansion of the original time series TS into r components that satisfies: TS=k=1rTSnk.

2.3.1.2 Separability and choice of parameters

The very important question is how to choose parameters to build the proper decomposition of the observed time series. We need first to study the concept of separability which characterizes how well different components can be separated from each other. To do so, the so-called w-correlation matrix is computed. This is the matrix consisting of a weighted correlation between the reconstructed time series components. The weight reflects the number of entries of the time series terms into its trajectory matrix. Well separated components have a weak correlation while badly separated components have a large correlation. Therefore, looking at the w-correlation matrix, one can find groups of correlated elementary reconstructed series and use this information for the consequent grouping procedure. MATWCORR, the w-correlation matrix is expressed as follows:

MATWCORRk1,k2=TSk1TSk2TSk1w.TSk2wE26

where

TSkiw=TSkiTSkiw,TSkiTSkjw=l=1NwlxlkixlkjE27

with wl = min (l, L, N-l) where L ≤ N/2.

An important issue is the selection of window length L which is a main parameter to determine the rank of the trajectory matrix. The window length is in the range [2, N/2]. The construction of trajectory matrix is similar to the phase space reconstruction of a nonlinear time series. The most common phase space reconstruction method is the method of delays (MOD). Many techniques have been suggested to estimate the parameters of MOD [22, 23, 24], i.e. the time delay τ and the embedding dimension m. The embedding dimension refers to the samples (separated by a fixed time delay τ) from a time window length L = (m-1)τ. The difference between the MOD and the trajectory matrix of SSA is that in MOD the m coordinates are samples separated by a fixed τ which can be greater than 1, while in the standard SSA, all the available samples in window length L are samples separated by a constant τ = 1. For the selection of window length L in SSA, there are some typical methods available, e.g. Kugiumtzis [25] suggested to take the window length L proportional to the mean orbital period” (MP), which can be determined from the spectral density of the original time series:

MP=i=1MDSfi=1Mf.DSf+1E28

where M is the number of frequencies f contained in the spectral support of the original time series, DS(f) is the spectral density of the f frequency. Several approaches to determine an optimal window length L are proposed in the literature [26, 27]. Unfortunately, those proposed methods are all case sensitive and based on the assumption that the measured time series is stationary. So far there does not exist any universal method to find an optimal window length L for an arbitrary time series.

Advertisement

3. Time series analysis and results

Numerical techniques mentioned above as EMD, EAWD, SSA have been applied to three monthly experimental time series of observation. The first time series is 22 years total ozone columns (in Dobson units) from January 1998 recorded in Natal (Brazil). The second time series is 42 years total ozone columns from 1978 recorded in Argentina. The third one is 60 years rainfall time series recorded in Conakry (Guinea).

3.1 Natal monthly total columns ozone time series analysis

The original time series is displayed in Figure 1, its size is N = 264:

Figure 1.

Natal monthly total ozone columns.

An EEMD technique has been applied to the time series presented above. After having selected relevant IMFs and add irrelevant ones to the residue, 5 IMFs and the trend has been determined. Results are shown in the Figure 2.

Figure 2.

EEMD decomposition of the Natal total ozone columns time series.

The IMF cycle is determined by the dominant frequency of its spectrum. In Figure 3, the residual trend R returned by the EEMD is compared with the trend of the original signal obtained from a moving window with a size set at the maximum of the relevant IMF cycles, i.e. 135 months named Tmb.

Figure 3.

Trend obtained by the EEMD technique (black curve). The trend of the original signal obtained from a moving window (blue curve) with a 135 months size.

The accuracy of the Trend returned by the EEMD is estimated by using the following expression:

ACEEMD=i=1NRiTmbi2/NTmb¯E29

where N is the size of the original time series and - represents the mean operator.

No mode-mixing has been detected in the results returned by the EEMD. However, because the EEMD suffers from a lack of theory, the spectral supports of IMFs are not totally disjoint. The EAWD method allows, while relying on the spectral content of the IMFs provided by the EEMD to restitute components whose spectral contents are disjoint. In this context, the EAWD technique has been applied to the Natal time series and the obtained results are displayed on Figure 4.

Figure 4.

EAWD decomposition of the Natal total ozone columns time series.

The accuracy of the trend returned by the EAWD has been estimated by using (Eq. (29)). ACEAWD = 0.2%.

The trend restituted by respectively EEMD and EAWD techniques have been superimposed with the trend obtained from the original signal by using a moving window of size 135 months as shown in Figure 5.

Figure 5.

Superimposition of the EAWD trend (red curve), EEMD trend (black curve) and the trend of the original signal obtained by using a moving window (blue curve).

The SSA technique has been applied to the Natal total ozone columns. The mean orbit period MP has been estimated by using (Eq. (28)), MP = 11. The embedding window size L has been fixed to L = 110. After having removing noisy components and applying the SSA basic procedure, 37 principal components have been found including trend and relevant periodic components. As the EEMD, EWT, EAWD acts as a bank of dyadic filter, to compare the results provided by SSA with those returned by EEMD, EAWD techniques, cycles of the 37 components extracted from the original signal by SSA are distributed along a grid 2logCymaxlog222. Thus, the SSA components whose cycles are in the interval [2i,2i + 1] are summed to form a component that can be compared to those returned by EEMD, EAWD methods. The components thus obtained are displayed in the Figure 6.

Figure 6.

SSA results on Natal ozone total columns time series.

Comparison between EAWD and SSA results are displayed on Figure 7.

Figure 7.

Comparison between SSA results (black curve) and EAWD results (red curve).

Picture (a), (c) and (d) represents the comparison between SSA and EAWD for respectively the cycles 135 months, 12 months and 6 months. In picture (c) it’s the superimposition of summed EAWD components 2 and 3 (cycles 33.75 months and 27 months) with SSA component 4 (cycle 27 months). Picture (c) is interesting from a climate forcing point of view. SSA2 component can be identified to the ENSO climate forcing (45 months) and on the graph of SSA2 (Figure 6), we can see that SSA2 energy contribution is very weak. SSA3 component can be identified to the QBO climate forcing (27 months). In Figure 6(c), the red curve is the sum the EAWD components 2 and 3 of 33.75 months and 27 months respectively (Figure 4). The difference the two curves in picture (c) is due to the presence of a very weak contribution of ENSO climate forcing in red curve (EAWD results).

3.2 Argentina monthly total columns ozone time series analysis

The original time series is displayed in Figure 8, its size is N = 507.

Figure 8.

Argentina monthly total ozone columns.

EEMD results are displayed in Figure 9.

Figure 9.

EEMD results of Argentina total ozone columns.

Mode-mixings are detected between IMFs 2 and 3 and between IMFs 5 and 6 as their dominant frequency are the same. The accuracy of the Trend returned by the EEMD is estimated to be 0.6%. An EAWD technique has been applied then to overcome mode-mixings encountered when EEMD has been applied. Results are displayed in Figure 10.

Figure 10.

EAWD results of Argentina total ozone columns.

Mode-mixings occurring in EEMD results have been removed. In EAWD results three new components have been appeared with cycles of respectively 28.5, 21.3 and 6 months. The IMF with cycle 2.4 months returned by EEMD disappeared. The trend restituted by respectively EEMD and EAWD techniques have been superimposed with the trend obtained from the original signal by using a moving window of size 170 months (Figure 11).

Figure 11.

Superimposition of the EAWD trend (red curve), EEMD trend (black curve) and the trend of the original signal obtained by using a moving window (blue curve).

The accuracy of the trend returned by the EAWD (red curve) has been estimated to be equal to 0.2%. The SSA technique has been applied to the Argentina total ozone columns. The mean orbit period MP has been estimated by using (Eq. (28)), MP = 10. The embedding window size L has been fixed to L = 110. After having removing noisy components and applying the SSA basic procedure 28 principal components have been found including trend and relevant periodic components. SSA principal components have been distributed along a grid 2logCymaxlog222. The components obtained are displayed in Figure 12.

Figure 12.

SSA results on Argentina ozone total columns time series.

Figure 13 displays the superimposition of the EAWD results (red curve) and SSA results (black curve). In pictures (a), (b) (d), (e) and (f) SSA results in black and EAWD results in red are shown for respectively the cycles 6, 12, 64, 170 months and trends. In picture (c) it’s the superimposition of summed EAWD components 3 and 4 (cycles 28 months and 21 months) with SSA component 4 (cycle 28 months). Curves in picture (c) can be identified to QBO climate forcing. Curves in pictures (d) and (e) can be respectively identified to ENSO and solar cycle.

Figure 13.

Superimposition of SSA results (black curve) and EAWD results (red curve).

3.3 Conakry monthly rainfall time series analysis

The original time series is displayed in Figure 14, its size is N = 720:

Figure 14.

Conakry monthly rainfall in millimeters.

EEMD results are displayed in Figure 15.

Figure 15.

EEMD results of Conakry rainfall.

Mode-mixings are detected between IMFs 2 and 3 and between IMFs 5 and 6 as their cycles are the same (12 months). An EAWD technique has been applied then to overcome mode-mixings encountered when EEMD has been applied. Results are displayed in Figure 16.

Figure 16.

EAWD results of Conakry rainfall time series.

Mode-mixing occurring between IMFs 2 and 3 in EEMD results has been removed. The trend restituted by respectively EEMD and EAWD techniques have been superimposed with the trend obtained from the original signal by using a moving window of size 144 months (Figure 17).

Figure 17.

Superimposition of the EAWD trend (red curve), EEMD trend (black curve) and the trend of the original signal obtained by using a moving window (blue curve).

The accuracy of the trend returned by the EAWD (red curve) has been estimated to be equal to 0.97%. The SSA technique has been applied to the Conakry rainfall time series. The mean orbit period MP has been estimated by using (Eq. (28)), MP = 9. The embedding window size L has been fixed to L = 108. After having removing noisy components and applying the SSA basic procedure 42 principal components have been found including trend and relevant periodic components. SSA principal components have been distributed along a grid 2logCymaxlog222. The components obtained are displayed in Figure 18.

Figure 18.

SSA results on Conakry rainfall time series.

Figure 19 displays the superimposition of the EAWD results (red curve) and SSA results (black curve). In pictures (a), (b), (c), (d) (e), and (f) SSA results in black and EAWD results in red are shown for respectively the cycles 6, 12, 32.7, 65.5, 144 months, and trend. As the segmentation of the original signal spectrum is not identical in EAWD and SSA, a 21 months cycle component appears in the EAWD decomposition and a 9 months cycle component appears in the SSA decomposition.

Figure 19.

Superimposition of SSA results (black curve) and EAWD results (red curve).

Advertisement

4. Conclusion

A comparative analysis of signal decomposition methods has been performed. The methods include empirical mode decomposition (EMD), ensemble empirical mode decomposition (EEMD) which can be seen as an optimization of EMD against noise, empirical wavelet transform and empirical adaptive wavelet decomposition which are methods that allows the mode-mixing problem to be avoid, and finally the singular spectrum decomposition which acts as a bank of narrow band filters and requires grouping techniques to be compared with results obtained with EAWD. Assuming that the variability components returned by the EMD are close to the physics implicitly contained in the observation time series. The EAWD technique seems to be an advantageous technique as it combines the rigor of wavelets while taking into account the spectral content of the components returned by the EMD. Thus the mode-mixing is avoid and in addition if an EEMD technique is used instead EMD, this technique becomes more robust against noise. Moreover, such a technique does not require a priori specification of the number of decomposed components. However, the SSA technique requires setting parameters like the width of the embedding window L and selecting an appropriate grouping technique to obtain components similar to those returned by the EAWD. SSA technique was found to be sensitive to parameter changes and L determines how well the components are separated. Despite the fact that SSA requires an appropriate embedding window size L, SSA is close to the underlying dynamics to the time series. Moreover, as the components extracted by SSA are independent, such a technique can be very useful for blind source separation purpose.

Advertisement

Acknowledgments

This study was led in the scope of the CNRS-LEFE-MANU SOLSTYCE project. Rainfall data were collected and qualified in the framework of the French South-African International Research Group IRP-ARSAIO (International Research Project - Atmospheric Research in Southern Africa and the Indian Ocean) supported by the NRF and CNRS, while the ozone time series were constructed within the French-Brazilian COFECUB-CAPES program.

References

  1. 1. Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time-series analysis. Proceedings of the Royal Society of London A: Math., Physical and Engineering Sciences. 1998;454(1971):903-995. DOI: 10.1098/rspa.1998.0193
  2. 2. Flandrin P, Rilling G, Gonçalvés P. Empirical mode decomposition as a filter bank. IEEE Signal Processing Letters. 2004;11(2):112-114. DOI: 10.1109/LSP.2003.821662
  3. 3. Xu B, Sheng Y, Li P, Cheng Q, Wu J. Causes and classification of EMD mode-mixing. Microengineering Procedia. 2019;22:158-164. DOI: 10.21595/vp.2018.20250
  4. 4. Gao Y, Ge G, Sheng Z, Sang E. Analysis and solution to the mode mixing phenomenon in EMD. In: 2008 Congress on Image and Signal Processing, Sanya, China. Shangai, China: IEEE; 2008. pp. 223-227. DOI: 10.1109/CISP.2008.193
  5. 5. Fosso OB, Molinas M. Method for mode mixing separation in empirical mode decomposition. September 2017. arXiv: 1709.05547v1 [stat. ME]
  6. 6. Delage O, Portafaix T, Bencheriff H, Guimbretière G, Loua RT. Multi-scale variability analysis of time series in geophysics by using the empirical mode decomposition. In: Proceedings SAGA. Durban, South Africa: HAL; October 2019. Available from: https://hal.archives-ouvertes.fr/hal-02363170
  7. 7. Gilles J. Empirical wavelet transform. IEEE Transactions on Signal Processing. 2013;61:3999-4010. DOI: 10.1109/TSP.2013.2265222
  8. 8. Delage O, Portafaix T, Bencherif H, Bourdier A, Lagracie E. Empirical adaptive wavelet decomposition (EAWD): An adaptive decomposition for the variability analysis of observation time series in atmospheric science. Non-linear process in Geophysics. 2022;29(3):265-277. DOI: 10.5194/npg-29-265-2022
  9. 9. Rehman N, Mandic DP. Filter bank property of multivariate empirical mode decomposition. IEEE Transactions on Signal Processing. 2011;55(5):2421-2426. DOI: 10.1109/TSP.2011.2106779
  10. 10. Mert A, Akan A. Detrended fluctuations analysis for empirical mode decomposition based denoising. In: 2014 22nd European Signal Processing Conference (EUSIPCO). Lisbon, Portugal: IEEE; November 2014. ISBN: 978-0-9928-6661-9
  11. 11. Gonzales JS et al. Analyzing chaos systems and fine spectrum sensing using detrended fluctuations analysis algorithm, Hindawi publishing corporation. Mathematical Problems in Engineering. 2016;2016:1-18. DOI: 10.1155/2016/2865195
  12. 12. Liu S, Ma R, Cong R, Wang H, Zhao H. A new approach for embedding dimension determination based on empirical mode decomposition. Kybernetes. 2012;41(9):1176-1184. DOI: 10.1108/03684921211275180
  13. 13. Souza DB, Chanussot J, Favre AC. On selecting relevant intrinsic mode functions in empirical mode decomposition: An energy approach. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Florence, Italy: IEEE; July 2014. DOI: 10.1109/ICASSP.2014.6853611
  14. 14. Wu Z, Huang NE. Ensemble empirical mode decomposition: A noise assisted data analysis method. Advances in Adaptive Data Analysis. 2009;1:1-41. DOI: 10.1142/SI1793536909000047
  15. 15. Jaffard S, Meyer Y, Ryan RD. Wavelets: Tools for Science and Technology. SIAM; 2001. pp. 1-256. ISBN: 0898714486
  16. 16. Meyer Y. Wavelets: Vibrations and Scaling. American Mathematical Society; 1997. ISBN: 9780821806852
  17. 17. Daubechies I. Ten lectures on wavelets. In: CBMS-NSF Conference, Society for Industrial and Applied Mathematics. SIAM; 1992. ISBN: 978-0-89871-274-2
  18. 18. Takens F. Dynamical systems and turbulence. In: Rand D, Young LS, editors. Lecture Notes in Mathematics. Vol. 898. New York: Scientific Research Publishing; 1981. pp. 366-381
  19. 19. Golyandina N, Korobeynikov A. Basic singular spectrum analysis and forecasting with R. Computational Statistics and Data Analysis. 2014;71:934-954. DOI: 10.1016/j.csda.2013.04.009
  20. 20. Bonizzi P, Bonizzi J, Karel MH. Singlar spectrum decomposition: A new method for time series decomposition. Advances in Adaptive Data Analysis. 2014;6(4):1-34. DOI: 10.1142/SI793536914500113
  21. 21. Harmouche J, Fourer D, Auger F, Borgnat P, Flandrin P. The sliding singular spectrum analysis: A data-driven non-stationary signal decomposition tool. IEEE Transactions on Signal Processing. 2017;66(1):251-263. DOI: 10.1109/TSP.2017.2752720
  22. 22. Tan E, Algar S, Correa D, Small M, Stemler T, Walker D. Selecting embedding delays: An overview of embedding techniques and a new method using persistent homology. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2023;33:1-28. DOI: 10.1063/5.0137223
  23. 23. Cao L. Practical method for determining the minimum embedding dimension of a scalar times series. Physica D: Nonlinear Phenomena. 1997;110:43-50. DOI: 10.1016/SO167-2789(97)00118-8
  24. 24. Delage O, Bourdier A. Selection of optimal embedding parameters applied to short and noisy time series from Rössler system. Journal of Modern Physics. 2017;8:1607-1632. DOI: 10.4236/jmp.2017.89096
  25. 25. Kugiumtzis D. State space reconstruction parameters in the analysis of chaotic time series-the role of the time window length. Physica D: Nonlinear Phenomena. 1996;95(1):13-28. DOI: 10.1016/0167-2789(96)00054-1
  26. 26. Kim HS, Eykhold R, Salas JD. Non linear dynamics delay times, and embedding windows. Physica D. 1999;127:48-60
  27. 27. Ma HG, Han CZ. Selection of embedding dimension and delay time in phase space reconstruction. Frontiers of Electrical and Electronic Engineering in China. 2006;1:111-114

Written By

Olivier Delage, Hassan Bencherif, Thierry Portafaix, Alain Bourdier, René Tato Loua and Damaris Kirsch Pinheiro

Submitted: 13 July 2023 Reviewed: 02 August 2023 Published: 17 October 2023