Open access peer-reviewed chapter

Segmentation of High-Dimensional Matrix-Variate Time Series

Written By

Zhaoxing Gao

Submitted: 14 June 2023 Reviewed: 02 August 2023 Published: 10 October 2023

DOI: 10.5772/intechopen.1002891

From the Edited Volume

Time Series Analysis - Recent Advances, New Perspectives and Applications

Jorge Rocha, Cláudia M. Viana and Sandra Oliveira

Chapter metrics overview

43 Chapter Downloads

View Full Metrics

Abstract

In this chapter, we introduce a new segmentation method for high-dimensional matrix-variate time series. Specifically, we look for linear transformations to segment the matrix into many small sub-matrices for which each of them is uncorrelated with the others both contemporaneously and serially, thus they can be analyzed separately, which will greatly reduce the number of parameters to be estimated in terms of modeling. To overcome the identification issue, we propose a two-step and more structured procedure to segment the rows and columns separately. When the dimension is large in relation to the sample size, we assume the transformation matrices are sparse and use threshold estimators for the (auto) covariance matrices. Unlike principal component analysis (PCA) for independent data, we cannot guarantee that the required linear transformation exists. When it does not, the proposed method provides an approximate segmentation, which may be useful for forecasting. The proposed method is illustrated with simulated data examples.

Keywords

  • high dimension
  • matrix-variate time series
  • segmentation
  • dimension reduction
  • eigen-analysis

1. Introduction

Modern scientific studies often gather data under combinations of multiple factors. For example, neuroimaging experiments record brain activity at multiple spatial locations, at multiple time points, and under a variety of experimental stimuli. Studies of social networks record social links of a variety of types from multiple initiators of social activity to multiple receivers of the activity. Data such as these are naturally represented not as lists or tables of numbers, but as multi-indexed arrays or tensors. As many types of such data are collected over time, it is natural to view them as tensor-valued time series. The matrix-valued time series is a sequence of second-order random tensors. For example, financial and economic studies often collect data from different countries with a number of economic indicators (e.g. GDP growth, unemployment rate, etc.) every quarter. Therefore, it is important and interesting to develop appropriate statistical methods to analyze such types of data. The most common approach to modeling such data is to stack the matrix into a large vector, and then apply the standard multivariate methods. However, such an approach will ignore the matrix structure of the data, which can lead to inefficient use of data and important patterns in the data being overlooked. For example, [1] pointed out that after vectorizing the matrices the resulting vectors have a Kronecker structure. Ignoring this structure means that a much larger number of parameters need to be estimated. Therefore, it is urgent to find an effective way to reduce the number of parameters, especially when the dimension is large.

When the data are independent and identically distributed (i.i.d.), [2] introduced dimension folding sufficient reduction for conditional mean functions, [3] proposed a dimension folding method for data with matrix-valued predictors, [4, 5, 6] extended the generalized linear models to the matrix- and tensor-valued predictors for analyzing image data. Ding and Dennis [7] studied the matrix-variate regression with a matrix-variate response. An incomplete list of publications also includes [8, 9, 10, 11, 12]. With temporal dependence, the matrix-valued time series has not been well studied in the literature; [13] handled this kind of data in signal and image processing, and [14, 15] proposed two versions of factor models for matrix-valued time series, which maintain and utilize the matrix structure to achieve the dimension reduction.

In this chapter, we extend the PCA approach of [16] to matrix-variate time series without stacking the matrix into a vector, and the structure can be preserved. Our goal is as follows: Let Yt=yijt be a p×q matrix-variate time series, that is. there are pq recorded values at each time, for example, p individuals and over q indices or variables. We assume Yt can be represented as

Yt=BWtAT,E1

where BRp×p, ARq×q, and Wt is a latent p×q matrix in which the rows are divided into p1p groups, and there are no correlations across different groups at all time lags, and the columns are divided into q1q groups, and there are no correlations across different groups at all time lags either. With such a decomposition, we only need to model the small sub-matrices in W separately, and we can achieve substantial dimension reduction. As B, Wt, and A are all latent ones and the identification is a big issue. For example, even when Wt is observable, AB can be replaced by A/ccB for any nonzero constant c without changing the relationship of (1).

Instead of estimating them simultaneously, we propose in this paper a two-step and more structured approach; first, we seek a column transformation, that is. We transform linearly the columns of Yt into q new variables, and ideally, those q new variables form q1 uncorrelated groups with q1q. The second step applies the same segmentation method to the p rows of the obtained ones in the first step, and the transformation of the rows will not alter the uncorrelatedness of the column groups in the first step. In the end, this new matrix can be divided into several smaller sub-matrices, and those sub-matrices are uncorrelated with each other both contemporaneously and serially. Our method is a building block for modeling tensor-valued time series, and it turns out that all tensor-valued time series can be rearranged as a matrix time series by matricization, see [17].

Advertisement

2. Methodology

2.1 Setting and method

Let Yt=y1tyqt be an observable p×q matrix-valued time series with yitRp. We assume Yt admits a latent segmentation structure:

Yt=XtAT,E2

where Xt is an unobservable p×q matrix-valued time series in which the q columns can be classified into q1 (>1) groups, and any two groups are contemporaneously and serially uncorrelated, and ARq×q is an unknown constant matrix. Before we proceed further, we give the definitions of row- and column-covariance matrix between two random matrices.

Definition 1 Let UtRs1×r1 and VtRs2×r2. If r1=r2=r, the covariance matrix over the columns between Ut and Vt is defined as

CovcUtVt1rEUtEUtVtEVtT,E3

and if s1=s2=s, the covariance matrix over the rows is defined as

CovrUtVt1sEUtEUtTVtEVt=CovcUtTVtT.E4

The variances VarcUt and VarrUt can be defined in a similar way. In particular, when r=1 or s=1, (3) or (4) reduces to the traditional case for two random vectors.

In model (2), we assume Yt and Xt are both weakly stationary in the sense that the means and the autocovariances do not vary with respect to time for any fixed pq. The stationarity of Yt can be inherited from Xt through (2), and a sufficient condition for this is to assume VecXt and VecYt are stationary, where Vec is the vectorization of a matrix. Denote the segmentation of Xt by

Xt=x1txqt=Xt1Xtq1E5

with CovrXtiXsj=0 for all t,s and ij. Therefore, all the autocovariances of XtT are of the same block-diagonal structure with q1 blocks, and Xt1,,Xtq1 can be modeled or forecasted separately as far as their linear dynamic structure is concerned.

Now, we spell out how to find the segmentation transformation under (2) and (5). Without loss of generality, we assume

VarrYt=IqandVarrXt=Iq.E6

The first equation in (6) is implied by replacing Yt by YtŜy,01/2, where Ŝy,0 is a consistent estimator of VarrYt. The second equation is to conceptually replace Xt by XtŜx,01/2, where Ŝx,01/2 is a consistent estimator for VarrXt, and it will not alter the fact that there are no correlations across different groups. As both A and Xt are unobservable, (6) implies that we can view Ŝy,01/2AŜx,01/2 as A. As a consequence of (6), the transformation matrix A in (2) is orthogonal. Let lj be the number of columns of Xtj with l1++lq1=q. Write A=A1Aq1, where AjRq×lj. It follows from (2) and (5) that

Xtj=YtAj,j=1,,q1.E7

However, similar to that in [16], A and Xt are not uniquely identified in (2), even with additional assumption in (6). For example, let Hj be any lj×lj orthogonal matrix and H=diagH1Hq1. Then AXt in (2) can be replaced by AHXtH, while (5) still holds. In fact, only MA1,,MAq1 are uniquely defined by (2), where MAj denotes the linear space spanned by the columns of Aj. As a result, YtΓj can be taken as Xtj for any q×lj matrix Γj as long as ΓjTΓj=Ilj and MΓj=MAj. Thus, to estimate A=A1Aq1, it is sufficient to estimate the linear spaces MA1,,MAq1.

To discover the latent segmentation, we introduce some notation first. We denote yi:t and xi:t the row vectors of Yt and Xt, respectively. For any integer k, let Σyk=CovrYt+kYt, Σxk=CovrXt+kXt, Σy,i,jk=Covyi:t+kyj:t, and Σx,i,jk=Covxi:t+kxj:t. By (6), we have Σy0=Σx0=Iq. For a pre-specified integer k0, define

Wy=k=0k0ΣykΣykT=Iq+k=1k0ΣykΣykTE8

and

Wx=k=0k0ΣxkΣxkT=Iq+k=1k0ΣxkΣxkT.E9

It follows from (2) and (5) that both Σxk and Wx are block-diagonal and

Wy=AWxAT.E10

Note that both Wy and Wx are positive definite matrices; therefore, we have the following decomposition

WxΓx=ΓxD,E11

where Γx is a q×q orthogonal matrix with the columns being the orthonormal eigenvectors of Wx, and D is a diagonal matrix with the corresponding eigenvalues as the elements on the main diagonal. By (10) and (11), WyAΓx=AΓxD and hence the columns of ΓyAΓx are the orthonormal eigenvectors of Wy. Consequently,

YtΓy=YtAΓx=XtΓx,E12

where the last equality follows from (2). Let

Wx=diagWx,1Wx,q1,E13

where Wx,j is an lj×lj positive definite matrix, and the eigenvalues of Wx,j are also the eigenvalues of Wx. Suppose that Wx,i and Wx,j do not share the same eigenvalues for any ij. Then if we line up the eigenvalues of Wx (i.e. the eigenvalues of Wx,1,,Wx,q1 combining together) in the main diagonal of D according to the order of the blocks in Wx, Γx must be a block-diagonal orthogonal matrix of the same shape as Wx; see Proposition 1 (i). However, the order of the eigenvalues is latent, and any Γx defined by (11) is nevertheless a column-permutation (i.e. a matrix consisting of the same column vectors but arranged in a different order) of such a block-diagonal orthogonal matrix; see Proposition 1(ii). By Proposition 1(i), write Γx=diagΓx,1Γx,q1, it follows from (5) and (12) that

YtΓy=XtΓx=Xt1Γx,1Xtq1Γx,q1,E14

Hence, XtΓx does not alter the fact that there are no correlations between different groups in Xt, and Γy can be regarded as A so long as the eigenvalues of Wx are ordered appropriately. As they are all latent, YtΓy can be taken as a permutation of Xt, and Γy can be viewed as a column-permutation of A; see the discussion below (7). This leads to the following three-step estimation for A and Xt:

Step 1. LetŜy,0be a consistent estimator forVarrYt. ReplaceYtbyYtŜy,01/2.

Step 2. LetŜbe a consistent estimator forWy. Calculate anq×qorthogonal matrixΓ̂ywith columns being the orthonormal eigenvectors ofŜ.

Step 3. The columns ofÂ=Â1Âq1are a permutation of the columns ofΓ̂ysuch thatX̂t=YtÂis segmented intoq1uncorrelated sub-matrix seriesX̂tj=YtÂj,j=1,,q1.

In Steps 1 and 2, the estimators Ŝy,0 and Ŝ should be consistent and will be constructed under various scenarios in Section 3 below. The permutation in Step 3 can be carried out by grouping the columns of ẐtYtΓ̂y.

We now state a proposition that demonstrates the assertion after (13); the proof is similar to Proposition 1 in [16], and we therefore omit it.

Proposition 1: (i) The orthogonal matrix Γx in (11) can be taken as a block-diagonal orthogonal matrix with the same block structure as Wx. (ii) An orthogonal matrix Γx satisfied (11) if and only if its columns are a permutation of the columns of a block-diagonal orthogonal matrix described in (i), provided that any two different blocks Wx,i and Wx,j do not share the same eigenvalues.

From Proposition 1, we can see that the proposed method will not be able to separate Xti and Xtj if Wx,i and Wx,j share one or more common eigenvalues. But, it does not rule out the possibility that each block Wx,j may have multiple eigenvalues.

2.2 Permutation

2.2.1 Permutation rule

According to the discussion in Section 2.1, Â is a permutation of the columns of Γ̂y, and the permutation can be carried out by grouping the columns of ẐtYtΓ̂y into q1 groups, where q1 and the number of columns lj (1jq1) are unknown. Let Ẑt=ẑ1tẑqt, Zt=YtΓy=z1tzqt, and Γi,jh denote the covariance matrix between two series ẑit and ẑjt at lag h, that is. Γi,jh=Corrẑit+hẑjt. We say that ẑit and ẑjt are connected if the multiple null hypothesis

H0:Γi,jh=0foranyh=0,±1,±2,,±mE15

Is rejected, where m1 is a prescribed integer. We should mention that the true Γi,jh is not known since ẑit is also one estimator for zit, but it will be asymptotically equivalent to Corrzit+hzjt so long as Γ̂y is consistent to Γy. Given the structure of Wx, this can be done under some regularity conditions, and therefore we also denote the true Γi,jh=Corrzit+hzjt, and the estimator Γ̂i,jh=Corr̂ẑit+hẑjt. The permutation in Step 3 in Section 2.1 can be performed as follows.

  1. Start with theqgroups with each group containing one column ofẐtonly.

  2. Combine two groups together if one connected pair is found.

  3. Repeat Step ii above until all connected pairs are within one group.

We introduce below one way to identify the connected pairs of the transformed matrix Ẑt.

2.2.2 Maximum cross-correlation method

Similar to [16], one natural way to test hypothesis H0 in (15) is to use the maximum cross-correlation over all elements of Γi,jh and all the lags between m to m:

L̂nij=maxhmΓ̂i,jh,E16

where Γ̂i,jh is a sample correlation matrix between ẑit and ẑjt at lag h when the dimension p and q are fixed, and it is a block-wisely thresholded sample correlation matrix when p and q are moderately high, and it will be constructed in the same way as that in [18]. We would reject H0 for the pair ẑitẑjt if L̂nij is greater than an appropriate threshold value.

For the q0=qq1/2 pairs of Ẑt, we propose a ratio-based method to those pairs for which H0 will be rejected. We rearrange the q0 pairs obtained L̂nij ‘s in descending order: L̂1L̂q0. Define

d̂=argmax1j<c0q0L̂j/L̂j+1,E17

where c001 is a prescribed constant. Similar ideas can be found in [16, 19].

Advertisement

3. Numerical results

3.1 Simulation

In this section, we illustrate the finite sample properties of the proposed methodology using simulated data. We only study the performance of the column transformation in (2) since the row transformation in the second stage is essentially the same. As the estimated, Â is an orthogonal matrix for the normalized model in which VarrYt=VarrXt=Iq. We should use Ŝy,01/2AŜx,01/2 instead of A in computing estimation error defined as

DMH1MH2=11minr1r2trP1P2,E18

where Hi is a p×ri matrix with rank Hi=ri and Pi=HiHiHi1Hi for i=1,2. Let A=Σy01/2AΣx01/2A1Aq1, and Σy01/2A=H1Hq1. Since Σx01/2 is a block-diagonal matrix, it holds that MHj=MAj for 1jq1. Therefore, we only need to replace A by Ŝy,01/2A.

Since the goal is to specify (via estimation) the q1 linear spaces MAj,j=1,,q1, simultaneously, we first introduce the concept of a ‘correct’ specification. We call Â=Â1Âq̂1 a correct specification for A if (i) q̂1=q1, and (ii) rank Âj=rankAj for j=1,,q1, after rearranging the order of Â1,,Âq̂1 (we still denote the rearranged sub-matrices as Â1,,Âq̂1 for simplicity in notation). When more than one Aj have the same rank, we pair each of those Aj with the Âj for which

DMAjMÂj=minrankÂi=rankAjDMAjMÂi.E19

Note that a correct specification for A implies a structurally correct segmentation for Xt, which will be abbreviated as ‘correct segmentation’ hereafter. For a correct segmentation, we report the estimation error defined as

D¯ÂA=1q1j=1q1DMAjMÂj.E20

In addition to the correct segmentations, we also report the proportions of the near-complete (NC) segmentations with q̂1=q11 in all the following examples.

Example 1. We consider the model (2) with p=3 and q=6. The columns of Xt are generated as follows:

xit=ηt+i11i=1,2,3,xit=ηt+i42i=45,andx6t=ηt3,E21

where

ηtj=Φjηt1j+εtjΘjεt1j,j=1,2,3.E22

The elements of Φj are drawn independently from U33, and then normalized by 0.9×Φj/Φj2 so that ηj is stationary, and the elements of Θj are drawn independently from U11. Meanwhile, the elements of the transformation matrix A are also drawn independently from U33. Thus Xt consists of three independent sub-matrices with, respectively, 3, 2, and 1 columns. In the experiments, we choose c0=0.75 in (17) and k0=2 in (8), and the other k0 ‘s give similar results. The sample sizes n are 100, 200, 300, 400, 500, 1000 and 1500, and the number of replications is 500 for each case. The proportions of the correct, incorrect, and near-complete (NC) segmentations are reported in Table 1. From Table 1, we can see that the proposed method improves as the sample size increases. With a dimension of pq=18, we can see that the performance is reasonably well even for a small sample size, and the sum of the proportions of the complete and the near complete ones is more than 90% for a small sample size n=100, from which we can see that we have achieved sufficient dimension reduction. We next study the estimation errors (20), and the box plots of the errors in the complete segmentations are shown in Figure 1. From Figure 1, we can see that the proposed method improves as the sample size increases.

n10020030040050010001500
Correct segmentation0.6080.6680.7480.8060.8640.9480.980
Incorrect segmentation0.3920.3320.2520.2940.2460.0520.020
NC segmentation with q̂1=20.2980.3060.2500.1900.1360.0520.020

Table 1.

The proportions of correct and incomplete segmentations in Example 1. p=3,q=6.

Figure 1.

The boxplots of estimation errors D¯ÂA in Example 1.

As a concrete example, we report the correlogram of Yt for one replication. We choose m=10, and for 0km in Figure 2, the correlation between yit and yjt are computed by

Figure 2.

Cross correlogram of the yit in Example 1, n= 1500.

Corr̂yit+kyjt=diagΣ̂y,i,i01/2Σ̂y,i,jkdiagΣ̂y,j,j01/2.E23

From Figure 2, we can see that each of the columns of Yt are highly correlated with the others. We then apply our method to Yt, and Figure 3 depicts the cross correlogram of the transformed matrix Zt=YtΓ̂y, and the scales of the y-axes are not the same. We can see from Figure 3 that the columns of Zt can be divided into three groups: 1,3,6, 24, and 5.

Figure 3.

Cross correlogram of the transposed series ẑit=Ytv̂i in Example 1, n=1500. The components of Ẑt can be segmented into 3 groups: {1,3,6}, {2,4}, and {5}.

Example 2. In this example, we slightly increase the dimension as p=q=6, and the data-generating processes are the same as those in Example 1. We observe that the performance of the proposed method is not as good as that in Example 1 when the dimension is higher with pq=36. Nevertheless, similar conclusions can also be obtained from Table 2 such as the proportion of the complete segmentation increases as the sample size becomes larger, and the sum of the proportions of the complete and the near complete ones is more than 90% even for a small sample size. The box plots of the estimation errors are similar to that in Example 1, and hence we do not report it here. From this example, we can see that when the dimension is higher, the proposed method without thresholding may not work well, and we will illustrate this point in the next example.

n10020030040050010001500
Correct segmentation0.5040.6200.6480.6700.7320.7660.780
Incorrect segmentation0.4960.3800.3520.3300.2680.2340.220
NC segmentation with q̂1=20.3540.3200.3120.3080.2640.2300.210

Table 2.

The proportions of correct and incomplete segmentations in Example 2. p=6,q=6.

Example 3. We consider model (2) with p=q=10, and the dimension is pq=100. The columns of Xt are generated as follows:

xit=ηt+i11i=1,2,3,4,xit=ηt+i52i=5,6,7,xit=ηt+i83i=89,andx10t=ηt4,E24

where

ηtj=Φjηt1j+εtjΘjεt1j,j=1,2,3,4,E25

and Φj and Θj are generated in the same way as Example 1. To fulfill a sparsity assumption, let

Bi=cosθiπsinθiπsinθiπcosθiπ,A=diagB1B2B3B4B5,E26

where θ1=π/5, θ2=π/6, θ3=π/7, θ4=π/8, θ5=π/9. Thus, A is a sparse orthogonal transformation matrix. Since the covariance of Xt and Yt are all block-diagonal by the data generating process, hence Ŝy,01/2ASx,01/2 is also block-diagonal and hence satisfies Assumption 3. If we apply our methodology to this model without thresholding the covariance matrices, the results are reported in Table 3. From Table 3, we can see that the proportions of the complete segmentations are pretty low for all the sample sizes, and it does not necessarily improve as the sample size increases. Now, we adopt the thresholding technique as that in [18], and the choice of the threshold is computed by a cross-validation method therein. Table 4 presents the proportions of the correct, incorrect, and near-complete segmentations for model (2) with pq=100 dimensions. From Table 4, we can see that the thresholding technique works reasonably well for even small sample sizes. Figure 4 reports the boxplots of the estimation errors, from which we can see that the estimation is very accurate, especially when the sample size is large. Figure 5 displays the cross correlogram of Yt for one instance of the 500 replications, and the correlations are calculated in the same way as that in Example 1 with thresholded estimators for the (auto) covariance matrices, and Figure 6 gives the correlogram of the transformed data matrix Zt=YtΓ̂y. We can see from Figure 5 that all the columns of Yt are highly correlated, while those of Zt can be separated into four groups: 1,3,4,8, 2,5,7, 610, and 9. The threshold of the correlation is 0.39 according to the rule in [18], and those pairs will be treated as uncorrelated ones if their maximal correlation is below this threshold level. Together with Tables 3 and 4 and Figures 4 and 5, we can achieve substantial dimension reduction using our proposed method.

n10020030040050010001500
Correct segmentation0.1480.1840.1860.1700.2160.1740.164
Incorrect segmentation0.8520.8160.8140.8300.7840.8260.836
NC segmentation with q̂1=30.2160.3100.3880.4700.4620.6740.744

Table 3.

The proportions of correct and incomplete segmentations in Example 3 without thresholding. p= 10, q = 10.

n10020030040050010001500
Correct segmentation0.4400.5720.6540.6660.8400.9100.936
Incorrect segmentation0.5600.4280.3460.3340.2600.0800.064
NC segmentation with q̂1=30.4900.2940.2820.2860.0840.0620.042

Table 4.

The proportions of correct and incomplete segmentations in Example 3 by thresholding. p= 10, q = 10.

Figure 4.

The boxplots of estimation errors D¯ÂA in Example 3.

Figure 5.

Cross correlogram of the yit in Example 3, n= 1500.

Figure 6.

Cross correlogram of the transposed series ẑit=Ytv̂i in Example 3, n= 1500. The components of Ẑt can be segmented into 4 groups: {1,3,4,8}, {2,5,7}, {6,10}, and {9}.

Advertisement

4. Conclusions

In this chapter, we proposed a new procedure to reduce the dimension of matrix-variate time series by looking for linear transformations to segment the matrix into many small sub-matrices, for which each of them is uncorrelated with the others both contemporaneously and serially, thus they can be analyzed separately, which will greatly reduce the number of parameters to be estimated in terms of modeling. We propose a two-step and more structured procedure to segment the rows and columns separately. When the dimensions are large in relation to the sample size, we assume the transformation matrices are sparse and use threshold estimators for the (auto) covariance matrices. Simulation studies suggest that our procedure works well in segmenting matrix-variate series, and it provides another option for researchers and practitioners to choose from the toolbox in matrix-variate time series modeling.

Advertisement

Acknowledgments

This research was supported in part by the Start-up Fund granted by the “100 Talents Program” of Zhejiang University and the National Natural Science Foundation of China (NSFC, 12201558).

Advertisement

Additional information

An earlier version of this chapter was previously published as a preprint in: Gao Z. Segmenting High-dimensional Matrix-valued Time Series via Sequential Transformations [Internet].arXiv.org. 2020 [cited 2023 Sep 4]. Available from:https://arxiv.org/abs/2002.03382.

References

  1. 1. Werner K, Jansson M, Stoica P. On estimation of covariance matrices with Kronecker product structure. IEEE Transactions on Signal Processing. 2008;56(2):478-491. DOI: 10.1109/TSP.2007.907834
  2. 2. Xue Y, Yin X. Sufficient dimension folding for regression mean function. Journal of Computational and Graphical Statistics. 2014;23(4):1028-1043. DOI: 10.1080/10618600.2013.859619
  3. 3. Li B, Kim MK, Altman N. On dimension folding of matrix-or array-valued statistical objects. The Annals of Statistics. 2010;38(2):1094-1121. DOI: 10.1214/09-AOS737
  4. 4. Hung H, Wang CC. Matrix variate logistic regression model with application to EEG data. Biostatistics. 2012;14(1):189-202. DOI: 10.1093/biostatistics/kxs023
  5. 5. Zhou H, Li L, Zhu H. Tensor regression with applications in neuroimaging data analysis. Journal of the American Statistical Association. 2013;108(502):540-552. DOI: 10.1080/01621459.2013.776499
  6. 6. Zhou H, Li L. Regularized matrix regression. Journal of the Royal Statistical Society. 2014;B76(2):463-483. DOI: 10.1111/rssb.12031
  7. 7. Ding S, Dennis CR. Matrix variate regressions and envelope models. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2018;80(2):387-408. DOI: 10.1111/rssb.12247
  8. 8. Gupta AK, Nagar DK. Matrix Variate Distributions. Boca Raton, FL: Chapman & Hall/CRC; 2000. DOI: 10.1201/9780203749289
  9. 9. Leng C, Tang CY. Sparse matrix graphical models. Journal of the American Statistical Association. 2012;107(499):1187-1200. DOI: 10.1080/01621459.2012.706133
  10. 10. Yin J, Li H. Model selection and estimation in the matrix normal graphical model. Journal of Multivariate Analysis. 2012;107:119-140. DOI: 10.1016/j.jmva.2012.01.005
  11. 11. Zhao J, Leng C. Structured lasso for regression with matrix covariates. Statistica Sinica. 2014;24:799-814. DOI: 10.5705/ss.2012.033
  12. 12. Zhou S. Gemini: Graph estimation with matrix variate normal instances. The Annals of Statistics. 2014;42(2):532-562. DOI: 10.1214/13-AOS1187
  13. 13. Walden A, Serroukh A. Wavelet analysis of matrix-valued time series. Proceedings: Mathematical, Physical and Engineering Sciences. 2017;2002(458):157-179. DOI: 10.1098/rspa.2001.0866
  14. 14. Wang D, Liu X, Chen R. Factor models for matrix-valued high-dimensional time series. Journal of Econometrics. 2019;208(1):231-248. DOI: 10.1016/j.jeconom.2018.09.013
  15. 15. Gao Z, Tsay RS. A two-way transformed factor model for matrix-variate time series. Econometrics and Statistics. 2023;27:83-101. DOI: 10.1016/j.ecosta.2021.08.008
  16. 16. Chang J, Guo B, Yao Q. Principal component analysis for second-order stationary vector time series. Annals of Statistics. 2018;46(5):2094-2124. DOI: 10.1214/17-AOS1613
  17. 17. Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Review. 2009;51(3):455-500. DOI: 10.1137/07070111
  18. 18. Bickel PJ, Levina E. Covariance regularization by thresholding. The Annals of Statistics. 2008;36(6):2577-2604. DOI: 10.1214/08-AOS600
  19. 19. Lam C, Yao Q. Factor modeling for high-dimensional time series: Inference for the number of factors. The Annals of Statistics. 2012;40(2):694-726. DOI: 10.1214/12-AOS970

Written By

Zhaoxing Gao

Submitted: 14 June 2023 Reviewed: 02 August 2023 Published: 10 October 2023