Open access peer-reviewed chapter

Online Canonical Polyadic Decomposition: Application of Fluorescence Tensors with Nonnegative Orthogonality and Sparse Constraint

Written By

Isaac Wilfried Sanou, Xavier Luciani, Roland Redon and Stéphane Mounier

Reviewed: 23 January 2023 Published: 10 July 2024

DOI: 10.5772/intechopen.110123

From the Edited Volume

Optimization Algorithms - Classics and Recent Advances

Edited by Mykhaylo Andriychuk and Ali Sadollah

Chapter metrics overview

20 Chapter Downloads

View Full Metrics

Abstract

The canonical polyadic decomposition (CPD) is now widely used in signal processing to decompose multi-way arrays. In many applications, it is important to add constraints to quickly converge on an optimal solution. In contrast to classical CPD, we then focus on online CPD. In this context, the number of relevant factors is usually unknown and can vary with time. We propose two algorithms to compute the online CPD based on sparse dictionary learning. We also introduce an application example in environmental sciences and evaluate the performances of the proposed approaches in this context on real data.

Keywords

  • third order tensor decomposition
  • online tensor decomposition
  • PARAFAC
  • rank variation
  • fluorescence spectroscopy

1. Introduction

In many fields such as psychometric [1], data mining [2], neuroscience [3], chemometric [4], telecommunications [5], computer vision [6], and biomedical image processing [7], use canonical polyadic decomposition (CPD) also known as PARAllel FActor analysis (PARAFAC).

The data from these fields can be put in a multidimensional data array (or tensor). Then, the CPD permits to decompose this array into factors (which is composed of the number of components or rank depending to the application) that can be interpreted by the user.

CPD algorithms can be summarized in three mains methods:

  • Direct methods. These methods are based on algebraic computation such as Direct TriLinear Decomposition (DTLD) [8], SEmi-algebraic framework for approximate CP decompositions via SImultaneous matrix diagonalization (SECSI) [9], and DIrect AlGorithm for canonical polyadic decomposition (DIAG) [10].

  • Alternating methods. These methods are based on least square methods, and we can cite alternating least square (ALS) [1], hierarchical alternating least square (HALS) [11, 12, 13], and alternating direction method of multipliers (ADMM) [14].

  • Descent methods. These methods are based on traditional gradient descent. We can cite some algorithms using stochastic gradient descent [15, 16, 17] or Nadam optimizer [18].

By considering these algorithms, we can say that the rank of the decomposition, namely the relevant number of factors, is known, but we find in practice that this is not the case. Indeed, this value is unknown so we make an estimate on the order of magnitude while a bad choice can have an impact on the factorial estimates [19]. Traditionally, two opposing approaches are used to address this situation.

  • Rank estimation. Rank estimation consists in estimating the appropriate CPD rank before the decomposition. Among these methods, we can cite the CORe CONsistency DIAgnostic (CORCONDIA) [20], split half validation [21], or AutoTen [22]. Though these approaches do not always allow to clearly decide between several possible rank values.

  • Overfactoring. In this method, an overestimated value is chosen, i.e., one higher than the current rank and the CPD algorithms are designed to produce additional factors with zero contribution [19]. Thus, in contrast, overfactoring is a posterior rank estimation method because the appropriate rank value is inferred from the CPD.

At this stage, we have talked about offline CPD. However, in this chapter, we are interested in online CPD. In online CPD, the data increases with the time, and the decomposition must follow this augmentation [23].

In the context of fluorescence data and environmental sciences, the online CPD can be summarized as follow:

  • 3-way array is used. This tensor is built by concatenation of a new matrix on the last mode. At each time we got a tensor call sub-tensor as shown in Figure 1.

  • In practice, some components or the rank of the decomposition can change at interval time from one sub-tensor to another. This phenomenon can be seen as appear and/or disappear of component. We talk about the rank variation but these variations are unknown.

  • The goal of online CPD in this context is to update at each new time interval the CPD factors using the factors estimated previously without performing the CPD of the whole.

Figure 1.

Example of a sub-tensor.

In the literature, some online CPD algorithms exist. For example, in [23], the authors introduced two adaptive algorithms: simultaneous diagonalization tracking (SDT) which tracks the SVD of the unfolded tensor and recursive least squares tracking (RLST). In [24], grid-based tensor factorization algorithm (GridTF) is introduced for the large tensor. Also, in [25] authors proposed OnlineCP based on ALS algorithms. In [26] authors introduced an online Tucker decomposition with a fixed rank. This list is not exhaustive, and more online algorithms can be found in [27].

For the rest of the chapter, we propose an approach based on sparse dictionary learning to compute the CPD with sparsity, nonnegativity, and orthogonality constraint. These constraints permit to handling unknown rank variation over time and the better convergence. Considering the most general case, we make no assumptions about these variations. The goal of the proposed approach is to combine dictionary learning with LASSO [28] in a simple but appropriate way for nonnegative online CPD. We have selected the factors of the two fixed modes from two dictionaries which are learned and updated throughout the online process. The appropriate number of vectors is selected from a sparsity constraint on the two atoms matrices (i.e., the CPD rank). We derived from this solution three different algorithms (one for offline CPD and the rest for online CPD). They differ in the way that the dictionaries are updated from one sub-tensor to another. Next, we propose an experiment semi-controlled to obtain real fluorescence data online with rank variations. We provide out an evaluation od our approaches with the state of the art.

Notations are introduced while recalling some basic definitions and the problem of online CPD in the particular context of fluorescence spectroscopy and environmental sciences in Section 2. In Section 3, the new proposed approach is described in the form of three algorithms for computing the nonnegative CPD of a third-order tensor. Two of these algorithms were already presented in a previous conference paper [18] and this journal paper [27]. In Section 4, an experimental design is proposed to evaluate the approaches. Results and comparisons with reference approach are provided and discussed. Section 5 concludes.

Advertisement

2. Problem formulation

We denote tensors and sub-tensors with letter T and are of size I×J×K meaning that T gather K matrix of size I×J on its last mode.

2.1 Offline CPD: Example of a fluorescence tensor

In environmental sciences, especially for fluorescence data tensors the third-order tensor is most used. The CPD of this tensor can be used in order to characterize fluorescent dissolved organic matters (FDOM) in natural water samples [21, 27, 29, 30, 31] (see Figure 2).

Figure 2.

Example of the rank 4 CPD of the fluorescence tensor [18, 27].

Fluorescence data tensors are a set of 2D signals called emission and excitation matrices (EEMs) of fluorescence measured from a set of samples. Each sample is then a mixture of an unknown number of fluorescent components (fluorophores), and each part of an EEM corresponds to the fluorescence intensity of one sample at a given couple of excitation and emission wavelengths. At low concentrations, the nonlinear model based on the Beer–Lambert law can be linearized, and it then follows the CPD model [32]. To recover each individual emission and excitation spectra of the fluorophores present in the different samples along with their respective contributions can be found using the CPD. In some applications like in our case, the decomposition factors have physical meaning and are known to be nonnegative. This constraint allows better convergence especially when the columns of the factors are collinear. Nonnegativity constraint can be imposed on the factors during optimization by projecting the values of each factor matrix on R+ [33]. This method of applying nonnegativity constraint sometimes shows its limits in certain cases due to the change of scale. In [34], the authors propose to solve this problem by involving two factors, ϵ and θ11, which multiply each factors. Nonnegativity can also be imposed by choosing an optimization method [14, 17, 35] or even using an exponential variable change [31]. In the literature, the projection method is mostly used such as in approach [14, 17, 18, 33, 36, 37]. In signal or data processing, the nonnegative CPD is usually used as a mathematical model to fit a data tensor T of size (I, J, K):

i,j,k,Ti,j,kT̂i,j,kABC=r=1RAirBjrCkrE1

The matrices AR+I×R,BR+J×R and CR+K×R are named the factor matrices. R is the CPD rank. Each column of the factor matrices A, B, and C, defines the CPD factors. We can distinguish two values of R if the CPD factors have a physical meaning:

  • The define value of the tensor T.

  • The maximal value of R for which all the factors have a physical meaning. We name this value the physical rank of T. It is usually much smaller than the tensor rank.

In the considered application, these matrices are usually full column rank so that the Kruskal condition [38] is fulfilled and guarantees that the decomposition is unique up to trivial scaling and permutation indeterminacy.

The euclidia distance F is the most used to the reconstruction error. With nonnegative constraint, the offline CPD problem is thus given by

minFABC=TT̂(ABC)F2s.t.A0,B0,C0E2

2.2 Fluorescence tensor in online CPD

In the online case, tensor T can be seen as

  • A composition of sub-tensors Tn of size IJKn acquired at various time intervals.

  • Each sub-tensor can have zero matrix in common between last and the present sub-tensor. We call this case partition with nooverlapping.

  • In opposite case, we talk about overlapping case. As explained on Figure 3, two consecutive sub-tensors share some matrices, i.e, in this case, Tn also contains the last η EEMs of Tn1. Sub-tensor overlapping should help the factor estimations but then the online decomposition becomes slower because we find ourselves with a larger number of sub-tensors.

  • Rank variation or fluorophore appearance and disappearance can be explained by sea currents or pollution events, natural degradation…, in a natural marine environment.

Figure 3.

Example of a sub-tensor.

The physical rank of sub-tensor Tn will be denoted R˜n in this chapter, and we recall that it is unknown. The online CPD problem with rank variation has been clearly described in [18, 27, 39].

In short, the online CPD consists by solving Eq. (2) for each sub-tensor Tn with the nonnegative constraint. We speak of: NonNegative Online Canonical Polyadic Decomposition (NNOnline CPD). The factor matrices estimated at time tn1 with the sub-tensor Tn1 are used to update the sub-tensor Tn in order to reduce the computational cost.

Advertisement

3. Algorithms for computing the nonnegative online canonical polyadic decomposition with sparsity and orthogonality constraint

3.1 NNCPD for the first tensor acquired at time t0

The first sub-tensor acquired at time t0 is denoted T0. At this time, we do not know the value of R˜0 but we suppose that, n,R˜n<<R. We aim to rewrite the matrices A and B as the product of sparse dictionaries DA and DB by atoms VA and VB:

A=DAVAandB=DBVBE3

with DAIR and DBJR, respectively, and VARR and VBRR. We expect that DA and DB contain R˜ true components on their columns. The factors without physical meaning are RR˜. While VA and VB have RR˜ null columns and that the other columns form generalized permutation matrix. For instance, for R=3 and R˜0=2, we could have ideally

A=972142212DA.0.80000000.90VA=7.21.800.81.801.61.80

We can easily estimate R0˜ by counting the number of non-null columns of A or B. A similar problem has been studied in depth by Cohen and Gillis in [40] but for the case in which the dictionary is known and the atom is a selection matrix. To have zeros columns on the atoms, we thus promote the sparsity constraints on the matrices VA and VB. In this purpose, we will aim at minimizing their L1,1 norm. Regarding the sparsity constraint, it is generally used for sparse tensors or to ensure the overestimation of the rank, as mentioned above. This constraint is imposed according to the goal we are looking for, and in our case, it is the overestimation of the rank. The constraint can be applied to the matrices factors in the form of a regularization term:

minxFxfonction+RxregularizationE4

F(x) corresponds here to Eq. (2) and is the attachment part. R(x) is the regularization part on the matrix a, b, and c.

L1,1 norm is a classically used to promote the sparsity [41]. Indeed, it is more tractable than the L0,0 norm. It has been used in similar contexts in [17, 35, 36, 40]. We can thus rewrite our problem as a LASSO regression problem by adding two penalty terms to the cost function (see Appendix A).

For T=T0, we solve

minF1DAVADBVBCs.t.DA,VA,DB,VB,C0whereF1=12TT̂DAVADBVBCF2+αVA1,1+αVB1,1E5

α>0 is a penalty term.

This approach is named sparse nonnegative CPD (SNNCPD). We resort to a stochastic gradient descent (SGD) algorithm called Nadam [42, 43] in order to solve Eq. (5). Details of the Nadam algorithm are given in Appendix B. In order to ensure the nonnegativity of matrix entries, all the element of matrices DA,DB,C,VA, and VB are projected on R+ at each iteration [44]. We can thus differentiate the L1,1 norm, and the gradients with respect to the different variables are given by

F1VA=DAT1DAVAL1L1+α1R,RF1VB=DBT2DBVBL2L2+α1R,RF1DA=T1DAVAL1L1VAF1DB=T2DBVBL2L2VBF1C=T3CL3L3E6

Matrices T1,T2,T3 are obtained by unfolding the tensor T with respect to the first, second, and third modes, respectively. L1=CDBVB,L2=CDAVA and L3=DBVBDAVA. The matrices VA and VB are initialized as the identity matrix. The matrices DA and DB and C are initialized with nonnegative random values. The different steps of SNNCPD are summarized in Algorithm 1.

Algorithm 1. Sparse NonNegative Canonical Polyadic Decomposition (SNNCPD). Initialization step

  • Input: T, R overestimated

while a convergence criterion is not reached do.

Update the dictionaries DA,DB, and C using Nadam optimizer.

Update the atoms VA and VB using Nadam optimizer.

end while

  • Output: (Da,Db,C,Va,Vb)

3.2 Online step: NN-CPD for the first tensor acquired at time tn with n>0

In this step, we assume that the factors matrices An and Bn are well estimated.

3.2.1 Algorithm 1

In this approach, we allow some modification of the factors estimated at time tn1 thanks to linear combinations as explained below. We first replace all the null columns of An1 and Bn1 by columns of random numbers drawn from the standard normal distribution. We then look for An and Bn as

An=UAAn1VAandBn=UBBn1VBE7

The matrices UA and UB are square with sizes I and J, respectively. Here, UAAn1 and UBBn1 are our dictionaries, while VA and VB matrices are still sparse atoms. Therefore, the optimization problem becomes

minF2UAVAUBVBCs.tUA,UB,C,VB,VB0withF2=12TT̂UAAn1VBUBBn1VBCF2+αVA1+αVB1E8

We call this algorithm online sparse and nonnegative CPD (OSNCPD). The different steps are.

summarized in Algorithm 2. Most details can be found in [18, 27].

Algorithm 2. Online Sparse and NonNegative CPD

  • STEP 1: Initialization phase

Input: T0, R overestimated.

Solve Eq. (5) with T=T0.

Compute matrices A0 and B0 using Eq. (3).

Output: A0,B0 and C0

  • STEP 2: Online phase at time tn with n>0

Input: Tn, and An1,Bn1.

Fill null columns of An1,Bn1 by random numbers and Solve Eq. (8).

Update matrices An and Bn using Eq. (7).

Output: (An,Bn,Cn and R˜n)

  • STEP 3: Return to Step 2 with n = n + 1

The advantage of this approach is therefore its flexibility. The inconvenient is that the matrices to optimize are larger which make the algorithm more expensive in time. We call this algorithm online sparse and nonnegative CPD (OSNCPD). The different stages are summarized in the Algorithm 2.

3.2.2 Algorithm 2

By taking the Eq. (7) as well as the cost function 8, the optimization of the algorithm becomes costly because of the dictionaries UA and UB. This is caused by the fact that no constraint is applied to these matrices to restrict the research space. In this approach, we will apply an orthogonality constraint on these dictionaries. The orthogonality constraint on the factors is also useful in certain applications [26, 45]. It is possible to apply the orthogonality constraint to the factors by using the Eq. 4. If, for example, A must be orthogonal then: IA=AtA. We therefore seek to minimize

minA,B,CTT̂ABCF2+ATAIAF+BTBIBF+CTCICFE9

To impose orthogonality constraint on the factors matrices for T=TN, the cost function becomes

minE2UnAVnAUnBVnBCoùE2=12TT̂UnAAn1VnBUnBBn1VBCF2+αVnA1,1+αVnB1,1+IAUnATUnAF2+IBUnBTUnBF2E10

The gradient of the cost Eq. (10) is given by

E2VnA=UnAAn1TAUnAAn1VnAZAZA+α1RE2VnB=UnBBn1TBUnBBn1VnBZBZB+α1RE2UnA=TAUnAAn1VnBZAZAVnAAn12UnAIAUnAUnA+2UnAIAUnAUnAUnA
E2UnB=TBUnBBn1VBZBZBVnBBn12UnBIBUnBUnB+2UnBIBUnBUnBUnBE2C=TCCZCZCE11

The advantage of this approach is its speed of convergence because we reduce the research space for our dictionaries. For example, if An is similar to An1, the matrix UnA is a diagonal matrix. We call this algorithm orthogonality online sparse CPD (OOSCPD). The different steps are similar with the Algorithm 2.

Advertisement

4. Experiment for fluorescence real data acquisition

We plan an experiment to obtain real data acquisition. Four well-known fluorophores are injected on a reservoir at different time intervals under quasi-real conditions. At all, we get 50 matrices or EEMs os size [21, 36] online with a spectrofluorimeter Hitachi F7000. The corresponding fluorescence tensor was partitioned into four successive sub-tensors (T0T3). Two kinds of partitions are considered. In the first partition (50% overlapping), each sub-tensor contains 20 EEMs, and two consecutive sub-tensors have ten EEMs in common. In the second partition (no overlapping, see Figure 3), sub-tensors have no EEMs in common. The initialization tensor T0 contains 20 EEMs. The other sub-tensors contain ten EEMs. During the acquisition, an extra fluorophore appears in all the EEMs due to the experimental device. A preliminary study showed that this extra fluorophore can be represented by a single additional factor in the CPD model. Thus, the rank of the sub-tensors can be 3, 4, or 5. We recall that SNNCPD is used for sub-tensors T0 in initialization phase.

4.1 Results and discussions

OSNCPD has already been compared with others online algorithms in [18, 27]. We compare the values of the physical ranks estimated by the two online approaches (OSNCPD, OOSCPD) with NN-CPD algorithm proposed in [31] and the actual values. Results are reported in Tables 1 and 2 for the 50% overlapping case and no overlapping case, respectively. The penalty coefficient term (α) used in our two algorithms and in NN-CPD is also given. We can see that NN-CPD overestimated the rank in the last sub-tensors in both cases and this means that NN-CPD creates duplicate factors. In contrast, our approaches (OSNCPD and OOSCPD) find the correct rank.

SymbolsDefinition
T,TnTensor, sub-tensor at time tn
A, A, a, aMatrix, transposed matrix, column vector, scalar
1RA matrix (R, R) of 1
IAIdentity matrix with size of matrix A
Set of real numbers
.F,.1,1Frobenius norm, L1,1 norm
A0Means that all the elements of matrix A are nonnegative
R˜n, RnPhysical rank of the sub-tensor Tn, CPD rank of the sub-tensor
Khatri-Rao product

Table 1.

Main notations used in the paper.

Sub-tensor0123α
True rank3455
NN-CPD347100.5
OSNCPD & OOSCPD34551

Table 2.

Rank estimation in the overlapping case.

When the null columns of  and B̂ and the corresponding columns of Ĉ are removed, we can compare the factor matrices estimated by our algorithms with the true factors by means of normalized root mean squared errors:

EA=AÂFAF,EB=BB̂FBFandEC=CĈFCFE12

Results are reported in Tables 3 and 4 for the 50% overlapping case and no overlapping case, respectively Table 5.

Sub-tensor0123α
True rank3455
NN-CPD345100.1
OSNCPD & OOSCPD34550.5

Table 3.

Rank estimation in the no overlapping case.

Algorithm 1 (OSNCPD)Algorithm 2 (OOSCPD)
Sub-tensor01230123
Mean EA0.170.190.190.20.170.180.20.16
Mean EB0.140.150.150.160.140.160.140.14
Mean EC0.250.250.160.200.250.210.160.18

Table 4.

Mean estimations errors in the overlapping case.

Algorithm 1 (OSNCPD)Algorithm 2 (OOSCPD)
Sub-tensor01230123
Mean EA0.170.190.200.200.170.180.210.21
Mean EB0.140.150.160.190.140.160.190.23
Mean EC0.250.320.310.320.250.310.320.25

Table 5.

Mean estimations errors in the no overlapping case.

In addition, to give some physical meaning to these error terms that may seem high, we have plotted in Figure 4 the factors obtained from our approach in the sub-tensor T4 along with the true factors. We observe a good agreement between the true and estimated factors.

Figure 4.

Emission and excitation and spectra of the fluorophores for T4. Top: Emission spectra, bottom: Excitation spectra. Red dots: estimated spectra, black lines: true spectra.

In the overlapping case, both algorithms give similar results for sub-tensors, and this can be explained that the algorithms use past information. Also, these are close to those obtained from the initialization sub-tensor T0 meaning that the online phase has been performed correctly.

In the no overlapping case OSNCPD and OOSCPD give good results although no sub-tensors are share any information. The error in matrix C seems to be high but in the no overlapping case, because the algorithms do not have the a priori information.

Advertisement

5. Conclusions

We have introduced two algorithms (OSNCPD and OOSCPD) for the online nonnegative canonical polyadic decomposition of sub-tensors. We have also introduced an offline nonnegative algorithm with sparsity constraint. Our algorithms are based on dictionary learning and can deal with unknown rank variations. OOSCPD incorporates orthogonality constraint and can be seen as an extension of OSNNCPD which incorporates nonnegativity and sparsity constraint for a better convergence.

These algorithms are presented in the particular case of the nonnegative online CPD of third-order fluorescence tensors, but they are not limited to this application field, and they can be easily extended to higher-order tensors. A real online fluorescence spectroscopy experiment was conducted in laboratory to validate our approach and compare with state-of-the-art approaches. The proposed algorithms allow to correctly follow the rank variations in most of the considered situations contrary to reference approaches. Eventually, these encouraging results allow to plan, for example, a monitoring in the natural environment in future work.

Advertisement

Conflict of interest

“The authors declare no conflict of interest.”

Advertisement

Appendix A

It is useful to mention that the CPD can incorporate different constraints depending on the field from which the data comes from. For example, it can be constraints of nonnegativity, orthogonality, or sparsity to name only the most commonly used.

Nonnegativity constraint Nonnegativity constraint is imposed in the CPD when the data to be processed is linked to physical quantities. In this sense, we can cite the domain of imagery [46] where nonnegativity is present because we know that optical signals are positive or zero. This is also the case of fluorescence spectroscopy where the data from the sensor is nonnegative [4, 30, 31, 47]. Nonnegativity constraint can be imposed on the factors during optimization by projecting the values of each factor matrix on R+ [33]. This method of imposing nonnegativity sometimes shows its limits in certain cases because of the change of scale. In [34], the authors propose to remedy this problem by involving two factors, ϵ and θ11 which multiply every factor matrix. Nonnégativity can also be imposed by choosing an optimization method [14, 17, 35] or even using an exponential variable change [48] or square [31].

Sparsity constraint Regarding the constraint of parsimony, it is generally used for parsimonious tensors or to ensure the overestimation of the rank, as mentioned above. This constraint is imposed according to the goal we are looking for, and in our case, it is the overestimation of the rank. The constraint can be applied to the matrices factors in the form of a regularization term:

minxfxfunction+rxregularizationE13

F(x) corresponds here to TT̂abcF2 and is the attachment part. R(x) is the regularization part on the matrix a, b, and c.

In the literature, the standard l1,1 is the most used as in [17, 35, 41, 46, 49], and it can be defined according to the Eq. (A1) as following:

mina,b,cTT̂abcF2+a1,1+B1,1+C1,1E14

Standard l1,1 is conventionally used compared to standard l0,0 to promote parsimony because it is easier to handle than standard l0,0. In order to minimize the rank of a matrix, we can speak of the nuclear standard which makes it possible to impose a small rank [50, 51, 52] and can be defined as a function of the Eq. (A1) as follows:

mina,b,cTT̂abcF2+a+B+CE15

In [53], the authors show that the mixed standard provides a better convex envelope of the rank function than the nuclear standard. It therefore makes it possible to minimize the rank of a matrix. In the literature, the mixed standard is increasingly used as in [54, 55, 56]. It can be defined as a function of the Eq. (A1) as follows:

mina,b,cTT̂abcF2+a2,1+B2,1+C2,1+a1,2+b1,2+c1,2E16
Advertisement

Appendix B: Stochastique gradient descent (SGD)

Take a differentiable function f(x). For each iteration k, the update of x in gradient descent is given by

xk+1=xkγf'xkE17

with γ>0, the step-size or the learning rate.

B.1 Adaptive moment estimation (ADAM)

Adaptive Moment estimation (ADAM) [43]. Adam optimizer is a variant of SGD. It acts on the gradient component using the exponential mobile average of the gradients and the step-size component by dividing the step-size by the square root of v, the square of exponential average gradients.

Let take a function f(x). For each iteration k, the update of x in gradient descent is given by

gk=fxkmk=β1mk1+1+β1gkvk=β2vk1+1+β2gk2m̂k=mk1β1kv̂k=vk1β2kxk+1=xkγv̂k+ϵm̂kE18

with γ>0 the step-size, β101,β201 and m, v is initialized to 0.

B.2 Nesterov accelerated adaptive moment estimation (Nadam)

Nadam optimizer is an extension of the Adam optimization algorithm. The algorithm was described in the 2016 article by Timothy Dozat [42]. For each iteration k of a function f(x), the update of x in gradient descent is given by:

gk=fxk1mk=β1mk1+1β1gknk=β2nk1+1β2gk2m̂k=β1mk1β1k+1β1gk1β1kn̂k=β2nk1β2xk+1=xkγm̂kn̂k+ϵE19
Advertisement

Abbreviations

ALS

alternating least square

ADMM

alternating direction method of multipliers

CPD

canonical polyadique decomposition

NNCPD

nonnegative canonical polyadique decomposition

ELS

enhenced line search

PARAFAC

parallel factor analysis

DOM

dissolve organisme matter

EEM

emission and excitation matrices

pH

potential Hydrogen

NMF

nonnegative matrix factorization

SGD

stochastique gradient descent

Nadam

nesterov accelerated adaptive moment estimation

F

fluorescence

TRP

tryptophane

5S8HQ

5-sulfate-8-hydroxyquinine

RHO

RHOdamine

SVD

singular value decomposition

References

  1. 1. Harshman RA. Foundation of PARAFAC procedure: Models and conditions for an ‘explanatory’ multi-mode factor analysis. In: UCLA Working Papers in Phonetics. 1970. pp. 1-84
  2. 2. Wang H, Ahuja N. Compact representation of multidimensional data using tensor rank-one decomposition. Vectors. 2004;1:5
  3. 3. Field AS, Graupe D. Topographic component (parallel factor) analysis of multichannel evoked potentials: Practical issues in trilinear spatiotemporal decomposition. Brain Topography. 1991;3:407-423
  4. 4. Bro R. PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory Systems. 1997;38:149-171
  5. 5. Nion D, De Lathauwer L. An enhanced line search scheme for complex-valued tensor decompositions. application in DS-CDMA. Signal Processing. 2008;88:749-755
  6. 6. Vasilescu MAO, Terzopoulos D. Multilinear analysis of image ensembles: Tensorfaces. In: European Conference on Computer Vision. Springer; 2002. pp. 447-460
  7. 7. Kopriva I, Cichocki A. Blind multispectral image decomposition by 3d nonnegative tensor factorization. Optics Letters. 2009;34:2210-2212
  8. 8. Sanchez E, Kowalski BR. Tensorial resolution: A direct trilinear decomposition. Journal of Chemometrics. 1990;4:29-45
  9. 9. Roemer F, Haardt M. A semi-algebraic framework for approximate CP decompositions via simultaneous matrix diagonalizations (SECSI). Signal Processing. 2013;93:2722-2738
  10. 10. Luciani X, Albera L. Canonical polyadic decomposition based on joint eigenvalue decomposition. Chemometrics and Intelligent Laboratory Systems. 2014;132:152-167
  11. 11. Cichocki A, Zdunek R, Phan AH, Amari SI. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. John Wiley & Sons; 2009
  12. 12. Fu X, Huang K, Sidiropoulos ND, Ma WK. Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications. IEEE Signal Processing Magazine. 2019;36:59-80
  13. 13. Phan AH, Cichocki A. Multi-way nonnegative tensor factorization using fast hierarchical alternating least squares algorithm (HALS). In: Proc. of The 2008 International Symposium on Nonlinear Theory and its Applications. 2008
  14. 14. Liavas AP, Sidiropoulos ND. Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers. IEEE Transactions on Signal Processing. 2015;63:5450-5463
  15. 15. Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Review. 2009;51:455-500
  16. 16. Tomasi G, Bro R. A comparison of algorithms for fitting the PARAFAC model. Computational Statistics & Data Analysis. 2006;50:1700-1734
  17. 17. Xu Y, Yin W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences. 2013;6:1758-1789
  18. 18. Sanou IW, Redon R, Luciani X, Mounier S. Online nonnegative canonical polyadic decomposition: Algorithms and application. In: 2021 29th European Signal Processing Conference (EUSIPCO). Vol. 2021. IEEE. pp. 1805-1809
  19. 19. André R, Luciani X, Albera L, Moreau E. A two-step algorithm for joint eigenvalue decomposition-application to canonical polyadic decomposition of fluorescence spectra. Chemometrics and Intelligent Laboratory Systems. 2020;206:104065
  20. 20. Bro R, Kiers HA. A new efficient method for determining the number of components in PARAFAC models. Journal of Chemometrics. 2003;17:274-286
  21. 21. Stedmon CA, Bro R. Characterizing dissolved organic matter fluorescence with parallel factor analysis: A tutorial. Limnology and Oceanography: Methods. 2008;6:572-579
  22. 22. Papalexakis EE. Automatic unsupervised tensor mining with quality assessment. In: Proceedings of the 2016 SIAM International Conference on Data Mining, SIAM. 2016. pp. 711-719
  23. 23. Nion D, Sidiropoulos ND. Adaptive algorithms to track the PARAFAC decomposition of a third-order tensor. IEEE Transactions on Signal Processing. 2009;57:2299-2310
  24. 24. Phan AH, Cichocki A. PARAFAC algorithms for large-scale problems. Neurocomputing. 2011;74:1970-1984
  25. 25. Zhou S, Vinh NX, Bailey J, Jia Y, Davidson I. Accelerating online cp decompositions for higher order tensors. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. 2016. pp. 1375-1384
  26. 26. Traoré A, Berar M, Rakotomamonjy A. Online multimodal dictionary learning. Neurocomputing. 2019;368:163-179
  27. 27. Sanou IW, Redon R, Luciani X, Mounier S. Online nonnegative and sparse canonical polyadic decomposition of fluorescence tensors. Chemometrics and Intelligent Laboratory Systems. 2022;225:104550
  28. 28. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological). 1996;58:267-288
  29. 29. Mounier S, Redon R. The use of 3-d fluorescence and its decomposition in environmental organic matter studies. In: Encyclopedia of Analytical Chemistry: Applications, Theory and Instrumentation. 2006. pp. 1-16
  30. 30. Murphy KR, Stedmon CA, Graeber D, Bro R. Fluorescence spectroscopy and multi-way techniques. PARAFAC Analytical Methods. 2013;5:6557-6566
  31. 31. Royer JP, Thirion-Moreau N, Comon P, Redon R, Mounier S. A regularized nonnegative canonical polyadic decomposition algorithm with preprocessing for 3D fluorescence spectroscopy. Journal of Chemometrics. 2015;29:253-265
  32. 32. Lakowicz J.. Principles of Fluorescence Spectroscopy. 2006. Vol. 1. ISBN: 978-0-387-31278-1. doi:10.1007/978-0-387-46312-4
  33. 33. Huang K, Sidiropoulos ND, Liavas AP. A flexible and efficient algorithmic framework for constrained matrix and tensor factorization. IEEE Transactions on Signal Processing. 2016;64:5052-5065
  34. 34. Jouni M, Mura MD, Comon P. Some issues in computing the cp decomposition of nonnegative tensors. In: International Conference on Latent Variable Analysis and Signal Separation. Springer; 2018. pp. 57-66
  35. 35. Vu X, Chaux C, Thirion-Moreau N, Maire S, Carstea EM. A new penalized nonnegative third-order tensor decomposition using a block coordinate proximal gradient approach: Application to 3d fluorescence spectroscopy. Journal of Chemometrics. 2017;31:2859
  36. 36. Royer JP, Thirion-Moreau N, Comon P. Computing the polyadic decomposition of nonnegative third order tensors. Signal Processing. 2011;91:2159-2171
  37. 37. Vu, X.T., Chaux, C., Maire, S., Thirion-Moreau, N., 2014. Study of different strategies for the canonical polyadic decomposition of nonnegative third order tensors with application to the separation of spectra in 3d fluorescence spectroscopy. In: 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), IEEE. pp. 1–6.
  38. 38. Kruskal JB. Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and its Applications. 1977;18:95-138
  39. 39. Pasricha R, Gujral E, Papalexakis EE. Identifying and alleviating concept drift in streaming tensor decomposition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer; 2018. pp. 327-343
  40. 40. Cohen JE, Gillis N. Dictionary-based tensor canonical polyadic decomposition. IEEE Transactions on Signal Processing. 2017;66:1876-1889
  41. 41. Lee H, Battle A, Raina R, Ng AY. Efficient sparse coding algorithms. In: Advances in Neural Information Processing Systems. Citeseer; 2007. pp. 801-808
  42. 42. Dozat T. Incorporating Nesterov momentum into Adam. In: Proceedings of 4th International Conference on Learning Representations, Workshop, Track. 2016
  43. 43. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014
  44. 44. Zdunek R, Cichocki A. Fast nonnegative matrix factorization algorithms using projected gradient approaches for large-scale problems. Computational Intelligence and Neuroscience. 2008, 2008
  45. 45. Olikier G, Absil PA, De Lathauwer L. Variable projection applied to block term decomposition of higher-order tensors. In: International Conference on Latent Variable Analysis and Signal Separation. Springer; 2018. pp. 139-148
  46. 46. Zhang Q, Wang H, Plemmons RJ, Pauca VP. Tensor methods for hyperspectral data analysis: A space object material identification study. Journal of the Optical Society of America A. 2008;25:3001-3012
  47. 47. Coble PG, Lead J, Baker A, Reynolds DM, Spencer RG. Aquatic Organic Matter Fluorescence. Cambridge University Press; 2014
  48. 48. Royer JP. Identification aveugle de mélanges et décomposition canonique de tenseurs: Application à l’analyse de l’eau. [theses]. Université Nice Sophia Antipolis. 2013
  49. 49. Cohen JE, Gillis N. Dictionary-based tensor canonical polyadic decomposition. IEEE Transactions on Signal Processing. 2018;66:1876-1889. DOI: 10.1109/tsp.2017.2777393
  50. 50. Gu S, Zhang L, Zuo W, Feng X. Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. pp. 2862-2869
  51. 51. Recht B, Fazel M, Parrilo PA. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review. 2010;52:471-501
  52. 52. Yuan M, Zhang CH. On tensor completion via nuclear norm minimization. Foundations of Computational Mathematics. 2016;16:1031-1068
  53. 53. Han X, Albera L, Kachenoura A, Senhadji L, Shu H. Low rank canonical polyadic decomposition of tensors based on group sparsity. In: 2017 25th European Signal Processing Conference (EUSIPCO). IEEE; 2017. pp. 668-672
  54. 54. Chen Y, He W, Yokoya N, Huang TZ. Hyperspectral image restoration using weighted group sparsity-regularized low-rank tensor decomposition. IEEE Transactions on Cybernetics. 2019;50:3556-3570
  55. 55. Gramfort A, Kowalski M, Hämäläinen M. Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods. Physics in Medicine & Biology. 2012;57:1937
  56. 56. de Morais Goulart JH, de Oliveira PMR, Farias RC, Zarzoso V, Comon P. Alternating group lasso for block-term tensor decomposition and application to ECG source separation. IEEE Transactions on Signal Processing. 2020;68:2682-2696

Written By

Isaac Wilfried Sanou, Xavier Luciani, Roland Redon and Stéphane Mounier

Reviewed: 23 January 2023 Published: 10 July 2024