Random Processes Defined by Stochastic Differential and Difference Equations

Pavel Loskot

doi:10.5772/intechopen.1005516

Abstract

Random processes are a key component in modeling stochastic systems. They are often defined by stochastic differential or, in discrete time, difference equations (SDEs), which precisely describe how the instantaneous time-dependent and for multidimensional processes also location-dependent values vary. The solutions of SDEs are Markov processes, which is important in the analysis and simulations. The aim of this chapter is to examine various SDE models of random processes and their specific cases, for example, the random processes that arise in statistical filtering of random signals. The problem of discretizing continuous processes and linearizing nonlinear processes is also considered. Since general Langevin equation is often difficult to solve analytically, in many scenarios, it is sufficient as well as useful to solve the corresponding Fokker-Planck equation or master equation in order to obtain the time-evolving distribution of the system states. After outlining the most common random processes in continuous and discrete time such as Bernoulli and Poisson sequences, telegraph signal, and Wiener and Gaussian processes, numerical methods for solving SDEs are discussed. The chapter concludes by presenting a general method for generating random processes that are defined by their probability distribution and the auto-covariance or auto-correlation function (ACF).

Keywords

dynamic system
Fokker-Planck equation
Kalman filter
Langevin equation
master equation
stochastic differential equation
random process

Author Information

Show +

Pavel Loskot*
- ZJU-UIUC Institute, Haining, Zhejiang, China

*Address all correspondence to: pavelloskot@intl.zju.edu.cn

1. Introduction

Random processes have been studied in different forms for decades. They are integral component of describing stochastic systems and their dynamics. They frequently appear in many applications of statistical signal processing, and they are also considered in modeling phenomena in engineering, computer science, statistical physics, social sciences, and more recently also in computational biology and medicine. Therefore, there exists very rich literature, and in many cases, also a good theoretical understanding of their models and properties. In practical settings, the choice of a random process is dictated by the specific application as well as by the tractability of mathematical analysis and feasibility of producing numerical results.

Among different models of random processes, the processes described by stochastic differential (in continuous time) or difference (in discrete time) equations (SDEs) are probably the most common. The SDE models are sufficiently versatile and in many cases offer good theoretical frameworks and efficient numerical algorithms. More importantly, the solutions of these SDEs are Markov processes, which are more prone to analysis and simulations.

This chapter explores different forms of SDEs that are used to define various random processes. It is clear that this is a very large topic with many results, which has been evolving for decades. The aim is to highlight the key ideas while the detailed description can be found in the cited literature. The advanced theoretical concepts were limited to minimum in order to make the discussion accessible for typical graduate students in engineering. The references consist only of the textbooks and monographs on stochastic processes [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. In particular, the reference [4] is a good introductory textbook. The book [17] is a recommended reading to gain more advanced theoretical foundations. The fundamental concepts in statistical signal processing are provided in [14] and in estimation theory in [11]. The aspects of computer simulations involving random processes are considered in [16].

The topics that are not covered, but which may nevertheless be highly relevant to stochastic processes, include the boundary conditions, absorbing processes, the first passage time and other such metrics, many methods and software tools for numerically solving the SDEs and integro-differential equations, analytical methods for solving SDEs having a specific structure, working with SDEs involving spatial diffusion, defining and generating complex-valued random processes and other. Some examples of more recent research topics in stochastic processes that are not covered in this chapter include their use in machine learning models and algorithms, defining stochastic processes in complex topological spaces such as manifolds, stochastic processes on graphs, multivariate stochastic processes involving mixed distributions, partially defined stochastic processes and so on.

This chapter also assumes that the readers are already familiar with the basic concepts in stochastic processes such as the probability distributions of different orders, stationarity, ergodicity, definition of the ACF, Markovian property, and understand linear auto-regressive modeling of stochastic processes, and why Kalman filter has been introduced. It is an advantage to already have working knowledge of differential equations and the strategies to obtain their solutions.

The rest of this chapter is organized as follows. Section 2 outlines various linear and nonlinear SDE models in continuous and discrete time and also considers the problems of discretization and linearization. Section 3 defines the Langevin, Fokker-Planck and master equations. The common random processes are presented in Section 4. Numerical methods for solving SDEs are discussed in Section 5. Section 6 provides a general method for generating random processes with the defined distribution and the ACF. Section 7 concludes the chapter.

The following mathematical notations are used in this chapter. A scalar, vector and matrix process, respectively, is denoted as xt, xt, and Xt, in continuous time, t∈R, and as xt,xt, and Xt, in discrete time, t∈Z. The first time-derivative is denoted as ẋt=dxt/dt, whereas the k-th derivatives, k>1, are denoted as xkt=dkxt/dtk. Other used symbols: ∣⋅∣ denotes absolute value, E⋅ denotes expectation, Prob⋅ denotes probability, δt denotes Dirac delta function, the Kronecker delta function, δij=1, if i=j, and 0 otherwise, I is an identity matrix, 0 is an all-zeros vector or matrix, ⋅T denotes a matrix transpose, ⋅! is the factorial, and ≈ indicates approximation. Moreover, for simplicity, all signals and values are assumed to be real-valued.

2. Modeling random processes as SDEs

Random processes can be conveniently modeled by SDEs. These models can be linear or nonlinear and defined in continuous time or in discrete time. The continuous-time nonlinear SDEs are preferred for developing theoretical foundations of stochastic processes, since they allow using differential (and integral) calculus. The discrete-time linear SDEs are normally used in generating the samples of stochastic processes in computer simulations and in implementing the methods of digital signal processing.

2.1 Linear SDE models

In continuous time, a linear random process, xt, is defined by the m-th order linear non-homogeneous differential equation,

∑i=0maitxit=stE1

where ait as well as st are other defined random processes, or ait can be random constants. It is customary to assume that the driving process (input noise), st, of model (1) is generated as a linear combination of higher-order derivatives, i.e.,

st=∑i=0MbktuitE2

where bit and ut are other defined random processes. Typically, bit are deterministic functions or constants, whereas ut is a stationary, zero-mean, unit-variance white noise, i.e., its ACF is Eutut+τ=δτ.

In discrete time, a linear random process, xt, is defined by the m-th order linear non-homogeneous difference equation,

∑i=0maitxt−i=stE3

with the driving (process) noise,

st=∑i=0Mbktut−i.E4

Similar assumptions about the processes or functions, ait, bit, and ut, are usually adopted as for continuous-time random processes. Since the descriptions in continuous and discrete time are nearly identical, in the sequel of this section, the expressions are mostly provided assuming continuous-time models.

It is common to express the mth-order linear model of a continuous-time random process as the first-order vector linear differential equation. In particular, it is straightforward to show that linear model (3) can be rewritten in the matrix form as

ẋtẋt−1⋮ẋt−m+2ẋt−m+1⏟ẋt=−a1t10⋯0−a2t01⋱0⋮⋮⋱⋱⋮−am−1t00⋱1−amt00⋯0⏟Atxtxt−1⋮xt−m+2xt−m+1⏟xt+b0t⋮bMt⋮0⏟btut.E5

In the discrete time, the equivalent first-order vector difference equation is written as

xt=Atxt−1+btutE6

assuming that the discrete-time indices, t∈Z.

In the literature, model (5) is generalized as ([17], (1.89)),

ẋt=Atxt+Btu̇t.E7

Importantly, the derivatives in (7) may not be defined, if the underlying processes are not differentiable everywhere; in such a case, the differential eq. (7) must be formulated with differentials, dxt=xtdt, and, dut=utdt. Then, conditioned on matrix, At, a solution of the homogeneous equation is the matrix, Xts, such that ∂∂tXts=AtXts and Xss=I is an identity matrix. Denoting the initial value as x0=x0, the overall solution is obtained by the integration, i.e.,

xt=Xt0x0+∫0tXtsBsu̇sds.E8

Note that the mean process Ext=Xt0x0, and the mean of its derivative Eẋt=ddtExt=AtExt. These solutions can be further specialized by conditioning also on the matrix, Bt. Thus, if matrices, A and B, are time-invariant and known, the solution, Xts=expAt−s, can be substituted into Eq. (8).

Given constant matrices, A and B, the SDE (7) defines different random processes for specific cases of the input process, ut. In particular, provided that ut is a white Gaussian noise, the random process (8) represents a multi-dimensional Brownian motion, i.e., a stationary Gaussian process with independent increments. The Ornstein-Uhlenbeck random process is defined for ut being a Wiener process, which is a continuous Gaussian process with independent increments.

There is also an integral representation of stationary random processes in the frequency domain. It is known as the Cramér spectral representation, and it is written as the inverse Fourier transform. Thus, a random process, xt, can be generated from the random increments, dZω, as [7],

xt=12π∫−∞∞ejωtXωdω=12π∫−∞∞ejωtdZω.E9

The random increment process, dZω, has zero mean, and its ACF function defined in the frequency domain is also the spectral density, i.e., EdZωdZν=Sωδω−νdωdν.

2.2 Nonlinear SDE models

An effective way to obtain nonlinear models of random processes is to consider a nonlinear transformation, gxt, of the linear random process, xt. Hence, define a random observation process

yt=gxt+vtE10

where vt represents a measurement noise, which is usually assumed to be zero-mean, stationary, and white.

A common strategy for processing nonlinear signals is to linearize the model about a suitable value, x0t, at time t. This strategy is neither optimum nor the most effective, but it is often used in practice. In particular, the first-order Taylor approximation of the process model (10) is written as

yt=g(x0t)+dgxtdxx=x0txt−x0t+vt.E11

The value of x0t affects whether the linearization error could be neglected with respect to the strength of the measurement noise, vt, so it must be updated at every time instant, t. The practical choice is to set x0t to be the predicted value of xt using the past estimated values, x̂t−1, x̂t−2, …, or simply letting, x0t=Ext, if the variance, varxt, is small.

A nonlinear form of SDE model (7) can be written as ([17], eq. (4.1)),

ẋt=axt+Bxtu̇tE12

where the vector function, axt=a1xta2xt…T, and the elements of the matrix, Bxt, are the multivariate processes, Bxtij=bijxt. The corresponding stochastic integral representation is

xt=x0+∫0taxssds+∫0tBxssu̇sds.E13

In general, integrating a random process can be performed assuming Itô, Stratonovich, backward, and other methods including a path integral ([17], Section 4.1). These methods lead to specific numerical algorithms and have different levels of convergence and stability. The choice of the appropriate method is mainly dependent on the ACF of the random process. In this chapter, the SDE models of random processes are presented assuming Itô integration (calculus).

More importantly, it can be shown that the solutions of the integral equation (13) are Markovian processes. It is then not surprising that many examples of Markovian processes can be found in the literature as well as in practical applications. Moreover, if the values of a Markov process are chosen from a discrete, countably finite, or infinite set, such a process is referred to as Markov chain. Some examples of the Markov chain processes include a one-dimensional random walk, a queuing system, inventory modeling, branch processes, and games involving success runs [10].

A scalar Markovian process solving (13) is referred to as a diffusion process, provided that

limΔt→01ΔtExt+Δt−xtxt=axtlimΔt→01ΔtExt+Δt−xt2xt=b2xtE14

are continuous and deterministic processes, which are referred to as drift and diffusion coefficient, respectively. The diffusion process can be defined similarly in higher dimensions by assuming a drift vector and a diffusion matrix. Moreover, a diffusion process becomes a classical Brownian motion when it has zero drift and a constant diffusion.

2.3 Filtering random processes

Consider the random process defined by Eq. (8). Denote x¯t=Xt0x0,hts=Xts, and Bsu̇s=ys−y¯s, where the means x¯t=Ext and y¯t=Eyt. With these substitutions, the random process (8) can be rewritten as

xt=x¯t+∫0thtsys−y¯sds.E15

Eq. (15) now represents a random process generated from another random process, yt. More importantly, provided that (cf. (10)),

yt=ax˜t+vtE16

the random process (15) is a linear estimate of the process, x˜t, in an additive measurement noise, vt, from the observations, yt, using a linear and generally non-stationary filter with the impulse response, hts. For example, the minimum mean square error (MMSE) filter must satisfy the integral equation,

∫0thtrCyrsdr=Cx,yts,∀s∈0tE17

where Cyts is the covariance matrix of observations, yt, and Cx,yts is the matrix of cross-covariances between processes, xt and yt. Analogical expressions can be also defined in the discrete time.

In the sequel of this subsection, a general model of random processes is presented that is used to obtain a discrete-time Kalman filter. The actual Kalman filter is derived, for example, in [11]. The Kalman filter is known to be the best linear unbiased estimator (BLUE) of a linear random process under these conditions:

the observations are linear;
the measurement noise is additive, even though possibly non-stationary;
the random process to be estimated can be also non-stationary, but it is generated by linear filtering of a white process (driving) noise;
the process noise, i.e., also the estimated random process and the measurement noise are uncorrelated.

Let assume the model of measurements in the form,

yt=Dtxt+dt+vtvt=Avtvt−1+bvt+qvtE18

where Dt, dt, Avt, and bvt are known (deterministic) functions. The measurement noise, vt, has zero mean, the covariance matrix, Cvt, and it is uncorrelated with the estimated random process, xt, i.e., qvt is a zero-mean white noise, which is uncorrelated with the estimated process noise, qt. The estimated process noise is modeled after (6), i.e.,

xt=Atxt−1+bt+qtE19

where At and bt are known (deterministic) functions, and qt is a zero-mean, white noise having the covariance matrix, Cqt, and it is uncorrelated with the measurement noise, vt.

The correlations make the measurement noise to be predictable, so it can be added to the estimated process. The extended vector of parameters becomes

xtvt=At00Avtxt−1vt−1+btbvt+qtqvtE20

and the corresponding measurements

yt=DtIxtvt+dt.E21

The random processes (20) and (21) are then used to obtain the Kalman filter.

2.4 Discretization of continuous-time models

Digital signal processing requires representing the continuous-time random processes as discrete sequences of random samples. In addition, it is desirable that these samples are statistically uncorrelated. The procedure to obtain the uncorrelated samples is known as Karhunen-Loéve expansion in the literature. In particular, given a finite set of K orthonormal continuous real-valued functions, ϕkt, over a finite time interval, T, such that

∫Tϕktϕltdt=1,k=l0,k≠lE22

the task is to decompose the random process, xt, into the sum

xt=∑k=1KXkϕktE23

such that

EXkXl=λk,k=l0,k≠l..E24

Consequently, the mean process and the ACF, respectively, are defined as

x¯t=Ext=∑k=1KEXkϕktCxts=Ext−x¯txs−x¯s=∑k=1Kλkϕktϕks.E25

In addition to being orthonormal, the basis functions, ϕkt, must satisfy the Fredholm integral equation,

∫TCxtsϕksds=λkϕkt,t∈T.E26

The Fredholm integral equations are often encountered in designing statistical signal processing methods. They are difficult to solve analytically, unless the underlying random process is stationary. In practice, it is sufficient to show that the adopted set of orthonormal functions, ϕkt, satisfies condition (26) rather than finding the eigenvalues and the eigenfunctions of the ACF.

It should be noted that ordinary Nyquist sampling of random signals may not yield samples that are uncorrelated. In such a case, Karhunen-Loéve expansion can be used to obtain the uncorrelated samples. In such a case, the set of orthonormal functions, ϕkt, must be chosen, so that the discrete-time ACF can be computed as

Cxts=∑k=1KλkϕktϕksE27

for all the indices, t and s, from a finite subset of integers. Unlike in continuous time, the values, λk, can be computed from the variance matrix, varX=EX−X¯X−X¯T, where X=X1…XKT is a column vector of samples, and X¯=EX. It can be shown that λk are the roots of the characteristic polynomial defined as the determinant,

detvarX−λkI=0∀k.E28

The vectors, ϕkt, are then computed, so they lie in the null space of the matrix, varX−λkI.

Furthermore, there are situations, for example, in describing the critical dynamics of a system, when the model of a stochastic process must involve both the deterministic and random effects. The random variables represent local, microscopic random fluctuations, whereas the deterministic variables model the collective, macroscopic, slowly varying effects. The corresponding SDEs are so-called Langevin equations. They are generally difficult to solve; however, the underlying probability density evolution can be often obtained by solving the corresponding Fokker-Planck equation (FPE) as will be shown in the next section.

A general Langevin equation in one dimension has the SDE form (12), i.e.,

ẋt=axt+bxtLtE29

where Lt represents the Langevin random force, which is assumed to be a zero-mean white Gaussian noise. If the coefficient, bxt, does not depend on x, then the noise term in (29) is assumed to be additive; otherwise, the noise is multiplicative. The multiplicative noise allows describing more complex behaviors including non-equilibrium dynamics.

In d-dimensions, the set of general Langevin equations is defined as

ẋit=aixt+BijxtLjt,i,j=1,2,…,dE30

where Ljt are the zero-mean Langevin random processes having the cross-covariances, ELitLjs=2δijδt−s.

For example, a one-dimensional zero-mean Gaussian random process with the ACF, Extxs=aexp−at−s, is described by the Langevin SDE,

ẋt=−axt+aLtE31

where Lt is a zero-mean white Gaussian noise. This random process is normally referred to as Ornstein-Uhlenbeck process. Note also that, in the limit of very large a, the process, xt, becomes white.

3. Evolution of probability density

By definition, at any given time instant, the value of a random process is a random variable, which is described by a particular probability density function (PDF). It is, therefore, of interest to investigate how such a PDF evolves in time along the realizations of a random process.

Consider a general nonlinear random process described by model (12). As stated previously, the solutions of this SDE are always Markov processes defined by their transition probability density, pxttxss, where, for notational convenience, xt≡xt, and, xs≡xs. The evolution of the probability density of the random variable, xt, is described by the forward Kolmogorov equation, which is better known as the FPE. It assumes that t>s, and the initial condition pxs=δx. On the other hand, the evolution from the final (terminal) state, pxs=usx, is described by a similar partial differential equation, i.e., the backward Kolmogorov equation, assuming t≤s.

The FPE can be effectively formulated using the Fokker-Planck operator. Hence, assuming the PDF, pxt, define the operator

Lxp=∑i∂∂xi∑j∂∂xjBijxtp−ai(xt)pE32

where the subscripts, i and j, denote the components of vectors or the elements of the process matrix, B. The FPE can be then written as

∂pxttxss∂t=Lxtpxttxss,t>s.E33

Note that, in the limit, limt→spxttxss=δxt−xs.

Omitting the dependence on the initial state, xs, at time t=s, the FPE of a d-dimensional diffusion process defined by the SDE model (12) is written as

∂pxt∂t=−∑i=1d∂∂xiaixtp(xt)+∑i,j=1d∂2∂xi∂xjDijxtp(xt)E34

where axt is the drift vector, and the diffusion coefficients are computed as Dij=12∑k=1dBikxtBjkxt.

The one-dimensional Ornstein-Uhlenbeck process, i.e., ẋt=−axt+bu̇t, (cf. (31)) is described by the FPE,

∂pxt∂t=a∂xpxt∂x+b22∂2pxt∂x2.E35

A Brownian motion process in one dimension is a special case of the diffusion having zero drift and the diffusion coefficient equal to 1/2. In such a case, ẋt=u̇t, where ut is a Wiener process, and the corresponding FPE is

∂pxt∂t=12∂2pxt∂x2.E36

Assuming the initial condition, px0=δx, the SDE (36) has the solution, pxt=2πt−1/2exp−x2/2t, i.e., it is a non-stationary Gaussian process.

In general, the FPE can be solved for special cases, such as when the coefficients of the SDE are time-independent. The stationary solution is obtained by assuming that ẋt=0, for some tss<t. The solution of a non-stationary FPE can be obtained by separation or transformation of variables, variational approach, eigenvalue expansion, numerical integration, and other methods ([12], Ch. 3).

Furthermore, the dynamics of many physical and chemical systems are often described by so-called a master equation. The master equation describes the distribution of the discrete states over time. For these systems, it is desirable to model how likely the system is at any given state at different time instances. The switching between the states occurs randomly at random time instances with a certain rate. This can be described by a continuous-time Markov chain.

Specifically, the master equation defines the change of the probability distribution, Pt, of the system state at time, t. It is the first-order matrix SDE,

Ṗt=AtPtE37

where At denotes the transition matrix. Its elements, Atij, i≠j, are non-negative, whereas the diagonal elements are set, so the rows of A sum to zero.

The matrix, A, is the Laplacian of a directed weighted graph representing the state transitions of a Markov chain with the weights being equal to the rates of the state transitions. In case the elements, Atij, of the transition matrix reflect the probabilities of a homogeneous Markov chain to jump from the state, i, to the state, j, at time instant, t, then by Chapman-Kolmogorov equation,

At+s=AtAs.E38

In discrete time, it follows that Pt=Pt1, so the master equation can be expressed as

ddtPit=∑j≠iAijtPjt−AjitPit.E39

Importantly, this indicates that the master equation does not depend on the diagonal elements of the transition matrix, A. At equilibrium, the detailed balance requires that

AijPj=AjiPi∀i,j.E40

The corresponding probability distribution of states, P, is referred to as the equilibrium distribution. The equilibrium describes the time-reversibility of microscopic dynamics, which is different from a steady-state; the latter occurs, when, ddtPit=0, for t>tss, and ∀i.

The master equation can be used to model a one-dimensional birth-death process. In particular, if the population size, xt, is of interest, it can increase by one to xt+s=xt+1, due to a birth event, or decrease by one to xt+s=x−1, when a death occurred, while xt≥0, for ∀t. Assuming that the transitions are Markovian, the time-dependent transition probabilities are defined as

PijΔt=Probxt+Δt=jxt=i.E41

For small Δt>0, and the current population size, i≥0, it can be assumed that at most one birth or death has occurred, so that,

Pi,i+1=αiΔt,i≥0Pi,i−1=βiΔt,i≥1Pi,i=1−α+βiΔt,i≥1.E42

The coefficients, α and β, in (42) represent the probabilities that the corresponding events occur per unit of time. Denoting the probability distribution of the population sizes at time t as Pit, the resulting master equation is written as

ddtPit=αi−1Pi−1t+βi+1Pi+1t−α+βiPit,i≥1ddtP0t=βP1t.E43

The stationary solution of (43) is obtained assuming ddtPit=0, i≥0, i.e., when P0=1, and, Pi=0, i≥1; in such a case, the population eventually goes extinct. On the other hand, if P0<1, the population can either become extinct with a certain probability, or it grows without any bounds. More precisely, the derived steady-state probabilities are computed as

Pi=P0∏j=1iαj−1βj,i≥1P0=1+∑i=1∞∏j=1iαj−1βj−1.E44

4. Common random processes

In simulations and modeling of various systems, some random processes are encountered more often than others. The random processes that are preferred in practical applications are those that can be succinctly described, have well-defined properties, and can be readily generated or fitted to measured data. In this section, the most common random processes are outlined starting from the random processes that are defined in discrete time, which are often referred to as random sequences, before introducing the random processes defined in continuous time.

The random sequence, x0x1…, is a martingale, if its increments are independent and have zero mean, i.e., ([18], Ch. 6)

Extx0x1…xt−1=Ext−xt−1x0…xt−1=0.E45

More generally, a sequence, ytt, is a martingale with respect to a sequence, xtt, provided that

Eytx0…xt−1=yt−1.E46

For example, if the samples, xt, are drawn independently from the same probability distribution, pxθ, the likelihood ratio of the estimated parameter, θ̂, from T observed samples, is

ΛT=∏t=1Tpxtθ̂pxtθ.E47

The mean value of the likelihood ratio conditioned on the observations is

EΛTx1…xT−1=ΛT−1E48

so the sequence of likelihoods is the martingale.

In continuous time, the martingale process is defined as

Ext2xtt≤t1<t2=xt1E49

or conditioned on another process as

Eyt2xtt≤t1<t2=yt1.E50

The Wiener process is an example of the martingale in continuous time.

The conditional probability of samples in a Markov sequence satisfies

Pxtx1…xt−1=Pxtxt−1.E51

The higher-order Markov sequences can be also defined; however, they are assumed rarely in practical problems. The Markov sequences attaining a finite set of values are often referred to as Markov chains, and their values are referred to as states.

The Markov chain is irreducible, if any state can be reached from any other state in a finite number of steps, i.e.,

∀xt1,∃t2>t1:Pxt2xt1>0.E52

Assuming Chapman-Kolmogorov equation, the transition probability over T steps is computed recursively from transition probabilities over T−1 steps as

PxT=jx1=i=∑kPxT−1=kx1=iPxT=jxT−1=k.E53

If the chain is homogeneous, and the one-step transition probabilities are arranged in a matrix, P, then the transition probabilities after k steps are computed as Pk.

The properties of the Markov process in continuous time are identical to the Markov chain, provided that the number of states is countably finite or infinite; such a process is referred to as a continuous-time Markov chain. For example, if the process, xt, can be constructed by independent increments, then it is the Markov process.

Bernoulli sequence is formed by binary random variables having the fixed probability of outcomes. The number of successes among a given number of trials is binomially distributed, whereas the number of trials required to achieve a given number of successes has a negative binomial distribution. Bernoulli sequence can be generalized to more than two outcomes; it is then called Bernoulli scheme sequence. These sequences can be used to define other random sequences. Consider, for example, the sum

xT=∑t=1TutE54

where ui∈−1+1, which defines a Bernoulli random walk. It has the mean, ExT=1−2pT, which can be both increasing or decreasing over time, depending on p=ProbxT=+1. The variance, varxT=4p1−pT, is always increasing over time. The ACF can be derived as ExT1xT2=4p1−pminT1T2+1−2p2T1T2. It should be noted that the random walk can be also created assuming a birth-death process discussed above.

The case, ui∈01, in (54) defines a binomial counting sequence with the mean value, ExT=pT, and the variance, varxT=p1−pT. The ACF is now obtained as ExT1xT2=p1−pminT1T2+p2T1T2.

The binomial counting process can be generalized as a continuous-time Poisson counting process having the independent and stationary increments, and the initial value x0=0. In the literature, these processes are sometimes referred to as renewal processes [10]. The Poisson counting process has the probability of k≥0 successes occurring over a time duration, T>0, approximated by the Poisson distribution, i.e.,

ProbxT=k≈λTkk!exp−λTE55

where λ denotes the average rate of successes per unit time. If λ is time-dependent, such a process is called non-homogeneous. The inter-arrival time between two consecutive successes is a random variable, which can be shown to be exponentially distributed with the same parameter, λ. The inter-arrival time can be used to obtain the distribution of times until the occurrence of the k-th success; this is an Erlang random variable. The ACF of the Poisson process can be shown to be ExT1xT2=λminT1T2+λ2T1T2.

Poisson process is an example of a general class of Lévy random processes. Another example of the Lévy random process is a gamma process having the inter-arrival times being gamma distributed. Poisson process can be generalized as Hawkes random process. It is a marked point process, which models the random intensity changes in the arrival times as a self-excitation phenomenon ([2], Ch. 11).

The random telegraph signal, yt, can be obtained as a transformation of the Poisson process, xt, i.e.,

yt=−1xt=+1,xt−even−1,xt−odd.E56

Thus, yt is a continuous-time Markov jump process having only two states. The probability of states can be derived to be

Probyt=+1=exp−λtcoshλtProbyt=−1=exp−λtsinhλt.E57

Note that the rate constant, λ, could be assumed to be different in the transitions to +1 and −1 value, respectively. The probability distribution of yt is

pyt=exp−λtcoshλtδy−1+sinhλtδy+1E58

with the mean, Eyt=exp−2λt, and the variance, varyt=1−exp−4λt. Moreover, the telegraph signal can be randomized by assuming the product, Uyt, where U are independent and equally likely values, +1 or −1, generated whenever the process, yt, changes its value.

The Wiener process is best known as a model of Brownian motion of particles in a fluid. It is a solution of the second-order partial differential equation

∂∂tpxt=α∂2∂x2pxt,x0=0E59

where α denotes the rate of particle collisions. Solving (59), pxt is the Gaussian distribution with zero mean and the variance equal to αt, i.e., this process is non-stationary. The ACF is derived as Ext1xt2=αmint1t2. More generally, the Wiener process can be also defined assuming the time increments rather than the absolute times.

A multivariate Gaussian process is probably the most frequently assumed random process. Since these processes have been extensively described and studied in the literature, they are mentioned only very briefly here. The Gaussian process is fully defined by the vector of means and the covariance matrix; the covariance matrix also defines the spatial ACF of this process. The Gaussian distribution has the largest entropy (i.e., it is the worst-case noise) among all continuous distributions. Its widespread occurrence in many practical scenarios and systems is justified by the central limit theorem. It is also the only distribution, which is preserved by linear transformations, and for which uncorrelated variables are equivalent to independent variables. Gaussian processes play an important role in estimating the parameter values. The Gaussian mixture models add another level of flexibility in modeling random processes and in defining the prior and posterior distributions in Bayesian as well as non-Bayesian estimation of parameter values.

More complicated random processes can be built from other random processes. For example, Archimedean survival process is a vector of differences between gamma bridge processes, which are themselves created by scaling the gamma random processes by other random variables ([2], Ch. 10).

5. Numerical methods for solving SDEs

The random processes considered in this chapter are defined by the SDEs. Since many practical SDEs cannot be solved analytically, they must be solved numerically by computer simulations. The caveat is that the existing algorithms for solving ordinary differential equations (ODEs) may not be directly applicable or efficient for solving SDEs. The notable exception is SDEs with the driving noise that is generated independently of the random process values; for these processes, the algorithms developed for ODEs can be directly used for SDEs.

In general, the algorithms for solving SDEs should provide fast convergence, good accuracy, and also stability. The accuracy is affected by discretization, whereas the lack of stability may cause a sudden divergence of computations. Therefore, numerical solvers for SDEs require formal mathematical frameworks and derivations in order to satisfy these requirements.

As an example, consider the integral form of a one-dimensional time-homogeneous SDE

xt=xt0+∫t0taxsds+∫t0tbxsdus.E60

Using Itô’s formula, the Euler method approximates integral equation (60) as

xt≈xt0+axt0t−t0+bxt0ut−ut0.E61

The more accurate higher-order approximation can be obtained in the form

xt≈xt0+axt0t−t0+bxt0∫t0tdus+L1bxt0∫t0t∫t0sdus′dusE62

where the function operator, L1, is introduced to denote L1ft=bddtft.

Assuming the time discretization to evaluate the process, xt, at N discrete-time instances, t=iΔt, i=0,1,…,N, so that, NΔt=T, and denoting the Wiener process increments as Δui=ui+1Δt−uiΔt, the Euler algorithm computes one of the following iterations depending on the assumed approximation, i.e.,

xi+1=xi+axiΔt+bxiΔuixi+1=xi+axiΔt+bxiΔui+12L0bxiΔui2−ΔtE63

where the function operator, L0, is defined as L0ft=addtft+12b2d2dt2ft. A better stability can be obtained by assuming axi+1 and bxi+1, or, axi+axi+1/2 and bxi+bxi+1/2, in (63). However, these latter values do not guarantee the convergence to the correct continuous-time solution even when the time step, Δt, goes to zero. The converge problem can be rectified by using ax¯i and bx¯i in (63), where the filtered sequence, x¯i=ϵxi+1+1−ϵxi. For ϵ=1/2, the algorithm to generate the samples, xi, of the random process must periodically find the root, x¯i, of the equation

2x¯i−xi=ax¯iΔt+bx¯iΔui.E64

The root mean square error as a measure of the algorithm convergence between the approximately computed values, xi, and the exact values, xiΔt, is of the order between Δt and Δt. Thus, the approximation accuracy improves by reducing the discretization step size. There are other algorithms, for example, Milstein algorithm, that were proposed in the literature to improve the accuracy.

There is a notion of communicative and non-communicative noises, depending whether the driving noise (usually a Wiener process with independent Gaussian increments) is additive or multiplicative, i.e., whether the scaling coefficient of the noise is independent of the random process values or not. This is important, for example, when deriving the FPE for a Langevin equation. If the noise is additive, then Itô and Stratonovich interpretation of Langevin equation leads to the same FPE; this is not the case when the noise is multiplicative.

Furthermore, similar discretization methods can be used to generate samples of the vector random processes as explained in ([5], Section 10.5). Subsequently, the vector counterparts of the Euler and Milstein algorithms exist.

In addition to low-order approximations by the Euler and Milstein algorithms, other algorithms relying on higher-order approximations are available. However, the increased accuracy of these algorithms usually comes at the cost of increased computational complexity and possible issues with convergence. In general, a good strategy is to exploit the structure of the specific SDE in order to avoid many of the computational problems. For example, the noise can be quantized to a small number of levels or even be generated as a binary sequence of ±1 values.

The substantial computational problems arise in simulating spatially resolved partial SDEs, since they have to be discretized not only in time, but also in 2D or even 3D space. This may prevent considering any higher-order approximations and the corresponding algorithms. In the literature, the algorithms for spatial SDEs assume, for example, the fast Fourier transforms, and there is also the interaction picture method inspired by Hamiltonians in quantum mechanics.

5.1 Example

Consider the one-dimensional SDE,

dxt=μxtdt+σxtdwt,x0=x0E65

with the constant drift, μ, and the constant diffusion, σ, driven by the Wiener process, wt. In this case, the solution can be obtained in the closed form, i.e.,

xt=x0eμ−σ2/2t+σwtE66

which can be verified by substituting into (65) and using the Itô calculus.

The exact solution (66) can be approximated by computing the approximate values, x˜ti≡x˜i, of xti, at chosen discrete-time instances, ti. Specifically, using the Euler algorithm

x˜i=x˜i−1+μx˜i−1Δti+σx˜i−1Δwi,x˜0=x0.E67

The samples of the Wiener process increments are generated, Δwi=N01Δti, where the time increments, Δti=ti−ti−1, and N01 is the zero-mean, unit-variance Gaussian random variable.

The Milstein algorithm approximates the SDE (65) as

x˜i=x˜i−1+μx˜i−1Δti+σx˜i−1Δwi+σ2Δwi2−Δti,x˜0=x0E68

so it introduces the extra term compared to the approximation (66).

It can be shown that both Euler and Milstein algorithms converge in the mean to the exact solution as the step size, Δt, is reduced, i.e.,

limΔt→0Ext−x˜t=OΔtn∀t>0E69

where n represents the order of converge. For L random realizations of the values, x˜t, the expectation can be evaluated as

Ext−x˜t≈Ext−1L∑l=1Lx˜lt.E70

The order, n, is 1/2, for the Euler algorithm, and 1, for the Milstein algorithm, respectively. Consequently, in order to reduce the approximation error to one half, the step size, Δt, must be reduced to 1/4, for the Euler algorithm, but only to 1/2, for the Milstein algorithm.

The approximations (67) and (68) of the exact solution (66) are numerically validated in Figure 1 (top) assuming the initial value, x0=10.0, the parameters, μ=1.0, and, σ=0.5, and N=50 time steps in the interval 01. The relative approximation error is expressed as

Figure 1.
Sample realization of the random process (65) and its Euler and Milstein approximations (top), and the absolute mean errors at ti=tN=1.0 as a function of the time steps, N (bottom).

ϵti=100%×xti−x˜ti/xti.E71

In addition, Figure 1 (bottom) shows the absolute mean error, ∣xti−x˜ti∣, as a function of the number of time steps, N=2k, k=2,3,…,10, in the interval, 01, for ti=tN=1.0. These curves indicate that the absolute mean error decreases as 1/N1/2, for the Euler algorithm, and as 1/N, for the Milstein algorithm.

6. Generating random processes

The previous section discussed how to simulate random processes that are defined by a given SDE. This section considers the problem of generating random processes of given properties, i.e., the problem how to find the corresponding SDE model that can be used to generate samples of the random process.

In general, in most practical scenarios, the random process is typically described by the probability distribution of its samples, which also fully defines all statistical moments at any given time instant, and by the second-order pairwise moments between the samples at any two time instances; these moments are referred to as the ACF or auto-covariance function.

More importantly, the generated random processes are ergodic and stationary, so that the statistical mean is equal to the average value in time, i.e.,

Ext=Avxt=AvExt=EAvxt∀t.E72

The non-ergodic processes are rarely encountered in practical applications. The non-stationary process, xt, can be generated from a stationary process, ut, using the transformation involving deterministic functions, at and bt, as

xt=atut+bt.E73

Note that transformation (73) is nonlinear, unless bt=0, for ∀t. Moreover, the robust signal processing can be achieved by assuming the values, AvExt or EAvxt, instead of Avxt or Ext.

The simplest, but very often considered random process is a correlated (colored) Gaussian noise. In continuous time, the uncorrelated (white) Gaussian process, ut, is correlated by a linear, not necessarily stationary filter with the impulse response, hts, i.e.,

xt=∫−∞∞htsusds.E74

The mean and the ACF of the process, xt, can be computed as

x¯=u¯∫−∞∞htsdsExtxs=∫−∞∞htrhsrdrE75

where u¯=Eut, and Eutus=δt−s.

Assuming a vector of mutually uncorrelated Gaussian processes, ut=u1tu2t…T, where each uit can be individually correlated in time, can be transformed into a vector of mutually correlated processes, xt, by a linear transformation

xt=Aut.E76

Then, the correlation matrix ExtxTt=AEutuTtAT. If the number of processes, N, is large, the elements of A can be efficiently calculated as

A1,1=C1,1,Ak,1=Ck,1A1,1,k=2,…,NAj,j=Cj,j−∑i=1j−1Aj,j2,Ak,j=Ck,j−∑i=1j−1Ak,iAj,iAj,j,k=j+1,…,NAN,N=CN,N−∑i=1N−1AN,i2E77

assuming, for simplicity, that all uit have zero mean, and Ci,j=Exitxjt denote the desired pairwise correlations. This procedure can be also used for a discrete-time signal represented by a column vector, u. For very large, N≫1, the auto-regressive moving average (ARMA) model in continuous [cf. (1)] or discrete time should be used. The initial values of the generated random process can be discarded until the process becomes stationary. Alternatively, a non-stationary ARMA model can be used initially before switching to model (77) in order to speed up the convergence to stationary samples.

The above methods involving linear filtering are well justified when the samples are Gaussian. There are, however, many situations when the random process is specified by a general probability distribution and its ACF. This is not a complete statistical description, so different generation methods can yield different higher-order moments. The most straightforward, however, in practice rather complicated method is to generate uncorrelated samples with a certain distribution, so that linear filtering of these samples yields both the desired distribution as well as the ACF.

A much more tractable method first generates Gaussian samples with given correlations and then uses a memoryless nonlinear transformation to obtain the desired distribution and the ACF. In particular, let the zero-mean Gaussian noise, ut, have the ACF, Cut1−t2=Eut1ut2, and the pairwise PDF, puu1u2t1−t2. The random process, xt=gut, has the one-dimensional and two-dimensional PDF, respectively, [14].

pxx=pug−1xġg−1xpxx1x2=pug−1u1g−1u2dg−1u1du1dg−1u2du2.E78

The corresponding mean and the ACF, respectively, can be computed as

x¯=∫−∞∞gupuuduCxt1−t2=∫−∞∞∫−∞∞gu1gu2puu1u2t1t2du1du2.E79

The generation procedure then proceeds in these steps:

find a nonlinearity, g⋅, so that for the target output distribution, the input distribution is Gaussian;
given g⋅, find the correlations (covariances), Cut1−t2, so that the output process, xt, has the desired ACF, Cxt1−t2;
given Cut1−t2, obtain a linear filter to generate such an ACF from the uncorrelated Gaussian samples.

This procedure is general and can be used to generate different processes with given distribution and ACF. However, in some cases, when the nature of a random process has been motivated by a specific physical or other phenomenon, it can directly indicate how to generate the underlying process. For example, a continuous-time Poisson process can be generated by a random sequence of Dirac delta pulses with exponentially distributed inter-arrival times, which are then filtered with an integrator. It is, therefore, advisable, to check the relevant literature whether this might be the case for a given random process.

6.1 Example

Consider the problem of generating a zero-mean stationary binary random sequence, xt, of ±1 values with a given ACF. Such a process can be generated, for example, from a discrete-time Gaussian process, yt, using the nonlinearity,

xt=signyt=+1,yt≥0−1,yt<0.E80

The ACF, Cxτ, of the output process, xt, can be expressed in terms of the ACF, Cyτ, of the input process, yt, as

Cxτ=2πarcsinCyτCy0.E81

For example, assume that Cx0=1, Cx−1=Cx+1=−1/3, and Cxτ=0, otherwise. In such a case, the sequence, yt, has a zero mean and the unit variance, and by inverting the ACF (81), the required ACF is Cy0=1, Cy−1=Cy+1=−1/2, and Cyτ=0, otherwise. The Gaussian process with this ACF can be obtained by filtering the uncorrelated Gaussian samples, ut, with a discrete-time linear filter having the impulse response, h0=1/2, h1=−1/2, and ht=0, otherwise. Therefore, the process, xt, with the desired ACF is generated from the uncorrelated Gaussian samples as

xt=signut−ut−12.E82

7. Conclusions

This chapter discussed random processes that are usually modeled as SDEs. The resulting SDEs can be linear or nonlinear and defined in continuous time or discrete time. The higher-order differential and difference linear equations can be equivalently represented by the first-order vector equations. Brownian motion and Ornstein-Uhlenbeck random processes are the special cases of a linear SDE. There is also Cramér integral representation of random processes in frequency domain. The nonlinear models can be obtained by introducing a nonlinear measurement equation with added measurement noise.

Filtering and estimating random processes often give rise to random processes defined by integral equations. Langevin equation allows incorporating both deterministic and random effects. In practical applications, it is usually easier to solve the corresponding Fokker-Planck equation and obtain the time-evolution of the state distribution. In modeling physical and chemical system dynamics, the FPE is referred to as master equation.

Acknowledgments

This research was funded by a research grant from Zhejiang University.

Abbreviations

ACF	auto-correlation function
ARMA	auto-regressive moving average
BLUE	best linear unbiased estimator
FPE	Fokker-Planck equation
MMSE	minimum mean square error
ODE	ordinary differential equation
PDF	probability density function
SDE	stochastic differential equation

References

1. Amir A. Thinking Probabilistically, Stochastic Processes, Disordered Systems, and Their Applications. Cambridge, UK: Cambridge University Press; 2021
2. Bielecki TR, Jakubowski J, Niewȩgłowski M. Structured Dependence between Stochastic Processes. Cambridge, UK: Cambridge University Press; 2020
3. Leon-Garcia A. Probability and Random Processes for Electrical Engineering. 2nd ed. New York, USA: Addison-Wesley; 1994
4. Gardner WA. Introduction to Random Processes With Applications. 2nd ed. New York, USA: McGraw-Hill; 1990
5. Gardiner CW. Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences. 3rd ed. New York, NY: Springer; 2004
6. Grimmett GR, Stirzaker DR. Probability and Random Processes. 3rd ed. Oxford, UK: Oxford University Press; 2001
7. Haykin S. Adaptive Filter Theory. 5th ed. Harlow, UK: Pearson Education Limited; 2014
8. Jazwinski AH. Stochastic Processes and Filtering Theory. New York, USA: Academic Press; 1970
9. van Kampen NG. Stochastic Processes in Physics and Chemistry. 3rd ed. Amsterdam, Netherlands: Elsevier; 2007
10. Karlin S, Taylor HM. A First Course Instochastic Processes. 2nd ed. New York, USA: Academic Press Inc.; 1975
11. Kay SM. Fundamentals of Statistical Signal Processing: Estimation Theory. Vol. I. Upper Saddle River, New Jersey: Prentice Hall; 1993
12. Kwok SF. Langevin and Fokker-Planck Equations and their Generalizations. Singapore: World Scientific Publishing Co. Pte. Ltd.; 2018
13. Miller SL, Childers D. Probability and Random Processes With Applications to Signal Processing and Communications. 2nd ed. Waltham, MA, USA: Academic Press; 2012
14. Papoulis A, Pillai SU. Probability, Random Variables, and Stochastic Processes. 4th ed. New York, NY: McGraw-Hill; 2002
15. Pavliotis GA. Stochastic Processes and Applications. New York, USA: Springer Science+Business Media; 2014
16. Ross SM. Stochastic Processes. 2nd ed. New York, USA: Wiley; 1996
17. Schuss Z. Theory and Applications of Stochastic Processes, An Analytical Approach. New York, NY: Springer; 2010
18. Shynk JJ. Probability, Random Variables and Random Processes. Hoboken, New Jersey: John Wiley & Sons; 2013

[1] 1. Amir A. Thinking Probabilistically, Stochastic Processes, Disordered Systems, and Their Applications. Cambridge, UK: Cambridge University Press; 2021

[2] 2. Bielecki TR, Jakubowski J, Niewȩgłowski M. Structured Dependence between Stochastic Processes. Cambridge, UK: Cambridge University Press; 2020

[3] 3. Leon-Garcia A. Probability and Random Processes for Electrical Engineering. 2nd ed. New York, USA: Addison-Wesley; 1994

[4] 4. Gardner WA. Introduction to Random Processes With Applications. 2nd ed. New York, USA: McGraw-Hill; 1990

[5] 5. Gardiner CW. Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences. 3rd ed. New York, NY: Springer; 2004

[6] 6. Grimmett GR, Stirzaker DR. Probability and Random Processes. 3rd ed. Oxford, UK: Oxford University Press; 2001

[7] 7. Haykin S. Adaptive Filter Theory. 5th ed. Harlow, UK: Pearson Education Limited; 2014

[8] 8. Jazwinski AH. Stochastic Processes and Filtering Theory. New York, USA: Academic Press; 1970

[9] 9. van Kampen NG. Stochastic Processes in Physics and Chemistry. 3rd ed. Amsterdam, Netherlands: Elsevier; 2007

[10] 10. Karlin S, Taylor HM. A First Course Instochastic Processes. 2nd ed. New York, USA: Academic Press Inc.; 1975

[11] 11. Kay SM. Fundamentals of Statistical Signal Processing: Estimation Theory. Vol. I. Upper Saddle River, New Jersey: Prentice Hall; 1993

[12] 12. Kwok SF. Langevin and Fokker-Planck Equations and their Generalizations. Singapore: World Scientific Publishing Co. Pte. Ltd.; 2018

[13] 13. Miller SL, Childers D. Probability and Random Processes With Applications to Signal Processing and Communications. 2nd ed. Waltham, MA, USA: Academic Press; 2012

[14] 14. Papoulis A, Pillai SU. Probability, Random Variables, and Stochastic Processes. 4th ed. New York, NY: McGraw-Hill; 2002

[15] 15. Pavliotis GA. Stochastic Processes and Applications. New York, USA: Springer Science+Business Media; 2014

[16] 16. Ross SM. Stochastic Processes. 2nd ed. New York, USA: Wiley; 1996

[17] 17. Schuss Z. Theory and Applications of Stochastic Processes, An Analytical Approach. New York, NY: Springer; 2010

[18] 18. Shynk JJ. Probability, Random Variables and Random Processes. Hoboken, New Jersey: John Wiley & Sons; 2013