Open access peer-reviewed chapter

Handling Missing Data in the Time-Series Data from Wearables

Written By

Jay Darji, Nupur Biswas, Lawrence D. Jones and Shashaanka Ashili

Submitted: 21 May 2023 Reviewed: 02 August 2023 Published: 24 August 2023

DOI: 10.5772/intechopen.1002536

From the Edited Volume

Time Series Analysis - Recent Advances, New Perspectives and Applications

Jorge Rocha, Cláudia M. Viana and Sandra Oliveira

Chapter metrics overview

111 Chapter Downloads

View Full Metrics

Abstract

Wearable technology is being used for tracking continuous events in various sectors of our lives. Wearables contain different types of sensors which can acquire movement data, blood pressure, blood sugar, temperature, and other physiological parameters. These parameters are recorded in the form of seamless univariate or multivariate time-series data. Very often, however, the data contains missing datum which disrupts the continuity of the data making it difficult to analyze the data. The missing part of the data needs to be imputed to make the remaining available data applicable. Choosing the proper imputation method is crucial for fruitful analysis and extracting underlined features from the data. In this context, this chapter discusses sensors associated with wearable technology which generate the time-series data, missing data in the wearables’ time-series data, and various imputation methods being used for imputing the missing data.

Keywords

  • missing data
  • imputation
  • time-series data
  • wearables
  • digital health

1. Introduction

Wearable technology refers to the electronic devices which users can wear. Wearables are available in different manifestations like watches, eyeglasses, tattoos, strips, gloves, belts, footwears, clothes, and can be associated with different parts of the body like wrist, eyes, skin, chest, waist, and foot [1, 2]. These wearables are often called smart devices as they are remotely connected by different wireless communication technologies like the internet and Bluetooth. The data is also stored on cloud platforms and consequently shared. It helps to synchronize wearables with mobile phones and laptops. This chapter is focused on healthcare applications. The major purpose of wearing these smart devices is to track real-time physiological information from the body. The information gathered by wearables is further analyzed by different algorithms for extracting features relevant to the users’ health. Finally, it is shared with users, and users’ healthcare providers. This information is utilized for the purpose of self-management, monitoring health conditions, clinical decision-making, and preventive measures for different health conditions [3, 4].

The popularity of wearables is astonishing with a forecast of 20% annual growth rate for its market [5]. The major reason for the popularity of wearables is that it collects data in a non-invasive or a minimal invasive manner. For many devices, no intervention of clinicians is required. This advantage has turned wearable devices into smart consumer devices. Among the diverse wearable devices and gadgets, smartwatches are the most popular. Interested users can buy it from the market and use it for self-monitoring. The advancement of technology has turned it into a helpful supplementary in healthcare and disease monitoring for clinicians [6]. Smartwatches can primarily track heart rate, measure calories burnt, count steps, and monitor sleep. Moreover, the use of smartwatches has extended beyond self-monitoring. Smartwatches are being used for routine remote monitoring, diagnostics purposes, predicting disease risk factors and even for treatments. Smartwatch-derived data is being used for diagnosing cardiovascular diseases [7, 8], neurological disorders [9], fatty liver diseases [10], Parkinson’s disease [11], and metabolic disorders including diabetes [12].

Data collection has become feasible due to technological advancement in developing various types of sensors. Wearable devices contain inertial sensors, optical sensors, temperature sensors, barometric sensors, and others. This is a rapidly growing area of research as improving the sensitivity of sensors, incorporating multiple sensors in a limited space, energy efficiency of the sensors, connectivity of the sensors, and lighter weight materials are the associated challenges. Sensors are also being specially designed for specific purposes [13].

Due to the presence of different types of sensors and their continuous operations, wearable devices generate a huge amount of multivariate data often considered as “big data” which is personalized also [14, 15]. Data consists of values of different parameters recorded continuously maintaining a regular interval of time. The time-series data from smartwatches contain heart rate, step counts, elevation, sleep duration, and sleep quality. Wearables also provide blood pressure, blood oxygen level, electrocardiogram (ECG), and body temperature. The cardiac data from wearables are being used for diagnosing and monitoring several disease conditions like coronary artery diseases [16], atrial fibrillation [17, 18, 19], and atrial arrhythmias [20]. The gyroscopes used in the wearables provide time series data of rotation along three mutually perpendicular directions. This data can further be analyzed to identify different activities and body postures like walking, running, sitting, moving up or down along the staircase, and standing [21]. It also detects tremors, and bradykinesia in case of users having Parkinson’s disease [22].

However, very often due to several reasons time series data generated by wearables contain missing data. The reasons may include but not limited to that the device was not used for a certain interval of time, the device was used but data was not transmitted, or the recorded data was noisy and hence removed [23]. The presence of missing data or missing values of any recorded parameter downgrades the data quality. Wearable data are very often analyzed by artificial intelligence-based machine learning algorithms, for which continuous data is preferred [24]. Ignoring the timestamps of missing data is not a good choice as it is a complete loss of data. Also, multivariate data may have missing values only for a particular parameter. Imputing the missing data is a necessity for making the entire dataset useful. Hence, missing values are substituted with predicted values. Moreover, there are different aspects of missing data that need to be considered during imputation. Data may or may not be missing at random fashion. Data may be missing continuously or in a discrete manner [25].

Researchers have concentrated on developing effective ways of imputing missing values in time-series data. This chapter is focused on wearable technology and frequently used imputation methods in imputing missing values of time-series data in the context of wearables.

Advertisement

2. Wearable technology

Wearable technology comprising portable sensors is being used to collect various types of information in a continuous seamless fashion. It has appeared as a huge source of time-series data. Data is collected from biofluids like sweat, tears, saliva, and interstitial fluid.

Wearable devices are loaded with different types of sensors which include accelerometers, gyroscopes, magnetometers, heart rate sensors, oximetry sensors, barometric pressure sensors, ambient temperature sensors, skin conductance sensors, etc.

The most often used sensors are inertial sensors like accelerometers and gyroscopes. Such sensors can detect the motion of body parts like hands and limbs. The acceleration and angular velocity of the associated organ are measured in three mutually perpendicular directions using combinations of three accelerometers and gyroscopes. The acceleration is measured by measuring the displacement of a test mass attached to a suspension system [26]. This information has been analyzed to successfully detect different daily activities like walking, sitting, running, and moving up/down through steps [21]. It can even predict the risk of possible falls [27] and the user and even the caregivers can be alerted by remote monitoring. Such sensors are particularly useful for patients having gait-related disorders. Patients suffering from Parkinson’s disease are also benefitted [28]. Magnetometers are also associated with motion sensors. It measures the earth’s magnetic field to detect the direction of the motion.

Wrist-worn wearable devices, smartwatches, use optical sensors to measure heart rate by the photoplethysmography (PPG) method. They use green LED light which is absorbed by blood as red-colored blood absorbs green light most. The photosensors detect light reflected from blood and measure absorbance which is then quantified to the volume of the blood. The synchronous cardiac beats change blood volume within the vessel in each beat. Heart rate is extracted from the volume change measured from the changes in absorbance [29]. Apart from measuring heart rates, smartwatches like KickLL, are also designed to measure respiratory rates using the PPG method [30].

Smartwatches also measure peripheral blood oxygen saturation levels by measuring light absorbed by oxygenated and non-oxygenated hemoglobin in blood vessels. The oxygenated and non-oxygenated hemoglobin molecules absorb light of different wavelengths (940 nm and 660 nm, respectively) [31]. Smartwatches use red and infrared lights and have inbuilt sensors for measuring blood oxygen levels only when the user is not physically active. Blood with higher oxygen levels will absorb more red light compared to infrared light [32, 33].

Wearables can provide an estimate of body temperature. It is based on the epidermal measurement and hence does not provide the core body temperature of the user. However, long-term, continuous, and accurate measurement is possible [34]. Apart from monitoring regular health conditions, the recorded temperature plays an important role in detecting diseases like epilepsy [9] and COVID-19 [35].

Bioimpedance (BIA) is the response to the externally applied electric signal. Smartwatches are equipped with bioimpedance sensors. These sensors are used to measure body fat, composition, blood pressure, and body glucose levels [36]. A recent study, involving 75 participants reports that the measurements of smartwatch BIAs is comparable with laboratory measures for measuring body fat [37]. BIA sensors are also being used for blood pressure measurements [38]. The advantage of wearable sensors is that they can continuously measure blood pressure with a comfortable cuffless setup. Kireev et al. have designed self-adhesive grapheme-made electronic tattoos for measuring blood pressure [39]. BIA sensors measure pulse transit time and further machine learning based analysis estimates both diastolic and systolic blood pressures [40].

The wearables are often coupled with biosensors. A typical biosensor has two parts receptor and transducer. The receptor may consist of a specific enzyme, antibody, DNA or cell which identifies target analytes by bio-reaction. The transducer translates the bio-reaction into a measurable electrical signal. Depending on the bio-reaction, the transducer can be electrochemical, optical, thermal, or piezoelectric type [1].

Advertisement

3. Missing data

Time series data in biology is generated when information on a physiological parameter or a phenomenon is recorded with time. The presence of missing values can lead to the loss of information and statistical power [41], making typical data analysis techniques ineffective or challenging to use and can inject bias into estimates obtained from a statistical model [42, 43]. Missing data typically falls into one of the following three categories according to Little et al. [44]:

  • Missing completely at random (MCAR): In cases where missing data is said to be missing completely at random (MCAR), the missing values are independent of both the observed and unobserved data. This means that there is no systematic relationship between the existing data and the missing values [44, 45].

  • Missing at random (MAR): In situations where missing data is said to be missing at random (MAR), the missing values are systematically related to the observed data, but not the unobserved data. In other words, although they cannot be predicted, the missing values have some relationship with other variables in the dataset that can be used to explain why they are missing [44, 45].

  • Not missing at random (NMAR): Missing data is considered missing not at random (MNAR) when the missing values are systematically related to the unobserved data, which cannot be explained by other variables in the dataset. This means that there is a pattern to the missing data that is not accounted for by the available information [46].

High data quality criteria are essential for machine learning (ML) applications to provide reliable prediction performance and appropriate use of automated decision-making. Eliminating the missing data is one approach to solving this issue. However, if we just delete the data, we run the danger of losing important information. To impute the missing data would be a preferable course of action. In other words, we must extrapolate those missing numbers from the data that is already available. This chapter goes into further detail on the substantial work that the statistics community has done on the imputation of missing data. Techniques for handling missing data have been divided into three groups by Song et al. [47]:

  1. Delete missing data: With this approach, the row with the missing data is simply deleted. However, due to its ease of use, this strategy is popular. Due to the possibility of leaving out important details, it can only be utilized when the missing data are modest in number as compared to the available data.

  2. Tolerate missing data: In contrast to the missing data deletion approach, this method uses the whole dataset for the analysis and replaces the missing data points with the special value NULL. This approach cannot be used when the goal is to forecast the data point.

  3. Impute missing data: The process of imputing missing data entails replacing estimated values for missing data points and then evaluating the entire dataset as if the imputed data were original. This approach is considered to be better when we have to deal with limited data, which is usually the case in most practical scenarios. Let us discuss some techniques used to impute missing data in the following section.

Advertisement

4. Imputation methods

Missing data imputation refers to replacing missing values with approximated ones. The current portion of this book chapter will go through various missing data imputation techniques. Statistical procedures, such as substituting missing values with the mean, median, or mode of the column containing missing values, are one of the most used approaches for dealing with missing data. While it is simple and quick, it alters the statistical character of the data set. This not only skews our histograms, but it also understates the variation in our data because we are making several values the same whereas, in reality, they evidently would not be the same [48]. Time series data can be seasonal or cyclical in nature, and they can follow a pattern. The standard statistical data imputation techniques compromise these components of time series data. Therefore, while this is a popular practice, it should be avoided while handling time-series data.

When mean imputation is not the best fit for the missing values, interpolation is used. Interpolation is a technique for estimating missing values between two known positions. An equally-spaced continuous time-series data can be represented as y=ytt=t0t0+at0+Na

yt0,yt0+a,yt0+2a,,yt0+NaE1

where the series starts at time t0, yt is the measured value of the variable at time t, the series continues for N timestamps, and a is the interval between two consecutive observations, also called “sampling interval”. If data for n continuous timestamps are missing, the series looks like

yt0,yt0+a,yt0+2a,,yt0+i1a,nan,nan,,nan,nan,yt0+i+n+1a,,yt0+NaE2

Here we consider data is missing continuously from time t0+ia to time t0+i+na, for n number of timestamps. The missing values are represented as nan (not a number).

The following are the most often used algorithms for imputing missing values in time series.

4.1 Linear interpolation

Linear interpolation is the simplest imputation method. In this method, missing data points are estimated considering the linearity between the data points available before and after missing data points. The missing period is replaced by [49, 50]:

yt=yt0+i1a+mtt0+i1aE3

Where, m=yt0+i1ayt0+i+n+1a/n+2a is the slope of the straight line. This method is simple and fast; however, it is not useful if the data is seasonal. It misses the time-dependent variation in the data.

4.2 Exponential weighted moving average (EWMA)

The Exponential Weighted Moving Average (EWMA) method estimates the missing values as a weighted average of the historical data points, where the weighting factors decrease exponentially for the older timestamps. The timestamps in the near past have more weightage as compared to the timestamps in the distant past [51]. EWMA relates the predicted value and error in prediction.

EWMA=yt+1=yt+λytytE4

Where, yt is the estimation of y at time t, λ0<λ<1 is the weight of the data at the time t and yt is the data point at time t. It is a recursive process where the weighting constant λ determines the memory of the process [51].

Intuitively, having a weightage of λ implies how much “older” data is considered. λ = 1 means only the most recent data is considered. A higher value of λ means recent data is given weightage, whereas a lower value of λ imposes more weightage on older data. An optimum choice of λ is necessary. J. Stuart Hunter recommends a choice of λ between 0.1 and 0.3 [51].

According to Wijesekara et al. [52], EWMA outperformed mean imputation and interpolation in data imputation. EWMA resulted in an MSE of 0.297 when imputing 5% of missing data. Some advantages of using EWMA for data imputation include its simplicity and flexibility, making it suitable for a wide range of datasets with different patterns of missing data. Additionally, EWMA is known to preserve trend and seasonality in time-series data. However, there are some limitations to EWMA as an imputation method. For example, EWMA is also sensitive to parameter selection, which can significantly affect the imputed values.

4.3 k-nearest neighbor (kNN)

Another approach to imputing missing values in data is the k-nearest neighbor (kNN) method. This method is used when we have limited knowledge about the distribution of the data [53]. This technique is derived from the nearest neighbor (NN) approach, which involves using the value of the closest data point to fill in a missing value. However, the NN approach can sometimes be affected by outliers and result in overfitting. The kNN method addresses this issue by using the average or median of the target values from the k nearest neighbors to make a prediction, rather than just relying on a single nearest neighbor. This can help to reduce the risk of overfitting and improve the generalizability of the model [54, 55]. Using the kNN technique categorical missing values can be imputed using the majority rule among k neighbors, whereas missing numerical values can be handled by imputing the average value of k – closest neighbors. This is known as the majority/mean rule. kNN first finds the k nearest neighbors and imputes the missing points with the weighted average of these neighbor points. The number of nearest neighbors k to be considered appears as a hyperparameter and is to be tuned in the experiments. For a set of k nearest neighbors, Dk=tjyj1 where j=1,2,,k, the kNN estimator predicts [55].

y=argmaxvtjyj1Dk1yj=vE5

when y is categorical. Here v is a value in the domain of target feature y. 1yj=v is an indicator function that returns 1 when vyj and returns 0 otherwise. For numerical outcomes, the kNN estimator predicts [55]

y=1kj=1kyjE6

The advantages of the kNN imputation method include simplicity and model-free approach. However, there are challenges associated with the proper choice of k and the subsequent selection of k nearest neighbors. Lall et al. recommend k=N for N>100, N being total number of observations or timestamps [56]. Minkowski distance is a commonly used approach to define the k nearest neighbors. Generally, this method gives good accuracy but it requires a lot of calculation and memory to predict the missing values [55]. kNN works well for randomly missing values; however, wearables data are often correlated over time. Lagged kNN method combined with Fourier transform imputation has shown better imputation for biomedical time series data [57]. Several modified kNN methods have been applied to impute ECG signal and hence can be applied to wearables data [58].

Kenyhercz et al. [59] conducted a study on data imputation using various methods such as KNN, mean imputation, hot deck imputation, and iterative robust model, with 25% and 50% data missing. The study used craniometric data from 352 individuals representing four population groups, which were publicly available in Howell’s dataset. The results showed that KNN performed better than mean imputation and hot deck imputation, and iterative robust model.

4.4 Expectation maximization (EM)

The EM method is an iterative process that comprises two steps i.e., estimation step (E-step) and maximization step (M-step). The E-step calculates expected values with the help of complete data points. In M-step, the parameters are optimized for best estimates [60]. The process repeats itself until the changes between expected values in each iteration become negligible or reaches convergence.

If we consider our time series data set as y=yobsymis where yobs and ymis represent observed and missing data set, respectively. The data set can be described by probability or density function pyθ governed by a set of parameters. Here pyθ is a function of y for a given θ. The likelihood function can be defined from the density function Lθy=pyθ whereLθy is a function of parameter θ for data y. The initial estimate parameter is θ0. The E-step finds the objective function Qθθt which provides the expected value of the observed data log-likelihood (lθy) for the given observed data and current parameters

Qθθt=Elθyy0θtE7

The M-step determines parameter vector θt+1 which maximizes the log likelihood of the imputed data. At each iteration, the observed data likelihood increases but due to imposed upper bound, convergence is achieved. For all θ, the objective function satisfies,

Qθt+1θtQθθtE8

EM is deemed to be superior then many substitution algorithms such as mean, median, and mode. EM is assumed to produce an unbiased estimation of values MCAR and little biased estimation for values MAR. The limitation of the algorithm is that, while it delivers reliable and correct estimation for missing data, the standard error is low, resulting in inaccuracy in several statistical tests (e.g., t-test). Thus, this method can only be used when the standard error is not crucial like in the case of factor analysis, which does not have the p-values [61].

EM algorithm has been applied to impute missing data of fetal heart rate data [62] as well as to ischemic heart rate data [63]. Aljuaid et al. [64] used the EM (Expectation-Maximization) algorithm to impute missing data in datasets obtained from the UCI machine learning repository. The study compared the performance of the EM algorithm with other imputation methods such as mean imputation, hot deck, kNN, and C5.0. The results showed that the EM algorithm produced the lowest RMSE (Root Mean Square Error) value, with an approximate value of 0.36. These findings suggest that the EM algorithm is a highly effective method for imputing missing data, and can outperform other popular imputation methods in terms of accuracy.

4.5 Kalman prediction

In 1960 R. E. Kalman proposed a method of predicting a signal from its past observations [65]. It is a linear model for discrete data and is widely used for predicting time-series data [66]. It estimates observations based on a set of measurements taken over time that contains noise. This algorithm consists of two processes: a prediction stage and a correction stage. The state and error covariance of the next data point is predicted with the help of the current state and error covariance in the prediction stage. The Kalman filter can be formally expressed as:

  1. Step 1: Prediction process

It predicts a priori state estimate ŷk+1and a posteriori state estimate ŷk by measurement equation,

ŷk+1=AkŷkE9

Here A is a n×n matrix that relates state ŷk at time step k with the state of the next time step (k + 1). It further estimates the next state a posteriori estimate error covariance Pk+1 based on the current state covariance Pk considering the noise covariance Q by transition equation

Pk+1=APkAT+QE10

  1. Step 2: Correction process

The prediction is further corrected by Kalman gain K (a n×mmatrix) which reflects confidence between the predicted result and actual measurement zk.

Kk+1=Pk+1HTHPk+1HT+R1E11

Here R is the measurement noise covariance matrix. The matrix H (m×n matrix) relates with the state of the measurement zk. The gain matrix updates the estimation by

ŷk+1=ŷk+1+KkzkHŷk+1E12

The covariance is also updated,

Pk+1=IKk+1HPk+1E13

We observe that there is a feedback relation between a priori estimate and a posteriori estimate. In Kalman gain, noise in measurement determines the weightage on the predicted values [67]. Being a recursive relation, the Kalman filter is used for the imputation of missing values by measurement and transition equations [68].

Kalman prediction is used for various types of physiological data like heart rate variability and body weight variability [69, 70, 71] as well as for other time series data [72]. Luis Alfonso et al. [73] used data imputation techniques on air quality data and evaluated the results using RMSE values. The study compared the performance of the Kalman smoothing algorithm with other imputation methods such as kNN and RF (Random Forest). The results showed that the Kalman smoothing algorithm performed well and outperformed the other methods in terms of accuracy, as evidenced by the lower RMSE values obtained.

The Kalman filter is a popular method for data imputation in time series data. It can incorporate temporal relationships between observations for more accurate imputations. However, it can be computationally intensive and require knowledge of system dynamics and parameters, which can be challenging to estimate. Despite these limitations, the Kalman filter remains effective for time series data imputation.

Advertisement

5. Conclusion

We observe that the wearables generate a massive amount of personalized multivariate time-series data. This data is generated by different types of sensors which include gyroscopes, accelerometers, magnetometers, light sensors, and temperature sensors. The technological advances have improved the performance of different types of sensors which are also energy efficient, and occupy lesser space. The advancement of computational efficiency has made the analysis of these data feasible. Data is processed in real-time and further analyzed by software which is often powered by artificial intelligence. Apart from real-time monitoring of patients, data extracted from wearable devices have successfully been used for monitoring and diagnosing different diseases. The diseases include cardiological disorders, metabolic disorders, neurological disorders, sleep quality, and others. The robust analysis of this data is a requisite to make the data insightful and beneficiary to the users. The analysis demands the availability of complete data without any missing parts. However, in reality, data contains missing data, compensation for which requires imputation. There are various types of missing data, depending on the randomness of missing. The choice of imputation method is critical as subsequent analysis depends on the imputed data. Several imputation methods are being used whose success depends on the type, trend, and amount of available data as well as the missing data. We have observed that instead of using entire days’ data, use of data binned around the missing period generates better imputed data [74]. Finally, we can conclude that the data which is missing is cardinal along with the available time-series data and hence should be imputed in a worthwhile way.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Kim J, Campbell AS, de Ávila BEF, Wang J. Wearable biosensors for healthcare monitoring. Nature Biotechnology. 2019;37(4):389-406
  2. 2. Rodrigues JJPC, De Rezende Segundo DB, Junqueira HA, Sabino MH, Prince RMI, Al-Muhtadi J, et al. Enabling technologies for the Internet of health things. IEEE Access. 2018;6:13129-13141
  3. 3. Lee JH, Lee KH, Kim HJ, Youk H, Lee HY, Lee JH, et al. Effective prevention and management tools for metabolic syndrome based on digital health-based lifestyle interventions using healthcare devices. Diagnostics. 2022;12(7):1730
  4. 4. Dunn J, Runge R, Snyder M. Wearables and the medical revolution. Persian Medicine. 2018;15(5):429-448
  5. 5. Ometov A, Shubina V, Klus L, Skibińska J, Saafi S, Pascacio P, et al. A survey on wearable technology: History, state-of-the-art and current challenges. Computer Networks. 2021;193:108074
  6. 6. Sharma A, Badea M, Tiwari S, Marty JL. Wearable biosensors: An alternative and practical approach in healthcare and disease monitoring. Molecules. 2021;26(3):748
  7. 7. Wang YC, Xu X, Hajra A, Apple S, Kharawala A, Duarte G, et al. Current advancement in diagnosing atrial fibrillation by utilizing wearable devices and artificial intelligence: A review study. Diagnostics. 2022;12(3):689
  8. 8. Torres-Soto J, Ashley EA. Multi-task deep learning for cardiac rhythm detection in wearable devices. NPJ Digital Medicine. 2020;3(1):116
  9. 9. Tang J, El Atrache R, Yu S, Asif U, Jackson M, Roy S, et al. Seizure detection using wearable sensors and machine learning: Setting a benchmark. Epilepsia. 2021;62(8):1807-1819
  10. 10. Schneider CV, Zandvakili I, Thaiss CA, Schneider KM. Physical activity is associated with reduced risk of liver disease in the prospective UK Biobank cohort. JHEP Reports. 2021;3(3):100263
  11. 11. Ancona S, Faraci FD, Khatab E, Fiorillo L, Gnarra O, Nef T, et al. Wearables in the home-based assessment of abnormal movements in Parkinson’s disease: A systematic review of the literature. Journal of Neurology. 2022;269:100
  12. 12. Chakrabarti S, Biswas N, Jones LD, Kesari S, Ashili S. Smart consumer wearables as digital diagnostic tools : A review. Diagnostics. 2022;12(9):2110
  13. 13. Vijayan V, Connolly J, Condell J, McKelvey N, Gardiner P. Review of wearable devices and data collection considerations for connected health. Sensors. 2021;21(16):5589
  14. 14. Dai H, Younis A, Kong JD, Puce L, Jabbour G, Yuan H, et al. Big data in cardiology: State-of-art and future prospects. Frontier in Cardiovascular Medicine. 2022;9:844296
  15. 15. Chen S, Qi J, Fan S, Qiao Z, Yeo JC, Lim CT. Flexible wearable sensors for cardiovascular health monitoring. Advanced Healthcare Materials. 2021;10(17):e2100116
  16. 16. Ukil A, Bandyopadhyay S, Puri C, Pal A, Mandana K. Cardiac condition monitoring through photoplethysmogram signal denoising using wearables: Can we detect coronary artery disease with higher performance efficacy? In: IEEE Conference Publication | IEEE Xplore. In: IEEE Computing in Cardiology Conference. Vancouver, BC, Canada; 2016
  17. 17. Tison GH, Sanchez JM, Ballinger B, Singh A, Olgin JE, Pletcher MJ, et al. Passive detection of atrial fibrillation using a commercially available smartwatch. JAMA Cardiology. 2018;3(5):409-416
  18. 18. Bashar SK, Han D, Hajeb-Mohammadalipour S, Ding E, Whitcomb C, McManus DD, et al. Atrial fibrillation detection from wrist photoplethysmography signals using smartwatches. Scientific Reports. 2019;9(1):15054
  19. 19. Inui T, Kohno H, Kawasaki Y, Matsuura K, Ueda H, Tamura Y, et al. Use of a smart watch for early detection of paroxysmal atrial fibrillation: Validation study. JMIR Cardiology. 2020;4(1):e14857
  20. 20. Fedorin I, Slyusarenko K. Consumer smartwatches as a portable PSG: LSTM based neural networks for a sleep-related physiological parameters estimation. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. Mexico: Institute of Electrical and Electronics Engineers Inc; 2021. pp. 849-852
  21. 21. Nemati E, Liaqat D, Rahman MM, Kuang J. A novel algorithm for activity state recognition using smartwatch data. In: 2017 IEEE Healthcare Innovations and Point of Care Technologies, HI-POCT 2017. Bethesda, MD, US. 2017
  22. 22. Khwaounjoo P, Singh G, Grenfell S, Özsoy B, MacAskill MR, Anderson TJ, et al. Non-contact hand movement analysis for optimal configuration of smart sensors to capture Parkinson’s disease hand tremor. Sensors (Basel). 2022;22(12):4613
  23. 23. Wu X, Mattingly S, Mirjafari S, Huang C, Chawla NV. Personalized imputation on wearable-sensory time series via knowledge transfer. International Conference on Information and Knowledge Management, Proceedings. 2020;10:1625-1634
  24. 24. Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. A survey on missing data in machine learning. Journal of Big Data. 2021;8(1):1-37
  25. 25. Mack C, Su Z, Westreich D. Types of missing data. In: Managing Missing Data in Patient Registries: Addendum to Registries for Evaluating Patient Outcomes: A User’s Guide. Third ed. Maryland, US: Agency for Healthcare Research and Quality (US); 2018
  26. 26. Sigcha L, Pavón I, Arezes P, Costa N, De Arcas G, López JM. Occupational risk prevention through smartwatches: Precision and uncertainty effects of the built-In accelerometer. Sensors. 2018;18(11):3805
  27. 27. Mauldin TR, Canby ME, Metsis V, Ngu AHH, Rivera CC. SmartFall: A smartwatch-based fall detection system using deep learning. Sensors. 2018;18(10):3363
  28. 28. Powers R, Etezadi-Amoli M, Arnold EM, Kianian S, Mance I, Gibiansky M, et al. Smartwatch inertial sensors continuously monitor real-world motor fluctuations in Parkinson’s disease. Science Translational Medicine. 2021;13:579
  29. 29. Allen J. Photoplethysmography and its application in clinical physiological measurement. Physiological Measurement. 2007;28(3):R1
  30. 30. Hoilett OS, Twibell AM, Srivastava R, Linnes JC. Kick LL: A smartwatch for monitoring respiration and heart rate using Photoplethysmography. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Honolulu, HI, USA: NIH Public Access; 2018. p. 3824
  31. 31. Spaccarotella C, Polimeni A, Mancuso C, Pelaia G, Esposito G, Indolfi C. Assessment of non-invasive measurements of oxygen saturation and heart rate with an Apple smartwatch: Comparison with a standard pulse oximeter. Journal of Clinical Medicine. 2022;11(6):1467
  32. 32. How do I track blood oxygen saturation (SpO2) with my Fitbit device? [Internet]. Available from: https://help.fitbit.com/articles/en_US/Help_article/2459.htm [Accessed: November 2, 2022]
  33. 33. How to use the Blood Oxygen app on Apple Watch – Apple Support (IN) [Internet]. Available from: https://support.apple.com/en-in/HT211027 [Accessed: November 2, 2022]
  34. 34. Magno M, Salvatore GA, Mutter S, Farrukh W, Troester G, Benini L. Autonomous smartwatch with flexible sensors for accurate and continuous mapping of skin temperature. In: IEEE International Symposium on Circuits and Systems. Montreal, QC, Canada: Institute of Electrical and Electronics Engineers Inc.; 2016. pp. 337-340
  35. 35. Gadaleta M, Radin JM, Baca-Motes K, Ramos E, Kheterpal V, Topol EJ, et al. Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms. NPJ Digital Medicine. 2021;4(1):166
  36. 36. Bertemes-Filho P, Morcelles KF. Wearable bioimpedance measuring devices. In: Simini F, Bertemes-Filho P, editors. Medicine-Based Informatics and Engineering. Switzerland: Springer Science and Business Media Deutschland GmbH; 2022. pp. 81-101
  37. 37. Bennett JP, Liu YE, Kelly NN, Quon BK, Wong MC, McCarthy C, et al. Next generation smartwatches to estimate whole body composition using bioimpedance analysis: Accuracy and precision in a diverse multiethnic sample. The American Journal of Clinical Nutrition. 2022;116(5):1418-1429
  38. 38. Huynh TH, Jafari R, Chung WY. A robust bioimpedance structure for smartwatch-based blood pressure monitoring. Sensors. 2018;18(7):2095
  39. 39. Kireev D, Sel K, Ibrahim B, Kumar N, Akbari A, Jafari R, et al. Continuous cuffless monitoring of arterial blood pressure via graphene bioimpedance tattoos. Nature Nanotechnology. 2022;17(8):864-870
  40. 40. Ibrahim B, Jafari R. Continuous blood pressure monitoring using wrist-worn bio-impedance sensors with wet electrodes. In: IEEE Biomedical Circuits and Systems Conference, BioCAS 2018. Cleveland, OH, USA: Institute of Electrical and Electronics Engineers Inc.; 2018
  41. 41. Kim J, Curry J. The treatment of missing data in multivariate analysis. Sociological Methods & Research. 1977;6(2):215-240
  42. 42. Rubin DB. In: Rubin DB, editor. Multiple Imputation for Nonresponse in Surveys. First ed. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 1987 (Wiley Series in Probability and Statistics)
  43. 43. Becker WE, Walstad WB. Data loss from Pretest to Posttest as a sample selection problem. The Review of Economics and Statistics. 1990;72(1):184-188
  44. 44. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 1st ed. New Jersey, US: Wiley; 2019. pp. 1-449
  45. 45. Bennett DA. How can I deal with missing data in my study? Aust N Z J Public Health. 2001;25:464-469
  46. 46. Mack C, Su Z, Westreich D. Managing missing data in patient registries: addendum to registries for evaluating patient outcomes: A user’s guide. 2018. Available from: https://europepmc.org/article/med/29671990 [Accessed: 2023 May 17]
  47. 47. Song Q, Shepperd M. Missing data imputation techniques. International Journal of Business Intelligence and Data Mining. 2007;2(3):261-291
  48. 48. Little RJA. Regression with missing X’s: A review. Journal of the American Statistical Association. 1992;87(420):1237
  49. 49. Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M. Methods for imputation of missing values in air quality data sets. Atmospheric Environment. 2004;38(18):2895-2907
  50. 50. Zhang Z. Missing data imputation: Focusing on single imputation. Annals of Translational Medicine. 2016;4(1):9
  51. 51. Hunter JS. The exponentially weighted moving average. Journal of Quality Technology. 1986;18(4):203-210
  52. 52. Wijesekara W, Liyanage L. Comparison of imputation methods for missing values in air pollution data: Case study on Sydney air quality index. In: Advances in Information and Communication: Proceedings of the 2020 Future of Information and Communication Conference (FICC). Vol. 2. San Francisco, US. 2020. pp. 257-269
  53. 53. Parvin H, Alizadeh H, Minati B. A modification on K-nearest neighbor classifier. Global Journal of Computer Science and Technology. 2010;10(14):37
  54. 54. Malarvizhi MR, Selvadoss TA. K-nearest neighbor in missing data imputation. International Journal of Engineering Research and Development. 2012;5(1):5-07
  55. 55. Zhang S. Nearest neighbor selection for iteratively kNN imputation. Journal of Systems and Software. 2012;85(11):2541-2552
  56. 56. Lall U, Sharma A. A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research. 1996;32(3):679-693
  57. 57. Rahman SA, Huang Y, Claassen J, Heintzman N, Kleinberg S. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data. Journal of Biomedical Informatics. 2015;58:207
  58. 58. Yang F, Du J, Lang J, Lu W, Liu L, Jin C, et al. Missing value estimation methods research for arrhythmia classification using the modified kernel difference-weighted KNN algorithms. BioMed Research International. 2020;2020:7141725
  59. 59. Kenyhercz MW, Passalacqua NV. Missing data imputation methods and their performance with biodistance analyses. In: Biological Distance Analysis. Amsterdam, Netherlands: Elsevier; 2016. pp. 181-194
  60. 60. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological). 1977;39(1):1-38
  61. 61. Molenberghs G, Verbeke G. Multiple imputation and the expectation-maximization algorithm. In: Models for Discrete Longitudinal Data. New York, NY: Springer; 2005. pp. 511-529
  62. 62. Nokas G, Koutras A, Christoyannis I, Georgoulas G, Stylios CH, Groumpos P. Prediction of missing data in Cardiotocograms using the expectation maximization algorithm. In: Scattering and Biomedical Engineering. Singapore: World Scientific Pub Co Pte Lt; 2002. pp. 354-362
  63. 63. Cenitta D, Vijaya Arjunan R, V PK. Engineered science ischemic heart disease multiple imputation technique using machine learning algorithm. Engineered Science. 2022;19:262-272
  64. 64. Aljuaid T, Sasi S. Proper imputation techniques for missing values in data sets. In: 2016 International Conference on Data Science and Engineering (ICDSE). Vol. 1. 2016. p. 5
  65. 65. Kalman RE. A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering. 1960;82:35-45
  66. 66. Sarkka S, Vehtari A, Lampinen J. Time series prediction by Kalman smoother with cross-validated noise density. In: IEEE International Joint Conference on Neural Networks. Budapest, Hungary: Institute of Electrical and Electronics Engineers (IEEE); 2004. pp. 1653-1657
  67. 67. Zhang J, Welch G, Bishop G, Huang Z. A two-stage Kalman filter approach for robust and real-time power system state estimation. IEEE Transactions on Sustainable Energy. 2014;5(2):629-636
  68. 68. Durbin J, Koopman SJ. Time Series Analysis by State Space Methods. Second ed. Oxford; 2012
  69. 69. Turicchi J, O’Driscoll R, Finlayson G, Duarte C, Palmeira AL, Larsen SC, et al. Data imputation and body weight variability calculation using linear and nonlinear methods in data collected from digital smart scales: Simulation and validation study. JMIR Mhealth Uhealth. 2020;8(9):e17977
  70. 70. Tarvainen MP, Georgiadis SD, Ranta-Aho PO, Karjalainen PA. Time-varying analysis of heart rate variability signals with a Kalman smoother algorithm. Physiological Measurement. 2006;27(3):225
  71. 71. Lin S, Wu X, Martinez G, Chawla NV. Filling missing values on wearable-sensory time series data. In: Proceedings of the 2020 SIAM International Conference on Data Mining (SDM). Ohio, US: Society for Industrial and Applied Mathematics Publications; 2020. pp. 46-54
  72. 72. Xie C, Huang C, Zhang D, He W. BiLSTM-I: A deep learning-based long interval gap-filling method for meteorological observation data. International Journal of Environmental Research and Public Health. 2021;18(19):10321
  73. 73. Menéndez Garcia LA, Menéndez Fernández M, Sokoła-Szewioła V, de Prado L, Ortiz Marqués A, Fernández López D, et al. A method of pruning and random replacing of known values for comparing missing data imputation models for incomplete air quality time series. Applied Sciences. 2022;12(13):6465
  74. 74. Chakrabarti S, Biswas N, Karnani K, Padul V, Jones LD, Kesari S, et al. Binned data provide better imputation of missing time series data from wearables. Sensors. 2023;23(3):1454

Written By

Jay Darji, Nupur Biswas, Lawrence D. Jones and Shashaanka Ashili

Submitted: 21 May 2023 Reviewed: 02 August 2023 Published: 24 August 2023