Introduction:
Periodic variation of a physical entity is ubiquitous in nature. The temperature varies from one month to another, an electro cardiogram data shows cyclicity, the consumer spending may exhibit certain repetitiveness due to recurring spending patterns during holidays. In this article we will develop a systematic method which will be able to detect any anomalous behavior in a periodic time series data. For example, in the electro cardiogram data, it is of great importance to understand if the regular periodic pattern persists or there are signs of anomalies. Naturally the question arises what qualifies as an anomaly. This is an interesting question. Obvious anomalies are spikes in a dataset, which because of their extreme values are indicative of out of the ordinary happenings. But there can be subtle anomalies: anomalies which are not characterized by very high or very low signal values. A time series data can exhibit anomalous behavior in terms of the unusual ordering of its components. Let’s consider the yearly temperature variation as an example. Also for simplicity let’s consider there are only 12 data points for a given year, one for each month. In a “normal” cycle the temperature in December is less compared to that of June. But if, for whatever reason, the temperatures of June and December are swapped, it will be a rather unusual situation. It is noted here that the unusualness does not stem from the individual temperatures being out of bound, but happens due to their relative ordering. In the following we will describe how anomalies, subtle or obvious, can be detected from a time series data.
Time series and quantum mechanics:
In order to detect anomalies in a time series, we will borrow a concept which is used in the path integral formulation of quantum mechanics. In the path integral concept of quantum mechanics, a particle can take all possible paths in order go from one point to another. For example, in the following figure a particle, say an electron, goes from point A to point B. The points are indicated in the distance versus time graph. The red line is a possible trajectory for the electron. Two other possible trajectories are also shown in the graph with a different color. In quantum mechanics, unlike classical particle, the electron takes all possible trajectories to go from point A to point B. This concept is at the cornerstone of the path integral formulation of quantum mechanics. The relevance of the following graph to our blog topic is the following. 
The distance at a given time (except at the end points) can be considered as a variable assuming various possible values. For example, the distance variables corresponding to
etc. are random variables. Any time series which is not completely deterministic can be seen in this light. For example, in the temperature data consisting of 12 data points one for each month of the year for several years, the temperatures corresponding to a given month, say January, can be thought to correspond to one random variable only. Hence the entire data set can be thought to be consisting of 12 random variables. Temperatures in each year can be thought of as one particular realization of the values of the random variables. The three paths shown in the above figure can be thought of as the temperature data for three different years. Treating a time series this way has the following advantage. From the dataset one can construct a joint probability distribution for all the random variables, and assign probabilities corresponding to each realization of the set of values of the random variables. This has the advantage if a certain realization, i.e. a given cycle (any year in our temperature dataset) shows any unusual behavior, the value of the corresponding multivariate probability distribution function will be small. This concept is elaborated in the next section.
Probability Distribution of Random Variables:
The probability distribution function (pdf) of a random variable yields the information about the probability of a random variable assuming values within a certain range. For example if a random variable has a pdf
, the probability that the random variable
will assume values between
and
is given by
. A commonly used pdf is the Normal (Gaussian) distribution. A random variable is said to follow a Normal distribution with mean
, and standard deviation
, if its pdf is given by:
Following figure shows three normal distributions. Their widths are all different. The width is controlled by the parameter
. The bigger the
, the larger is the width. Two of the distributions are centered at 0, i.e. have
,
and the third one is centered around .5, i.e.
. From the diagram, as well as from the mathematical expression for the pdf of a Normal distribution it can be seen that the pdf reaches its maximum at mean
, and tapers off further and further away from the mean. Next we will consider the joint probability distribution of two independent random variables
and
. Let
follow a Normal distribution with mean
and standard deviation
, and
follow a Normal distribution with mean
and standard deviation
. Since they are assumed to be independent, their joint probability distribution is given by:
Where
is the covariance matrix of the random variables
and
. A covariance matrix has variances (square of standard deviations) of the random variables as the diagonal entries. The off-diagonal entries of the covariance matrix are the same and are equal to the covariance of the random variables. In this example, because
and
are independent, their covariance is zero. The formal definitions of the variance and the covariance are given below. But intuitively the covariance measures the degree to which the random variables are correlated and the variances measure the spreads of the random variables. The variance of a random variable
is given by:
where
is the
th value of
.
The covariance of two random variables
and
are given by:
where
and
are the
th values of
and
. It is noted that
.
Although the covariance in this example is zero, it is non-zero if the random variables are correlated. In the following the surface and the contour plots of the probability distribution functions for the independent as well as the correlated random variables are shown. As for the independent variables, the contours are circular, whereas the contours for the uncorrelated ones show a clear sign of dependence by being elongated. In the specific example given in the figure below the variables are negatively correlated, which can be seen from the leftward tilt of the elongated ellipse. For both the zero and the non-zero covariance, the probability is maximum at the point (0,0), the two entries in the first bracket being the means of the two variables for this specific example. The probability falls down as one goes away further and further from the center. But the manner in which the probabilities decrease differ from the zero to the non-zero covariance. As for the non-zero covariance, the probability falls off in an anisotropic manner, whereas for the zero covariance case the probability decay is isotropic in nature. We will conclude this section by giving an expression for the probability distribution of multiple variables, which will be a straightforward generalization of the two variable result. Assuming that there are n variables with the means written in a vector form as
, the probability distribution function will be given by:
, where
In the above notation
. In the next section we will explain how the concept of the multivariate probability distribution can actually be used in the context of the time series data.
Time Series Data and Multivariate Probability Distribution:
As was described earlier, the time axis of each cycle of a periodic time series data can be divided into n-1 intervals. Hence each cycle will contain n time points. Because of the cyclical nature of the data, these n points can be thought of as n ‘cycle independent’ random variables. For example, with regard to the monthly temperature data for several years, the months from January through December are the same for every year. The temperatures for the month of January are different for different years. So the temperature in January is a random variable. Corresponding to the n points of a cycle, there will be n random variables assuming various values across cycles. Using the multivariate probability distribution, we will determine the probability of occurrence of a given cycle. This is of key importance, since a low probability value of a cycle will be indicative of its anomalousness. In the following the detailed procedure for obtaining the probability of a cycle is given. In order to determine the multivariate probability first of all one needs to determine the n dimensional mean vector. Assuming there are m cycles, the n-component mean vector is given as:
, where
is the value of the
th random variable
for the
th cycle. The elements for the covariance matrix are calculated as follows:
,where
is the mean of
.
Once the mean vector and the covariance matrix are determined, the multivariate probability can be calculated from Eq. (5). In the next section we will implement this algorithm and show numerical results for a few examples.
Results:
Let’s consider the following signal, which has 20 cycles, the 6th cycle of which has an anomalous character to it.
Using the procedure mentioned in the previous section the probability for each of the 20 cycles can be computed. If our method works well, we should be able to show that the probability corresponding to the anomalous cycle should be quite small. Following are the calculated probabilities for the 20 cycles.
As can be seen, the probabilities for all the cycles except for the 6th one are quite large. Since it is the 6th cycle which shows anomalous behavior we can conclude based on this well controlled experiment that the method described in this blogpost does a pretty good job in detecting the anomaly. As another example, let there be anomalies in the 6th and the 15th cycle as shown below.
The computed probabilities corresponding to various cycles using the same technique are given as
As can be seen from the above diagram, the 6th and the 15th cycles have much smaller probabilities compared to those corresponding to the rest of the cycles. It is noted here that the absolute probability values are not normalized. Hence it is the relative probability values which are important.
As the last example of how our method is performing, we will consider the following time series data and evaluate probabilities for the cycles.
The probabilities for the cycles are given below. For this example all the probabilities have been normalized by dividing by the maximum probability value, which happens to be that corresponding to the 8th cycle for this particular example.
Again the probability values indicate that the 6th and the 15th cycles really stand out in terms of very small probabilities and hence anomalousness. Hence we conclude that the method proposed in this blog works reasonably well to detect anomalies in a periodic dataset.
Cycle Detection:
So far we have shown that our method works well for a periodic dataset, of which the period is known. This is not a serious constraint. There are many datasets which belong to this category. For example, yearly temperature variation, sales, or expenditure fit very well into this category. If the cycle of a dataset is not known, there is the standard Fourier series technique, which can shed some light on the periodicity of the dataset at hand. In the following we describe another method, which can sometimes be helpful to ascertain the time period of a periodic dataset. A cycle is characterized by the time period after which a signal repeats itself. Our philosophy has been to divide a cycle into n points, treating the independent variable corresponding to each time point as a random variable. After one cycle ends, and the next cycle starts, the value of the independent variable corresponding to the first time point is considered to be the same as that of the previous cycle. For a moment let’s take the aspect of randomness of the independent variables away from the picture. For a non-random variable, the independent variable assumes exactly the same value as it had in the previous cycle. Hence the all the values for that particular independent variable in the dataset will be equal. This is the key concept. The standard deviation of the values assumed by that independent variable will be zero. It will be the same situation for all other independent variables of the time series as well. Hence if we add up all the standard deviations, a low value thereof is a direct result of the cyclical property of the signal. We can utilize this property of a periodic signal to find its period. Of all proposed time periods, the true period will be the one for which the total variance is minimum. This method is demonstrated with an example below.
Example: We consider the following signal
The above signal is the sum of two sinusoids. The signal is plotted in the following as a function of time.
It can be seen from the graph and also evaluated independently that the time period for the above signal is 10 time units. Hence if our proposed method of determining the cycle is correct, we should be able to see a dip in the sum of the standard deviations at that cycle-value. Following is the plot of the net standard deviation versus the cycle length plot and lo and behold there is surely a dip in the net standard deviation value when the cycle length is 10 time-units.
Summary:
In the conclusion, in this post we have described a method to detect anomalies in a time series. We explained a multivariate probability distribution function based approach to assign probabilities to various cycles of the periodic time series data. By giving multiple examples we demonstrated how the anomalous cycles are identified from their low probability values. This approach can be used for any periodic time series data. We also briefly discussed a method alternative to the standard Fourier technique, which can sometimes successfully determine the time period of a periodic dataset.
No comments:
Post a Comment