Skip to content

Generalizing Moving Averages 📈

Published 2025-11-02

TL;DR

This post introduces a novel parametrized moving average (MMA) that unifies the cumulative and exponential moving averages into a single framework. After providing context on existing approaches, I introduce the MMA formula and visualize how its parameters control the balance between stability and responsiveness regarding its weights and for real-world examples.

Moving averages are fundamental tools in time series analysis, used to smooth noise and reveal underlying trends across domains from finance to signal processing. While choosing between different moving average types often involves trade-offs between responsiveness and stability, what if we could parametrically control this trade-off within a single framework?

This post introduces the Mixed Moving Average (MMA) — a novel parametrized approach that encompasses both the cumulative moving average (maximum stability) and the exponential moving average (tunable responsiveness) as special cases, while offering a continuous spectrum of behaviors between them. The resulting Mixed Moving Average provides this unified framework, with mathematical foundations and practical applications in automated feature engineering for time series machine learning.

Moving Average Categories

There are two categories of moving averages:

  • those that keep track of all historical data points, and
  • those that keep track of a window of the last n data points.

Here's our roadmap through the different moving averages:

Window-Based Moving Averages

Before diving into the historical data approaches, let's briefly review the two main window-based moving averages that operate on a fixed window of the most recent n data points.

Simple Moving Average

The Simple Moving Average (SMA) calculates the arithmetic mean of the last n data points:

$$ \textit{SMA}_t^n(x) = \frac{1}{n} \sum_{i=t-n+1}^{t} x_i = \frac{x_{t-n+1} + x_{t-n+2} + \ldots + x_t}{n} $$

All data points in the window receive equal weight \(\frac{1}{n}\). The SMA provides a simple smoothing effect but can be slow to respond to trend changes.

Weighted Moving Average

The Weighted Moving Average (WMA) assigns different weights to data points in the window, typically giving more importance to recent observations:

$$ \textit{WMA}_t^n(x) = \frac{1}{\sum_{k=1}^{n} w_k} \sum_{i=t-n+1}^{t} w_{i-(t-n)} \cdot x_i $$

where \(w_k\) are the weights for positions \(k = 1, 2, \ldots, n\) within the window. A common choice is linear weights: \(w_k = k\), giving the oldest point weight \(1\) and the most recent point weight \(n\).

Both SMA and WMA have fixed memory requirements and computational complexity, making them efficient for real-time applications. However, they discard information older than the window size.

Cumulative Moving Average

The Cumulative Moving Average (CMA) is a type of moving average that calculates the average of all the data points up to a given point \(t\) in time. Given a sequence of measurements \(x\), the cumulative moving average of the first \(t\) measurements \(CMA_t(x)\) is

$$ \textit{CMA}_t(x) = \frac{ x_1 + x_2 + … + x_t }{t} = \sum_{i=1}^{t} \frac{1}{t} x_i. $$

The corresponding recursive form is

$$ \begin{aligned} \textit{CMA}_1(x) &= x_1, \\ \textit{CMA}_t(x) &= \frac{1}{t} x_t + \frac{ t-1 }{t} \textit{CMA}_{t-1}(x), \text{for } t > 1. \end{aligned} $$

Therefore, the moving average can be updated with each new data-point coming in, and without storing all previous, but only the moving average of the previous time step.

Looking at the closed form again, we see that it's a weighted sum over all datapoints. The weight distribution \(w_i\) for all past values \(x_i\) at time \(t\) is constant, but changes as time progresses and more datapoints are added:

$$ \begin{aligned} \textit{CMA}_t(x) &= \sum_{i=1}^{t} \frac{1}{t} x_i \\ w_{i,t}^{\textit{CMA}} &= \frac{1}{t} \end{aligned} $$

Figure 1: CMA weight distribution over time. Notice how all historical values receive equal weight \(\frac{1}{t}\). This weight decreases as more data points are accumulated, as shown with orange line.

Exponential Moving Average

The Exponential Moving Average (EMA) is a type of moving average that gives more weight to recent prices. It is calculated using a smoothing factor \(\alpha\) and the previous period's EMA, where \(0 < \alpha \leq 1\). Values of \(\alpha\) close to 1 give more weight to recent observations (more responsive), while values close to 0 give more weight to historical observations (smoother).

The formula for the EMA is:

$$ \begin{aligned} \textit{EMA}^\alpha_1(x) &= x_1, \\ \textit{EMA}^\alpha_t(x) &= \alpha x_t + (1 - \alpha) \textit{ EMA}^\alpha_{t-1}(x), \text{for } t > 1. \end{aligned} $$

The corresponding closed form for \(t > 1\) is

$$ \begin{aligned} \textit{EMA}^\alpha_t(x) &= \alpha x_t + \alpha (1 - \alpha) x_{t-1} + \alpha (1 - \alpha)^2 x_{t-2} + … \\ &\quad + \alpha (1 - \alpha)^{t-2} x_2 + (1 - \alpha)^{t-1} x_1 \\ &= \sum_{i=2}^{t} \alpha (1 - \alpha)^{t-i} x_{i} + (1 - \alpha)^{t-1} x_1 \end{aligned} $$

Let's have a look at the weight distribution of the closed form, defining

$$ w_{i,t}^{\textit{EMA}^\alpha} = \begin{cases} (1 - \alpha)^{t-1} &\text{if } i = 1, \\ \alpha (1 - \alpha)^{t-i} &\text{if } i > 1. \end{cases} $$

The weights drop exponentially from the newest one. When time progresses, the overall weight distribution shifts to the right, but stays constant considering the n-newest datapoint (e.g., constant \(t-i\)):

Figure 2: EMA weight distribution over time. Weights decay exponentially, giving much higher importance to recent observations while maintaining a fixed relative weighting pattern.

Mixed Moving Average

For my master thesis, I developed a moving average that could be parametrized to behave as either the cumulative or the exponential moving average, or as a mixture of both. Therefore, I'll refer to it as Mixed Moving Average (MMA). It is possible that this or a similar moving average is known; however, I'm not aware of it, so I'm sharing it here.

Recursive Formula

The following recursive formula describes the mixed moving average:

$$ \begin{aligned} \textit{MMA}^{ab}_1(x) &= x_1, \\ \textit{MMA}^{ab}_t(x) &= a t^b x_t + (1 - a t^b) \textit{ MMA}^{ab}_{t-1}(x) &&\text{for } t > 1, \\ &&&\text{where} \\ &-1 \leq b \leq 0 &&\text{and} \\ &\hspace{1em}0 < a \leq 1. \end{aligned} $$

Covering CMA and EMA

This formula encompasses both the CMA and the EMA for specific parameters \(a\) and \(b\):

  • For \(a=1\), \(b=-1\):
    \(\textit{MMA}^{a=1,b=-1}_t(x) = t^{-1} x_t + (1 - {t^{-1}}) \textit{MMA}^{a=1,b=-1}_{t-1}(x) = \textit{CMA}_t(x)\)
  • For \(b=0\), \(a=\alpha\):
    \(\textit{MMA}^{a=\alpha,b=0}_t(x) = \alpha x_t + (1 - \alpha) \textit{MMA}^{a=\alpha,b=0}_{t-1}(x) = \textit{EMA}^\alpha_t(x)\)

Closed Form & Weight Distribution

The corresponding closed form for \(t > 1\) is

$$ \begin{aligned} \textit{MMA}^{ab}_t(x) &= a t^b x_t + (1 - a t^b) a (t-1)^b x_{t-1} \\ &\quad + (1 - a t^b)(1 - a (t-1)^b) a (t-2)^b x_{t-2} + \ldots \\ &\quad + \prod_{j=3}^{t} (1 - a j^b) a 2^b x_2 + \prod_{j=2}^{t} (1 - a j^b) x_1 \\ &= \sum_{i=2}^{t} \left[ a i^b \underbrace{\prod_{j=i+1}^{t} (1 - a j^b)}_{\text{1 if } i = t} x_i \right] + \prod_{j=2}^{t} (1 - a j^b) x_1 \end{aligned} $$

So we can define a weight distribution \(w_{i,t}^{\textit{MMA}^{ab}}\) as well:

$$ \begin{aligned} \textit{MMA}^{ab}_t(x) &= \sum_{i=1}^{t} ( w_{i,t}^{\textit{MMA}^{ab}} x_i ) \\ \text{where } w_{i,t}^{\textit{MMA}^{ab}} &= \begin{cases} a i^b \prod_{j=i+1}^{t} (1 - a j^b) &\text{if } i > 1, \\ \prod_{j=2}^{t} (1 - a j^b) &\text{if } i = 1. \end{cases} \end{aligned} $$

The transformation to the EMA weight distribution is straight-forward. For the CMA weight distribution, expand the section below:

MMA to CMA weight transformation

The weight distribution of the MMA for \(a=1\) and \(b=-1\) can be transformed to the CMA weight distribution as follows:

\[ \begin{aligned} w_{i,t}^{\textit{MMA}^{a=1,b=-1}} &= 1 i^{-1} \prod_{j=i+1}^{t} (1 - 1 j^{-1}) \\ &= \frac{1}{i} (1 - \frac{1}{i+1}) (1 - \frac{1}{i+2}) … (1 - \frac{1}{t}) \\ &= \frac{1}{i} (\frac{i+1}{i+1} - \frac{1}{i+1}) (\frac{i+2}{i+2} - \frac{1}{i+2}) … (\frac{t}{t} - \frac{1}{t}) \\ &= \frac{1}{\cancel{i}} \frac{\cancel{i}}{\cancel{i+1}} \frac{\cancel{i+1}}{\cancel{i+2}} … (\frac{\cancel{t-1}}{t}) \\ &= \frac{1}{t} = w_{t}^{\textit{CMA}} \end{aligned} \]

This uses the telescoping series property.

The visualization below shows how the parameters \(a\) and \(b\) affect the weight distribution:

Weight Visualization

Figure 3: Interactive visualization of MMA weight distributions and parameter constraints. The parameter plane shows monotonicity constraints detailed below.

Parameter Constraints

The parameter plane visualization shows two dashed constraint curves that determine whether MMA weights are monotonic:

  1. Full Monotonicity Constraint: \(a \geq 2^{-b-1}\) (teal dashed line)
    Ensures weights are monotonically increasing for all \(i \geq 1\)
Derivation: Full Monotonicity Constraint

For MMA weights to be non-decreasing from the first weight onward, we need \(w_2 \geq w_1\). This gives us the full monotonicity constraint since ensuring the first ratio is at least 1 is the most restrictive requirement.

From the weight distribution formula, we have:

\[ \begin{aligned} w_1 &= \prod_{j=2}^{t} (1 - a j^b) = (1 - a \cdot 2^b) \prod_{j=3}^{t} (1 - a j^b) \\ w_2 &= a \cdot 2^b \prod_{j=3}^{t} (1 - a j^b) \end{aligned} \]

For \(w_2 \geq w_1\), we need:

\[ \frac{w_2}{w_1} = \frac{a \cdot 2^b \prod_{j=3}^{t} (1 - a j^b)}{(1 - a \cdot 2^b) \prod_{j=3}^{t} (1 - a j^b)} = \frac{a \cdot 2^b}{1 - a \cdot 2^b} \geq 1 \]

Solving for \(a\):

\[ \begin{aligned} \footnotesize\text{multiply both sides by } (1 - a \cdot 2^b):&& a \cdot 2^b &\geq 1 - a \cdot 2^b \\ \footnotesize\text{add } a \cdot 2^b \text{ to both sides}:&& 2a \cdot 2^b &\geq 1 \\ \footnotesize\text{divide by } 2 \cdot 2^b:&& a &\geq \frac{1}{2 \cdot 2^b} = \frac{1}{2^{b+1}} = 2^{-b-1} \end{aligned} \]

Therefore, the full monotonicity constraint is \(a \geq 2^{-b-1}\).

  1. Partial Monotonicity Constraint: \(a \geq \left(\frac{1}{3}\right)^b - \left(\frac{1}{2}\right)^b\) (purple dashed line)
    Ensures weights are monotonically increasing for \(i \geq 2\)
    This allows the first weight to take up extra weight needed to sum the weights to \(1\), similar to \(w_{1,t}^{\textit{EMA}^{\alpha}}\). Therefore, this constraint is more relaxed than the full monotonicity constraint and allows to cover the EMA with \(\alpha < 0.5\).
Derivation: Partial Monotonicity Constraint

For MMA weights to be non-decreasing from the second weight onward, we need \(w_3 \geq w_2\). This gives us the tightest constraint since the ratio \(\frac{w_{i+1}}{w_i}\) is smallest for small values of \(i\) (not shown here).

The general weight ratio for \(i > 1\) is:

\[ \frac{w_{i+1}}{w_i} = \frac{(i+1)^b}{i^b} \cdot \frac{1}{1 - a(i+1)^b} \]

For \(w_3 \geq w_2\) with \(i = 2\), we need:

\[ \frac{w_3}{w_2} = \frac{3^b}{2^b} \cdot \frac{1}{1 - a \cdot 3^b} \geq 1 \]

Solving for \(a\):

\[ \begin{aligned} \footnotesize\text{divide both sides by } \frac{3^b}{2^b}:&& \frac{1}{1 - a \cdot 3^b} &\geq \frac{2^b}{3^b} \\ \footnotesize\text{take reciprocals}:&& 1 - a \cdot 3^b &\leq \frac{3^b}{2^b} \\ \footnotesize\text{subtract 1, multiply by -1}:&& a \cdot 3^b &\geq 1 - \frac{3^b}{2^b} \\ \footnotesize\text{common denominator}:&& a \cdot 3^b &\geq \frac{2^b - 3^b}{2^b} \\ \footnotesize\text{divide by } 3^b:&& a &\geq \frac{2^b - 3^b}{2^b \cdot 3^b} \\ \footnotesize\text{simplify}:&& a &\geq \frac{2^b}{6^b} - \frac{3^b}{6^b} = \left(\frac{1}{3}\right)^b - \left(\frac{1}{2}\right)^b \end{aligned} \]

Therefore, the partial monotonicity constraint is \(a \geq \left(\frac{1}{3}\right)^b - \left(\frac{1}{2}\right)^b\).

Parameter combinations above or on these curves produce MMA weights that give more emphasis to recent observations, making the moving average more responsive to new data. Decreasing weights would violate the monotonicity constraints, rendering the MMA ineffective for trend following. The parameter space given by the constraints is particularly useful for automated parameter tuning in machine learning pipelines.

Real-World Examples

The following interactive visualization shows how different moving averages perform on real-world time series data from three different domains:

Figure 4: MMA behavior on real-world time series data. Experiment with parameters a and b to observe how the MMA adapts to different data characteristics compared to traditional approaches.

Each dataset demonstrates different time series characteristics:

  • Brent Crude Oil Prices: Exhibits high volatility with economic shock events
  • COVID-19 Daily New Cases: Features exponential growth phases, multiple waves, and policy intervention effects
  • COâ‚‚ Atmospheric Concentration: Shows smooth long-term trends with seasonal cycles

Try adjusting the MMA parameters a and b to see how they affect the balance between responsiveness and smoothing. Lowering b means that newer data becomes less important when more datapoints are accumulated. This creates interesting dynamics:

  • For datasets with clear long-term trends (like COâ‚‚ concentration), negative values of b are counterproductive because they reduce the influence of recent data points that carry trend information. Such data requires consistent weighting of new observations, like the EMA or window-based moving averages.

  • For volatile datasets without clear trends (like oil prices or COVID cases), lower b values can be beneficial. They allow the moving average to follow data more closely initially when fewer datapoints are available, then provide more stability as additional data arrives.

  • The extreme case is CMA (where b = -1), which treats all datapoints equally regardless of their position in time. While this provides maximum stability, it completely ignores the temporal ordering that makes recent observations potentially more relevant.

This illustrates how b values between -1 and 0 allow the average to follow the data more closely in the beginning and become more stable as more data becomes available.

Summary

The following table compares the different moving average approaches:

Moving Average Description Memory Response Parameters Pros Cons
SMA (window-based) Mean over \(n\) last values \(n\) values Responsive Window size \(n\) Simple, intuitive Ignores older data completely
WMA (window-based) Weighted mean over \(n\) values \(n\) values Responsive Weights \(w_i\), window \(n\) Flexible weighting Complex parameter tuning
CMA (recursive) Mean over all historical data None Unresponsive None Uses all data, stable Slow to adapt to changes
EMA (recursive) Exponentially weighted mean None Responsive Smoothing factor \(\alpha\) Responsive, memory efficient Single parameter limits flexibility
MMA (recursive) Parametric mixture of CMA/EMA None Tunable Factor \(a\), decay \(b\) Highly flexible, covers CMA/EMA Complex parameter space

All methods have \(O(1)\) update complexity except WMA which requires \(O(n)\) for generic weights.

In conclusion, the MMA provides a unified framework that encompasses both CMA (\(a=1, b=-1\)) and EMA (\(b=0\)) while offering additional flexibility through its parameter space constrained by monotonicity requirements. This flexibility makes the MMA particularly valuable for automated parameter tuning in machine learning pipelines, where the optimal balance between responsiveness and stability can be learned from the data characteristics, and can change as more data becomes available.