Generalizing Moving Averages
¶
Published 2025-11-02
TL;DR
This post introduces a novel parametrized moving average (MMA) that unifies the cumulative and exponential moving averages into a single framework. After providing context on existing approaches, I introduce the MMA formula and visualize how its parameters control the balance between stability and responsiveness regarding its weights and for real-world examples.
Moving averages are fundamental tools in time series analysis, used to smooth noise and reveal underlying trends across domains from finance to signal processing. While choosing between different moving average types often involves trade-offs between responsiveness and stability, what if we could parametrically control this trade-off within a single framework?
This post introduces the Mixed Moving Average (MMA) — a novel parametrized approach that encompasses both the cumulative moving average (maximum stability) and the exponential moving average (tunable responsiveness) as special cases, while offering a continuous spectrum of behaviors between them. The resulting Mixed Moving Average provides this unified framework, with mathematical foundations and practical applications in automated feature engineering for time series machine learning.
Moving Average Categories¶
There are two categories of moving averages:
- those that keep track of all historical data points, and
- those that keep track of a window of the last
ndata points.
Here's our roadmap through the different moving averages:
- Window-based approaches: We'll first briefly cover the Simple Moving Average (SMA) and Weighted Moving Average (WMA)
- Historical data approaches: Then we'll focus on the two most important ones: The Cumulative Moving Average (CMA) and the Exponential Moving Average (EMA)
- Unified framework: Afterwards, we'll introduce a parametrized Mixed Moving Average (MMA) formula covering them both
Window-Based Moving Averages¶
Before diving into the historical data approaches, let's briefly review the two main window-based moving averages that operate on a fixed window of the most recent n data points.
Simple Moving Average¶
The Simple Moving Average (SMA) calculates the arithmetic mean of the last n data points:
All data points in the window receive equal weight \(\frac{1}{n}\). The SMA provides a simple smoothing effect but can be slow to respond to trend changes.
Weighted Moving Average¶
The Weighted Moving Average (WMA) assigns different weights to data points in the window, typically giving more importance to recent observations:
where \(w_k\) are the weights for positions \(k = 1, 2, \ldots, n\) within the window. A common choice is linear weights: \(w_k = k\), giving the oldest point weight \(1\) and the most recent point weight \(n\).
Both SMA and WMA have fixed memory requirements and computational complexity, making them efficient for real-time applications. However, they discard information older than the window size.
Cumulative Moving Average¶
The Cumulative Moving Average (CMA) is a type of moving average that calculates the average of all the data points up to a given point \(t\) in time. Given a sequence of measurements \(x\), the cumulative moving average of the first \(t\) measurements \(CMA_t(x)\) is
The corresponding recursive form is
Therefore, the moving average can be updated with each new data-point coming in, and without storing all previous, but only the moving average of the previous time step.
Looking at the closed form again, we see that it's a weighted sum over all datapoints. The weight distribution \(w_i\) for all past values \(x_i\) at time \(t\) is constant, but changes as time progresses and more datapoints are added:
Figure 1: CMA weight distribution over time. Notice how all historical values receive equal weight \(\frac{1}{t}\). This weight decreases as more data points are accumulated, as shown with orange line.
Exponential Moving Average¶
The Exponential Moving Average (EMA) is a type of moving average that gives more weight to recent prices. It is calculated using a smoothing factor \(\alpha\) and the previous period's EMA, where \(0 < \alpha \leq 1\). Values of \(\alpha\) close to 1 give more weight to recent observations (more responsive), while values close to 0 give more weight to historical observations (smoother).
The formula for the EMA is:
The corresponding closed form for \(t > 1\) is
Let's have a look at the weight distribution of the closed form, defining
The weights drop exponentially from the newest one. When time progresses, the overall weight distribution shifts to the right, but stays constant considering the n-newest datapoint (e.g., constant \(t-i\)):
Figure 2: EMA weight distribution over time. Weights decay exponentially, giving much higher importance to recent observations while maintaining a fixed relative weighting pattern.
Mixed Moving Average¶
For my master thesis, I developed a moving average that could be parametrized to behave as either the cumulative or the exponential moving average, or as a mixture of both. Therefore, I'll refer to it as Mixed Moving Average (MMA). It is possible that this or a similar moving average is known; however, I'm not aware of it, so I'm sharing it here.
Recursive Formula¶
The following recursive formula describes the mixed moving average:
Covering CMA and EMA¶
This formula encompasses both the CMA and the EMA for specific parameters \(a\) and \(b\):
- For \(a=1\), \(b=-1\):
\(\textit{MMA}^{a=1,b=-1}_t(x) = t^{-1} x_t + (1 - {t^{-1}}) \textit{MMA}^{a=1,b=-1}_{t-1}(x) = \textit{CMA}_t(x)\) - For \(b=0\), \(a=\alpha\):
\(\textit{MMA}^{a=\alpha,b=0}_t(x) = \alpha x_t + (1 - \alpha) \textit{MMA}^{a=\alpha,b=0}_{t-1}(x) = \textit{EMA}^\alpha_t(x)\)
Closed Form & Weight Distribution¶
The corresponding closed form for \(t > 1\) is
So we can define a weight distribution \(w_{i,t}^{\textit{MMA}^{ab}}\) as well:
The transformation to the EMA weight distribution is straight-forward. For the CMA weight distribution, expand the section below:
MMA to CMA weight transformation
The weight distribution of the MMA for \(a=1\) and \(b=-1\) can be transformed to the CMA weight distribution as follows:
This uses the telescoping series property.
The visualization below shows how the parameters \(a\) and \(b\) affect the weight distribution:
Weight Visualization¶
Figure 3: Interactive visualization of MMA weight distributions and parameter constraints. The parameter plane shows monotonicity constraints detailed below.
Parameter Constraints¶
The parameter plane visualization shows two dashed constraint curves that determine whether MMA weights are monotonic:
- Full Monotonicity Constraint: \(a \geq 2^{-b-1}\) (teal dashed line)
Ensures weights are monotonically increasing for all \(i \geq 1\)
Derivation: Full Monotonicity Constraint
For MMA weights to be non-decreasing from the first weight onward, we need \(w_2 \geq w_1\). This gives us the full monotonicity constraint since ensuring the first ratio is at least 1 is the most restrictive requirement.
From the weight distribution formula, we have:
For \(w_2 \geq w_1\), we need:
Solving for \(a\):
Therefore, the full monotonicity constraint is \(a \geq 2^{-b-1}\).
- Partial Monotonicity Constraint: \(a \geq \left(\frac{1}{3}\right)^b - \left(\frac{1}{2}\right)^b\) (purple dashed line)
Ensures weights are monotonically increasing for \(i \geq 2\)
This allows the first weight to take up extra weight needed to sum the weights to \(1\), similar to \(w_{1,t}^{\textit{EMA}^{\alpha}}\). Therefore, this constraint is more relaxed than the full monotonicity constraint and allows to cover the EMA with \(\alpha < 0.5\).
Derivation: Partial Monotonicity Constraint
For MMA weights to be non-decreasing from the second weight onward, we need \(w_3 \geq w_2\). This gives us the tightest constraint since the ratio \(\frac{w_{i+1}}{w_i}\) is smallest for small values of \(i\) (not shown here).
The general weight ratio for \(i > 1\) is:
For \(w_3 \geq w_2\) with \(i = 2\), we need:
Solving for \(a\):
Therefore, the partial monotonicity constraint is \(a \geq \left(\frac{1}{3}\right)^b - \left(\frac{1}{2}\right)^b\).
Parameter combinations above or on these curves produce MMA weights that give more emphasis to recent observations, making the moving average more responsive to new data. Decreasing weights would violate the monotonicity constraints, rendering the MMA ineffective for trend following. The parameter space given by the constraints is particularly useful for automated parameter tuning in machine learning pipelines.
Real-World Examples¶
The following interactive visualization shows how different moving averages perform on real-world time series data from three different domains:
Figure 4: MMA behavior on real-world time series data. Experiment with parameters a and b to observe how the MMA adapts to different data characteristics compared to traditional approaches.
Each dataset demonstrates different time series characteristics:
- Brent Crude Oil Prices: Exhibits high volatility with economic shock events
- COVID-19 Daily New Cases: Features exponential growth phases, multiple waves, and policy intervention effects
- COâ‚‚ Atmospheric Concentration: Shows smooth long-term trends with seasonal cycles
Try adjusting the MMA parameters a and b to see how they affect the balance between responsiveness and smoothing. Lowering b means that newer data becomes less important when more datapoints are accumulated. This creates interesting dynamics:
-
For datasets with clear long-term trends (like COâ‚‚ concentration), negative values of
bare counterproductive because they reduce the influence of recent data points that carry trend information. Such data requires consistent weighting of new observations, like the EMA or window-based moving averages. -
For volatile datasets without clear trends (like oil prices or COVID cases), lower
bvalues can be beneficial. They allow the moving average to follow data more closely initially when fewer datapoints are available, then provide more stability as additional data arrives. -
The extreme case is CMA (where
b = -1), which treats all datapoints equally regardless of their position in time. While this provides maximum stability, it completely ignores the temporal ordering that makes recent observations potentially more relevant.
This illustrates how b values between -1 and 0 allow the average to follow the data more closely in the beginning and become more stable as more data becomes available.
Summary¶
The following table compares the different moving average approaches:
| Moving Average | Description | Memory | Response | Parameters | Pros | Cons |
|---|---|---|---|---|---|---|
| SMA (window-based) | Mean over \(n\) last values | \(n\) values | Responsive | Window size \(n\) | Simple, intuitive | Ignores older data completely |
| WMA (window-based) | Weighted mean over \(n\) values | \(n\) values | Responsive | Weights \(w_i\), window \(n\) | Flexible weighting | Complex parameter tuning |
| CMA (recursive) | Mean over all historical data | None | Unresponsive | None | Uses all data, stable | Slow to adapt to changes |
| EMA (recursive) | Exponentially weighted mean | None | Responsive | Smoothing factor \(\alpha\) | Responsive, memory efficient | Single parameter limits flexibility |
| MMA (recursive) | Parametric mixture of CMA/EMA | None | Tunable | Factor \(a\), decay \(b\) | Highly flexible, covers CMA/EMA | Complex parameter space |
All methods have \(O(1)\) update complexity except WMA which requires \(O(n)\) for generic weights.
In conclusion, the MMA provides a unified framework that encompasses both CMA (\(a=1, b=-1\)) and EMA (\(b=0\)) while offering additional flexibility through its parameter space constrained by monotonicity requirements. This flexibility makes the MMA particularly valuable for automated parameter tuning in machine learning pipelines, where the optimal balance between responsiveness and stability can be learned from the data characteristics, and can change as more data becomes available.