--- title: `simple_interpolation` keywords: fastai sidebar: home_sidebar summary: "A Pandas implentation of the Brownian Bridge interpolation algorithm. Wiener processes are assumed to build `std()`." description: "A Pandas implentation of the Brownian Bridge interpolation algorithm. Wiener processes are assumed to build `std()`." nb_path: "index.ipynb" ---
Interpolation rocks, but doing it poorly can alter the original features of your data. Brownian bridge preserves the volatibility of the original data, if done well. Mixing that with a bit theory on the stock market (Wiener processes), we built a simple interpolation library.
Read about the algorithm in the "Brownian bridge algo" section below.
pip install simple_interpolation
# (i. e. X column, values 3-5)
df
patched_df = interpolate_gaps( df ) #, plot = True )
patched_df
Allows to interpolate large gaps preserving volatility of the series (as an input!). Read about it here "Brownian bridge".
In a Wiener process volatility (variance) is $$var = \Delta_t$$ so $$std = \sqrt{var} = \sqrt{\Delta_t}$$This sets how the local volatility should be analyzed.
So, if we have $std_{year}$ (or $std_{whole series}$), we can get the daily by: $$std_{year} = std_{day} \cdot \sqrt{365} \Rightarrow std_{day} = \frac{std_{year}}{\sqrt{365}}$$
So we can get the "basic building block" of the volatility by getting $std_{minute}$ in our case.
Having $std_{minute}$, we then do a "bottom-up" process building the gap:
{% raw %} $$ std_{gap} = std_{minute} \cdot \sqrt{number\_of\_mins\_in\_gap}$$ {% endraw %}
(Advice from Miguel, my colleague at ING)
fixed_freq
argument to make the interpolated X points rounded to a certain timestep. 'fixed_freq' timesteps defaults to 'min'. Valid options from Pandas, see link:https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases Implementation of the rounding (you probably don't need to read this)
This constraint takes us out of the brownian bridge, because for it we only interpolate the midpoints through:
\begin{cases} x_m = \frac{x_0 + x_1}{2} \\ y_m = \frac{y_0 + y_1}{2} + std \end{cases}But, if we round up to mins, this midpoint $x_m$ could be different than a minute-exact timestamp (imagine the first interpolated point on a gap of 3m: it would be 1.5m). So we round $x_m$, and search for its associated Y displacement $\Delta y$:
\begin{cases} x'_m = x_m + \Delta x_{toroundtomin} \\ y'_m = y_m + \Delta y \end{cases}To get the associated $\Delta y$ we must use the slope (derivative) at that straight line between points $(x_0, y_0), (x_1, y_1)$.
So:
1- Round up $x_m$ to the nearest minute (lowest, floor()
-like), so we obtain: $x'_m$, $\Delta x_{toroundtomin}$
2- The deltas on X and Y are related by the derivative, which we are implicitly assuming linear on the brownian bridge, so it's quite straightforward to calculate $\Delta y$:
{% raw %} $$ \Delta y := \frac{dy}{dx} \Delta x \Rightarrow \Delta y \approx \frac{y_1 - y_0}{x_1 - x_0} \Delta x_{toroundtomin} $$ {% endraw %}
So we would have everything for the Y correction.