sensortoolkit.datetime_utils._time_averaging.interval_averaging

interval_averaging(df, freq='H', interval_count=60, thres=0.75)[source]

Average DataFrame to the specified sampling frequency (‘freq’).

Numeric columns are averaged for for each interval and a completeness threshold (default 75%) must be met, otherwise averages are null. Columns of type ‘object’ (i.e. text) are aggregated within each interval by the mode of unique object values.

Parameters
  • df (pandas DataFrame or pandas Series) – Dataframe or Series for which averages will be computed.

  • freq (str) – The frequency (averaging interval) to which the DataFrame will be averaged. Defaults to H. Pandas refers to these as ‘offset aliases’, and a list is found here (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).

  • interval_count (int) – The number of datapoints expected within the passed DataFrame for the specified averaging interval (‘freq’). Defaults to 60 for 1-hour averages. E.g., if computing 1-hour averages (freq=’H’) an the passed DataFrame is for a sensor that recorded measurements at 1-minute sampling frequency, interval_count will equal 60 (expect 60 non-null data points per averaging interval).

  • thres (float) – Threshold (ranging from 0 to 1) for ratio of the number of data points recorded within a given averaging interval vs. the number of expected data points. Defaults to 0.75 (i.e., 75%).

Returns

Dataframe averaged to datetimeindex interval specified by ‘freq’.

Return type

avg_df (pandas DataFrame)