data_loss#

pymovements.measure.samples.data_loss(time_column: str, data_column: str, *, sampling_rate: float, start_time: float | None = None, end_time: float | None = None, unit: Literal['count', 'time', 'ratio'] = 'count') → Expr[source]#

Measure data loss using an expected, evenly sampled time base.

The measure computes missing samples in three categories and returns either:

“count”: total number of lost samples (integer)
“time”: lost time in the units of time_column (count / sampling_rate)
“ratio”: fraction of lost to expected samples in [0, 1]

Lost samples are the sum of:

Missing rows implied by gaps in the time axis, given sampling_rate.
Invalid rows in data_column, where a row is invalid if it is null or contains any null/NaN/inf element (for list columns, any invalid element marks the row invalid).

If start_time/end_time are not provided, the group’s first/last timestamps (min/max of time_column) are used as bounds.

Parameters:

time_column (str) – Name of the timestamp column.
data_column (str) – Name of a data column used to count invalid samples due to null/NaN/inf values. For list columns, any null/NaN/inf element marks the whole row as invalid.
sampling_rate (float) – Expected sampling rate in Hz (must be > 0).
start_time (float | None) – Recording start time. If None, uses the group’s first timestamp.
end_time (float | None) – Recording end time. If None, uses the group’s last timestamp.
unit (Literal['count', 'time', 'ratio']) – Aggregation unit for the result.

Returns:

A scalar (per-group) expression with alias data_loss_{unit}.

Return type:

pl.Expr

Raises:

ValueError – If unit is not one of {‘count’,’time’,’ratio’} or sampling_rate <= 0.
TypeError – If time_column is not a string.

Examples

>>> import polars as pl
>>> from pymovements import measure as m
>>> df = pl.DataFrame({'time': [0.0, 1.0, 2.0, 4.0]})
>>> df.select(m.data_loss('time', 'time', sampling_rate=1.0, unit='count'))
shape: (1, 1)
┌─────────────────┐
│ data_loss_count │
│ ---             │
│ i64             │
╞═════════════════╡
│ 1               │
└─────────────────┘
>>> # Include invalid rows in a data column
>>> df = pl.DataFrame({
...     'time': [1, 2, 3, 4, 5, 9],
...     'pixel':  [[1, 1], [1, 1], None, None, [1, 1], [1, None]],
... })
>>> df.select(m.data_loss('time', 'pixel', sampling_rate=1.0, unit='count'))
shape: (1, 1)
┌─────────────────┐
│ data_loss_count │
│ ---             │
│ i64             │
╞═════════════════╡
│ 6               │
└─────────────────┘

data_loss#

This Page