data_loss#
- pymovements.measure.samples.data_loss(time_column: str, data_column: str, *, sampling_rate: float, start_time: float | None = None, end_time: float | None = None, unit: Literal['count', 'time', 'ratio'] = 'count') Expr[source]#
Measure data loss using an expected, evenly sampled time base.
The measure computes missing samples in three categories and returns either:
“count”: total number of lost samples (integer)
“time”: lost time in the units of
time_column(count / sampling_rate)“ratio”: fraction of lost to expected samples in [0, 1]
Lost samples are the sum of:
Missing rows implied by gaps in the time axis, given
sampling_rate.Invalid rows in
data_column, where a row is invalid if it isnullor contains anynull/NaN/infelement (for list columns, any invalid element marks the row invalid).
If
start_time/end_timeare not provided, the group’s first/last timestamps (min/max oftime_column) are used as bounds.- Parameters:
time_column (str) – Name of the timestamp column.
data_column (str) – Name of a data column used to count invalid samples due to null/NaN/inf values. For list columns, any null/NaN/inf element marks the whole row as invalid.
sampling_rate (float) – Expected sampling rate in Hz (must be > 0).
start_time (float | None) – Recording start time. If
None, uses the group’s first timestamp.end_time (float | None) – Recording end time. If
None, uses the group’s last timestamp.unit (Literal['count', 'time', 'ratio']) – Aggregation unit for the result.
- Returns:
A scalar (per-group) expression with alias
data_loss_{unit}.- Return type:
pl.Expr
- Raises:
ValueError – If
unitis not one of {‘count’,’time’,’ratio’} orsampling_rate<= 0.TypeError – If
time_columnis not a string.
Examples
>>> import polars as pl >>> from pymovements import measure as m >>> df = pl.DataFrame({'time': [0.0, 1.0, 2.0, 4.0]}) >>> df.select(m.data_loss('time', 'time', sampling_rate=1.0, unit='count')) shape: (1, 1) ┌─────────────────┐ │ data_loss_count │ │ --- │ │ i64 │ ╞═════════════════╡ │ 1 │ └─────────────────┘ >>> # Include invalid rows in a data column >>> df = pl.DataFrame({ ... 'time': [1, 2, 3, 4, 5, 9], ... 'pixel': [[1, 1], [1, 1], None, None, [1, 1], [1, None]], ... }) >>> df.select(m.data_loss('time', 'pixel', sampling_rate=1.0, unit='count')) shape: (1, 1) ┌─────────────────┐ │ data_loss_count │ │ --- │ │ i64 │ ╞═════════════════╡ │ 6 │ └─────────────────┘