DatasetDefinition#
- class pymovements.DatasetDefinition(name: str = '.', *, long_name: str | None = None, has_files: dict[str, bool] | None = None, mirrors: dict[str, Sequence[str]] | None = None, resources: ResourceDefinitions | Sequence[dict[str, Any]] | dict[str, Sequence[dict[str, Any]]] | None = None, experiment: Experiment | None = None, extract: dict[str, bool] | None = None, filename_format: dict[str, str] | None = None, filename_format_schema_overrides: dict[str, dict[str, type]] | None = None, custom_read_kwargs: dict[str, dict[str, Any]] | None = None, column_map: dict[str, str] | None = None, trial_columns: list[str] | None = None, time_column: str | None = None, time_unit: str | None = None, pixel_columns: list[str] | None = None, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None)[source]#
Definition to initialize a
Dataset.- mirrors#
A list of mirrors of the dataset. Each entry must be of type str and end with a ‘/’. (default: {})
Deprecated since version v0.24.0: Please use
mirrorsinstead. This field will be removed in v0.29.0.
- resources#
A list of dataset resources. Each list entry must be a dictionary with the following keys:
resource: The url suffix of the resource. This will be concatenated with the mirror.
filename: The filename under which the file is saved as.
md5: The MD5 checksum of the respective file.
(default: ResourceDefinitions())
- Type:
- experiment#
The experiment definition. (default: None)
- Type:
Experiment | None
- extract#
Decide whether to extract the data. (default: None)
Deprecated since version v0.22.1: This field will be removed in v0.27.0.
- custom_read_kwargs#
If specified, these keyword arguments will be passed to the file reading function. The behavior of this argument depends on the file extension of the dataset files. If the file extension is .csv, the keyword arguments will be passed to
polars.read_csv(). If the file extension is .asc, the keyword arguments will be passed topymovements.utils.parsing.parse_eyelink(). See Notes for more details on how to use this argument. (default: field(default_factory=dict))Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.
- column_map#
The keys are the columns to read, the values are the names to which they should be renamed. (default: None)
Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.
- trial_columns#
The name of the trial columns in the input data frame. If the list is empty or None, the input data frame is assumed to contain only one trial. If the list is not empty, the input data frame is assumed to contain multiple trials, and the transformation methods will be applied to each trial separately. (default: None)
Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.
- time_column#
The name of the timestamp column in the input data frame. This column will be renamed to
time. (default: None)Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.- Type:
str | None
- time_unit#
The unit of the timestamps in the timestamp column in the input data frame. Supported units are ‘s’ for seconds, ‘ms’ for milliseconds and ‘step’ for steps. If the unit is ‘step’ the experiment definition must be specified. All timestamps will be converted to milliseconds. (default: ‘ms’)
Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.- Type:
str | None
- pixel_columns#
The name of the pixel position columns in the input data frame. These columns will be nested into the column
pixel. If the list is empty or None, the nestedpixelcolumn will not be created. (default: None)Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.
- position_columns#
The name of the dva position columns in the input data frame. These columns will be nested into the column
position. If the list is empty or None, the nestedpositioncolumn will not be created. (default: None)Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.
- velocity_columns#
The name of the velocity columns in the input data frame. These columns will be nested into the column
velocity. If the list is empty or None, the nestedvelocitycolumn will not be created. (default: None)Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.
- acceleration_columns#
The name of the acceleration columns in the input data frame. These columns will be nested into the column
acceleration. If the list is empty or None, the nestedaccelerationcolumn will not be created. (default: None)Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.
- distance_column#
The name of the column containing eye-to-screen distance in millimeters for each sample in the input data frame. If specified, the column will be used for pixel to dva transformations. If not specified, the constant eye-to-screen distance will be taken from the experiment definition. This column will be renamed to
distance. (default: None)Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.- Type:
str | None
- Parameters:
name (str) – The name of the dataset. (default: ‘.’)
long_name (str | None) – The entire name of the dataset. (default: None)
has_files (dict[str, bool] | None) –
Indicate whether the dataset contains ‘gaze’, ‘precomputed_events’, and ‘precomputed_reading_measures’. (default: None)
Deprecated since version v0.23.0: This field will be removed in v0.28.0.
mirrors (dict[str, Sequence[str]] | None) –
A list of mirrors of the dataset. Each entry must be of type str and end with a ‘/’. (default: None)
Deprecated since version v0.24.0: Please use
mirrors. instead. This field will be removed in v0.29.0.resources (ResourceDefinitions | ResourcesLike | None) –
A list of dataset resources. Each list entry must be a dictionary with the following keys:
resource: The url suffix of the resource. This will be concatenated with the mirror.
filename: The filename under which the file is saved as.
md5: The MD5 checksum of the respective file.
(default: None)
experiment (Experiment | None) – The experiment definition. (default: None)
extract (dict[str, bool] | None) –
Decide whether to extract the data. (default: None)
Deprecated since version v0.22.1: This field will be removed in v0.27.0.
filename_format (dict[str, str] | None) –
Regular expression, which will be matched before trying to load the file. Namedgroups will appear in the fileinfo dataframe. (default: None)
Deprecated since version v0.24.1: This field will be removed in v0.28.0.
filename_format_schema_overrides (dict[str, dict[str, type]] | None) –
If named groups are present in the filename_format, this makes it possible to cast specific named groups to a particular datatype. (default: None)
Deprecated since version v0.24.1: This field will be removed in v0.28.0.
custom_read_kwargs (dict[str, dict[str, Any]] | None) –
If specified, these keyword arguments will be passed to the file reading function. The behavior of this argument depends on the file extension of the dataset files. If the file extension is .csv, the keyword arguments will be passed to
polars.read_csv(). If the file extension is .asc, the keyword arguments will be passed topymovements.utils.parsing.parse_eyelink(). See Notes for more details on how to use this argument. (default: None)Deprecated since version v0.25.0: Please use
load_kwargsinstead. This field will be removed in v0.30.0.column_map (dict[str, str] | None) – The keys are the columns to read, the values are the names to which they should be renamed. (default: None)
trial_columns (list[str] | None) – The name of the trial columns in the input data frame. If the list is empty or None, the input data frame is assumed to contain only one trial. If the list is not empty, the input data frame is assumed to contain multiple trials, and the transformation methods will be applied to each trial separately. (default: None)
time_column (str | None) – The name of the timestamp column in the input data frame. This column will be renamed to
time. (default: None)time_unit (str | None) – The unit of the timestamps in the timestamp column in the input data frame. Supported units are ‘s’ for seconds, ‘ms’ for milliseconds and ‘step’ for steps. If the unit is ‘step,’ the experiment definition must be specified. All timestamps will be converted to milliseconds. (default: ‘ms’)
pixel_columns (list[str] | None) – The name of the pixel position columns in the input data frame. These columns will be nested into the column
pixel. If the list is empty or None, the nestedpixelcolumn will not be created. (default: None)position_columns (list[str] | None) – The name of the dva position columns in the input data frame. These columns will be nested into the column
position. If the list is empty or None, the nestedpositioncolumn will not be created. (default: None)velocity_columns (list[str] | None) – The name of the velocity columns in the input data frame. These columns will be nested into the column
velocity. If the list is empty or None, the nestedvelocitycolumn will not be created. (default: None)acceleration_columns (list[str] | None) – The name of the acceleration columns in the input data frame. These columns will be nested into the column
acceleration. If the list is empty or None, the nestedaccelerationcolumn will not be created. (default: None)distance_column (str | None) – The name of the column containing eye-to-screen distance in millimeters for each sample in the input data frame. If specified, the column will be used for pixel to dva transformations. If not specified, the constant eye-to-screen distance will be taken from the experiment definition. This column will be renamed to
distance. (default: None)
Notes
Deprecated since version v0.25.0: The
custom_read_kwargsattribute is deprecated. Please specifyload_kwargsinstead. This field will be removed in v0.30.0.When working with the
custom_read_kwargsattribute, there are specific use cases and considerations to keep in mind, especially for reading csv files:Custom separator: To read a csv file with a custom separator, you can pass the separator keyword argument to
custom_read_kwargs. For example passcustom_read_kwargs={'separator': ';'}to read a semicolon-separated csv file.Reading subset of columns: To read only specific columns, specify them in
custom_read_kwargs. For example:custom_read_kwargs={'columns': ['col1', 'col2']}Specifying column datatypes:
polars.read_csv()infers data types from a fixed number of rows, which might not be accurate for the entire dataset. To ensure correct data types, you can pass a dictionary to theschema_overrideskeyword argument incustom_read_kwargs. Use data types from thepolarslibrary. For instance:custom_read_kwargs={'schema_overrides': {'col1': polars.Int64, 'col2': polars.Float64}}
Methods
__init__([name, long_name, has_files, ...])from_yaml(path)Load a dataset definition from a YAML file.
to_dict(*[, exclude_private, exclude_none])Return dictionary representation.
to_yaml(path, *[, exclude_private, exclude_none])Save a dataset definition to a YAML file.
Attributes
filename_formatRegular expression, which will be matched before trying to load the file.
filename_format_schema_overridesSpecifies datatypes of named groups in the filename pattern.
has_resourcesChecks for resources in
resources.