BSCII#
- class pymovements.datasets.BSCII(name: str = 'BSCII', *, long_name: str = 'Beijing Sentence Corpus II', mirrors: dict[str, Sequence[str]] = <factory>, resources: ResourceDefinitions = <factory>, experiment: Experiment | None = <factory>, extract: dict[str, bool] | None = None, custom_read_kwargs: dict[str, dict[str, Any]] | None = None, column_map: dict[str, str] | None = None, trial_columns: list[str] | None = None, time_column: str | None = None, time_unit: str | None = None, pixel_columns: list[str] | None = None, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None, filename_format: dict[str, str] | None = None, filename_format_schema_overrides: dict[str, dict[str, type]] | None = None)[source]#
BSCII dataset [Yan et al., 2025].
The Beijing Sentence Corpus II (BSCII) is a Traditional Chinese sentence corpus of eye-tracking data, based on the original Beijing Sentence Corpus (BSC) in Simplified Chinese. Data was collected from 60 native Traditional Chinese readers. The corpus enables analyses of word frequency, visual complexity, and predictability on fixation location and duration.
Since the BSCII sentences are nearly identical to those in the BSC, the two corpora together provide a valuable resource for studying cross-script similarities and differences between Simplified and Traditional Chinese.
Eye-movements were recorded with an Eyelink 1000 system at 1000 Hz.
Check the respective paper for details [Yan et al., 2025].
- resources#
A list of dataset gaze_resources. Each list entry must be a dictionary with the following keys: - resource: The url suffix of the resource. This will be concatenated with the mirror. - filename: The filename under which the file is saved as. - md5: The MD5 checksum of the respective file.
- Type:
- filename_format#
Regular expression, which will be matched before trying to load the file. Namedgroups will appear in the fileinfo dataframe.
- filename_format_schema_overrides#
If named groups are present in the filename_format, this makes it possible to cast specific named groups to a particular datatype.
- trial_columns#
The name of the trial columns in the input data frame. If the list is empty or None, the input data frame is assumed to contain only one trial. If the list is not empty, the input data frame is assumed to contain multiple trials, and the transformation methods will be applied to each trial separately.
- column_map#
The keys are the columns to read, the values are the names to which they should be renamed.
- custom_read_kwargs#
If specified, these keyword arguments will be passed to the file reading function. (default: None)
Examples
Initialize your
Datasetobject with theBSCIIdefinition:>>> import pymovements as pm >>> >>> dataset = pm.Dataset("BSCII", path='data/BSCII')
Download the dataset resources:
>>> dataset.download()
Load the data into memory:
>>> dataset.load()
Methods
__init__([name, long_name, mirrors, ...])from_yaml(path)Load a dataset definition from a YAML file.
to_dict(*[, exclude_private, exclude_none])Return dictionary representation.
to_yaml(path, *[, exclude_private, exclude_none])Save a dataset definition to a YAML file.
Attributes
acceleration_columnsdistance_columnextracthas_resourcesChecks for resources in
resources.pixel_columnsposition_columnstime_columntime_unitvelocity_columnsmirrorsexperiment