{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Parsing SR Research EyeLink Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "## What you will learn in this tutorial:\n",
    "\n",
    "* how to parse raw eye tracking files created with SR Research EyeLink\n",
    "* how to extract experiment information using patterns\n",
    "* how to create a custom dataset definition to load a complete dataset of multiple files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Preparations\n",
    "\n",
    "We import `pymovements` as the alias `pm` for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pymovements as pm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "Let's start by downloading a toy dataset `ToyDatasetEyeLink` that contains `*.asc` files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = pm.Dataset(\"ToyDatasetEyeLink\", path='data/ToyDatasetEyeLink')\n",
    "dataset.download()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "This dataset includes `*.asc` files that store raw eye-tracking data along with synchronization messages. Below, we’ll inspect the files included in the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "asc_files = list(dataset.path.glob('**/*.asc'))\n",
    "asc_files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "Let’s display the first 20 lines of one of the files to get a sense of its structure:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "!head -n 20 data/ToyDatasetEyeLink/raw/aeye-lab-pymovements-toy-dataset-eyelink-a970d09/raw/subject_1_session_1.asc"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "We can see that this file is a converted version of an `*.edf` file created by EyeLink.\n",
    "\n",
    "Let’s try loading one of these files directly using `pm.gaze.from_asc`:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "### Loading eye-tracking data from a file\n",
    "Loading eye-tracking data is straightforward. You can load an `.asc` file with a single call to `pm.gaze.from_asc`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "gaze = pm.gaze.from_asc(file=asc_files[0])\n",
    "gaze"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "This function automatically loads the raw eye-tracking data and attempts to infer the experimental settings used.\n",
    "\n",
    "Let’s inspect a few rows from the resulting `GazeDataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "gaze.samples"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "We can see that timestamps (column time), pupil diameter (column pupil), and raw pixel coordinates (column pixel) are extracted automatically.\n",
    "\n",
    "Let’s now take a look at the experimental metadata that was retrieved:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "gaze.experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "All relevant experimental metadata have\n",
    " been successfully extracted, such as the eye tracker model and the screen resolution used during recording."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18",
   "metadata": {},
   "source": [
    "### Loading eye-tracking data along with SR Research recording messages\n",
    "To extract all `MSG`-prefixed SR Research messages, simply pass `True` to the `pm.gaze.from_asc`. The messages are stored in `gaze.messages`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "gaze = pm.gaze.from_asc(file=asc_files[0], messages=True)\n",
    "gaze.messages"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": [
    "We can also control which messages are parsed by specifying them in the `messages` argument. For example, to extract only trial-related messages containing the keyword `TRIAL`, we can do the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "gaze = pm.gaze.from_asc(file=asc_files[0], messages=['TRIAL'])\n",
    "gaze.messages"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22",
   "metadata": {},
   "source": [
    "### Defining custom patterns for data extraction\n",
    "\n",
    "Now let’s define our own patterns to extract additional information from the `*.asc` files and add them to the `GazeDataFrame`.\n",
    "We can do this using the parameter `patterns` using `pm.gaze.from_asc`.\n",
    "\n",
    "`patterns` accepts either a list of custom patterns to match additional columns or a key identifying predefined and eye-tracker-specific patterns.\n",
    "\n",
    "Let’s define a set of custom patterns to extract more information from parsed messages and show the resulting `GazeDataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "patterns = [\n",
    "    {\n",
    "        'pattern': 'SYNCTIME_READING_SCREEN',\n",
    "        'column': 'task',\n",
    "        'value': 'reading',\n",
    "    },\n",
    "    {\n",
    "        'pattern': 'SYNCTIME_JUDO',\n",
    "        'column': 'task',\n",
    "        'value': 'judo',\n",
    "    },\n",
    "    r'TRIALID (?P<trial_id>\\d+)',\n",
    "]\n",
    "\n",
    "gaze = pm.gaze.from_asc(file=asc_files[0], patterns=patterns)\n",
    "gaze.samples"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "The examples above illustrate that patterns can be defined in different forms. Some patterns simply match a message and assign a fixed column value (see the first pattern above), while others use regular expressions to capture dynamic information—for instance, the `trial_id` in the last pattern.\n",
    "\n",
    "Given the patterns defined above, we can see that the columns for `task` and `trial_id` has been added.\n",
    "\n",
    "The `trial_id` was extracted from messages such as `MSG 2762689 TRIALID 0`, while the task value was obtained from messages like `MSG 2814942 SYNCTIME_JUDO`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25",
   "metadata": {},
   "source": [
    "### Writing a DatasetDefinition to parse the complete dataset \n",
    "Let’s create a custom `DatasetDefinition` to load all `asc` files, including the patterns we defined earlier.\n",
    "\n",
    "First we create a `ResourceDefinition` that specifies how we want to load our `asc` files.\n",
    "We can use the `patterns` that we identified and specify them as one of the load keyword arguments (`load_kwargs`).\n",
    "\n",
    "In addition, we also define the filename pattern, which represents subject and session information encoded in the filename.\n",
    "The datatypes of the additional metadata parsed from the filename can be specified via `filename_pattern_schema_overrides`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26",
   "metadata": {},
   "outputs": [],
   "source": [
    "resource_definition = pm.ResourceDefinition(\n",
    "    content='gaze',\n",
    "    filename_pattern=r'subject_{subject_id:d}_session_{session_id:d}.asc',\n",
    "    filename_pattern_schema_overrides={\n",
    "        'subject_id': int,\n",
    "        'session_id': int,\n",
    "    },\n",
    "    load_kwargs={\n",
    "        'patterns': patterns,\n",
    "        'schema': {'trial_id': int},\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27",
   "metadata": {},
   "source": [
    "Next, we need to define the experiment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28",
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = pm.Experiment(\n",
    "    screen_width_px=1280,\n",
    "    screen_height_px=1024,\n",
    "    screen_width_cm=38,\n",
    "    screen_height_cm=30.2,\n",
    "    distance_cm=68,\n",
    "    origin='lower left',\n",
    "    sampling_rate=1000,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29",
   "metadata": {},
   "source": [
    "We now use these do write our `DatasetDefinition`. We choose `ToyDatasetEyeLink` as the name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_definition = pm.DatasetDefinition(\n",
    "    name='ToyDatasetEyeLink',\n",
    "    experiment=experiment,\n",
    "    resources=[resource_definition],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31",
   "metadata": {},
   "source": [
    "Let’s initialize a new `Dataset` and load the data using the dataset definition we just set up:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = pm.Dataset(\n",
    "    definition=dataset_definition,\n",
    "    path='data/ToyDatasetEyeLink',\n",
    ")\n",
    "dataset.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33",
   "metadata": {},
   "source": [
    "Let’s inspect the first `Gaze` in this dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.gaze[0].samples"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35",
   "metadata": {},
   "source": [
    "## What you have learned in this tutorial:\n",
    "\n",
    "* how to handle `*.asc` files\n",
    "* how to create a custom dataset loading all files and parsing custom messages\n",
    "* how to load the dataset into your working memory"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 5
}