{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Importing experimental data\n",
    "\n",
    "This notebook illustrates the import of experimental data in *larvaworld* and the supporting classes and configuration structure."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Initialize the larvaworld registry. This loads some components from disc and builds the rest on the fly.\n",
    "\n",
    "We also set VERBOSE=1 to get more info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext param.ipython\n",
    "import panel as pn\n",
    "\n",
    "pn.extension()\n",
    "\n",
    "# You might have to install this module to run pn.Param\n",
    "# !pip install jupyter_bokeh\n",
    "\n",
    "import larvaworld\n",
    "from larvaworld.lib import util, reg, sim\n",
    "from larvaworld.lib.reg.generators import LabFormat\n",
    "\n",
    "# Import the Replay configuration class (for Example III)\n",
    "from larvaworld.lib.reg.generators import ReplayConf\n",
    "\n",
    "larvaworld.VERBOSE = 1\n",
    "\n",
    "# Tutorial safety switches (avoid network / heavy I/O / media generation by default)\n",
    "RUN_SCHLEYER_IMPORT = False\n",
    "RUN_SCHLEYER_IMPORT_MERGED = False\n",
    "RUN_DOWNLOAD_SCHLEYER_EXAMPLE = False  # requires network access\n",
    "RUN_JOVANIC_IMPORT = False\n",
    "RUN_JOVANIC_LOAD = False\n",
    "RUN_PLOTS = False\n",
    "RUN_REPLAY_VIDEOS = False\n",
    "RUN_COMBINE_VIDEOS = False\n",
    "\n",
    "SAVE_MEDIA = False\n",
    "MEDIA_DIR = \"./media/3conditions\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The LabFormat class\n",
    "\n",
    "Raw data can be of diverse lab-specific formats. We will start with the *LabFormat* class which supports them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%params LabFormat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's generate a new instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lf_new = LabFormat(labID=\"MyLab\")\n",
    "print(f\"An instance of {lf_new.__class__}\")\n",
    "\n",
    "\n",
    "%params lf_new"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Stored instances of the *LabFormat* class are available through the configuration registry.\n",
    "\n",
    "The registry is retrieved from a dictionary of registry objects by the *LabFormat* key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "LFreg = reg.conf.LabFormat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each lab-specific data-format configuration is stored in the registry's dictionary under a unique ID.\n",
    "\n",
    "Let's print the IDs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lfIDs = LFreg.confIDs\n",
    "print(f\"The IDs of the stored configurations of LabFormat class are :{lfIDs}\")\n",
    "\n",
    "# The registry is supported by a nested dictionary :\n",
    "LFdict = LFreg.dict\n",
    "\n",
    "# The path where the dictionary is stored:\n",
    "print(LFreg.path_to_dict)\n",
    "\n",
    "\n",
    "# The configuration IDs are the keys. They correspond to a nested dictionary :\n",
    "lfID = lfIDs[0]\n",
    "lf0_entry = LFdict[lfID]\n",
    "print()\n",
    "print(f\"An instance of {lf0_entry.__class__.__name__}\")\n",
    "\n",
    "# The configuration dictionary can be retrieved directly by :\n",
    "lf0_entry2 = LFreg.getID(lfID)\n",
    "print()\n",
    "print(lf0_entry == lf0_entry2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The configuration object can be retrieved directly by :\n",
    "lf0 = LFreg.get(lfID)\n",
    "print(f\"The object under the ID : {lfID} is an instance of {lf0.__class__.__name__}\")\n",
    "print()\n",
    "\n",
    "%params lf0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The configuration object can be visualized by :\n",
    "pn.Param(lf0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The configuration dictionary can be retrieved directly from the object :\n",
    "lf0_entry3 = lf0.nestedConf\n",
    "\n",
    "# As well as the parameter keys\n",
    "print(lf0.param_keys)\n",
    "print()\n",
    "\n",
    "# The path where the lab data are stored:\n",
    "print(lf0.path)\n",
    "# print(lf0.raw_folder)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example I : Import datasets\n",
    "\n",
    "Note : The data imported here are part of the core larvaworld package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's inspect one specific lab-format configuration\n",
    "id = \"Schleyer\"\n",
    "Schleyer_lf = LFreg.get(id)\n",
    "\n",
    "%params Schleyer_lf.tracker"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both raw and imported experimental data, as well as the simulated data are stored at a specific location in the filestructure that can be accessed easily. Regarding experimental data, each format has its own dedicated directory :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"All data are stored here :\\n{larvaworld.DATA_DIR}\\n\")\n",
    "\n",
    "print(f\"The path to the data of the {id} lab-format :\\n{Schleyer_lf.path}\\n\")\n",
    "\n",
    "print(\n",
    "    f\"Raw data to be imported should be stored here (if not otherwise specified) :\\n{Schleyer_lf.raw_folder}\\n\"\n",
    ")\n",
    "\n",
    "print(\n",
    "    f\"Imported/Processed data will be stored here (if not otherwise specified) :\\n{Schleyer_lf.processed_folder}\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can import some datasets. This means we convert from the native lab-specific data-format to the *larvaworld* format while at the same time filter/select specific entries of the data.\n",
    "\n",
    "Here two cases are illustrated : \n",
    " - Tracks from a single dish\n",
    " - Merged tracks from all dishes inder a certain directory\n",
    "\n",
    "The import returns an instance of *LarvaDataset* that can be then used.\n",
    "\n",
    "By default this is not stored to disc, except if we specify *save_dataset = True*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_SCHLEYER_IMPORT:\n",
    "    # Single dish case\n",
    "    folder = \"dish01\"\n",
    "    kws1 = {\n",
    "        \"parent_dir\": f\"exploration/{folder}\",\n",
    "        \"min_duration_in_sec\": 90,\n",
    "        \"id\": folder,\n",
    "        \"refID\": f\"exploration.{folder}\",\n",
    "        \"group_id\": \"exploration\",\n",
    "    }\n",
    "\n",
    "    d1 = Schleyer_lf.import_dataset(**kws1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_SCHLEYER_IMPORT_MERGED:\n",
    "    # Merged case\n",
    "    N = 40\n",
    "    kws2 = {\n",
    "        \"parent_dir\": \"exploration\",\n",
    "        \"merged\": True,\n",
    "        \"max_Nagents\": N,\n",
    "        \"min_duration_in_sec\": 120,\n",
    "        \"refID\": f\"exploration.{N}controls\",\n",
    "        \"group_id\": \"exploration\",\n",
    "    }\n",
    "\n",
    "    d2 = Schleyer_lf.import_dataset(**kws2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\n",
    "    f\"The import method returns an instance of {d1.__class__.__name__} having the ID : {d1.id}\\n\"\n",
    ")\n",
    "\n",
    "s, e, c = d1.data\n",
    "\n",
    "print(\"The timeseries data (dropping NaNs) : \\n\")\n",
    "s.dropna().head()\n",
    "\n",
    "print(\"The endpoint data : \\n\")\n",
    "e"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example II : Import downloaded data\n",
    "\n",
    "Now we will illustrate the import functionality by downloading a publically available dataset of *Drosophila* larva locomotion.\n",
    "\n",
    "Go to the website below, download the zipped file and extract in the lab-specific folder indicated above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_DOWNLOAD_SCHLEYER_EXAMPLE:\n",
    "    # URL of the repository. Visit for further information.\n",
    "    link2repo = \"https://doi.gin.g-node.org/10.12751/g-node.5e1ifd/\"\n",
    "\n",
    "    # The name of the zipped file to be downloaded.\n",
    "    filename = \"Naive_Locomotion_Drosophila_Larvae.zip\"\n",
    "\n",
    "    # URL of the file.\n",
    "    link2data = f\"https://gin.g-node.org/MichaelSchleyer/Naive_Locomotion_Drosophila_Larvae/src/master/{filename}\"\n",
    "\n",
    "    # Path to extract the downloaded file\n",
    "    dirname = \"naive\"\n",
    "    print(\n",
    "        f\"The path to extract the downloaded file :\\n{Schleyer_lf.raw_folder}/{dirname}\\n\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_SCHLEYER_IMPORT:\n",
    "    # Single dish case\n",
    "    folder = \"box1-2017-05-18_14_48_22\"\n",
    "    id = \"imported_single_dish\"\n",
    "    kws = {\n",
    "        \"parent_dir\": f\"{dirname}/{folder}\",\n",
    "        \"min_duration_in_sec\": 120,\n",
    "        \"id\": id,\n",
    "        \"refID\": f\"{dirname}.{id}\",\n",
    "        \"group_id\": dirname,\n",
    "    }\n",
    "\n",
    "    d6 = Schleyer_lf.import_dataset(**kws)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d6.e.cum_dur.sort_values()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_SCHLEYER_IMPORT_MERGED:\n",
    "    # Merged case\n",
    "    N = 50\n",
    "    kws2 = {\n",
    "        \"parent_dir\": dirname,\n",
    "        \"merged\": True,\n",
    "        \"max_Nagents\": N,\n",
    "        \"min_duration_in_sec\": 160,\n",
    "        \"refID\": f\"{dirname}.{N}controls\",\n",
    "        \"group_id\": dirname,\n",
    "    }\n",
    "\n",
    "    d100 = Schleyer_lf.import_dataset(**kws2)\n",
    "\n",
    "    d100.e.cum_dur.sort_values()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example III : Import data of a different format\n",
    "\n",
    "We will now illustrate the import functionality by importing a set of 3 datasets : Fed, Sucrose and Starved\n",
    "\n",
    "The 3 animal groups have been subjected two different diets and therefore are in different metabolic state at the moment of tracking their locomotion. We want to compare them in order to detect any impact of metabolic state on locomotion.\n",
    "\n",
    "Note : This example requires data existing in the *data/JovanicGroup/raw/ProteinDeprivation* folder\n",
    "\n",
    "Also note that the tracks in the datasets above only include the body's midline and not its contour."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "media_dir = MEDIA_DIR\n",
    "plot_dir = f\"{media_dir}/plots\"\n",
    "video_dir = f\"{media_dir}/videos\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The name of the experiment\n",
    "exp = \"ProteinDeprivation\"\n",
    "\n",
    "# The group IDs\n",
    "gIDs = [\"Fed\", \"Sucrose\", \"Starved\"]\n",
    "\n",
    "# The colors per group\n",
    "palette = {\n",
    "    \"Fed\": \"black\",\n",
    "    \"Sucrose\": \"red\",\n",
    "    \"Starved\": \"purple\",\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here we configure the import of the data\n",
    "Jovanic_lf.tracker.dt = 0.1\n",
    "\n",
    "constraints = util.AttrDict(\n",
    "    {\n",
    "        \"match_ids\": False,\n",
    "        \"interpolate_ticks\": True,\n",
    "        \"min_duration_in_sec\": 20,\n",
    "        \"time_slice\": (0, 60),\n",
    "        # 'time_slice':None,\n",
    "    }\n",
    ")\n",
    "\n",
    "enr_kws = util.AttrDict(\n",
    "    {\n",
    "        \"proc_keys\": [\"angular\", \"spatial\"],\n",
    "        \"anot_keys\": [\"bout_detection\"],\n",
    "        \"traj2origin\": True,\n",
    "        # 'recompute' : True,\n",
    "        \"tor_durs\": [20],\n",
    "        \"dsp_starts\": [0],\n",
    "        \"dsp_stops\": [40, 60],\n",
    "    }\n",
    ")\n",
    "\n",
    "\n",
    "kws = {\n",
    "    \"parent_dir\": exp,\n",
    "    \"source_ids\": gIDs,\n",
    "    \"colors\": [palette[gID] for gID in gIDs],\n",
    "    # 'raw_folder': '../raw/',\n",
    "    # 'proc_folder': processed_data_dir,\n",
    "    \"refIDs\": gIDs,\n",
    "    \"merged\": False,\n",
    "    \"save_dataset\": True,\n",
    "    \"enrich_conf\": enr_kws,\n",
    "    **constraints,\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following cell actually imports the datasets. \n",
    "\n",
    "This step might take a while. \n",
    "\n",
    "It needs to be performed once when converting the datasets from the raw tracker-specific format (contained in the *raw* folder) to the larvaworld format (stored in the *processed* folder). \n",
    "\n",
    "If the datasets have already been imported they can just be loaded (from the *processed* folder). In this case you can instead run the next cell in order to load them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_JOVANIC_IMPORT:\n",
    "    # Import the datasets (Needs to run only once)\n",
    "    ds = Jovanic_lf.import_datasets(**kws)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_JOVANIC_LOAD:\n",
    "    # Load the datasets (If they have been imported in a previous session)\n",
    "    ds = [reg.loadRef(id=gID, load=True) for gID in gIDs]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the data, we can generate some plots.\n",
    "\n",
    "We will choose from the available ones :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The available plots by their unique IDs\n",
    "reg.graphs.ks\n",
    "\n",
    "# The keyword arguments for all plots\n",
    "plot_kws = {\"datasets\": ds, \"save_to\": plot_dir, \"show\": False, \"subfolder\": None}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_PLOTS:\n",
    "    # The trajectories of the larvae\n",
    "    _ = reg.graphs.run(\"trajectories\", **plot_kws)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_PLOTS:\n",
    "    # The trajectories of the larvae aligned at the origin, colored by the respective color of the group\n",
    "    _ = reg.graphs.run(\"trajectories\", mode=\"origin\", single_color=True, **plot_kws)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_PLOTS:\n",
    "    # Boxplot of some endpoint metrics\n",
    "    _ = reg.graphs.run(\"endpoint box\", **plot_kws)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_PLOTS:\n",
    "    # Composite plot summarizing exploration metrics\n",
    "    _ = reg.graphs.run(\"exploration summary\", **plot_kws)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's say we want to compare the 3 larva groups in terms of their spatial dispersal\n",
    "\n",
    "We will do this in increasingly elaborate ways :\n",
    "\n",
    "1. boxplot of dispersal during the first minute. This will capture only the endpoint situation\n",
    "2. timeplot of dispersal. This will capture the dispersal timecourse (mean and variance)\n",
    "3. video of trajectories aligned to originate from the center of the dish\n",
    "4. combined videos of the 3 groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_PLOTS:\n",
    "    # 1. Boxplots of dispersal (mean, final, maximum) for the first 60 seconds\n",
    "    _ = reg.graphs.run(\n",
    "        \"endpoint box\", ks=[\"dsp_0_60_mu\", \"dsp_0_60_fin\", \"dsp_0_60_max\"], **plot_kws\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_PLOTS:\n",
    "    # 2. Dispersal of larvae from their starting point. The default time range is 0-40 seconds.\n",
    "    _ = reg.graphs.run(\"dispersal\", **plot_kws)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_PLOTS:\n",
    "    # 2. Dispersal of larvae from their starting point. Now plotting the time range is 0-60 seconds.\n",
    "    _ = reg.graphs.run(\"dispersal\", range=(0, 60), **plot_kws)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_PLOTS:\n",
    "    # 2. Summary of dispersal of larvae from their starting point. The default time range is 0-40 seconds.\n",
    "    _ = reg.graphs.run(\"dispersal summary\", **plot_kws)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_PLOTS:\n",
    "    # 2. Summary of dispersal of larvae from their starting point. Now plotting the time range is 0-60 seconds.\n",
    "    _ = reg.graphs.run(\"dispersal summary\", range=(0, 60), **plot_kws)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3. Run replay simulations and store videos\n",
    "\n",
    "\n",
    "# A method that runs the replay simulation\n",
    "def run_replay(d):\n",
    "    # The display parameters\n",
    "    screen_kws = {\n",
    "        \"vis_mode\": \"video\",\n",
    "        \"show_display\": False,\n",
    "        \"draw_contour\": False,\n",
    "        \"draw_midline\": False,\n",
    "        \"draw_centroid\": False,\n",
    "        \"visible_trails\": True,\n",
    "        \"save_video\": SAVE_MEDIA,\n",
    "        \"fps\": 1,\n",
    "        \"video_file\": d.id,\n",
    "        \"media_dir\": video_dir,\n",
    "    }\n",
    "\n",
    "    # The replay configuration\n",
    "    replay_conf = ReplayConf(\n",
    "        transposition=\"origin\", time_range=(0, 60), track_point=d.c.point_idx\n",
    "    ).nestedConf\n",
    "\n",
    "    rep = sim.ReplayRun(\n",
    "        dataset=d, parameters=replay_conf, id=f\"{d.refID}_replay\", screen_kws=screen_kws\n",
    "    )\n",
    "    # print(rep.refDataset.color)\n",
    "    _ = rep.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_REPLAY_VIDEOS:\n",
    "    # 3. Run the replay simulation for each dataset\n",
    "    for d in ds:\n",
    "        _ = run_replay(d)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if RUN_COMBINE_VIDEOS:\n",
    "    # 4. Combine the videos\n",
    "    from larvaworld.lib.util.combining import combine_videos\n",
    "\n",
    "    combine_videos(file_dir=video_dir, save_as=\"3conditions.mp4\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "larvaworld_autoversioning",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}