Importing experimental data

This notebook illustrates the import of experimental data in larvaworld and the supporting classes and configuration structure.

Initialize the larvaworld registry. This loads some components from disc and builds the rest on the fly.

We also set VERBOSE=1 to get more info

%load_ext param.ipython
import panel as pn

pn.extension()

# You might have to install this module to run pn.Param
# !pip install jupyter_bokeh

import larvaworld
from larvaworld.lib import util, reg, sim
from larvaworld.lib.reg.generators import LabFormat

# Import the Replay configuration class (for Example III)
from larvaworld.lib.reg.generators import ReplayConf

larvaworld.VERBOSE = 1

# Tutorial safety switches (avoid network / heavy I/O / media generation by default)
RUN_SCHLEYER_IMPORT = False
RUN_SCHLEYER_IMPORT_MERGED = False
RUN_DOWNLOAD_SCHLEYER_EXAMPLE = False  # requires network access
RUN_JOVANIC_IMPORT = False
RUN_JOVANIC_LOAD = False
RUN_PLOTS = False
RUN_REPLAY_VIDEOS = False
RUN_COMBINE_VIDEOS = False

SAVE_MEDIA = False
MEDIA_DIR = "./media/3conditions"

The LabFormat class

Raw data can be of diverse lab-specific formats. We will start with the LabFormat class which supports them.

%params LabFormat

Let’s generate a new instance

lf_new = LabFormat(labID="MyLab")
print(f"An instance of {lf_new.__class__}")


%params lf_new

Stored instances of the LabFormat class are available through the configuration registry.

The registry is retrieved from a dictionary of registry objects by the LabFormat key.

LFreg = reg.conf.LabFormat

Each lab-specific data-format configuration is stored in the registry’s dictionary under a unique ID.

Let’s print the IDs

lfIDs = LFreg.confIDs
print(f"The IDs of the stored configurations of LabFormat class are :{lfIDs}")

# The registry is supported by a nested dictionary :
LFdict = LFreg.dict

# The path where the dictionary is stored:
print(LFreg.path_to_dict)


# The configuration IDs are the keys. They correspond to a nested dictionary :
lfID = lfIDs[0]
lf0_entry = LFdict[lfID]
print()
print(f"An instance of {lf0_entry.__class__.__name__}")

# The configuration dictionary can be retrieved directly by :
lf0_entry2 = LFreg.getID(lfID)
print()
print(lf0_entry == lf0_entry2)

# The configuration object can be retrieved directly by :
lf0 = LFreg.get(lfID)
print(f"The object under the ID : {lfID} is an instance of {lf0.__class__.__name__}")
print()

%params lf0

# The configuration object can be visualized by :
pn.Param(lf0)

# The configuration dictionary can be retrieved directly from the object :
lf0_entry3 = lf0.nestedConf

# As well as the parameter keys
print(lf0.param_keys)
print()

# The path where the lab data are stored:
print(lf0.path)
# print(lf0.raw_folder)

Example I : Import datasets

Note : The data imported here are part of the core larvaworld package

# Let's inspect one specific lab-format configuration
id = "Schleyer"
Schleyer_lf = LFreg.get(id)

%params Schleyer_lf.tracker

Both raw and imported experimental data, as well as the simulated data are stored at a specific location in the filestructure that can be accessed easily. Regarding experimental data, each format has its own dedicated directory :

print(f"All data are stored here :\n{larvaworld.DATA_DIR}\n")

print(f"The path to the data of the {id} lab-format :\n{Schleyer_lf.path}\n")

print(
    f"Raw data to be imported should be stored here (if not otherwise specified) :\n{Schleyer_lf.raw_folder}\n"
)

print(
    f"Imported/Processed data will be stored here (if not otherwise specified) :\n{Schleyer_lf.processed_folder}"
)

Now we can import some datasets. This means we convert from the native lab-specific data-format to the larvaworld format while at the same time filter/select specific entries of the data.

Here two cases are illustrated :

Tracks from a single dish
Merged tracks from all dishes inder a certain directory

The import returns an instance of LarvaDataset that can be then used.

By default this is not stored to disc, except if we specify save_dataset = True

if RUN_SCHLEYER_IMPORT:
    # Single dish case
    folder = "dish01"
    kws1 = {
        "parent_dir": f"exploration/{folder}",
        "min_duration_in_sec": 90,
        "id": folder,
        "refID": f"exploration.{folder}",
        "group_id": "exploration",
    }

    d1 = Schleyer_lf.import_dataset(**kws1)

if RUN_SCHLEYER_IMPORT_MERGED:
    # Merged case
    N = 40
    kws2 = {
        "parent_dir": "exploration",
        "merged": True,
        "max_Nagents": N,
        "min_duration_in_sec": 120,
        "refID": f"exploration.{N}controls",
        "group_id": "exploration",
    }

    d2 = Schleyer_lf.import_dataset(**kws2)

print(
    f"The import method returns an instance of {d1.__class__.__name__} having the ID : {d1.id}\n"
)

s, e, c = d1.data

print("The timeseries data (dropping NaNs) : \n")
s.dropna().head()

print("The endpoint data : \n")
e

Example II : Import downloaded data

Now we will illustrate the import functionality by downloading a publically available dataset of Drosophila larva locomotion.

Go to the website below, download the zipped file and extract in the lab-specific folder indicated above

if RUN_DOWNLOAD_SCHLEYER_EXAMPLE:
    # URL of the repository. Visit for further information.
    link2repo = "https://doi.gin.g-node.org/10.12751/g-node.5e1ifd/"

    # The name of the zipped file to be downloaded.
    filename = "Naive_Locomotion_Drosophila_Larvae.zip"

    # URL of the file.
    link2data = f"https://gin.g-node.org/MichaelSchleyer/Naive_Locomotion_Drosophila_Larvae/src/master/{filename}"

    # Path to extract the downloaded file
    dirname = "naive"
    print(
        f"The path to extract the downloaded file :\n{Schleyer_lf.raw_folder}/{dirname}\n"
    )

if RUN_SCHLEYER_IMPORT:
    # Single dish case
    folder = "box1-2017-05-18_14_48_22"
    id = "imported_single_dish"
    kws = {
        "parent_dir": f"{dirname}/{folder}",
        "min_duration_in_sec": 120,
        "id": id,
        "refID": f"{dirname}.{id}",
        "group_id": dirname,
    }

    d6 = Schleyer_lf.import_dataset(**kws)

d6.e.cum_dur.sort_values()

if RUN_SCHLEYER_IMPORT_MERGED:
    # Merged case
    N = 50
    kws2 = {
        "parent_dir": dirname,
        "merged": True,
        "max_Nagents": N,
        "min_duration_in_sec": 160,
        "refID": f"{dirname}.{N}controls",
        "group_id": dirname,
    }

    d100 = Schleyer_lf.import_dataset(**kws2)

    d100.e.cum_dur.sort_values()

Example III : Import data of a different format

We will now illustrate the import functionality by importing a set of 3 datasets : Fed, Sucrose and Starved

The 3 animal groups have been subjected two different diets and therefore are in different metabolic state at the moment of tracking their locomotion. We want to compare them in order to detect any impact of metabolic state on locomotion.

Note : This example requires data existing in the data/JovanicGroup/raw/ProteinDeprivation folder

Also note that the tracks in the datasets above only include the body’s midline and not its contour.

media_dir = MEDIA_DIR
plot_dir = f"{media_dir}/plots"
video_dir = f"{media_dir}/videos"

# The name of the experiment
exp = "ProteinDeprivation"

# The group IDs
gIDs = ["Fed", "Sucrose", "Starved"]

# The colors per group
palette = {
    "Fed": "black",
    "Sucrose": "red",
    "Starved": "purple",
}

# Here we configure the import of the data
Jovanic_lf.tracker.dt = 0.1

constraints = util.AttrDict(
    {
        "match_ids": False,
        "interpolate_ticks": True,
        "min_duration_in_sec": 20,
        "time_slice": (0, 60),
        # 'time_slice':None,
    }
)

enr_kws = util.AttrDict(
    {
        "proc_keys": ["angular", "spatial"],
        "anot_keys": ["bout_detection"],
        "traj2origin": True,
        # 'recompute' : True,
        "tor_durs": [20],
        "dsp_starts": [0],
        "dsp_stops": [40, 60],
    }
)


kws = {
    "parent_dir": exp,
    "source_ids": gIDs,
    "colors": [palette[gID] for gID in gIDs],
    # 'raw_folder': '../raw/',
    # 'proc_folder': processed_data_dir,
    "refIDs": gIDs,
    "merged": False,
    "save_dataset": True,
    "enrich_conf": enr_kws,
    **constraints,
}

The following cell actually imports the datasets.

This step might take a while.

It needs to be performed once when converting the datasets from the raw tracker-specific format (contained in the raw folder) to the larvaworld format (stored in the processed folder).

If the datasets have already been imported they can just be loaded (from the processed folder). In this case you can instead run the next cell in order to load them.

if RUN_JOVANIC_IMPORT:
    # Import the datasets (Needs to run only once)
    ds = Jovanic_lf.import_datasets(**kws)

if RUN_JOVANIC_LOAD:
    # Load the datasets (If they have been imported in a previous session)
    ds = [reg.loadRef(id=gID, load=True) for gID in gIDs]

Now that we have the data, we can generate some plots.

We will choose from the available ones :

# The available plots by their unique IDs
reg.graphs.ks

# The keyword arguments for all plots
plot_kws = {"datasets": ds, "save_to": plot_dir, "show": False, "subfolder": None}

if RUN_PLOTS:
    # The trajectories of the larvae
    _ = reg.graphs.run("trajectories", **plot_kws)

if RUN_PLOTS:
    # The trajectories of the larvae aligned at the origin, colored by the respective color of the group
    _ = reg.graphs.run("trajectories", mode="origin", single_color=True, **plot_kws)

if RUN_PLOTS:
    # Boxplot of some endpoint metrics
    _ = reg.graphs.run("endpoint box", **plot_kws)

if RUN_PLOTS:
    # Composite plot summarizing exploration metrics
    _ = reg.graphs.run("exploration summary", **plot_kws)

Let’s say we want to compare the 3 larva groups in terms of their spatial dispersal

We will do this in increasingly elaborate ways :

boxplot of dispersal during the first minute. This will capture only the endpoint situation
timeplot of dispersal. This will capture the dispersal timecourse (mean and variance)
video of trajectories aligned to originate from the center of the dish
combined videos of the 3 groups

if RUN_PLOTS:
    # 1. Boxplots of dispersal (mean, final, maximum) for the first 60 seconds
    _ = reg.graphs.run(
        "endpoint box", ks=["dsp_0_60_mu", "dsp_0_60_fin", "dsp_0_60_max"], **plot_kws
    )

if RUN_PLOTS:
    # 2. Dispersal of larvae from their starting point. The default time range is 0-40 seconds.
    _ = reg.graphs.run("dispersal", **plot_kws)

if RUN_PLOTS:
    # 2. Dispersal of larvae from their starting point. Now plotting the time range is 0-60 seconds.
    _ = reg.graphs.run("dispersal", range=(0, 60), **plot_kws)

if RUN_PLOTS:
    # 2. Summary of dispersal of larvae from their starting point. The default time range is 0-40 seconds.
    _ = reg.graphs.run("dispersal summary", **plot_kws)

if RUN_PLOTS:
    # 2. Summary of dispersal of larvae from their starting point. Now plotting the time range is 0-60 seconds.
    _ = reg.graphs.run("dispersal summary", range=(0, 60), **plot_kws)

# 3. Run replay simulations and store videos


# A method that runs the replay simulation
def run_replay(d):
    # The display parameters
    screen_kws = {
        "vis_mode": "video",
        "show_display": False,
        "draw_contour": False,
        "draw_midline": False,
        "draw_centroid": False,
        "visible_trails": True,
        "save_video": SAVE_MEDIA,
        "fps": 1,
        "video_file": d.id,
        "media_dir": video_dir,
    }

    # The replay configuration
    replay_conf = ReplayConf(
        transposition="origin", time_range=(0, 60), track_point=d.c.point_idx
    ).nestedConf

    rep = sim.ReplayRun(
        dataset=d, parameters=replay_conf, id=f"{d.refID}_replay", screen_kws=screen_kws
    )
    # print(rep.refDataset.color)
    _ = rep.run()

if RUN_REPLAY_VIDEOS:
    # 3. Run the replay simulation for each dataset
    for d in ds:
        _ = run_replay(d)

if RUN_COMBINE_VIDEOS:
    # 4. Combine the videos
    from larvaworld.lib.util.combining import combine_videos

    combine_videos(file_dir=video_dir, save_as="3conditions.mp4")