Data Processing

Larvaworld provides a comprehensive data processing pipeline for trajectory analysis. All processing operates on LarvaDataset objects.


Processing Pipeline

Raw Data → Preprocess → Process → Annotate → Plot/Analyze

1. Preprocessing

Purpose: Clean and standardize raw trajectories

dataset.preprocess(
    # Note: drop_collisions requires a "collision_flag" column (typically present in imported datasets)
    drop_collisions=False,
    interpolate_nans=True,     # Fill missing data
    filter_f=3.0,              # Low-pass filter at 3 Hz
    rescale_by=0.001,          # Scale (e.g., mm to m)
    transposition="center"     # Center trajectories
)

Available Options

Parameter

Description

Default

drop_collisions

Remove frames with collisions

False

interpolate_nans

Interpolate missing values

False

filter_f

Low-pass filter cutoff (Hz)

None

rescale_by

Scale factor

None

transposition

Alignment mode ("center", "origin", "arena")

None


2. Processing

Purpose: Compute behavioral metrics

dataset.process(
    proc_keys=["angular", "spatial"],
    dsp_starts=[0],
    dsp_stops=[40, 60],
    tor_durs=[5, 10, 20]
)

Processing Categories

Angular Metrics

proc_keys = ["angular"]

Computes:

  • Orientation angle

  • Angular velocity

  • Angular acceleration

  • Head direction

Spatial Metrics

proc_keys = ["spatial"]

Computes:

  • Linear velocity

  • Linear acceleration

  • Cumulative distance

Forward Components

Forward/sideways components are part of the spatial metrics computed when "spatial" is included in proc_keys. They are not a separate proc_keys entry.

Dispersal

Dispersal metrics are always computed for the combinations of dsp_starts and dsp_stops passed to process:

dataset.process(
    proc_keys=["angular", "spatial"],
    dsp_starts=[0],       # Start times (s)
    dsp_stops=[40, 60],   # Stop times (s)
)

This adds endpoint columns such as dispersion_0_60_mean / std / max and their scaled counterparts (e.g. scaled_dispersion_0_60_mean).

Tortuosity

Tortuosity metrics are controlled by tor_durs:

dataset.process(
    proc_keys=["angular", "spatial"],
    tor_durs=[5, 10, 20],  # Window durations (s)
)

This adds endpoint columns such as tortuosity_5_mean / std, tortuosity_10_mean / std, etc.


3. Annotation

Purpose: Detect behavioral events

dataset.annotate(
    anot_keys=[
        "bout_detection",
        "bout_distribution",
        "interference"
    ]
)

Annotation Types

Bout Detection

anot_keys = ["bout_detection"]

Detects:

  • Strides: Individual peristaltic waves

  • Runs: Chains of strides

  • Pauses: Immobility epochs

  • Turns: Reorientation maneuvers

Bout Distribution

anot_keys = ["bout_distribution"]

Computes:

  • Distribution fitting (exponential, power-law)

  • Duration/length statistics

Interference

anot_keys = ["interference"]

Analyzes:

  • Crawl-turn coupling

  • Phase relationships


Data Structure

LarvaDataset

dataset = run.datasets[0]

# Endpoint data (summary per larva)
print(dataset.e)         # Pandas DataFrame

# Step-wise data (time-series)
print(dataset.s)         # Pandas DataFrame

# Configuration
print(dataset.c)         # AttrDict

Endpoint Metrics

dataset.e.columns

Available:

  • cum_dur: Total duration (s)

  • velocity_mean, scaled_velocity_mean: Mean speed (and scaled variant)

  • dispersion_0_60_mean: Example dispersal metric (if computed)

  • tortuosity_5_mean: Example tortuosity metric (if computed)

Step-wise Data

dataset.s.columns

Available:

  • x, y: Position

  • bend, front_orientation, rear_orientation: Angular kinematics

  • velocity, scaled_velocity: Speed (and scaled variant)


Example Workflow

from larvaworld.lib.sim import ExpRun

# Run experiment
run = ExpRun(experiment="dish", N=3, duration=1.0, screen_kws={}, store_data=False)
run.simulate()

# Get dataset
dataset = run.datasets[0]

# 1. Preprocess
dataset.preprocess(
    interpolate_nans=True,
    filter_f=3.0,
    transposition="center"
)

# 2. Process
dataset.process(
    proc_keys=["angular", "spatial"],
    dsp_starts=[0],
    dsp_stops=[40, 60],
    tor_durs=[5, 10],
)

# 3. Annotate
dataset.annotate(
    anot_keys=["bout_detection", "bout_distribution"]
)

# 4. Analyze
print("=== Summary Statistics ===")
cols = [c for c in ["cum_dur", "velocity_mean", "dispersion_0_60_mean", "tortuosity_5_mean"] if c in dataset.e.columns]
print(dataset.e[cols].describe() if cols else dataset.e.describe())

Saving Processed Data

from larvaworld.lib import reg

# Save dataset to HDF5 and register as a reference ID
dataset.save(refID="my_experiment")

# Load later
dataset = reg.loadRef(id="my_experiment", load=True)