Model Evaluation

The Eval simulation mode enables you to compare multiple larva models against experimental reference datasets using statistical tests. This is essential for model validation and selection.


Purpose

Use Eval mode to:

  • Validate models against real experimental data

  • Select best model from multiple candidates

  • Behavioral fingerprinting across 40+ metrics

  • Hypothesis testing with statistical rigor

For mode comparison, see Simulation Modes.


Quick Start

CLI

larvaworld Eval -refID exploration.30controls --modelIDs explorer navigator

Python

from larvaworld.lib.sim import EvalRun

eval_run = EvalRun(
    refID='exploration.30controls',
    modelIDs=['explorer', 'navigator', 'forager'],
    N=3,                  # agents per model (use larger N for real runs)
    screen_kws={},        # headless
)
eval_run.simulate()
eval_run.plot_results(show=False)

Workflow

1. Select Reference Dataset

Reference datasets are experimental recordings imported into Larvaworld.

Available datasets:

from larvaworld.lib import reg

# List all reference datasets
ref_ids = reg.conf.Ref.confIDs
print(f"Available: {len(ref_ids)} reference datasets")

# Inspect a dataset
ref_conf = reg.conf.Ref.getID("exploration.30controls")
print(ref_conf)

Loading a reference:

from larvaworld.lib import reg

ref_dataset = reg.loadRef(id="exploration.30controls", load=True)
print(f"Reference: {ref_dataset.config.refID}")
print(f"Agents: {len(ref_dataset.agent_ids)}")
print(f"Duration: {ref_dataset.config.duration} min")

For details on importing datasets, see Lab-Specific Data Import.


2. Select Models to Compare

Predefined models:

Model ID

Description

'explorer'

Baseline exploration

'navigator'

Odor-guided navigation

'forager'

Feeding/foraging

'rover'

High-activity forager phenotype

'sitter'

Low-activity forager phenotype

'max_forager'

Maximal feeding rate

'max_feeder'

Feeder-focused behavior

'RLnavigator'

RL-enhanced navigation

'OSNnavigator'

OSN-based navigation

Inspect models:

from larvaworld.lib import reg

# List all models
model_ids = reg.conf.Model.confIDs
print(f"Available: {len(model_ids)} models")

# Inspect model configuration
model_conf = reg.conf.Model.getID("explorer")
print(model_conf)

3. Run Evaluation

from larvaworld.lib.sim import EvalRun

eval_run = EvalRun(
    refID='exploration.30controls',          # Reference dataset
    modelIDs=['explorer', 'navigator'],      # Models to compare
    N=3,                                     # Agents per model (increase for real)
    screen_kws={},                           # headless
)

# Run simulations
eval_run.simulate()

What happens:

  1. Load reference dataset

  2. For each model:

    • Run one simulation with N larvae (per model)

    • Compute 40+ behavioral metrics

  3. Compare model distributions to reference using Kolmogorov-Smirnov (KS) tests


4. Access Results

# Statistical comparison (endpoint metrics)
print(eval_run.error_dicts["pooled"]["end"])

# Statistical comparison (distribution metrics)
print(eval_run.error_dicts["pooled"]["step"])

# Raw datasets per model
for model_id, datasets in eval_run.model_datasets.items():
    print(f"{model_id}: {len(datasets)} runs")

5. Visualize Results

Statistical Comparison Plots

# Aggregate comparison plots
eval_run.plot_results()  # KS D-statistic heatmaps

Generated plots:

  • KS D-statistic heatmap: Models × Metrics

  • Box plots: Metric distributions per model

  • P-value summary: Statistical significance

Model-Specific Visualizations

# Individual model plots
eval_run.plot_models()  # Trajectories, distributions

Generated plots per model:

  • Trajectories: Spatial paths

  • Angular distributions: Orientation, turns

  • Spatial distributions: Velocity, dispersal

  • Bout distributions: Stride/turn/pause durations


Evaluation Metrics

Larvaworld computes 40+ behavioral metrics across three categories:

Endpoint Metrics (Summary Statistics)

Metric

Description

Unit

cum_dur

Total duration

s

cum_sd

Total distance

m

v_mu

Mean linear velocity

mm/s

a_mu

Mean linear acceleration

mm/s²

av_mu

Mean angular velocity

rad/s

fov_mu

Mean forward velocity

mm/s

pau_N

Number of pauses

count

str_N

Number of strides

count

run_N

Number of runs

count

str_f

Stride frequency

Hz

run_t

Average run duration

s

pau_t

Average pause duration

s

Distribution Metrics (Time-Series)

Metric

Description

angular

Orientation, angular velocity/acceleration

spatial

Linear velocity/acceleration distributions

dispersal

Spatial spread over time

tortuosity

Path straightness (sliding windows)

Bout Metrics (Event-Based)

Metric

Description

stride_duration

Distribution of stride durations

turn_amplitude

Distribution of turn amplitudes

pause_duration

Distribution of pause durations

run_distance

Distribution of run distances


Statistical Testing

Kolmogorov-Smirnov (KS) Test

Purpose: Compare distributions between model and reference.

Null Hypothesis: Model and reference are drawn from the same distribution.

KS D-Statistic: Maximum difference between cumulative distributions.

  • Formula: D = max_x |F_model(x) - F_ref(x)|

  • Where F_model(x) is the cumulative distribution of the model and F_ref(x) that of the reference.

Interpretation:

  • D = 0: Perfect match

  • D < 0.2: Good match

  • D > 0.5: Poor match

Computing KS tests manually:

from larvaworld.lib.process.evaluation import eval_fast

# Compare two datasets
ks_results = eval_fast(
    datasets=[model_dataset],
    refDataset=ref_dataset,
    metric_definition="angular"  # or "spatial", "all"
)

print(ks_results['end'])  # Endpoint metrics
print(ks_results['step'])  # Distribution metrics

Example: Rover vs. Sitter Comparison

from larvaworld.lib.sim import EvalRun

# Compare rover vs sitter models (short demo)
eval_run = EvalRun(
    refID='exploration.30controls',
    modelIDs=['rover', 'sitter'],
    N=3,
    screen_kws={},
)

eval_run.simulate()

# Plot comparison
eval_run.plot_results()

# Access D-statistics
ks_end = eval_run.error_dicts["pooled"]['end']
print("Endpoint KS D-statistics:")
for model, metrics in ks_end.items():
    print(f"\n{model}:")
    for metric, d_stat in metrics.items():
        if d_stat < 0.2:
            print(f"  {metric}: {d_stat:.3f} ✅ (good match)")
        else:
            print(f"  {metric}: {d_stat:.3f} ❌ (poor match)")

Custom Metric Selection

By default, Larvaworld auto-selects metrics based on experiment type. You can customize:

from larvaworld.lib.sim import EvalRun

eval_run = EvalRun(
    refID='exploration.30controls',
    modelIDs=['explorer'],
    duration=1.0,  # short demo
    N=5,
    screen_kws={},

    # Custom metric selection
    metric_definition="angular"  # Only angular metrics
    # Options: "angular", "spatial", "spatial+angular", "all"
)

eval_run.simulate()

Parallelization

Currently EvalRun.simulate() runs single-process. For parallel runs, launch multiple EvalRun instances via your own batching (e.g., shell/xargs or a task runner) and combine results manually.


Saving Results

# Save evaluation results
eval_run.store()

# Location: DATA/SimGroup/eval_runs/{refID}/
print(f"Saved to: {eval_run.dir}")

# Load later
from larvaworld.lib.sim import EvalRun
eval_run_loaded = EvalRun.load(path=eval_run.dir)

Advanced: Custom Reference Data

You can use your own experimental data:

Step 1: Import Dataset

from larvaworld.lib import reg

lab = reg.gen.LabFormat(labID="Schleyer")
lab.import_dataset(
    parent_dir="exploration",
    merged=True,
    max_Nagents=30,
    min_duration_in_sec=60,
    id="my_experiment",
    refID="my_experiment",
    save_dataset=True,
)

For details, see Lab-Specific Data Import.

Step 2: Process Dataset

dataset = reg.loadRef(id="my_experiment", load=True)

dataset.preprocess(filter_f=3.0)
dataset.process(proc_keys=["angular", "spatial"])
dataset.annotate(
    anot_keys=["bout_detection", "bout_distribution", "interference"]
)

Step 3: Evaluate Against Custom Reference

eval_run = EvalRun(
    refID='my_experiment',
    modelIDs=['explorer', 'navigator'],
    duration=5.0
)
eval_run.simulate()