Model Evaluation
The Eval simulation mode enables you to compare multiple larva models against experimental reference datasets using statistical tests. This is essential for model validation and selection.
Purpose
Use Eval mode to:
✅ Validate models against real experimental data
✅ Select best model from multiple candidates
✅ Behavioral fingerprinting across 40+ metrics
✅ Hypothesis testing with statistical rigor
For mode comparison, see Simulation Modes.
Quick Start
CLI
larvaworld Eval -refID exploration.30controls --modelIDs explorer navigator
Python
from larvaworld.lib.sim import EvalRun
eval_run = EvalRun(
refID='exploration.30controls',
modelIDs=['explorer', 'navigator', 'forager'],
N=3, # agents per model (use larger N for real runs)
screen_kws={}, # headless
)
eval_run.simulate()
eval_run.plot_results(show=False)
Workflow
1. Select Reference Dataset
Reference datasets are experimental recordings imported into Larvaworld.
Available datasets:
from larvaworld.lib import reg
# List all reference datasets
ref_ids = reg.conf.Ref.confIDs
print(f"Available: {len(ref_ids)} reference datasets")
# Inspect a dataset
ref_conf = reg.conf.Ref.getID("exploration.30controls")
print(ref_conf)
Loading a reference:
from larvaworld.lib import reg
ref_dataset = reg.loadRef(id="exploration.30controls", load=True)
print(f"Reference: {ref_dataset.config.refID}")
print(f"Agents: {len(ref_dataset.agent_ids)}")
print(f"Duration: {ref_dataset.config.duration} min")
For details on importing datasets, see Lab-Specific Data Import.
2. Select Models to Compare
Predefined models:
Model ID |
Description |
|---|---|
|
Baseline exploration |
|
Odor-guided navigation |
|
Feeding/foraging |
|
High-activity forager phenotype |
|
Low-activity forager phenotype |
|
Maximal feeding rate |
|
Feeder-focused behavior |
|
RL-enhanced navigation |
|
OSN-based navigation |
Inspect models:
from larvaworld.lib import reg
# List all models
model_ids = reg.conf.Model.confIDs
print(f"Available: {len(model_ids)} models")
# Inspect model configuration
model_conf = reg.conf.Model.getID("explorer")
print(model_conf)
3. Run Evaluation
from larvaworld.lib.sim import EvalRun
eval_run = EvalRun(
refID='exploration.30controls', # Reference dataset
modelIDs=['explorer', 'navigator'], # Models to compare
N=3, # Agents per model (increase for real)
screen_kws={}, # headless
)
# Run simulations
eval_run.simulate()
What happens:
Load reference dataset
For each model:
Run one simulation with
Nlarvae (per model)Compute 40+ behavioral metrics
Compare model distributions to reference using Kolmogorov-Smirnov (KS) tests
4. Access Results
# Statistical comparison (endpoint metrics)
print(eval_run.error_dicts["pooled"]["end"])
# Statistical comparison (distribution metrics)
print(eval_run.error_dicts["pooled"]["step"])
# Raw datasets per model
for model_id, datasets in eval_run.model_datasets.items():
print(f"{model_id}: {len(datasets)} runs")
5. Visualize Results
Statistical Comparison Plots
# Aggregate comparison plots
eval_run.plot_results() # KS D-statistic heatmaps
Generated plots:
KS D-statistic heatmap: Models × Metrics
Box plots: Metric distributions per model
P-value summary: Statistical significance
Model-Specific Visualizations
# Individual model plots
eval_run.plot_models() # Trajectories, distributions
Generated plots per model:
Trajectories: Spatial paths
Angular distributions: Orientation, turns
Spatial distributions: Velocity, dispersal
Bout distributions: Stride/turn/pause durations
Evaluation Metrics
Larvaworld computes 40+ behavioral metrics across three categories:
Endpoint Metrics (Summary Statistics)
Metric |
Description |
Unit |
|---|---|---|
cum_dur |
Total duration |
s |
cum_sd |
Total distance |
m |
v_mu |
Mean linear velocity |
mm/s |
a_mu |
Mean linear acceleration |
mm/s² |
av_mu |
Mean angular velocity |
rad/s |
fov_mu |
Mean forward velocity |
mm/s |
pau_N |
Number of pauses |
count |
str_N |
Number of strides |
count |
run_N |
Number of runs |
count |
str_f |
Stride frequency |
Hz |
run_t |
Average run duration |
s |
pau_t |
Average pause duration |
s |
Distribution Metrics (Time-Series)
Metric |
Description |
|---|---|
angular |
Orientation, angular velocity/acceleration |
spatial |
Linear velocity/acceleration distributions |
dispersal |
Spatial spread over time |
tortuosity |
Path straightness (sliding windows) |
Bout Metrics (Event-Based)
Metric |
Description |
|---|---|
stride_duration |
Distribution of stride durations |
turn_amplitude |
Distribution of turn amplitudes |
pause_duration |
Distribution of pause durations |
run_distance |
Distribution of run distances |
Statistical Testing
Kolmogorov-Smirnov (KS) Test
Purpose: Compare distributions between model and reference.
Null Hypothesis: Model and reference are drawn from the same distribution.
KS D-Statistic: Maximum difference between cumulative distributions.
Formula:
D = max_x |F_model(x) - F_ref(x)|Where
F_model(x)is the cumulative distribution of the model andF_ref(x)that of the reference.
Interpretation:
D = 0: Perfect matchD < 0.2: Good matchD > 0.5: Poor match
Computing KS tests manually:
from larvaworld.lib.process.evaluation import eval_fast
# Compare two datasets
ks_results = eval_fast(
datasets=[model_dataset],
refDataset=ref_dataset,
metric_definition="angular" # or "spatial", "all"
)
print(ks_results['end']) # Endpoint metrics
print(ks_results['step']) # Distribution metrics
Example: Rover vs. Sitter Comparison
from larvaworld.lib.sim import EvalRun
# Compare rover vs sitter models (short demo)
eval_run = EvalRun(
refID='exploration.30controls',
modelIDs=['rover', 'sitter'],
N=3,
screen_kws={},
)
eval_run.simulate()
# Plot comparison
eval_run.plot_results()
# Access D-statistics
ks_end = eval_run.error_dicts["pooled"]['end']
print("Endpoint KS D-statistics:")
for model, metrics in ks_end.items():
print(f"\n{model}:")
for metric, d_stat in metrics.items():
if d_stat < 0.2:
print(f" {metric}: {d_stat:.3f} ✅ (good match)")
else:
print(f" {metric}: {d_stat:.3f} ❌ (poor match)")
Custom Metric Selection
By default, Larvaworld auto-selects metrics based on experiment type. You can customize:
from larvaworld.lib.sim import EvalRun
eval_run = EvalRun(
refID='exploration.30controls',
modelIDs=['explorer'],
duration=1.0, # short demo
N=5,
screen_kws={},
# Custom metric selection
metric_definition="angular" # Only angular metrics
# Options: "angular", "spatial", "spatial+angular", "all"
)
eval_run.simulate()
Parallelization
Currently EvalRun.simulate() runs single-process. For parallel runs, launch multiple EvalRun instances via your own batching (e.g., shell/xargs or a task runner) and combine results manually.
Saving Results
# Save evaluation results
eval_run.store()
# Location: DATA/SimGroup/eval_runs/{refID}/
print(f"Saved to: {eval_run.dir}")
# Load later
from larvaworld.lib.sim import EvalRun
eval_run_loaded = EvalRun.load(path=eval_run.dir)
Advanced: Custom Reference Data
You can use your own experimental data:
Step 1: Import Dataset
from larvaworld.lib import reg
lab = reg.gen.LabFormat(labID="Schleyer")
lab.import_dataset(
parent_dir="exploration",
merged=True,
max_Nagents=30,
min_duration_in_sec=60,
id="my_experiment",
refID="my_experiment",
save_dataset=True,
)
For details, see Lab-Specific Data Import.
Step 2: Process Dataset
dataset = reg.loadRef(id="my_experiment", load=True)
dataset.preprocess(filter_f=3.0)
dataset.process(proc_keys=["angular", "spatial"])
dataset.annotate(
anot_keys=["bout_detection", "bout_distribution", "interference"]
)
Step 3: Evaluate Against Custom Reference
eval_run = EvalRun(
refID='my_experiment',
modelIDs=['explorer', 'navigator'],
duration=5.0
)
eval_run.simulate()