larvaworld.lib.process.dataset

Basic classes for larvaworld-format datasets

Classes

`DatasetConfig`	The configuration of a LarvaDataset.
`ParamLarvaDataset`	Base class for named objects that support Parameters and message
`BaseLarvaDataset`	Base class for named objects that support Parameters and message
`LarvaDataset`	Base class for named objects that support Parameters and message
`LarvaDatasetCollection`

Module Contents

class larvaworld.lib.process.dataset.DatasetConfig(**kwargs: Any)

Bases: larvaworld.lib.param.RuntimeDataOps, larvaworld.lib.param.SimMetricOps, larvaworld.lib.param.SimTimeOps

The configuration of a LarvaDataset.

Nticks

refID

group_id

color

env_params

larva_group

agent_ids

N

sample

filtered_at

rescaled_by

pooled_cycle_curves

bout_distros

intermitter

modelConfs

EEB_poly1d

property h5_kdic: Returns the keys of the h5 file that store the parameters of the dataset

update_Nagents()

property arena_vertices

get_sample_bout_distros(m)

class larvaworld.lib.process.dataset.ParamLarvaDataset(**kwargs: Any)

Bases: param.Parameterized

Base class for named objects that support Parameters and message formatting.

Automatic object naming: Every Parameterized instance has a name parameter. If the user doesn’t designate a name=<str> argument when constructing the object, the object will be given a name consisting of its class name followed by a unique 5-digit number.

Automatic parameter setting: The Parameterized __init__ method will automatically read the list of keyword parameters. If any keyword matches the name of a Parameter (see Parameter class) defined in the object’s class or any of its superclasses, that parameter in the instance will get the value given as a keyword argument. For example:

class Foo(Parameterized):
xx = Parameter(default=1)

foo = Foo(xx=20)

in this case foo.xx gets the value 20.

When initializing a Parameterized instance (‘foo’ in the example above), the values of parameters can be supplied as keyword arguments to the constructor (using parametername=parametervalue); these values will override the class default values for this one instance.

If no ‘name’ parameter is supplied, self.name defaults to the object’s class name with a unique number appended to it.

Message formatting: Each Parameterized instance has several methods for optionally printing output. This functionality is based on the standard Python ‘logging’ module; using the methods provided here, wraps calls to the ‘logging’ module’s root logger and prepends each message with information about the instance from which the call was made. For more information on how to set the global logging level and change the default message prefix, see documentation for the ‘logging’ module.

config

step_data

endpoint_data

config2

epoch_dict

larva_dicts

validate_IDs()

update_ids_in_data()

update_Nticks()

property c

property ids

property s

property e

property end_ps

property step_ps

property end_ks

property step_ks

property min_tick

timeseries_slice(time_range=None, df=None)

required()

valid(returned=None)

data_exists(ks=[], ps=[], eks=[], eps=[], config_attrs=[], attrs=[])

property chunk_dicts

property epoch_dicts

property fitted_epochs

property pooled_epochs

property cycle_curves

property pooled_cycle_curves

track_par_in_chunk(chunk, par)

epochs_pose_by_ID(chunk, id)

epochs_bearing_by_ID(chunk, id, loc=(0.0, 0.0))

epoch_durs(epochs)

epoch_amps(epochs, a)

epoch_maxs(epochs, a)

epoch_idx(epochs)

comp_chunk_bearing(chunk)

detect_epochs(idx, min_dur=None)

detect_runs(a, vel_thr=0.3, min_dur=0.5)

Annotates crawl-runs in timeseries.

Extended description of function.

Parameters

aarray

1D np.array : forward velocity timeseries

vel_thrfloat

Maximum velocity threshold

min_durfloat, optional: The minimum required duration for a turn

Returns

runslist: A list of pairs of the start-end indices of the runs.

detect_pauses(a, vel_thr=0.3, runs=None, min_dur=None)

Annotates crawl-pauses in timeseries.

Extended description of function.

Parameters

aarray: 1D np.array : forward velocity timeseries
vel_thrfloat: Maximum velocity threshold
runslist: A list of pairs of the start-end indices of the runs. If provided pauses that overlap with runs will be excluded.
min_durfloat, optional: The minimum required duration for a turn

Returns

pauseslist: A list of pairs of the start-end indices of the pauses.

detect_strides(a, vel_thr=0.3, stretch=(0.75, 2.0), fr=None, return_extrema=True)

Annotates strides-runs and pauses in timeseries.

Extended description of function.

Parameters

aarray: 1D np.array : forward velocity timeseries
vel_thrfloat: Maximum velocity threshold
stretchTuple[float,float]: The min-max stretch of a stride relative to the default derived from the dominnt frequency
frfloat, optional: The dominant crawling frequency.
return_extremaboolean: Whether to additionally return the stride extrema

Returns

strideslist: A list of pairs of the start-end indices of the strides.
i_minarray: Indices of the local minima.
i_maxarray: Indices of the local maxima

detect_stridechains(strides)

Annotates stridechains-runs by concatenating consecutive strides.

Extended description of function.

Parameters

stridesarray: 2D np.array : the start-end tics of the stride epochs

Returns

runslist: A list of pairs of the start-end indices of the runs/stridechains.
run_countslist: Stride-counts of the runs/stridechains.

detect_turns(a, min_dur=None)

Annotates turns in timeseries.

Extended description of function.

Parameters

aarray: 1D np.array : angular velocity timeseries
min_durfloat, optional: The minimum required duration for a turn

Returns

Lturnslist: A list of pairs of the start-end indices of the Left turns.
Rturnslist: A list of pairs of the start-end indices of the Right turns.

crawl_annotation(strides_enabled: bool = True, vel_thr: float = 0.3) → larvaworld.lib.util.AttrDict

turn_annotation(min_dur=None)

turn_mode_annotation()

patch_residency_annotation()

detect_epoch_on_food_overlap(chunk)

detect_bouts(vel_thr=0.3, strides_enabled=True, castsNweathervanes=True)

comp_pooled_epochs()

Compute pooled epochs from chunk dictionaries.

This method processes the chunk_dicts attribute to create epoch_dicts and pooled_epochs. It first extracts unique epoch keys from the chunk dictionaries and then constructs a dictionary of epochs (epoch_dicts) where each key corresponds to a dictionary of chunk data.

The method then defines an inner function get_vs to concatenate values from the dictionaries, handling cases where the values have different shapes. If the majority of the values have a shape of 2 dimensions, it filters out those with a shape of 1 dimension before concatenation.

Finally, it creates the pooled_epochs attribute by concatenating the values for each epoch key, excluding specific keys such as “turn_slice”, “pause_idx”, “run_idx”, and “stride_idx”.

Attributes:: chunk_dicts (dict): A dictionary containing chunk data. epoch_dicts (AttrDict): A dictionary of epochs with chunk data. pooled_epochs (AttrDict): A dictionary of concatenated epoch data.
Raises:: Exception: If there is an issue with concatenating the values in get_vs.
Prints:: “Completed bout detection.” upon successful completion.

fit_pooled_epochs()

generate_pooled_epochs(mID)

comp_bout_distros()

register_bout_distros()

comp_cycle_curves(Nbins=64)

comp_attenuation(Nbins=64)

comp_interference(Nbins=64)

comp_pooled_cycle_curves()

annotate(anot_keys=['bout_detection', 'bout_distribution', 'interference'], is_last=False, **kwargs)

interpolate_nan_values()

filter(filter_f=2.0, recompute=False)

rescale(recompute=False, rescale_by=1.0)

exclude_rows(flag='collision_flag', accepted=[0], rejected=None)

smaller_dataset(p)

Generate a smaller dataset based on the given ReplayConf parameters.

Args:: p (ReplayConf): The configuration for dataset replay.
Returns:: LarvaDataset: A subset of the original dataset.

align_trajectories(track_point=None, arena_dims=None, transposition='origin', replace=True)

preprocess(drop_collisions=False, interpolate_nans=False, filter_f=None, rescale_by=None, transposition=None, recompute=False)

merge_configs()

set_data(step=None, end=None, agents=None, **kwargs)

property data

path_to_file(file='data.h5')

property path_to_config

store(df, key, file='data.h5')

save_dict(d, file)

read(key, file='data.h5')

load(step=True, h5_ks=None)

save(refID=None)

save_config(refID=None)

load_traj(mode='default')

load_dicts(type, ids=None)

Load dictionaries based on the specified type and optional IDs.

Args:

type (str): The type of dictionaries to load. ids (list, optional): A list of IDs to load. If None, uses self.ids.

Returns:

list: A list of dictionaries corresponding to the specified type and IDs.

Notes:

If the specified type and IDs are found in self.larva_dicts, the dictionaries are loaded from there.
Otherwise, the dictionaries are loaded from files located in the directory specified by self.config.data_dir.

store_dicts(type, dicts)

Stores a dictionary of dictionaries to individual files.

Args:

type (str): The type/category of the dictionaries to be stored. dicts (dict): A dictionary where keys are identifiers and values are dictionaries to be stored.

Example:

>>> store_dicts('example_type', {'id1': {'key1': 'value1'}, 'id2': {'key2': 'value2'}})
This will create files 'id1.txt' and 'id2.txt' in the directory specified by self.config.data_dir/individuals/example_type.

store_larva_dicts()

Stores larva dictionaries by iterating over the items in self.larva_dicts.

This method retrieves each type and its corresponding dictionary from self.larva_dicts and passes them to the store_dicts method for storage.

Returns:: None

property contour_xy_data_byID

property midline_xy_data_byID

property traj_xy_data_byID

data_by_ID(data)

property midline_xy_data

property contour_xy_data

empty_df(dim3=1)

apply_per_agent(pars, func, time_range=None, **kwargs)

Apply a function to each subdataframe of a MultiIndex DataFrame after grouping by the agentID.

Parameters

spandas.DataFrame: A MultiIndex DataFrame with levels [‘Step’, ‘AgentID’].
funcfunction: The function to apply to each subdataframe.
**kwargsdict: Additional keyword arguments to pass to the ‘func’ function.

Returns

numpy.ndarray: An array of dimensions [N_ticks, N_ids], where N_ticks is the number of unique ‘Step’ values, and N_ids is the number of unique ‘AgentID’ values.

Notes

This function groups the DataFrame ‘s’ by the specified ‘level’, applies ‘func’ to each subdataframe, and returns the results as a numpy array.

midline_xy_1less(mid)

property midline_seg_xy_data_byID

property midline_seg_orients_data_byID

midline_seg_orients_from_mid(mid)

Calculate the orientation of midline segments from midline coordinates.

Parameters: mid (numpy.ndarray): A 3D array of shape (Nticks, N, 2) where Nticks is the number of timesteps,

N is the number of midline points, and 2 represents the x and y coordinates of each point.

Returns: numpy.ndarray: A 2D array of shape (Nticks, N-1) containing the orientation angles (in radians)

of each segment for each timestep, with values in the range [0, 2π).

comp_freq(par, fr_range=(0.0, +np.inf))

Compute the frequency of a parameter for each agent.

This method calculates the dominant frequency of a given parameter for each agent in the dataset. It uses the Fast Fourier Transform (FFT) to find the frequency with the highest amplitude within a specified frequency range.

Parameters: par (str): The name of the parameter to compute the frequency for. fr_range (tuple, optional): A tuple specifying the frequency range to consider.

Defaults to (0.0, +np.inf).

Returns: None: The result is stored in the endpoint dataframe with the frequency name

as the key.

comp_freqs()

Compute dominant frequencies for translational and angular velocities. The frequency ranges (in Hz) are (1.0, 2.5) and (0.1, 0.8) respectively.

Parameters: None

Returns: None

comp_orientations(mode='minimal', recompute=False)

Compute the orientations of body segments for each timestep, for each agent in the dataset.

Parameters: mode (str): Determines whether to compute only front and rear orientations

or one for each body segment. Options are “minimal” (default) or “full”.

recompute (bool): If True, recompute the orientations even if they already exist.: Default is False.

Returns: None

comp_angular(is_last=False, **kwargs)

Perform angular analysis on the dataset.

This method computes orientations, bends, and angular moments for the dataset. If is_last is set to True, the results are saved after computation.

Parameters: is_last (bool): Flag to indicate if this is the last computation step. If True, the results are saved. **kwargs: Additional keyword arguments passed to the computation methods.

Returns: None

comp_bend(mode='minimal', recompute=False)

Compute the body bending angle for each timestep, for each agent in the dataset.

Parameters: mode (str): Determines whether to compute a single angle or one for each intersegmental joint.

Options are “minimal” (default) or “full”.

recompute (bool): If True, forces recomputation of the bending angles: even if they are already computed. Default is False.

Raises: Exception: If the bending angle computation method specified in the

configuration is not recognized.

Notes: - If the bending angles are already computed and recompute is set to False,

a message will be printed and the function will exit without recomputing.

The bending angle can be computed in two ways: 1. “from_vectors”: As the difference between front and rear orientations. 2. “from_angles”: As the sum of the first N front angles, where N is

specified in the configuration.
The computed bending angles are stored in the step dataframe.

comp_ang_moments(pars=None, mode='minimal', recompute=False)

comp_xy_moments(point='', **kwargs)

comp_tortuosity(dur=20, **kwargs)

comp_dispersal(t0=0, t1=60, **kwargs)

comp_operators(pars)

comp_centroid(**kwargs)

comp_length(mode='minimal', recompute=False)

comp_spatial(**kwargs)

scale_to_length(pars=None, keys=None)

comp_source_metrics()

comp_wind()

comp_wind_metrics(woo, wo)

comp_final_anemotaxis(woo)

comp_PI2(xys, x=0.04)

comp_PI(arena_xdim, xs, return_num=False)

comp_dataPI()

process(proc_keys=['angular', 'spatial'], dsp_starts=[0], dsp_stops=[40, 60], tor_durs=[5, 10, 20], is_last=False, **kwargs)

get_par(par=None, k=None, key='step')

sample_larvagroup(N=1, ps=[])

imitate_larvagroup(N=None, ps=None)

property existing_dispersion_ranges

convert_to_pint()

class larvaworld.lib.process.dataset.BaseLarvaDataset(dir: str | None = None, refID: str | None = None, load_data: bool = True, config: larvaworld.lib.util.AttrDict | None = None, step: pandas.DataFrame | None = None, end: pandas.DataFrame | None = None, agents: list[str] | None = None, initialize: bool = False, **kwargs: Any)

Bases: ParamLarvaDataset

Base class for named objects that support Parameters and message formatting.

Automatic object naming: Every Parameterized instance has a name parameter. If the user doesn’t designate a name=<str> argument when constructing the object, the object will be given a name consisting of its class name followed by a unique 5-digit number.

Automatic parameter setting: The Parameterized __init__ method will automatically read the list of keyword parameters. If any keyword matches the name of a Parameter (see Parameter class) defined in the object’s class or any of its superclasses, that parameter in the instance will get the value given as a keyword argument. For example:

class Foo(Parameterized):
xx = Parameter(default=1)

foo = Foo(xx=20)

in this case foo.xx gets the value 20.

When initializing a Parameterized instance (‘foo’ in the example above), the values of parameters can be supplied as keyword arguments to the constructor (using parametername=parametervalue); these values will override the class default values for this one instance.

If no ‘name’ parameter is supplied, self.name defaults to the object’s class name with a unique number appended to it.

Message formatting: Each Parameterized instance has several methods for optionally printing output. This functionality is based on the standard Python ‘logging’ module; using the methods provided here, wraps calls to the ‘logging’ module’s root logger and prepends each message with information about the instance from which the call was made. For more information on how to set the global logging level and change the default message prefix, see documentation for the ‘logging’ module.

static initGeo(to_Geo: bool = False, **kwargs: Any) → BaseLarvaDataset

generate_config(**kwargs)

delete()

set_id(id, save=True)

class larvaworld.lib.process.dataset.LarvaDataset(**kwargs: Any)

Bases: BaseLarvaDataset

Base class for named objects that support Parameters and message formatting.

Automatic object naming: Every Parameterized instance has a name parameter. If the user doesn’t designate a name=<str> argument when constructing the object, the object will be given a name consisting of its class name followed by a unique 5-digit number.

Automatic parameter setting: The Parameterized __init__ method will automatically read the list of keyword parameters. If any keyword matches the name of a Parameter (see Parameter class) defined in the object’s class or any of its superclasses, that parameter in the instance will get the value given as a keyword argument. For example:

class Foo(Parameterized):
xx = Parameter(default=1)

foo = Foo(xx=20)

in this case foo.xx gets the value 20.

When initializing a Parameterized instance (‘foo’ in the example above), the values of parameters can be supplied as keyword arguments to the constructor (using parametername=parametervalue); these values will override the class default values for this one instance.

If no ‘name’ parameter is supplied, self.name defaults to the object’s class name with a unique number appended to it.

Message formatting: Each Parameterized instance has several methods for optionally printing output. This functionality is based on the standard Python ‘logging’ module; using the methods provided here, wraps calls to the ‘logging’ module’s root logger and prepends each message with information about the instance from which the call was made. For more information on how to set the global logging level and change the default message prefix, see documentation for the ‘logging’ module.

visualize(parameters={}, **kwargs)

enrich(pre_kws={}, proc_keys=[], anot_keys=[], is_last=True, mode='minimal', recompute=False, **kwargs)

property epoch_bound_dicts

get_chunk_par(chunk, k=None, par=None, min_dur=0, mode='distro')

class larvaworld.lib.process.dataset.LarvaDatasetCollection(labels: list[str] | None = None, colors: list[Any] | None = None, add_samples: bool = False, config: larvaworld.lib.util.AttrDict | None = None, **kwargs: Any)

config = None

datasets

labels = None

Ndatasets

colors = None

group_ids

Ngroups

dir

set_dir(dir=None)

property plot_dir

plot(ids=[], gIDs=[], **kwargs)

get_datasets(datasets=None, refIDs=None, dirs=None, group_id=None)

get_colors()

property data_dict

property data_palette

property data_palette_with_N

property color_palette

property Nticks

property N

property labels_with_N

property fr

property dt

property duration

property tlim

trange(unit='min')

property arena_dims

property arena_geometry

concat_data(key)

classmethod from_agentpy_output(output=None, agents=None, to_Geo=False): Convert agentpy output to a LarvaDataset