larvaworld.lib.process.dataset_geo

Script that uses the movingpandas and geopandas libraries to facilitate spatial data processing. Analysis starts from scratch meaning it only uses the primary tracked xy coordinates of the midline and contour. The data used for this illustration is loaded from a stored reference dataset under the RefID. Only the relevant columns of the double-index stepwise pandas dataframe are used. See below for further explanation

Classes

GeoLarvaDataset

An alternative mode to maintain a dataset is this "trajectory" mode inheriting from "movingpandas.TrajectoryCollection"

Module Contents

class larvaworld.lib.process.dataset_geo.GeoLarvaDataset(step: pandas.DataFrame | None = None, dt: float | None = None, **kwargs: Any)

Bases: larvaworld.lib.process.dataset.BaseLarvaDataset, movingpandas.TrajectoryCollection

An alternative mode to maintain a dataset is this “trajectory” mode inheriting from “movingpandas.TrajectoryCollection”

and adjusted to the needs of the larva-tracking data format

It has the following powerful aspects :

Straightforward computation of distance, velocity, acceleration and other spatial metrics. We also compute the scaled metrics by dividing by the agent’s body length
Both the timeseries and endpoint data are kept in pintpandas dataframes supporting SI units
Shapely objects for easy drawing. These are stored as normal columns in the timeseries Geo dataframes. All columns except shapely objects that do not support units have a defined unit dtype.

init_gdf(step: pandas.DataFrame, dt: float) → geopandas.GeoDataFrame

Initialize GeoDataFrame from step data.

Converts a step DataFrame with trajectory data into a GeoDataFrame with datetime index and geometry column for spatial analysis.

Args:

step: Step DataFrame with columns ‘x’, ‘y’, and either ‘datetime’,: ‘t’, or ‘Step’. Multi-indexed or single-indexed.

dt: Timestep duration in seconds for converting Step to datetime.

Returns:

GeoDataFrame with datetime index, ‘xy’ Point geometry column, and all original columns preserved with pint units where applicable.

Example:

>>> gdf = self.init_gdf(step_df, dt=0.1)
>>> gdf.geometry  # 'xy' column with Point objects

init_mpd(step: pandas.DataFrame, dt: float) → None

Initialize movingpandas TrajectoryCollection.

Converts step DataFrame into movingpandas TrajectoryCollection with proper units (pint) and grouped by AgentID.

Args:

step: Step DataFrame with trajectory data (x, y, AgentID). dt: Timestep duration in seconds.

Side Effects:

Initializes self as TrajectoryCollection with trajectories for each agent, spatial data converted to pint units.

Example:

>>> self.init_mpd(step_df, dt=0.1)
>>> for traj in self:  # Iterate over agent trajectories
>>>     print(traj.id, traj.get_length())

set_data(step: pandas.DataFrame | None = None, end: pandas.DataFrame | None = None, **kwargs: Any) → None

Drop the Nan timesteps. This is extremely convenient as the geopandas handles unequal trajectories easily Build three shapely geometries :

The midline xy points are converted into a LineString

The contour xy points are converted into a Polygon

The position xy is converted into a Point

All of these are timeseries over the entire dataset duration. Eventually the position xy timeseries is set to be the required geometry for the Trajectory instance of geopandas. By dropping the index levels to columns we can specify :

a column name “t” for timing. This is converted to a datetime type by using the timestep and the index ticks

a column name “AgentID” that is used to group the trajectories by agent

Straightforward computation of the midline length at every timestep by taking the Linestring length. The mean length will be used for scaling

property traj_dic

property dt_mag

property iter

add_transposed_traj(name='xy_origin'): Computing also the transosed trajectory that starts from the center. Plotting the original and the transposed trajectories

add_speed(name=None, **kwargs)

Add speed column and values to the trajectories.

Speed is calculated as CRS units per second, except if the CRS is geographic (e.g. EPSG:4326 WGS84) then speed is calculated in meters per second.

Parameters

overwritebool

Whether to overwrite existing speed values (default: False)

namestr

Name of the speed column (default: speed)

unitstuple(str)

Units in which to calculate speed

distancestr: Abbreviation for the distance unit (default: CRS units, or metres if geographic)
timestr: Abbreviation for the time unit (default: seconds)

For more info, check the list of supported units at https://movingpandas.org/units

n_processesint or None, optional

Number of processes to use for computation (default: 1). If set to None, the number of processes will be set to os.cpu_count() (or os.process_cpu_count() in Python 3.13+), enabling full CPU utilization via multiprocessing.

n_threadsint, optional

DEPRECATED. Use n_processes instead. This parameter will be removed in a future version.

Raises

ValueError: If both n_processes and the deprecated n_threads are provided.

add_distance(name=None, **kwargs)

Add distance column and values to the trajectories.

Parameters

overwritebool: Whether to overwrite existing distance values (default: False)
namestr: Name of the distance column (default: “distance”)
unitsstr: Units in which to calculate distance values (default: CRS units) For more info, check the list of supported units at https://movingpandas.org/units
n_processesint or None, optional: Number of processes to use for computation (default: 1). If set to None, the number of processes will be set to os.cpu_count() (or os.process_cpu_count() in Python 3.13+), enabling full CPU utilization via multiprocessing.
n_threadsint, optional: DEPRECATED. Use n_processes instead. This parameter will be removed in a future version.

add_acceleration(name=None, **kwargs)

Add acceleration column and values to the trajectories.

Acceleration is calculated as CRS units per second squared, except if the CRS is geographic (e.g. EPSG:4326 WGS84) then acceleration is calculated in meters per second squared.

Parameters

overwritebool

Whether to overwrite existing acceleration values (default: False)

namestr

Name of the acceleration column (default: “acceleration”)

unitstuple(str)

Units in which to calculate acceleration

distancestr: Abbreviation for the distance unit (default: CRS units, or metres if geographic)
timestr: Abbreviation for the time unit (default: seconds)
time2str: Abbreviation for the second time unit (default: seconds)

For more info, check the list of supported units at https://movingpandas.org/units

n_processesint or None, optional

Number of processes to use for computation (default: 1). If set to None, the number of processes will be set to os.cpu_count() (or os.process_cpu_count() in Python 3.13+), enabling full CPU utilization via multiprocessing.

n_threadsint, optional

DEPRECATED. Use n_processes instead. This parameter will be removed in a future version.

scale_to_length(pars=None, ks=None)

get_means(pars=None, ks=None)

property dtypes

drop_xy_Nones()

detect_pauses(max_scaled_diameter=0.3, min_duration=timedelta(seconds=1))

Annotates crawl-pauses in timeseries.

Extended description of function.

Parameters

aarray: 1D np.array : forward velocity timeseries
vel_thrfloat: Maximum velocity threshold
runslist: A list of pairs of the start-end indices of the runs. If provided pauses that overlap with runs will be excluded.
min_durfloat, optional: The minimum required duration for a turn

Returns

pauseslist: A list of pairs of the start-end indices of the pauses.

property spatial_unit

property spatial_pint_unit

property temporal_pint_unit

cols_exist_in_all_traj(cols)

time_to_datetime(t)

get_locations_at(t)

Returns GeoDataFrame with trajectory locations at the specified timestamp

Parameters

tdatetime.datetime: Timestamp to extract trajectory locations for

Returns

GeoDataFrame: Trajectory locations at timestamp t

get_locations_at_tick(tick)

get_segments_between(t1, t2)

Return Trajectory segments between times t1 and t2.

Parameters

t1datetime.datetime: Start time for the segments
t2datetime.datetime: End time for the segments

Returns

TrajectoryCollection: Extracted trajectory segments

get_complete_segments_between(t1, t2)

get_segments_between_ticks(tick1, tick2)

get_length_from_traj_with_nans(traj)

build_endpoint_data(e: pandas.DataFrame | None = None) → pandas.DataFrame

Build endpoint metrics for all trajectories.

Computes summary statistics for each trajectory including cumulative distance, duration, start/end positions, and temporal extent.

Args:

e: Optional existing endpoint DataFrame to update. If None,: creates new DataFrame with AgentID index.

Returns:

DataFrame indexed by AgentID with columns: - cum_d: Cumulative distance traveled (spatial units) - cum_t: Total duration (seconds) - t0, t_fin: Start and end timestamps - x0, x_fin, y0, y_fin: Start and end coordinates - group: Group ID from configuration

Example:

>>> endpoint = self.build_endpoint_data()
>>> print(endpoint[['cum_d', 'cum_t']])  # Distance and time per agent

load_midline(drop: bool = True, keep_midline_LineString: bool = False) → None

Load and process midline geometry data.

Computes midline length from tracked midline points, either keeping the LineString geometry or just computing the length metric.

Args:

drop: If True, drops midline xy columns after processing: to save memory.
keep_midline_LineString: If True, keeps ‘midline’ column with: shapely LineString objects. If False, only computes and stores length values.

Side Effects:

Adds ‘length’ column to each trajectory DataFrame
Adds mean ‘length’ to endpoint_data
Optionally adds ‘midline’ LineString column
Drops midline xy columns if drop=True

Example:

>>> self.load_midline(drop=True, keep_midline_LineString=False)
>>> mean_length = self.endpoint_data['length'].mean()

load_contour(drop=True)

set_dtype(cols, units)

classmethod from_ID(refID, **kwargs)

path_to_file(file='geostep')

save(refID=None)

property df

get_step_data()

property duration

interpolate_traj(dt: float = 0.1) → pandas.DataFrame

Interpolate trajectories to uniform timestep grid.

Resamples all trajectories to a common temporal grid with specified timestep, enabling synchronized multi-agent analysis.

Args:

dt: Target timestep duration in seconds for interpolation.: Default is 0.1 seconds (10 Hz).

Returns:

Multi-indexed DataFrame with (Step, AgentID) index containing interpolated x, y coordinates at uniform time intervals. Missing values are NaN for agents not present at those times.

Side Effects:

Updates endpoint_data with tick0, tick1, N_ticks columns indicating valid time ranges for each agent.

Example:

>>> interp_df = self.interpolate_traj(dt=0.1)
>>> interp_df.loc[10]  # All agents at timestep 10

load_traj()

load(**kwargs)

match_ids()

comp_spatial()