larvaworld.lib.process.dataset_geo
Script that uses the movingpandas and geopandas libraries to facilitate spatial data processing. Analysis starts from scratch meaning it only uses the primary tracked xy coordinates of the midline and contour. The data used for this illustration is loaded from a stored reference dataset under the RefID. Only the relevant columns of the double-index stepwise pandas dataframe are used. See below for further explanation
Classes
An alternative mode to maintain a dataset is this "trajectory" mode inheriting from "movingpandas.TrajectoryCollection" |
Module Contents
- class larvaworld.lib.process.dataset_geo.GeoLarvaDataset(step: pandas.DataFrame | None = None, dt: float | None = None, **kwargs: Any)
Bases:
larvaworld.lib.process.dataset.BaseLarvaDataset,movingpandas.TrajectoryCollection- An alternative mode to maintain a dataset is this “trajectory” mode inheriting from “movingpandas.TrajectoryCollection”
and adjusted to the needs of the larva-tracking data format
- It has the following powerful aspects :
Straightforward computation of distance, velocity, acceleration and other spatial metrics. We also compute the scaled metrics by dividing by the agent’s body length
Both the timeseries and endpoint data are kept in pintpandas dataframes supporting SI units
Shapely objects for easy drawing. These are stored as normal columns in the timeseries Geo dataframes. All columns except shapely objects that do not support units have a defined unit dtype.
- init_gdf(step: pandas.DataFrame, dt: float) geopandas.GeoDataFrame
Initialize GeoDataFrame from step data.
Converts a step DataFrame with trajectory data into a GeoDataFrame with datetime index and geometry column for spatial analysis.
- Args:
- step: Step DataFrame with columns ‘x’, ‘y’, and either ‘datetime’,
‘t’, or ‘Step’. Multi-indexed or single-indexed.
dt: Timestep duration in seconds for converting Step to datetime.
- Returns:
GeoDataFrame with datetime index, ‘xy’ Point geometry column, and all original columns preserved with pint units where applicable.
- Example:
>>> gdf = self.init_gdf(step_df, dt=0.1) >>> gdf.geometry # 'xy' column with Point objects
- init_mpd(step: pandas.DataFrame, dt: float) None
Initialize movingpandas TrajectoryCollection.
Converts step DataFrame into movingpandas TrajectoryCollection with proper units (pint) and grouped by AgentID.
- Args:
step: Step DataFrame with trajectory data (x, y, AgentID). dt: Timestep duration in seconds.
- Side Effects:
Initializes self as TrajectoryCollection with trajectories for each agent, spatial data converted to pint units.
- Example:
>>> self.init_mpd(step_df, dt=0.1) >>> for traj in self: # Iterate over agent trajectories >>> print(traj.id, traj.get_length())
- set_data(step: pandas.DataFrame | None = None, end: pandas.DataFrame | None = None, **kwargs: Any) None
Drop the Nan timesteps. This is extremely convenient as the geopandas handles unequal trajectories easily Build three shapely geometries :
The midline xy points are converted into a LineString
The contour xy points are converted into a Polygon
The position xy is converted into a Point
All of these are timeseries over the entire dataset duration. Eventually the position xy timeseries is set to be the required geometry for the Trajectory instance of geopandas. By dropping the index levels to columns we can specify :
a column name “t” for timing. This is converted to a datetime type by using the timestep and the index ticks
a column name “AgentID” that is used to group the trajectories by agent
Straightforward computation of the midline length at every timestep by taking the Linestring length. The mean length will be used for scaling
- property traj_dic
- property dt_mag
- property iter
- add_transposed_traj(name='xy_origin')
Computing also the transosed trajectory that starts from the center. Plotting the original and the transposed trajectories
- add_speed(name=None, **kwargs)
Add speed column and values to the trajectories.
Speed is calculated as CRS units per second, except if the CRS is geographic (e.g. EPSG:4326 WGS84) then speed is calculated in meters per second.
Parameters
- overwritebool
Whether to overwrite existing speed values (default: False)
- namestr
Name of the speed column (default: speed)
- unitstuple(str)
Units in which to calculate speed
- distancestr
Abbreviation for the distance unit (default: CRS units, or metres if geographic)
- timestr
Abbreviation for the time unit (default: seconds)
For more info, check the list of supported units at https://movingpandas.org/units
- n_processesint or None, optional
Number of processes to use for computation (default: 1). If set to None, the number of processes will be set to os.cpu_count() (or os.process_cpu_count() in Python 3.13+), enabling full CPU utilization via multiprocessing.
- n_threadsint, optional
DEPRECATED. Use n_processes instead. This parameter will be removed in a future version.
Raises
- ValueError
If both n_processes and the deprecated n_threads are provided.
- add_distance(name=None, **kwargs)
Add distance column and values to the trajectories.
Parameters
- overwritebool
Whether to overwrite existing distance values (default: False)
- namestr
Name of the distance column (default: “distance”)
- unitsstr
Units in which to calculate distance values (default: CRS units) For more info, check the list of supported units at https://movingpandas.org/units
- n_processesint or None, optional
Number of processes to use for computation (default: 1). If set to None, the number of processes will be set to os.cpu_count() (or os.process_cpu_count() in Python 3.13+), enabling full CPU utilization via multiprocessing.
- n_threadsint, optional
DEPRECATED. Use n_processes instead. This parameter will be removed in a future version.
- add_acceleration(name=None, **kwargs)
Add acceleration column and values to the trajectories.
Acceleration is calculated as CRS units per second squared, except if the CRS is geographic (e.g. EPSG:4326 WGS84) then acceleration is calculated in meters per second squared.
Parameters
- overwritebool
Whether to overwrite existing acceleration values (default: False)
- namestr
Name of the acceleration column (default: “acceleration”)
- unitstuple(str)
Units in which to calculate acceleration
- distancestr
Abbreviation for the distance unit (default: CRS units, or metres if geographic)
- timestr
Abbreviation for the time unit (default: seconds)
- time2str
Abbreviation for the second time unit (default: seconds)
For more info, check the list of supported units at https://movingpandas.org/units
- n_processesint or None, optional
Number of processes to use for computation (default: 1). If set to None, the number of processes will be set to os.cpu_count() (or os.process_cpu_count() in Python 3.13+), enabling full CPU utilization via multiprocessing.
- n_threadsint, optional
DEPRECATED. Use n_processes instead. This parameter will be removed in a future version.
- scale_to_length(pars=None, ks=None)
- get_means(pars=None, ks=None)
- property dtypes
- drop_xy_Nones()
- detect_pauses(max_scaled_diameter=0.3, min_duration=timedelta(seconds=1))
Annotates crawl-pauses in timeseries.
Extended description of function.
Parameters
- aarray
1D np.array : forward velocity timeseries
- vel_thrfloat
Maximum velocity threshold
- runslist
A list of pairs of the start-end indices of the runs. If provided pauses that overlap with runs will be excluded.
- min_durfloat, optional
The minimum required duration for a turn
Returns
- pauseslist
A list of pairs of the start-end indices of the pauses.
- property spatial_unit
- property spatial_pint_unit
- property temporal_pint_unit
- cols_exist_in_all_traj(cols)
- time_to_datetime(t)
- get_locations_at(t)
Returns GeoDataFrame with trajectory locations at the specified timestamp
Parameters
- tdatetime.datetime
Timestamp to extract trajectory locations for
Returns
- GeoDataFrame
Trajectory locations at timestamp t
- get_locations_at_tick(tick)
- get_segments_between(t1, t2)
Return Trajectory segments between times t1 and t2.
Parameters
- t1datetime.datetime
Start time for the segments
- t2datetime.datetime
End time for the segments
Returns
- TrajectoryCollection
Extracted trajectory segments
- get_complete_segments_between(t1, t2)
- get_segments_between_ticks(tick1, tick2)
- get_length_from_traj_with_nans(traj)
- build_endpoint_data(e: pandas.DataFrame | None = None) pandas.DataFrame
Build endpoint metrics for all trajectories.
Computes summary statistics for each trajectory including cumulative distance, duration, start/end positions, and temporal extent.
- Args:
- e: Optional existing endpoint DataFrame to update. If None,
creates new DataFrame with AgentID index.
- Returns:
DataFrame indexed by AgentID with columns: - cum_d: Cumulative distance traveled (spatial units) - cum_t: Total duration (seconds) - t0, t_fin: Start and end timestamps - x0, x_fin, y0, y_fin: Start and end coordinates - group: Group ID from configuration
- Example:
>>> endpoint = self.build_endpoint_data() >>> print(endpoint[['cum_d', 'cum_t']]) # Distance and time per agent
- load_midline(drop: bool = True, keep_midline_LineString: bool = False) None
Load and process midline geometry data.
Computes midline length from tracked midline points, either keeping the LineString geometry or just computing the length metric.
- Args:
- drop: If True, drops midline xy columns after processing
to save memory.
- keep_midline_LineString: If True, keeps ‘midline’ column with
shapely LineString objects. If False, only computes and stores length values.
- Side Effects:
Adds ‘length’ column to each trajectory DataFrame
Adds mean ‘length’ to endpoint_data
Optionally adds ‘midline’ LineString column
Drops midline xy columns if drop=True
- Example:
>>> self.load_midline(drop=True, keep_midline_LineString=False) >>> mean_length = self.endpoint_data['length'].mean()
- load_contour(drop=True)
- set_dtype(cols, units)
- classmethod from_ID(refID, **kwargs)
- path_to_file(file='geostep')
- save(refID=None)
- property df
- get_step_data()
- property duration
- interpolate_traj(dt: float = 0.1) pandas.DataFrame
Interpolate trajectories to uniform timestep grid.
Resamples all trajectories to a common temporal grid with specified timestep, enabling synchronized multi-agent analysis.
- Args:
- dt: Target timestep duration in seconds for interpolation.
Default is 0.1 seconds (10 Hz).
- Returns:
Multi-indexed DataFrame with (Step, AgentID) index containing interpolated x, y coordinates at uniform time intervals. Missing values are NaN for agents not present at those times.
- Side Effects:
Updates endpoint_data with tick0, tick1, N_ticks columns indicating valid time ranges for each agent.
- Example:
>>> interp_df = self.interpolate_traj(dt=0.1) >>> interp_df.loc[10] # All agents at timestep 10
- load_traj()
- load(**kwargs)
- match_ids()
- comp_spatial()