larvaworld.lib.process.dataset_geo ================================== .. py:module:: larvaworld.lib.process.dataset_geo .. autoapi-nested-parse:: Script that uses the movingpandas and geopandas libraries to facilitate spatial data processing. Analysis starts from scratch meaning it only uses the primary tracked xy coordinates of the midline and contour. The data used for this illustration is loaded from a stored reference dataset under the RefID. Only the relevant columns of the double-index stepwise pandas dataframe are used. See below for further explanation Classes ------- .. autoapisummary:: larvaworld.lib.process.dataset_geo.GeoLarvaDataset Module Contents --------------- .. py:class:: GeoLarvaDataset(step: Optional[pandas.DataFrame] = None, dt: Optional[float] = None, **kwargs: Any) Bases: :py:obj:`larvaworld.lib.process.dataset.BaseLarvaDataset`, :py:obj:`movingpandas.TrajectoryCollection` An alternative mode to maintain a dataset is this "trajectory" mode inheriting from "movingpandas.TrajectoryCollection" and adjusted to the needs of the larva-tracking data format It has the following powerful aspects : - Straightforward computation of distance, velocity, acceleration and other spatial metrics. We also compute the scaled metrics by dividing by the agent's body length - Both the timeseries and endpoint data are kept in pintpandas dataframes supporting SI units - Shapely objects for easy drawing. These are stored as normal columns in the timeseries Geo dataframes. All columns except shapely objects that do not support units have a defined unit dtype. .. py:method:: init_gdf(step: pandas.DataFrame, dt: float) -> geopandas.GeoDataFrame Initialize GeoDataFrame from step data. Converts a step DataFrame with trajectory data into a GeoDataFrame with datetime index and geometry column for spatial analysis. Args: step: Step DataFrame with columns 'x', 'y', and either 'datetime', 't', or 'Step'. Multi-indexed or single-indexed. dt: Timestep duration in seconds for converting Step to datetime. Returns: GeoDataFrame with datetime index, 'xy' Point geometry column, and all original columns preserved with pint units where applicable. Example: >>> gdf = self.init_gdf(step_df, dt=0.1) >>> gdf.geometry # 'xy' column with Point objects .. py:method:: init_mpd(step: pandas.DataFrame, dt: float) -> None Initialize movingpandas TrajectoryCollection. Converts step DataFrame into movingpandas TrajectoryCollection with proper units (pint) and grouped by AgentID. Args: step: Step DataFrame with trajectory data (x, y, AgentID). dt: Timestep duration in seconds. Side Effects: Initializes self as TrajectoryCollection with trajectories for each agent, spatial data converted to pint units. Example: >>> self.init_mpd(step_df, dt=0.1) >>> for traj in self: # Iterate over agent trajectories >>> print(traj.id, traj.get_length()) .. py:method:: set_data(step: Optional[pandas.DataFrame] = None, end: Optional[pandas.DataFrame] = None, **kwargs: Any) -> None Drop the Nan timesteps. This is extremely convenient as the geopandas handles unequal trajectories easily Build three shapely geometries : - The midline xy points are converted into a LineString - The contour xy points are converted into a Polygon - The position xy is converted into a Point All of these are timeseries over the entire dataset duration. Eventually the position xy timeseries is set to be the required geometry for the Trajectory instance of geopandas. By dropping the index levels to columns we can specify : - a column name "t" for timing. This is converted to a datetime type by using the timestep and the index ticks - a column name "AgentID" that is used to group the trajectories by agent Straightforward computation of the midline length at every timestep by taking the Linestring length. The mean length will be used for scaling .. py:property:: traj_dic .. py:property:: dt_mag .. py:property:: iter .. py:method:: add_transposed_traj(name='xy_origin') Computing also the transosed trajectory that starts from the center. Plotting the original and the transposed trajectories .. py:method:: add_speed(name=None, **kwargs) Add speed column and values to the trajectories. Speed is calculated as CRS units per second, except if the CRS is geographic (e.g. EPSG:4326 WGS84) then speed is calculated in meters per second. Parameters ---------- overwrite : bool Whether to overwrite existing speed values (default: False) name : str Name of the speed column (default: speed) units : tuple(str) Units in which to calculate speed distance : str Abbreviation for the distance unit (default: CRS units, or metres if geographic) time : str Abbreviation for the time unit (default: seconds) For more info, check the list of supported units at https://movingpandas.org/units n_processes : int or None, optional Number of processes to use for computation (default: 1). If set to `None`, the number of processes will be set to `os.cpu_count()` (or `os.process_cpu_count()` in Python 3.13+), enabling full CPU utilization via multiprocessing. n_threads : int, optional DEPRECATED. Use `n_processes` instead. This parameter will be removed in a future version. Raises ------ ValueError If both `n_processes` and the deprecated `n_threads` are provided. .. py:method:: add_distance(name=None, **kwargs) Add distance column and values to the trajectories. Parameters ---------- overwrite : bool Whether to overwrite existing distance values (default: False) name : str Name of the distance column (default: "distance") units : str Units in which to calculate distance values (default: CRS units) For more info, check the list of supported units at https://movingpandas.org/units n_processes : int or None, optional Number of processes to use for computation (default: 1). If set to `None`, the number of processes will be set to `os.cpu_count()` (or `os.process_cpu_count()` in Python 3.13+), enabling full CPU utilization via multiprocessing. n_threads : int, optional DEPRECATED. Use `n_processes` instead. This parameter will be removed in a future version. .. py:method:: add_acceleration(name=None, **kwargs) Add acceleration column and values to the trajectories. Acceleration is calculated as CRS units per second squared, except if the CRS is geographic (e.g. EPSG:4326 WGS84) then acceleration is calculated in meters per second squared. Parameters ---------- overwrite : bool Whether to overwrite existing acceleration values (default: False) name : str Name of the acceleration column (default: "acceleration") units : tuple(str) Units in which to calculate acceleration distance : str Abbreviation for the distance unit (default: CRS units, or metres if geographic) time : str Abbreviation for the time unit (default: seconds) time2 : str Abbreviation for the second time unit (default: seconds) For more info, check the list of supported units at https://movingpandas.org/units n_processes : int or None, optional Number of processes to use for computation (default: 1). If set to `None`, the number of processes will be set to `os.cpu_count()` (or `os.process_cpu_count()` in Python 3.13+), enabling full CPU utilization via multiprocessing. n_threads : int, optional DEPRECATED. Use `n_processes` instead. This parameter will be removed in a future version. .. py:method:: scale_to_length(pars=None, ks=None) .. py:method:: get_means(pars=None, ks=None) .. py:property:: dtypes .. py:method:: drop_xy_Nones() .. py:method:: detect_pauses(max_scaled_diameter=0.3, min_duration=timedelta(seconds=1)) Annotates crawl-pauses in timeseries. Extended description of function. Parameters ---------- a : array 1D np.array : forward velocity timeseries vel_thr : float Maximum velocity threshold runs : list A list of pairs of the start-end indices of the runs. If provided pauses that overlap with runs will be excluded. min_dur : float, optional The minimum required duration for a turn Returns ------- pauses : list A list of pairs of the start-end indices of the pauses. .. py:property:: spatial_unit .. py:property:: spatial_pint_unit .. py:property:: temporal_pint_unit .. py:method:: cols_exist_in_all_traj(cols) .. py:method:: time_to_datetime(t) .. py:method:: get_locations_at(t) Returns GeoDataFrame with trajectory locations at the specified timestamp Parameters ---------- t : datetime.datetime Timestamp to extract trajectory locations for Returns ------- GeoDataFrame Trajectory locations at timestamp t .. py:method:: get_locations_at_tick(tick) .. py:method:: get_segments_between(t1, t2) Return Trajectory segments between times t1 and t2. Parameters ---------- t1 : datetime.datetime Start time for the segments t2 : datetime.datetime End time for the segments Returns ------- TrajectoryCollection Extracted trajectory segments .. py:method:: get_complete_segments_between(t1, t2) .. py:method:: get_segments_between_ticks(tick1, tick2) .. py:method:: get_length_from_traj_with_nans(traj) .. py:method:: build_endpoint_data(e: Optional[pandas.DataFrame] = None) -> pandas.DataFrame Build endpoint metrics for all trajectories. Computes summary statistics for each trajectory including cumulative distance, duration, start/end positions, and temporal extent. Args: e: Optional existing endpoint DataFrame to update. If None, creates new DataFrame with AgentID index. Returns: DataFrame indexed by AgentID with columns: - cum_d: Cumulative distance traveled (spatial units) - cum_t: Total duration (seconds) - t0, t_fin: Start and end timestamps - x0, x_fin, y0, y_fin: Start and end coordinates - group: Group ID from configuration Example: >>> endpoint = self.build_endpoint_data() >>> print(endpoint[['cum_d', 'cum_t']]) # Distance and time per agent .. py:method:: load_midline(drop: bool = True, keep_midline_LineString: bool = False) -> None Load and process midline geometry data. Computes midline length from tracked midline points, either keeping the LineString geometry or just computing the length metric. Args: drop: If True, drops midline xy columns after processing to save memory. keep_midline_LineString: If True, keeps 'midline' column with shapely LineString objects. If False, only computes and stores length values. Side Effects: - Adds 'length' column to each trajectory DataFrame - Adds mean 'length' to endpoint_data - Optionally adds 'midline' LineString column - Drops midline xy columns if drop=True Example: >>> self.load_midline(drop=True, keep_midline_LineString=False) >>> mean_length = self.endpoint_data['length'].mean() .. py:method:: load_contour(drop=True) .. py:method:: set_dtype(cols, units) .. py:method:: from_ID(refID, **kwargs) :classmethod: .. py:method:: path_to_file(file='geostep') .. py:method:: save(refID=None) .. py:property:: df .. py:method:: get_step_data() .. py:property:: duration .. py:method:: interpolate_traj(dt: float = 0.1) -> pandas.DataFrame Interpolate trajectories to uniform timestep grid. Resamples all trajectories to a common temporal grid with specified timestep, enabling synchronized multi-agent analysis. Args: dt: Target timestep duration in seconds for interpolation. Default is 0.1 seconds (10 Hz). Returns: Multi-indexed DataFrame with (Step, AgentID) index containing interpolated x, y coordinates at uniform time intervals. Missing values are NaN for agents not present at those times. Side Effects: Updates endpoint_data with tick0, tick1, N_ticks columns indicating valid time ranges for each agent. Example: >>> interp_df = self.interpolate_traj(dt=0.1) >>> interp_df.loc[10] # All agents at timestep 10 .. py:method:: load_traj() .. py:method:: load(**kwargs) .. py:method:: match_ids() .. py:method:: comp_spatial()