larvaworld.lib.process.dataset_geo
==================================

.. py:module:: larvaworld.lib.process.dataset_geo

.. autoapi-nested-parse::

   Script that uses the movingpandas and geopandas libraries to facilitate spatial data processing.
   Analysis starts from scratch meaning it only uses the primary tracked xy coordinates of the midline and contour.
   The data used for this illustration is loaded from a stored reference dataset under the RefID.
   Only the relevant columns of the double-index stepwise pandas dataframe are used.
   See below for further explanation


Classes
-------

.. autoapisummary::

   larvaworld.lib.process.dataset_geo.GeoLarvaDataset


Module Contents
---------------

.. py:class:: GeoLarvaDataset(step: Optional[pandas.DataFrame] = None, dt: Optional[float] = None, **kwargs: Any)

   Bases: :py:obj:`larvaworld.lib.process.dataset.BaseLarvaDataset`, :py:obj:`movingpandas.TrajectoryCollection`


   An alternative mode to maintain a dataset is this "trajectory" mode inheriting from "movingpandas.TrajectoryCollection"
    and adjusted to the needs of the larva-tracking data format

   It has the following powerful aspects :
    -  Straightforward computation of distance, velocity, acceleration and other spatial metrics.
       We also compute the scaled metrics by dividing by the agent's body length
    -  Both the timeseries and endpoint data are kept in pintpandas dataframes supporting SI units
    -  Shapely objects for easy drawing. These are stored as normal columns in the timeseries Geo dataframes.
       All columns except shapely objects that do not support units have a defined unit dtype.


   .. py:method:: init_gdf(step: pandas.DataFrame, dt: float) -> geopandas.GeoDataFrame

      Initialize GeoDataFrame from step data.

      Converts a step DataFrame with trajectory data into a GeoDataFrame
      with datetime index and geometry column for spatial analysis.

      Args:
          step: Step DataFrame with columns 'x', 'y', and either 'datetime',
                't', or 'Step'. Multi-indexed or single-indexed.
          dt: Timestep duration in seconds for converting Step to datetime.

      Returns:
          GeoDataFrame with datetime index, 'xy' Point geometry column,
          and all original columns preserved with pint units where applicable.

      Example:
          >>> gdf = self.init_gdf(step_df, dt=0.1)
          >>> gdf.geometry  # 'xy' column with Point objects


   .. py:method:: init_mpd(step: pandas.DataFrame, dt: float) -> None

      Initialize movingpandas TrajectoryCollection.

      Converts step DataFrame into movingpandas TrajectoryCollection
      with proper units (pint) and grouped by AgentID.

      Args:
          step: Step DataFrame with trajectory data (x, y, AgentID).
          dt: Timestep duration in seconds.

      Side Effects:
          Initializes self as TrajectoryCollection with trajectories
          for each agent, spatial data converted to pint units.

      Example:
          >>> self.init_mpd(step_df, dt=0.1)
          >>> for traj in self:  # Iterate over agent trajectories
          >>>     print(traj.id, traj.get_length())


   .. py:method:: set_data(step: Optional[pandas.DataFrame] = None, end: Optional[pandas.DataFrame] = None, **kwargs: Any) -> None

      Drop the Nan timesteps. This is extremely convenient as the geopandas handles unequal trajectories easily
      Build three shapely geometries :
       - The midline xy points are converted into a LineString
        - The contour xy points are converted into a Polygon
        - The position xy is converted into a Point
      All of these are timeseries over the entire dataset duration.
      Eventually the position xy timeseries is set to be the required geometry for the Trajectory instance of geopandas.
      By dropping the index levels to columns we can specify :
       - a column name "t" for timing. This is converted to a datetime type by using the timestep and the index ticks
      - a column name "AgentID" that is used to group the trajectories by agent

      Straightforward computation of the midline length at every timestep by taking the Linestring length.
      The mean length will be used for scaling


   .. py:property:: traj_dic


   .. py:property:: dt_mag


   .. py:property:: iter


   .. py:method:: add_transposed_traj(name='xy_origin')

      Computing also the transosed trajectory that starts from the center.
      Plotting the original and the transposed trajectories


   .. py:method:: add_speed(name=None, **kwargs)

      Add speed column and values to the trajectories.

      Speed is calculated as CRS units per second, except if the CRS is geographic
      (e.g. EPSG:4326 WGS84) then speed is calculated in meters per second.

      Parameters
      ----------
      overwrite : bool
          Whether to overwrite existing speed values (default: False)
      name : str
          Name of the speed column (default: speed)
      units : tuple(str)
          Units in which to calculate speed

          distance : str
              Abbreviation for the distance unit
              (default: CRS units, or metres if geographic)
          time : str
              Abbreviation for the time unit (default: seconds)

          For more info, check the list of supported units at
          https://movingpandas.org/units
      n_processes : int or None, optional
          Number of processes to use for computation (default: 1). If set to `None`,
          the number of processes will be set to `os.cpu_count()`
          (or `os.process_cpu_count()` in Python 3.13+), enabling full CPU
          utilization via multiprocessing.
      n_threads : int, optional
          DEPRECATED. Use `n_processes` instead. This parameter will be
          removed in a future version.

      Raises
      ------
      ValueError
          If both `n_processes` and the deprecated `n_threads` are provided.


   .. py:method:: add_distance(name=None, **kwargs)

      Add distance column and values to the trajectories.

      Parameters
      ----------
      overwrite : bool
          Whether to overwrite existing distance values (default: False)
      name : str
          Name of the distance column (default: "distance")
      units : str
          Units in which to calculate distance values (default: CRS units)
          For more info, check the list of supported units at
          https://movingpandas.org/units
      n_processes : int or None, optional
          Number of processes to use for computation (default: 1). If set to `None`,
          the number of processes will be set to `os.cpu_count()`
          (or `os.process_cpu_count()` in Python 3.13+), enabling full CPU
          utilization via multiprocessing.
      n_threads : int, optional
          DEPRECATED. Use `n_processes` instead. This parameter will be
          removed in a future version.


   .. py:method:: add_acceleration(name=None, **kwargs)

      Add acceleration column and values to the trajectories.

      Acceleration is calculated as CRS units per second squared,
      except if the CRS is geographic (e.g. EPSG:4326 WGS84) then acceleration is
      calculated in meters per second squared.

      Parameters
      ----------
      overwrite : bool
          Whether to overwrite existing acceleration values (default: False)
      name : str
          Name of the acceleration column (default: "acceleration")
      units : tuple(str)
          Units in which to calculate acceleration

          distance : str
              Abbreviation for the distance unit
              (default: CRS units, or metres if geographic)
          time : str
              Abbreviation for the time unit (default: seconds)
          time2 : str
              Abbreviation for the second time unit (default: seconds)

          For more info, check the list of supported units at
          https://movingpandas.org/units
      n_processes : int or None, optional
          Number of processes to use for computation (default: 1). If set to `None`,
          the number of processes will be set to `os.cpu_count()`
          (or `os.process_cpu_count()` in Python 3.13+), enabling full CPU
          utilization via multiprocessing.
      n_threads : int, optional
          DEPRECATED. Use `n_processes` instead. This parameter will be
          removed in a future version.


   .. py:method:: scale_to_length(pars=None, ks=None)


   .. py:method:: get_means(pars=None, ks=None)


   .. py:property:: dtypes


   .. py:method:: drop_xy_Nones()


   .. py:method:: detect_pauses(max_scaled_diameter=0.3, min_duration=timedelta(seconds=1))

      Annotates crawl-pauses in timeseries.

      Extended description of function.

      Parameters
      ----------
      a : array
          1D np.array : forward velocity timeseries
      vel_thr : float
          Maximum velocity threshold
      runs : list
          A list of pairs of the start-end indices of the runs.
          If provided pauses that overlap with runs will be excluded.
      min_dur : float, optional
          The minimum required duration for a turn

      Returns
      -------
      pauses : list
          A list of pairs of the start-end indices of the pauses.


   .. py:property:: spatial_unit


   .. py:property:: spatial_pint_unit


   .. py:property:: temporal_pint_unit


   .. py:method:: cols_exist_in_all_traj(cols)


   .. py:method:: time_to_datetime(t)


   .. py:method:: get_locations_at(t)

      Returns GeoDataFrame with trajectory locations at the specified timestamp

      Parameters
      ----------
      t : datetime.datetime
          Timestamp to extract trajectory locations for

      Returns
      -------
      GeoDataFrame
          Trajectory locations at timestamp t


   .. py:method:: get_locations_at_tick(tick)


   .. py:method:: get_segments_between(t1, t2)

      Return Trajectory segments between times t1 and t2.

      Parameters
      ----------
      t1 : datetime.datetime
          Start time for the segments
      t2 : datetime.datetime
          End time for the segments

      Returns
      -------
      TrajectoryCollection
          Extracted trajectory segments


   .. py:method:: get_complete_segments_between(t1, t2)


   .. py:method:: get_segments_between_ticks(tick1, tick2)


   .. py:method:: get_length_from_traj_with_nans(traj)


   .. py:method:: build_endpoint_data(e: Optional[pandas.DataFrame] = None) -> pandas.DataFrame

      Build endpoint metrics for all trajectories.

      Computes summary statistics for each trajectory including cumulative
      distance, duration, start/end positions, and temporal extent.

      Args:
          e: Optional existing endpoint DataFrame to update. If None,
             creates new DataFrame with AgentID index.

      Returns:
          DataFrame indexed by AgentID with columns:
          - cum_d: Cumulative distance traveled (spatial units)
          - cum_t: Total duration (seconds)
          - t0, t_fin: Start and end timestamps
          - x0, x_fin, y0, y_fin: Start and end coordinates
          - group: Group ID from configuration

      Example:
          >>> endpoint = self.build_endpoint_data()
          >>> print(endpoint[['cum_d', 'cum_t']])  # Distance and time per agent


   .. py:method:: load_midline(drop: bool = True, keep_midline_LineString: bool = False) -> None

      Load and process midline geometry data.

      Computes midline length from tracked midline points, either keeping
      the LineString geometry or just computing the length metric.

      Args:
          drop: If True, drops midline xy columns after processing
                to save memory.
          keep_midline_LineString: If True, keeps 'midline' column with
                                   shapely LineString objects. If False,
                                   only computes and stores length values.

      Side Effects:
          - Adds 'length' column to each trajectory DataFrame
          - Adds mean 'length' to endpoint_data
          - Optionally adds 'midline' LineString column
          - Drops midline xy columns if drop=True

      Example:
          >>> self.load_midline(drop=True, keep_midline_LineString=False)
          >>> mean_length = self.endpoint_data['length'].mean()


   .. py:method:: load_contour(drop=True)


   .. py:method:: set_dtype(cols, units)


   .. py:method:: from_ID(refID, **kwargs)
      :classmethod:


   .. py:method:: path_to_file(file='geostep')


   .. py:method:: save(refID=None)


   .. py:property:: df


   .. py:method:: get_step_data()


   .. py:property:: duration


   .. py:method:: interpolate_traj(dt: float = 0.1) -> pandas.DataFrame

      Interpolate trajectories to uniform timestep grid.

      Resamples all trajectories to a common temporal grid with specified
      timestep, enabling synchronized multi-agent analysis.

      Args:
          dt: Target timestep duration in seconds for interpolation.
              Default is 0.1 seconds (10 Hz).

      Returns:
          Multi-indexed DataFrame with (Step, AgentID) index containing
          interpolated x, y coordinates at uniform time intervals.
          Missing values are NaN for agents not present at those times.

      Side Effects:
          Updates endpoint_data with tick0, tick1, N_ticks columns
          indicating valid time ranges for each agent.

      Example:
          >>> interp_df = self.interpolate_traj(dt=0.1)
          >>> interp_df.loc[10]  # All agents at timestep 10


   .. py:method:: load_traj()


   .. py:method:: load(**kwargs)


   .. py:method:: match_ids()


   .. py:method:: comp_spatial()