Bout analysis#

Here is a brief demo on bout analysis with the bouts module for data generated by mixtures of random Poisson processes.

Set up the environment. Consider loading the logging module and setting up a logger to monitor progress to this section.

# Set up
import importlib.resources as rsrc
import pandas as pd
import matplotlib.pyplot as plt
from skdiveMove import calibrate
import skdiveMove.bouts as skbouts

# Declare figure sizes
_FIG1X1 = (7, 6)
_FIG1X2 = (12, 5)
_FIG3X1 = (11, 11)

Calculate postdive duration#

Create a TDR object to easily calculate the necessary statistics:

config_file = (rsrc.files("skdiveMove") / "config_examples" /
               "ag_mk7_2002_022_config.json")
tdr_file = (rsrc.files("skdiveMove") / "tests" /
            "data" / "ag_mk7_2002_022.nc")
tdrX = calibrate(tdr_file, config_file)
stats = tdrX.dive_stats()
stamps = tdrX.stamp_dives(ignore_z=True)
stats_tab = pd.concat((stamps, stats), axis=1)
stats_tab.info()

<class 'pandas.core.frame.DataFrame'>
Index: 426 entries, 1 to 426
Data columns (total 49 columns):
 #   Column               Non-Null Count  Dtype          
---  ------               --------------  -----          
 phase_id             426 non-null    int64          
 beg                  426 non-null    datetime64[ns] 
 end                  426 non-null    datetime64[ns] 
 begdesc              426 non-null    datetime64[ns] 
 enddesc              426 non-null    datetime64[ns] 
 begasc               426 non-null    datetime64[ns] 
 desctim              426 non-null    float64        
 botttim              292 non-null    float64        
 asctim               426 non-null    float64        
 divetim              426 non-null    float64        
descdist             426 non-null    float64        
bottdist             292 non-null    float64        
ascdist              426 non-null    float64        
bottdep_mean         292 non-null    float64        
bottdep_median       292 non-null    float64        
bottdep_sd           292 non-null    float64        
maxdep               426 non-null    float64        
desc_tdist           202 non-null    float64        
desc_mean_speed      202 non-null    float64        
desc_angle           171 non-null    float64        
bott_tdist           292 non-null    float64        
bott_mean_speed      292 non-null    float64        
asc_tdist            160 non-null    float64        
asc_mean_speed       160 non-null    float64        
asc_angle            154 non-null    float64        
descD_mean           426 non-null    float64        
descD_std            426 non-null    float64        
descD_min            426 non-null    float64        
descD_25%            426 non-null    float64        
descD_50%            426 non-null    float64        
descD_75%            426 non-null    float64        
descD_max            426 non-null    float64        
bottD_mean           292 non-null    float64        
bottD_std            292 non-null    float64        
bottD_min            292 non-null    float64        
bottD_25%            292 non-null    float64        
bottD_50%            292 non-null    float64        
bottD_75%            292 non-null    float64        
bottD_max            292 non-null    float64        
ascD_mean            426 non-null    float64        
ascD_std             425 non-null    float64        
ascD_min             426 non-null    float64        
ascD_25%             426 non-null    float64        
ascD_50%             426 non-null    float64        
ascD_75%             426 non-null    float64        
ascD_max             426 non-null    float64        
postdive_dur         426 non-null    timedelta64[ns]
postdive_tdist       420 non-null    float64        
postdive_mean_speed  420 non-null    float64        
dtypes: datetime64[ns](5), float64(42), int64(1), timedelta64[ns](1)
memory usage: 166.4 KB

Extract postdive duration for further analysis.

postdives = stats_tab["postdive_dur"][stats_tab["phase_id"] == 4]
postdives_diff = (postdives.dt.total_seconds()
                  .diff().iloc[1:].abs())
# Remove isolated dives
postdives_diff = postdives_diff[postdives_diff < 2000]

Non-linear least squares via “broken-stick” model#

skdiveMove provides the bouts.BoutsNLS class for fitting non-linear least squares (NLS) models to a modified histogram of a given variable.

The first step is to generate a modified histogram of postdive duration, and this requires choosing the bin width for the histogram.

postdives_nlsbouts = skbouts.BoutsNLS(postdives_diff, 0.1)
print(postdives_nlsbouts)

Class BoutsNLS object
histogram method:    standard
log-frequency histogram:
              x  lnfreq
count    46.000  46.000
mean    307.983  -4.079
std     383.530   2.119
min       0.050  -8.216
25%      62.550  -5.521
50%     132.550  -3.912
75%     351.300  -3.219
max    1449.950   3.091

Two-process model#

Assuming a 2-process model, calculate starting values, providing a guess at 50 s interdive interval.

fig, ax = plt.subplots(figsize=_FIG1X1)
init_pars2 = postdives_nlsbouts.init_pars([50], plot=True, ax=ax)

Fit the two-process model.

coefs2, pcov2 = postdives_nlsbouts.fit(init_pars2)
# Coefficients
print(coefs2)

        (0.049, 50.0]  (50.0, 1449.95]
a              25.756            8.307
lambda          0.119            0.003

# Covariance between parameters
print(pcov2)

[[+1.180e+02 +1.713e-01 +3.055e-01 +6.586e-05]
 [+1.713e-01 +8.032e-04 +6.607e-03 +1.584e-06]
 [+3.055e-01 +6.607e-03 +1.690e+00 +6.461e-05]
 [+6.586e-05 +1.584e-06 +6.461e-05 +1.542e-07]]

Calculate bout-ending criterion.

# `bec` returns ndarray, and we have only one here
print("bec = {[0]:.2f}".format(postdives_nlsbouts.bec(coefs2)))

bec = 41.70

Plot the fit.

fig, ax = plt.subplots(figsize=_FIG1X1)
postdives_nlsbouts.plot_fit(coefs2, ax=ax);

Three-process model#

Attempt to discern three processes in the data.

fig, ax = plt.subplots(figsize=_FIG1X1)
init_pars3 = postdives_nlsbouts.init_pars([50, 550], plot=True, ax=ax)

Fit three-process model.

coefs3, pcov3 = postdives_nlsbouts.fit(init_pars3)
# Coefficients
print(coefs3)

        (0.049, 50.0]  (50.0, 550.0]  (550.0, 1449.95]
a              29.812          7.094             3.913
lambda          0.254          0.014             0.001

# Covariance between parameters
print(pcov3)

[[+2.241e+02 +4.746e-01 -4.352e-01 -1.852e-03 -1.422e-01 -1.187e-04]
 [+4.746e-01 +6.365e-03 +3.179e-02 +9.877e-05 +6.796e-03 +5.591e-06]
 [-4.352e-01 +3.179e-02 +2.749e+00 +1.380e-03 -2.567e-01 -3.133e-04]
 [-1.852e-03 +9.877e-05 +1.380e-03 +2.145e-05 +2.135e-03 +1.889e-06]
 [-1.422e-01 +6.796e-03 -2.567e-01 +2.135e-03 +9.747e-01 +2.138e-04]
 [-1.187e-04 +5.591e-06 -3.133e-04 +1.889e-06 +2.138e-04 +5.340e-07]]

Plot the fit.

fig, ax = plt.subplots(figsize=_FIG1X1)
postdives_nlsbouts.plot_fit(coefs3, ax=ax);

Compare the cumulative frequency distributions of two- vs three-process models.

fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
postdives_nlsbouts.plot_ecdf(coefs3, ax=axs[1]);

The three-process model does not follow the observed data as well as the two-process model.

Maximum likelihood estimation#

Another way to model Poisson mixtures that does not rely on the subjectively created histogram, and involves fewer parameters, requires fitting via maximum likelihood estimation (MLE). This approach is available in bouts.BoutsMLE.

Set up an instance.

postdives_mlebouts = skbouts.BoutsMLE(postdives_diff, 0.1)
print(postdives_mlebouts)

Class BoutsMLE object
histogram method:    standard
log-frequency histogram:
              x  lnfreq
count    46.000  46.000
mean    307.983  -4.079
std     383.530   2.119
min       0.050  -8.216
25%      62.550  -5.521
50%     132.550  -3.912
75%     351.300  -3.219
max    1449.950   3.091

Again, assuming a 2-process model, calculate starting values.

fig, ax = plt.subplots(figsize=_FIG1X1)
init_pars = postdives_mlebouts.init_pars([50], plot=True, ax=ax)

Fit the two-process model. It is important, but optional, to supply reasonable bounds to help the optimization algorithm. Otherwise, the algorithm may fail to converge. The fitting procedure is done in two steps: with and without a reparameterized log-likelihood function. Therefore, there are two sets of bounds required.

p_bnd = (-2, None)                 # bounds for `p`
lda1_bnd = (-5, None)              # bounds for `lambda1`
lda2_bnd = (-10, None)             # bounds for `lambda2`
bnd1 = (p_bnd, lda1_bnd, lda2_bnd)
p_bnd = (1e-2, None)
lda1_bnd = (1e-4, None)
lda2_bnd = (1e-8, None)
bnd2 = (p_bnd, lda1_bnd, lda2_bnd)
fit1, fit2 = postdives_mlebouts.fit(init_pars,
                                    fit1_opts=dict(method="L-BFGS-B",
                                                   bounds=bnd1),
                                    fit2_opts=dict(method="L-BFGS-B",
                                                   bounds=bnd2))

# First fit
print(fit1)

  message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
  success: True
   status: 0
      fun: 917.8524699061169
        x: [ 8.264e-01 -2.690e+00 -5.629e+00]
      nit: 7
      jac: [-1.478e-04 -1.705e-04  0.000e+00]
     nfev: 36
     njev: 9
 hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>

# Second fit
print(fit2)

  message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
  success: True
   status: 0
      fun: 917.8524699061167
        x: [ 6.956e-01  6.791e-02  3.592e-03]
      nit: 1
      jac: [-7.049e-04 -2.342e-03  1.579e-02]
     nfev: 28
     njev: 7
 hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>

Calculate bout-ending criterion (BEC).

print("bec = {:.2f}".format(postdives_mlebouts.bec(fit2)))

bec = 58.55

Plot the fit.

fig, ax = plt.subplots(figsize=_FIG1X1)
postdives_mlebouts.plot_fit(fit2, ax=ax);

Compare the cumulative frequency distribution between NLS and MLE model estimates.

fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
axs[0].set_title("NLS")
postdives_mlebouts.plot_ecdf(fit2, ax=axs[1])
axs[1].set_title("MLE");

Label bouts based on BEC from the last MLE model. Note that Timedelta type needs to be converted to total seconds to allow comparison with BEC.

bec = postdives_mlebouts.bec(fit2)
skbouts.label_bouts(postdives.dt.total_seconds(), bec, as_diff=True)

   1
   2
   3
   4
   4
       ..
  51
  51
  51
  52
  52
Name: postdive_dur, Length: 191, dtype: int64

Feel free to download a copy of this demo (demo_bouts.py).