Bout analysis#

Here is a brief demo on bout analysis with the bouts module for data generated by mixtures of random Poisson processes.

Set up the environment. Consider loading the logging module and setting up a logger to monitor progress to this section.

 1# Set up
 2import importlib.resources as rsrc
 3import pandas as pd
 4import matplotlib.pyplot as plt
 5from skdiveMove import calibrate
 6import skdiveMove.bouts as skbouts
 7
 8# Declare figure sizes
 9_FIG1X1 = (7, 6)
10_FIG1X2 = (12, 5)
11_FIG3X1 = (11, 11)

Calculate postdive duration#

Create a TDR object to easily calculate the necessary statistics:

17config_file = (rsrc.files("skdiveMove") / "config_examples" /
18               "ag_mk7_2002_022_config.json")
19tdr_file = (rsrc.files("skdiveMove") / "tests" /
20            "data" / "ag_mk7_2002_022.nc")
21tdrX = calibrate(tdr_file, config_file)
22stats = tdrX.dive_stats()
23stamps = tdrX.stamp_dives(ignore_z=True)
24stats_tab = pd.concat((stamps, stats), axis=1)
25stats_tab.info()
<class 'pandas.core.frame.DataFrame'>
Index: 426 entries, 1 to 426
Data columns (total 49 columns):
 #   Column               Non-Null Count  Dtype          
---  ------               --------------  -----          
 0   phase_id             426 non-null    int64          
 1   beg                  426 non-null    datetime64[ns] 
 2   end                  426 non-null    datetime64[ns] 
 3   begdesc              426 non-null    datetime64[ns] 
 4   enddesc              426 non-null    datetime64[ns] 
 5   begasc               426 non-null    datetime64[ns] 
 6   desctim              426 non-null    float64        
 7   botttim              292 non-null    float64        
 8   asctim               426 non-null    float64        
 9   divetim              426 non-null    float64        
 10  descdist             426 non-null    float64        
 11  bottdist             292 non-null    float64        
 12  ascdist              426 non-null    float64        
 13  bottdep_mean         292 non-null    float64        
 14  bottdep_median       292 non-null    float64        
 15  bottdep_sd           292 non-null    float64        
 16  maxdep               426 non-null    float64        
 17  desc_tdist           202 non-null    float64        
 18  desc_mean_speed      202 non-null    float64        
 19  desc_angle           171 non-null    float64        
 20  bott_tdist           292 non-null    float64        
 21  bott_mean_speed      292 non-null    float64        
 22  asc_tdist            160 non-null    float64        
 23  asc_mean_speed       160 non-null    float64        
 24  asc_angle            154 non-null    float64        
 25  descD_mean           426 non-null    float64        
 26  descD_std            426 non-null    float64        
 27  descD_min            426 non-null    float64        
 28  descD_25%            426 non-null    float64        
 29  descD_50%            426 non-null    float64        
 30  descD_75%            426 non-null    float64        
 31  descD_max            426 non-null    float64        
 32  bottD_mean           292 non-null    float64        
 33  bottD_std            292 non-null    float64        
 34  bottD_min            292 non-null    float64        
 35  bottD_25%            292 non-null    float64        
 36  bottD_50%            292 non-null    float64        
 37  bottD_75%            292 non-null    float64        
 38  bottD_max            292 non-null    float64        
 39  ascD_mean            426 non-null    float64        
 40  ascD_std             425 non-null    float64        
 41  ascD_min             426 non-null    float64        
 42  ascD_25%             426 non-null    float64        
 43  ascD_50%             426 non-null    float64        
 44  ascD_75%             426 non-null    float64        
 45  ascD_max             426 non-null    float64        
 46  postdive_dur         426 non-null    timedelta64[ns]
 47  postdive_tdist       420 non-null    float64        
 48  postdive_mean_speed  420 non-null    float64        
dtypes: datetime64[ns](5), float64(42), int64(1), timedelta64[ns](1)
memory usage: 166.4 KB

Extract postdive duration for further analysis.

26postdives = stats_tab["postdive_dur"][stats_tab["phase_id"] == 4]
27postdives_diff = (postdives.dt.total_seconds()
28                  .diff().iloc[1:].abs())
29# Remove isolated dives
30postdives_diff = postdives_diff[postdives_diff < 2000]

Non-linear least squares via “broken-stick” model#

skdiveMove provides the bouts.BoutsNLS class for fitting non-linear least squares (NLS) models to a modified histogram of a given variable.

The first step is to generate a modified histogram of postdive duration, and this requires choosing the bin width for the histogram.

31postdives_nlsbouts = skbouts.BoutsNLS(postdives_diff, 0.1)
32print(postdives_nlsbouts)
Class BoutsNLS object
histogram method:    standard
log-frequency histogram:
              x  lnfreq
count    46.000  46.000
mean    307.983  -4.079
std     383.530   2.119
min       0.050  -8.216
25%      62.550  -5.521
50%     132.550  -3.912
75%     351.300  -3.219
max    1449.950   3.091

Two-process model#

Assuming a 2-process model, calculate starting values, providing a guess at 50 s interdive interval.

33fig, ax = plt.subplots(figsize=_FIG1X1)
34init_pars2 = postdives_nlsbouts.init_pars([50], plot=True, ax=ax)
../_images/demo_bouts_5_0.png

Fit the two-process model.

35coefs2, pcov2 = postdives_nlsbouts.fit(init_pars2)
36# Coefficients
37print(coefs2)
        (0.049, 50.0]  (50.0, 1449.95]
a              25.756            8.307
lambda          0.119            0.003
38# Covariance between parameters
39print(pcov2)
[[+1.180e+02 +1.713e-01 +3.055e-01 +6.586e-05]
 [+1.713e-01 +8.032e-04 +6.607e-03 +1.584e-06]
 [+3.055e-01 +6.607e-03 +1.690e+00 +6.461e-05]
 [+6.586e-05 +1.584e-06 +6.461e-05 +1.542e-07]]

Calculate bout-ending criterion.

40# `bec` returns ndarray, and we have only one here
41print("bec = {[0]:.2f}".format(postdives_nlsbouts.bec(coefs2)))
bec = 41.70

Plot the fit.

42fig, ax = plt.subplots(figsize=_FIG1X1)
43postdives_nlsbouts.plot_fit(coefs2, ax=ax);
../_images/demo_bouts_9_0.png

Three-process model#

Attempt to discern three processes in the data.

44fig, ax = plt.subplots(figsize=_FIG1X1)
45init_pars3 = postdives_nlsbouts.init_pars([50, 550], plot=True, ax=ax)
../_images/demo_bouts_10_0.png

Fit three-process model.

46coefs3, pcov3 = postdives_nlsbouts.fit(init_pars3)
47# Coefficients
48print(coefs3)
        (0.049, 50.0]  (50.0, 550.0]  (550.0, 1449.95]
a              29.812          7.094             3.913
lambda          0.254          0.014             0.001
49# Covariance between parameters
50print(pcov3)
[[+2.241e+02 +4.746e-01 -4.352e-01 -1.852e-03 -1.422e-01 -1.187e-04]
 [+4.746e-01 +6.365e-03 +3.179e-02 +9.877e-05 +6.796e-03 +5.591e-06]
 [-4.352e-01 +3.179e-02 +2.749e+00 +1.380e-03 -2.567e-01 -3.133e-04]
 [-1.852e-03 +9.877e-05 +1.380e-03 +2.145e-05 +2.135e-03 +1.889e-06]
 [-1.422e-01 +6.796e-03 -2.567e-01 +2.135e-03 +9.747e-01 +2.138e-04]
 [-1.187e-04 +5.591e-06 -3.133e-04 +1.889e-06 +2.138e-04 +5.340e-07]]

Plot the fit.

51fig, ax = plt.subplots(figsize=_FIG1X1)
52postdives_nlsbouts.plot_fit(coefs3, ax=ax);
../_images/demo_bouts_13_0.png

Compare the cumulative frequency distributions of two- vs three-process models.

53fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
54postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
55postdives_nlsbouts.plot_ecdf(coefs3, ax=axs[1]);
../_images/demo_bouts_14_0.png

The three-process model does not follow the observed data as well as the two-process model.

Maximum likelihood estimation#

Another way to model Poisson mixtures that does not rely on the subjectively created histogram, and involves fewer parameters, requires fitting via maximum likelihood estimation (MLE). This approach is available in bouts.BoutsMLE.

Set up an instance.

56postdives_mlebouts = skbouts.BoutsMLE(postdives_diff, 0.1)
57print(postdives_mlebouts)
Class BoutsMLE object
histogram method:    standard
log-frequency histogram:
              x  lnfreq
count    46.000  46.000
mean    307.983  -4.079
std     383.530   2.119
min       0.050  -8.216
25%      62.550  -5.521
50%     132.550  -3.912
75%     351.300  -3.219
max    1449.950   3.091

Again, assuming a 2-process model, calculate starting values.

58fig, ax = plt.subplots(figsize=_FIG1X1)
59init_pars = postdives_mlebouts.init_pars([50], plot=True, ax=ax)
../_images/demo_bouts_16_0.png

Fit the two-process model. It is important, but optional, to supply reasonable bounds to help the optimization algorithm. Otherwise, the algorithm may fail to converge. The fitting procedure is done in two steps: with and without a reparameterized log-likelihood function. Therefore, there are two sets of bounds required.

60p_bnd = (-2, None)                 # bounds for `p`
61lda1_bnd = (-5, None)              # bounds for `lambda1`
62lda2_bnd = (-10, None)             # bounds for `lambda2`
63bnd1 = (p_bnd, lda1_bnd, lda2_bnd)
64p_bnd = (1e-2, None)
65lda1_bnd = (1e-4, None)
66lda2_bnd = (1e-8, None)
67bnd2 = (p_bnd, lda1_bnd, lda2_bnd)
68fit1, fit2 = postdives_mlebouts.fit(init_pars,
69                                    fit1_opts=dict(method="L-BFGS-B",
70                                                   bounds=bnd1),
71                                    fit2_opts=dict(method="L-BFGS-B",
72                                                   bounds=bnd2))
73# First fit
74print(fit1)
  message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
  success: True
   status: 0
      fun: 917.8524699061169
        x: [ 8.264e-01 -2.690e+00 -5.629e+00]
      nit: 7
      jac: [-1.478e-04 -1.705e-04  0.000e+00]
     nfev: 36
     njev: 9
 hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
75# Second fit
76print(fit2)
  message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
  success: True
   status: 0
      fun: 917.8524699061167
        x: [ 6.956e-01  6.791e-02  3.592e-03]
      nit: 1
      jac: [-7.049e-04 -2.342e-03  1.579e-02]
     nfev: 28
     njev: 7
 hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>

Calculate bout-ending criterion (BEC).

77print("bec = {:.2f}".format(postdives_mlebouts.bec(fit2)))
bec = 58.55

Plot the fit.

78fig, ax = plt.subplots(figsize=_FIG1X1)
79postdives_mlebouts.plot_fit(fit2, ax=ax);
../_images/demo_bouts_21_0.png

Compare the cumulative frequency distribution between NLS and MLE model estimates.

80fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
81postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
82axs[0].set_title("NLS")
83postdives_mlebouts.plot_ecdf(fit2, ax=axs[1])
84axs[1].set_title("MLE");
../_images/demo_bouts_22_0.png

Label bouts based on BEC from the last MLE model. Note that Timedelta type needs to be converted to total seconds to allow comparison with BEC.

85bec = postdives_mlebouts.bec(fit2)
86skbouts.label_bouts(postdives.dt.total_seconds(), bec, as_diff=True)
236     1
237     2
238     3
239     4
240     4
       ..
422    51
423    51
424    51
425    52
426    52
Name: postdive_dur, Length: 191, dtype: int64

Feel free to download a copy of this demo (demo_bouts.py).