Simulating bouts#

This follows the simulation of mixed Poisson distributions in Luque & Guinet (2007), and the comparison of models for characterizing such distributions.

Set up the environment.

# Set up
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import skdiveMove.bouts as skbouts

# For figure sizes
_FIG3X1 = (9, 12)

Generate two-process mixture#

For a mixed distribution of two random Poisson processes with a mixing parameter \(p=0.7\), and density parameters \(\lambda_f=0.05\), and \(\lambda_s=0.005\), we use the random_mixexp function to generate samples.

Define the true values described above, grouping the parameters into a Series to simplify further operations.

p_true = 0.7
lda0_true = 0.05
lda1_true = 0.005
pars_true = pd.Series({"lambda0": lda0_true,
                       "lambda1": lda1_true,
                       "p": p_true})

Declare the number of simulations and the number of samples to generate:

# Number of simulations
nsims = 500
# Size of each sample
nsamp = 1000

Set up variables to accumulate simulations:

# Set up NLS simulations
coefs_nls = []
# Set up MLE simulations
coefs_mle = []
# Fixed bounds fit 1
p_bnd = (-2, None)
lda0_bnd = (-5, None)
lda1_bnd = (-10, None)
opts1 = dict(method="L-BFGS-B",
             bounds=(p_bnd, lda0_bnd, lda1_bnd))
# Fixed bounds fit 2
p_bnd = (1e-1, None)
lda0_bnd = (1e-3, None)
lda1_bnd = (1e-6, None)
opts2 = dict(method="L-BFGS-B",
             bounds=(p_bnd, lda0_bnd, lda1_bnd))

Perform the simulations in a loop, fitting the nonlinear least squares (NLS) model, and the alternative maximum likelihood (MLE) model at each iteration.

# Set up a random number generator for efficiency
rng = np.random.default_rng()
# Estimate parameters `nsims` times
for i in range(nsims):
    x = skbouts.random_mixexp(nsamp, pars_true["p"],
                              (pars_true[["lambda0", "lambda1"]]
                               .to_numpy()), rng=rng)
    # NLS
    xbouts = skbouts.BoutsNLS(x, 5)
    init_pars = xbouts.init_pars([80], plot=False)
    coefs, _ = xbouts.fit(init_pars)
    p_i = skbouts.bouts.calc_p(coefs)[0][0]  # only one here
    coefs_i = pd.concat([coefs.loc["lambda"], pd.Series({"p": p_i})])
    coefs_nls.append(coefs_i.to_numpy())

    # MLE
    xbouts = skbouts.BoutsMLE(x, 5)
    init_pars = xbouts.init_pars([80], plot=False)
    fit1, fit2 = xbouts.fit(init_pars, fit1_opts=opts1,
                            fit2_opts=opts2)
    coefs_mle.append(np.roll(fit2.x, -1))

Non-linear least squares (NLS)#

Collect and display NLS results from the simulations:

nls_coefs = pd.DataFrame(np.vstack(coefs_nls),
                         columns=["lambda0", "lambda1", "p"])
# Centrality and variance
nls_coefs.describe()

	lambda0	lambda1	p
count	500.000	5.000e+02	500.000
mean	0.047	4.004e-03	0.729
std	0.005	4.165e-04	0.020
min	0.032	3.023e-03	0.663
25%	0.044	3.715e-03	0.715
50%	0.047	3.972e-03	0.729
75%	0.050	4.264e-03	0.743
max	0.062	5.243e-03	0.785

Maximum likelihood estimation (MLE)#

Collect and display MLE results from the simulations:

mle_coefs = pd.DataFrame(np.vstack(coefs_mle),
                         columns=["lambda0", "lambda1", "p"])
# Centrality and variance
mle_coefs.describe()

	lambda0	lambda1	p
count	500.000	5.000e+02	500.000
mean	0.050	5.022e-03	0.700
std	0.003	3.700e-04	0.022
min	0.043	4.047e-03	0.630
25%	0.048	4.754e-03	0.684
50%	0.050	5.005e-03	0.700
75%	0.052	5.264e-03	0.715
max	0.059	6.106e-03	0.766

Comparing NLS vs MLE#

The bias relative to the true values of the mixed distribution can be readily assessed for NLS:

nls_coefs.mean() - pars_true

lambda0   -3.029e-03
lambda1   -9.961e-04
p          2.865e-02
dtype: float64

and for MLE:

mle_coefs.mean() - pars_true

lambda0    2.951e-05
lambda1    2.226e-05
p         -1.965e-04
dtype: float64

To visualize the estimates obtained throughout the simulations, we can compare density plots, along with the true parameter values:

Three-process mixture#

We generate a mixture of “fast”, “slow”, and “very slow” processes. The probabilities considered for modeling this mixture are \(p0\) and \(p1\), representing the proportion of “fast” to “slow” events, and the proportion of “slow” to “slow” and “very slow” events, respectively.

p_fast = 0.6
p_svs = 0.7                   # prop of slow to (slow + very slow) procs
p_true = [p_fast, p_svs]
lda_true = [0.05, 0.01, 8e-4]
pars_true = pd.Series({"lambda0": lda_true[0],
                       "lambda1": lda_true[1],
                       "lambda2": lda_true[2],
                       "p0": p_true[0],
                       "p1": p_true[1]})

Mixtures with more than two processes require careful choice of constraints to avoid numerical issues to fit the models; even the NLS model may require help.

# Bounds for NLS fit; flattened, two per process (a, lambda).  Two-tuple
# with lower and upper bounds for each parameter.
nls_opts = dict(bounds=(
    ([100, 1e-3, 100, 1e-3, 100, 1e-6]),
    ([5e4, 1, 5e4, 1, 5e4, 1])))
# Fixed bounds MLE fit 1
p0_bnd = (-5, None)
p1_bnd = (-5, None)
lda0_bnd = (-6, None)
lda1_bnd = (-8, None)
lda2_bnd = (-12, None)
opts1 = dict(method="L-BFGS-B",
             bounds=(p0_bnd, p1_bnd, lda0_bnd, lda1_bnd, lda2_bnd))
# Fixed bounds MLE fit 2
p0_bnd = (1e-3, 9.9e-1)
p1_bnd = (1e-3, 9.9e-1)
lda0_bnd = (2e-2, 1e-1)
lda1_bnd = (3e-3, 5e-2)
lda2_bnd = (1e-5, 1e-3)
opts2 = dict(method="L-BFGS-B",
             bounds=(p0_bnd, p1_bnd, lda0_bnd, lda1_bnd, lda2_bnd))

x = skbouts.random_mixexp(nsamp, [pars_true["p0"], pars_true["p1"]],
                          [pars_true["lambda0"], pars_true["lambda1"],
                           pars_true["lambda2"]], rng=rng)

We fit the three-process data with the two models:

x_nls = skbouts.BoutsNLS(x, 5)
init_pars = x_nls.init_pars([75, 220], plot=False)
coefs, _ = x_nls.fit(init_pars, **nls_opts)

x_mle = skbouts.BoutsMLE(x, 5)
init_pars = x_mle.init_pars([75, 220], plot=False)
fit1, fit2 = x_mle.fit(init_pars, fit1_opts=opts1,
                       fit2_opts=opts2)

Plot both fits and BECs:

fig, axs = plt.subplots(1, 2, figsize=(13, 5))
x_nls.plot_fit(coefs, ax=axs[0])
x_mle.plot_fit(fit2, ax=axs[1]);

Compare cumulative frequency distributions:

fig, axs = plt.subplots(1, 2, figsize=(13, 5))
axs[0].set_title("NLS")
x_nls.plot_ecdf(coefs, ax=axs[0])
axs[1].set_title("MLE")
x_mle.plot_ecdf(fit2, ax=axs[1]);

Feel free to download a copy of this demo (demo_simulbouts.py).