API Reference
This section provides comprehensive documentation for all public APIs.
Core Functions
- kim_convergence.run_length_control(get_trajectory: Callable, get_trajectory_args: dict | None = None, *, number_of_variables: int = 1, initial_run_length: int = 10000, run_length_factor: float = 1.0, maximum_run_length: int = 1000000, maximum_equilibration_step: int | None = None, minimum_number_of_independent_samples: int | None = None, relative_accuracy: float | list[float | None] | ndarray | None = 0.1, absolute_accuracy: float | list[float | None] | ndarray | None = 0.1, population_mean: float | list[float | None] | ndarray | None = None, population_standard_deviation: float | list[float | None] | ndarray | None = None, population_cdf: str | list[str | None] | None = None, population_args: tuple | list[tuple | None] | None = None, population_loc: float | list[float | None] | ndarray | None = None, population_scale: float | list[float | None] | ndarray | None = None, confidence_coefficient: float = 0.95, confidence_interval_approximation_method: str = 'uncorrelated_sample', heidel_welch_number_points: int = 50, fft: bool = True, test_size: int | float | None = None, train_size: int | float | None = None, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, ignore_end: int | float | None = None, number_of_cores: int = 1, si: str = None, nskip: int | None = 1, minimum_correlation_time: int | None = None, dump_trajectory: bool = False, dump_trajectory_fp: str = 'kim_convergence_trajectory.edn', fp: Any = None, fp_format: str = 'txt') str | bool
Control the length of the time series data from a simulation run.
It starts drawing
initial_run_lengthnumber of observations (samples) by calling theget_trajectoryfunction in a loop to reach equilibration or pass thewarm-upperiod.Note
get_trajectoryis a callback function with a specific signature ofget_trajectory(nstep: int) -> 1darrayif we only have one variable orget_trajectory(nstep: int) -> 2darraywith the shape of (number_of_variables, nstep)To use extra arguments in the
get_trajectory, one can use the other specific signature ofget_trajectory(nstep: int, args: dict) -> 1darrayorget_trajectory(nstep: int, args: dict) -> 2darraywith the shape of (number_of_variables, nstep)where all the required variables can be pass thrugh the args dictionary.
All the values returned from this function should be finite values, otherwise the code will stop wih error message explaining the issue.
Examples:
>>> rng = np.random.RandomState(12345) >>> start = 0 >>> stop = 0 >>> def get_trajectory(step): ... global start, stop ... start = stop ... if 100000 < start + step: ... step = 100000 - start ... stop += step ... data = np.ones(step) * 10 + (rng.random_sample(step) - 0.5) ... return data
or,
>>> targs = {'start': 0, 'stop': 0} >>> def get_trajectory(step, targs): ... targs['start'] = targs['stop'] ... if 100000 < targs['start'] + step: ... step = 100000 - targs['start'] ... targs['stop'] += step ... data = np.ones(step) * 10 + (rng.random_sample(step) - 0.5) ... return data
Then it continues drawing observations until some pre-specified level of absolute or relative precision has been reached.
The relative
precisionis defined as a half-width of the estimator’s confidence interval (CI).At each checkpoint, an upper confidence limit (
UCL) is approximated. The drawing of observations is terminated, if UCL is less than the pre-specified absolute precisionabsolute_accuracyor if the relative UCL (UCL divided by the computed sample mean) is less than a pre-specified value,relative_accuracy.The UCL is calculated as a confidence_coefficient% confidence interval for the mean, using the portion of the time series data, which is in the stationarity region.
The
Relative accuracyis the confidence interval half-width or UCL divided by the sample mean. If the ratio is bigger than relative_accuracy, the length of the time series is deemed not long enough to estimate the mean with sufficient accuracy, which means the run should be extended.In order to avoid problems caused by sequential UCL evaluation cost, this calculation should not be repeated too frequently. Heidelberger and Welch (1981) [heidelberger1981] suggested increasing the run length by a factor run_length_factor > 1.5, each time, so that estimate has the same, of reasonably large proportion of new data.
The accuracy parameter relative_accuracy specifies the maximum relative error that will be allowed in the mean value of time-series data. In other words, the distance from the confidence limit(s) to the mean (which is also known as the precision, half-width, or margin of error). A value of 0.01 is usually used to request two digits of accuracy, and so forth.
The parameter
confidence_coefficientis the confidence coefficient and often, the values 0.95 is used. For the confidence coefficient, confidence_coefficient, we can use the following interpretation,If thousands of samples of n items are drawn from a population using simple random sampling and a confidence interval is calculated for each sample, the proportion of those intervals that will include the true population mean is confidence_coefficient.
The
maximum_run_lengthparameter places an upper bound on how long the simulation will run. If the specified accuracy cannot be achieved within this time, the simulation will terminate, and a warning message will appear in the report.The
maximum_equilibration_stepparameter places an upper bound on how long the simulation will run to reach equilibration or pass thewarm-upperiod. If the equilibration or warm-up period cannot be detected within this time, the simulation will terminate and a warning message will appear in the report.Note
By default and if not specified on input, the
maximum_equilibration_stepis defined as half of themaximum_run_length.Note
By default, the algorithm will use
relative_accuracyas a termination criterion, and in case of failure, it switches to use theabsolute_accuracy.If using the
absolute_accuracyis desired, one should set therelative_accuracyto None.Examples:
>>> run_length_control(get_trajectory, ... number_of_variables=1, ... relative_accuracy=None ... absolute_accuracy=0.1)
The algorithm converts
relative_accuracy``and ``absolute_accuracyfloating numbers to arrays with the shape of (number_of_variables, ), when thenumber_of_variablesbigger than one. By default, it usesrelative_accuracyas a termination criterion for the corresponding variable number, and in case of failure, it switches to use theabsolute_accuracy.If the
absolute_accuracyis desired for one or some variables, one should provide bothrelative_accuracy``and ``absolute_accuracyas an array. Then it must set the correspondingrelative_accuracyin the array to None and set the correct absolute_accuracy` at the right place in the collection.E.g.,
>>> run_length_control(get_trajectory, ... number_of_variables=3, ... relative_accuracy=[0.1, 0.05, None] ... absolute_accuracy=[0.1, 0.05, 0.1])
or,
>>> run_length_control(get_trajectory, ... number_of_variables=3, ... relative_accuracy=[None, 0.05, None] ... absolute_accuracy=[0.1, 0.05, 0.1])
Note
confidence_interval_approximation_method is set to a method to use for approximating the upper confidence limit of the mean.
By default, (
uncorrelated_sampleapproach) uses the independent samples in the time-series data to approximate the confidence intervals for the mean. The other methods have different approaches.E.g., in the
heidel_welchmethod, it requires no such independence assumption. In this spectral approach, the problem of dealing with dependent data are largely avoided by working in the frequency domain with the sample spectrum (periodogram) of the process.Note
population_meanis a variable known (true) mean. Expected value in null hypothesis. It is an extra information for normally distributed data.Note
for non-normally distributed data, and as an extra check on the convergence one should provide the population info using
population_cdf,population_args,population_loc, andpopulation_scalefor a specific distribution.- Parameters:
get_trajectory (callback function) –
A callback function with a specific signature of
get_trajectory(nstep: int) -> 1darrayif we only have one variable orget_trajectory(nstep: int) -> 2darraywith the shape of (number_of_variables, nstep)Note
all the values returned from this function should be finite values, otherwise the code will stop wih error message explaining the issue.
get_trajectory_args (dict, optional) – Extra arguments passed to the get_trajectory function. (default: {}) To use this option, the dictionary may contain start and stop keywords as well as other keywords which are needed in the function.
get_trajectory(nstep, get_trajectory_args) -> 1darraynumber_of_variables (int, optional) – number of variables in the corresponding time-series data from get_trajectory callback function. (default: 1)
initial_run_length (int, optional) – initial run length. (default: 2000)
run_length_factor (float, optional) – run length increasing factor. (default: 1.0)
maximum_run_length (int, optional) – the maximum run length represents a cost constraint. (default: 1000000)
maximum_equilibration_step (int, optional) – the maximum number of steps as an equilibration hard limit. If the algorithm finds equilibration_step greater than this limit it will fail. For the default None, the function is using
maximum_run_length // 2as the maximum equilibration step. (default: None)minimum_number_of_independent_samples (int, optional) – minimum number of independent samples. This is an extra parameter to terminate the run after the pre-specified level of absolute or relative precision has been reached and there are minimum number of independent samples available for further analysis. (default: None)
relative_accuracy (float, or 1darray, optional) – a relative half-width requirement or the accuracy parameter. Target value for the ratio of halfwidth to sample mean. If
number_of_variables > 1,relative_accuracycan be a scalar to be used for all variables or a 1darray of values of size number_of_variables. (default: 0.1)absolute_accuracy (float, or 1darray, optional) – a half-width requirement or the accuracy parameter. Target value for the ratio of halfwidth to sample mean. If
number_of_variables > 1,relative_accuracycan be a scalar to be used for all variables or a 1darray of values of size number_of_variables. (default: 0.1)population_mean (float, or 1darray, optional) –
variable known (true) mean. Expected value in null hypothesis. (default: None)
Note
For
number_of_variables > 1, and ifpopulation_meanis provided, it should be a list or array of values. It should be set to None for variables which we do not intend to use this extra measure.Examples:
>>> run_length_control(get_trajectory, ... number_of_variables=3, ... population_mean=[None, 297., None])
population_standard_deviation (float, or 1darray, optional) –
population standard deviation. (default: None)
Note
For
number_of_variables > 1, and ifpopulation_standard_deviationis provided, it should be a list or array of values. It should be set to None for variables which we do not intend to use this extra measure.Examples:
>>> run_length_control( ... get_trajectory, ... number_of_variables=3, ... population_mean=[None, 297., None], ... population_standard_deviation=[None, 10., None])
population_cdf (str, or 1darray, optional) –
The name of a distribution. (default: None)
Examples: >>> run_length_control( … get_trajectory, … number_of_variables=2, … population_cdf=[None, ‘gamma’], … population_args=[None, (1.99,)], … population_loc=[None, None], … population_scale=[None, None])
or,
>>> run_length_control( ... get_trajectory, ... number_of_variables=2, ... population_mean=[297., None], ... population_standard_deviation=[10., None], ... population_cdf=[None, 'gamma'], ... population_args=[None, (1.99,)], ... population_loc=[None, None], ... population_scale=[None, None])
population_args (tuple, or list of tuples, optional) – Distribution parameter. (default: None)
population_loc (float, or 1darray, or None) – location of the distribution. (default: None)
population_scale (float, or 1darray, or None) – scale of the distribution. (default: None)
confidence_coefficient (float, optional) – (or confidence level) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
confidence_interval_approximation_method (str, optional) – Method to use for approximating the upper confidence limit of the mean. One of the
ucl_methodsaproaches. (default: ‘uncorrelated_sample’)heidel_welch_number_points (int, optional) – the number of points in Heidelberger and Welch’s spectral method that are used to obtain the polynomial fit. The parameter
heidel_welch_number_pointsdetermines the frequency range over which the fit is made. (default: 50)fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
test_size (int, float, optional) – if
float, should be between 0.0 and 1.0 and represent the proportion of the periodogram dataset to include in the test split. Ifint, represents the absolute number of test samples. (default: None)train_size (int, float, optional) – if
float, should be between 0.0 and 1.0 and represent the proportion of the preiodogram dataset to include in the train split. If int, represents the absolute number of train samples. (default: None)batch_size (int, optional) – batch size. (default: 5)
scale (str, optional) – a method to standardize a batched dataset. (default: ‘translate_scale’)
with_centering (bool, optional) – if
True, use batched data minus the scale metod centering approach. (default: False)with_scaling (bool, optional) – if
True, scale the batched data to scale metod scaling approach. (default: False)ignore_end (int, or float, or None, optional) – if
int, it is the last few (batch) points that should be ignored. iffloat, should be in(0, 1)and it is the percent of last (batch) points that should be ignored. if None it would be set to thebatch_sizein bacth method and to the one fourth of the total number of points elsewhere. (default: None)number_of_cores (int, optional) – The maximum number of concurrently running jobs, such as the number of Python worker processes or the size of the thread-pool. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. (default: 1)
si (str, optional) – statistical inefficiency method. (default: ‘statistical_inefficiency’)
nskip (int, optional) – the number of data points to skip in estimating ucl. (default: 1)
minimum_correlation_time (int, optional) – The minimum amount of correlation function to compute in estimating ucl. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
dump_trajectory (bool, optional) – if
True, dump the final trajectory data to a filedump_trajectory_fp. (default: False)dump_trajectory_fp (str, object with a write(string) method, optional) – a
.write()-supporting file-like object or a name string to open a file. (default: ‘kim_convergence_trajectory.edn’)fp (str, object with a write(string) method, optional) – if an
strequals to'return'the function will return string of the analysis results on the length of the time series. Otherwise it must be an object with write(string) method. If it is None,sys.stdoutwill be used which prints objects on the screen. (default: None)fp_format (str) – one of the
txt,json, orednformat. (default: ‘txt’)
- Returns:
- Union[str, bool]
Trueif the length of the time series is long enough to estimate the mean with sufficient accuracy or with enough requested sample size;Falseotherwise. Iffp == 'return', a string containing the analysis results is returned instead.
UCL Methods
Upper Confidence Limit (UCL) module.
Upper Confidence Limit (UCL): The upper boundary (or limit) of a confidence interval of a parameter of interest such as the population mean.
A confidence interval is how much uncertainty there is with any particular statistic [nistdiv898]. Confidence limits for the mean are interval estimates. Interval estimates are often desirable because instead of a single estimate for the mean, a confidence interval generates a lower and upper limit. It indicates how much uncertainty there is in our estimation of the true mean. The narrower the gap, the more precise our estimate is. We use a confidence level to express confidence limits. Choosing the confidence level is somewhat arbitrary, but 90 %, 95 %, and 99 % intervals are standard, and 95 % is the most commonly used.
Note
One should note that a 95 % confidence interval does not mean a 95 % probability of containing the true mean. The interval computed from a sample either has the true mean, or it does not. The confidence level is simply the proportion of samples of a given size that may be expected to contain the true mean. For a 95 % confidence interval, if many samples are collected and the confidence interval computed, in the long run, about 95 % of these intervals would contain the true mean.
- class kim_convergence.ucl.HeidelbergerWelch
Heidelberger and Welch algorithm.
Heidelberger and Welch (1981) [heidelberger1981] Object.
- heidel_welch_set
Flag indicating if the Heidelberger and Welch constants are set.
- Type:
bool
- heidel_welch_k
The number of points that are used to obtain the polynomial fit in Heidelberger and Welch’s spectral method.
- Type:
int
- heidel_welch_n
The number of time series data points or number of batches in Heidelberger and Welch’s spectral method.
- Type:
int
- heidel_welch_p
Probability.
- Type:
float
- a_matrix
Auxiliary matrix.
- Type:
ndarray
- a_matrix_1_inv
The (Moore-Penrose) pseudo-inverse of a matrix for the first degree polynomial fit in Heidelberger and Welch’s spectral method.
- Type:
ndarray
- a_matrix_2_inv
The (Moore-Penrose) pseudo-inverse of a matrix for the second degree polynomial fit in Heidelberger and Welch’s spectral method.
- Type:
ndarray
- a_matrix_3_inv
The (Moore-Penrose) pseudo-inverse of a matrix for the third degree polynomial fit in Heidelberger and Welch’s spectral method.
- Type:
ndarray
- heidel_welch_c1_1
Heidelberger and Welch’s C1 constant for the first degree polynomial fit.
- Type:
float
- heidel_welch_c1_2
Heidelberger and Welch’s C1 constant for the second degree polynomial fit.
- Type:
float
- heidel_welch_c1_3
Heidelberger and Welch’s C1 constant for the third degree polynomial fit.
- Type:
float
- heidel_welch_c2_1
Heidelberger and Welch’s C2 constant for the first degree polynomial fit.
- Type:
float
- heidel_welch_c2_2
Heidelberger and Welch’s C2 constant for the second degree polynomial fit.
- Type:
float
- heidel_welch_c2_3
Heidelberger and Welch’s C2 constant for the third degree polynomial fit.
- Type:
float
- tm_1
t_distribution inverse cumulative distribution function for C2_1 degrees of freedom.
- Type:
float
- tm_2
t_distribution inverse cumulative distribution function for C2_2 degrees of freedom.
- Type:
float
- tm_3
t_distribution inverse cumulative distribution function for C2_3 degrees of freedom.
- Type:
float
- get_heidel_welch_auxilary_matrices() tuple
Get the Heidelberger and Welch auxilary matrices.
- get_heidel_welch_c1() tuple
Get the Heidelberger and Welch C1 constants.
- get_heidel_welch_c2() tuple
Get the Heidelberger and Welch C2 constants.
- get_heidel_welch_constants() tuple
Get the Heidelberger and Welch constants.
- get_heidel_welch_knp() tuple
Get the heidel_welch_number_points, n, and confidence_coefficient.
- get_heidel_welch_tm() tuple
Get the Heidelberger and Welch t_distribution ppf.
Get the Heidelberger and Welch t_distribution ppf for C2 degrees of freedom.
- is_heidel_welch_set() bool
Return True if the flag is set to True.
- set_heidel_welch_constants(*, confidence_coefficient: float = 0.95, heidel_welch_number_points: int = 50)
Set Heidelberger and Welch constants globally.
Set the constants necessary for application of the Heidelberger and Welch’s [heidelberger1981] confidence interval generation method.
- Parameters:
confidence_coefficient (float) – probability (or confidence interval) and must be between 0.0 and 1.0. (default: 0.95)
heidel_welch_number_points (int) – the number of points in Heidelberger and Welch’s spectral method that are used to obtain the polynomial fit. The parameter
heidel_welch_number_pointsdetermines the frequency range over which the fit is made. (default: 50)
- unset_heidel_welch_constants()
Unset the Heidelberger and Welch flag.
- class kim_convergence.ucl.MSER_m
MSER-m algorithm.
The MSER [white1997] and MSER-5 [spratt1998] rules determine the truncation point as the value of \(d\) that best balances the tradeoff between improved accuracy (elimination of bias) and decreased precision (reduction in the sample size) for the input series. They select a truncation point that minimizes the width of the marginal confidence interval about the truncated sample mean. The marginal confidence interval is a measure of the homogeneity of the truncated series. The optimal truncation point \(d(j)^*\) selected by MSER-m can be expressed as:
\[d(j)^* = \underset{n>d(j) \geq 0}{\text{argmin}} \left[ \frac{1}{(n(j)-d(j))^2} \sum_{i=d}^{n}{\left(X_i(j)- \bar{X}_{n,d}(j) \right )^2} \right]\]MSER-m applies the equation to a series of batch averages instead of the raw series. The CI estimators can be computed from the truncated sequence of batch means.
- estimate_equilibration_length(time_series_data: ndarray | list[float], *, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, ignore_end: int | float | None = None, number_of_cores: int = 1, si: str | float | int | None = None, nskip: int | None = 1, fft: bool = True, minimum_correlation_time: int | None = None) tuple[bool, int]
Estimate the equilibration point in a time series data.
- class kim_convergence.ucl.MSER_m_y
MSER_m_y algorithm.
MSER_m_y [yousefi2011] computes k batch means of size m to evaluate the MSER-m statistic as described in [spratt1998] and detect the truncation point. If the truncation is detected, the point estimator of the mean is the sample mean of all observations in the truncated data set.
To compute the UCL, the MSER_m_y applies the von Neumann randomness test [vonneumann1941], [vonneumann1941b] to the truncated data to find a new batch size \(m^*\) for which the new batch means are approximately independent. It checks the randomness test on successively larger batch sizes until the test is finally passed and the batch means are finally determined to be approximately independent of each other. It starts by setting the initial batch size m as 1, and calculate the number of batches k’ accordingly.
- significance_level
Significance level. A probability threshold below which the null hypothesis will be rejected.
- Type:
float
- class kim_convergence.ucl.N_SKART
N-Skart algorithm.
N-Skart [tafazzoli2011] is a nonsequential procedure designed to compute a half the width of the confidence_coefficient% probability interval (CI) (confidence interval, or credible interval) around the time-series mean.
Note
N-Skart is a variant of the method of batch means.
N-Skart makes some modifications to the confidence interval (CI). These modifications account for the skewness (non-normality), and autocorrelation of the batch means which affect the distribution of the underlying Student’s t-statistic.
- k_number_batches
number of nonspaced (adjacent) batches of size
batch_size.- Type:
int
- kp_number_batches
number of nonspaced (adjacent) batches.
- Type:
int
- batch_size
bacth size.
- Type:
int
- number_batches_per_spacer
number of batches per spacer.
- Type:
int
- maximum_number_batches_per_spacer
maximum number of batches per spacer.
- Type:
int
- significance_level
Significance level. A probability threshold below which the null hypothesis will be rejected.
- Type:
float
- randomness_test_counter
counter for applying the randomness test of von Neumann [vonneumann1941] [vonneumann1941b].
- Type:
int
- estimate_equilibration_length(time_series_data: ndarray | list[float], *, si: str | float | int | None = None, nskip: int | None = 1, fft: bool = True, minimum_correlation_time: int | None = None, ignore_end: int | float | None = None, number_of_cores: int = 1, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False) tuple[bool, int]
Estimate the equilibration point in a time series data.
Estimate the equilibration point in a time series data using the N-Skart algorithm.
- Parameters:
time_series_data (array_like, 1d) – time series data.
- Returns:
- tuple[bool, int]
truncated: True if truncation was applied. truncation_point: Index at which to truncate.
Note
if N-Skart does not detect the equilibration it will return truncated as False and the equilibration index equals to the last index in the time series data.
Note
nskip, ignore_end, and number_of_cores are accepted for API compatibility but are not used by this method.
- class kim_convergence.ucl.UCLBase
Upper Confidence Limit base class.
- ci(time_series_data: ndarray | list[float], *, confidence_coefficient: float = 0.95, equilibration_length_estimate: int = 0, heidel_welch_number_points: int = 50, batch_size: int = 5, fft: bool = True, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, test_size: int | float | None = None, train_size: int | float | None = None, population_standard_deviation: float | None = None, si: str | float | int | None = None, minimum_correlation_time: int | None = None, uncorrelated_sample_indices: ndarray | list[int] | None = None, sample_method: str | None = None) tuple[float, float]
Approximate the confidence interval of the mean.
- estimate_equilibration_length(time_series_data: ndarray | list[float], *, si: str | None = None, nskip: int | None = 1, fft: bool = True, minimum_correlation_time: int | None = None, ignore_end: int | float | None = None, number_of_cores: int = 1, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False) tuple[bool, int]
Estimate the equilibration point in a time series data.
- property indices
Get the indices.
- property mean
Get the mean.
- property name
Get the name.
- relative_half_width_estimate(time_series_data: ndarray | list[float], *, confidence_coefficient: float = 0.95, equilibration_length_estimate: int = 0, heidel_welch_number_points: int = 50, batch_size: int = 5, fft: bool = True, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, test_size: int | float | None = None, train_size: int | float | None = None, population_standard_deviation: float | None = None, si: str | float | int | None = None, minimum_correlation_time: int | None = None, uncorrelated_sample_indices: ndarray | list[int] | None = None, sample_method: str | None = None) float
Get the relative half width estimate.
- requires_si_computation() bool
Return True if this UCL method requires statistical inefficiency computation.
- property sample_size
Get the sample_size.
- set_indices(time_series_data: ndarray | list[float], *, si: str | float | int | None = None, fft: bool = True, minimum_correlation_time: int | None = None) None
Set the indices.
- Parameters:
time_series_data (array_like, 1d) – time series data.
si (float, or str, optional) – estimated statistical inefficiency. (default: None)
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
- set_si(time_series_data, *, si: str | float | int | None = None, fft: bool = True, minimum_correlation_time: int | None = None) None
Set the si (statistical inefficiency).
- Parameters:
time_series_data (array_like, 1d) – time series data.
si (float, or str, optional) – estimated statistical inefficiency. (default: None)
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
- property si
Get the si.
- property std
Get the std.
- ucl(time_series_data: ndarray | list[float], *, confidence_coefficient: float = 0.95, equilibration_length_estimate: int = 0, heidel_welch_number_points: int = 50, batch_size: int = 5, fft: bool = True, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, test_size: int | float | None = None, train_size: int | float | None = None, population_standard_deviation: float | None = None, si: str | float | int | None = None, minimum_correlation_time: int | None = None, uncorrelated_sample_indices: ndarray | list[int] | None = None, sample_method: str | None = None) float
Approximate the upper confidence limit of the mean.
UncorrelatedSamples algorithm.
- kim_convergence.ucl.heidelberger_welch_ci(time_series_data: ndarray | list[float], *, confidence_coefficient: float = 0.95, heidel_welch_number_points: int = 50, fft: bool = True, test_size: int | float | None = None, train_size: int | float | None = None, obj: HeidelbergerWelch | None = None) tuple[float, float]
Approximate the confidence interval of the mean.
- Parameters:
time_series_data (array_like, 1d) – time series data.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
heidel_welch_number_points (int, optional) – the number of points that are used to obtain the polynomial fit. The parameter
heidel_welch_number_pointsdetermines the frequency range over which the fit is made. (default: 50)fft (bool, optional) – Use FFT convolution for long series. (default: True)
test_size (int, float, optional) – if
float, should be between 0.0 and 1.0 and represent the proportion of the periodogram dataset to include in the test split. Ifint, represents the absolute number of test samples. (default: None)train_size (int, float, optional) – if
float, should be between 0.0 and 1.0 and represent the proportion of the preiodogram dataset to include in the train split. Ifint, represents the absolute number of train samples. (default: None)obj (HeidelbergerWelch, optional) – instance of
HeidelbergerWelch(default: None)
- Returns:
- tuple[float, float]
Lower and upper confidence limits for the mean.
- kim_convergence.ucl.heidelberger_welch_relative_half_width_estimate(time_series_data: ndarray | list[float], *, confidence_coefficient: float = 0.95, heidel_welch_number_points: int = 50, fft: bool = True, test_size: int | float | None = None, train_size: int | float | None = None, obj: HeidelbergerWelch | None = None) float
Get the relative half width estimate.
The relative half width estimate is the confidence interval half-width or upper confidence limit (UCL) divided by the sample mean.
The UCL is calculated as a confidence_coefficient% confidence interval for the mean, using the portion of the time series data, which is in the stationarity region.
- Parameters:
time_series_data (array_like, 1d) – time series data.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
heidel_welch_number_points (int, optional) – the number of points that are used to obtain the polynomial fit. The parameter
heidel_welch_number_pointsdetermines the frequency range over which the fit is made. (default: 50)fft (bool, optional) – if
True, use FFT convolution. FFT should be preferred for long time series. (default: True)test_size (int, float, optional) – if
float, should be between 0.0 and 1.0 and represent the proportion of the periodogram dataset to include in the test split. Ifint, represents the absolute number of test samples. (default: None)train_size (int, float, optional) – if
float, should be between 0.0 and 1.0 and represent the proportion of the preiodogram dataset to include in the train split. Ifint, represents the absolute number of train samples. (default: None)obj (HeidelbergerWelch, optional) – instance of
HeidelbergerWelch(default: None)
- Returns:
- float
Relative half width estimate
- kim_convergence.ucl.heidelberger_welch_ucl(time_series_data: ndarray | list[float], *, confidence_coefficient: float = 0.95, heidel_welch_number_points: int = 50, fft: bool = True, test_size: int | float | None = None, train_size: int | float | None = None, obj: HeidelbergerWelch | None = None) float
Approximate the upper confidence limit of the mean.
- kim_convergence.ucl.mser_m(time_series_data: ndarray | list[float], *, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, ignore_end: int | float | None = None) tuple[bool, int]
Determine the truncation point using marginal standard error rules.
Determine the truncation point using marginal standard error rules (MSER). The MSER [white1997] and MSER-5 [spratt1998] rules determine the truncation point as the value of \(d\) that best balances the tradeoff between improved accuracy (elimination of bias) and decreased precision (reduction in the sample size) for the input series. They select a truncation point that minimizes the width of the marginal confidence interval about the truncated sample mean. The marginal confidence interval is a measure of the homogeneity of the truncated series. The optimal truncation point \(d(j)^*\) selected by MSER-m can be expressed as:
\[d(j)^* = \underset{n>d(j) \geq 0}{\text{argmin}} \left[ \frac{1}{(n(j)-d(j))^2} \sum_{i=d}^{n}{\left(X_i(j)- \bar{X}_{n,d}(j) \right )^2} \right]\]MSER-m applies the equation to a series of batch averages instead of the raw series.
- Parameters:
time_series_data (array_like, 1d) – Time series data.
batch_size (int, optional) – batch size. (default: 5)
scale (str, optional) – A method to standardize a dataset. (default: ‘translate_scale’)
with_centering (bool, optional) – If True, use time_series_data minus the scale metod centering approach. (default: False)
with_scaling (bool, optional) – If True, scale the data to scale metod scaling approach. (default: False)
ignore_end (int, or float, or None, optional) – if int, it is the last few batch points that should be ignored. if float, should be in (0, 1) and it is the percent of last batch points that should be ignored. if None it would be set to the \(Min(batch_size, number_batches / 4)\). (default: None)
- Returns:
- tuple[bool, int]
truncated: True if truncation was applied. truncation_point: Index at which to truncate.
Note
MSER-m sometimes erroneously reports a truncation point at the end of the data series. This is because the method can be overly sensitive to observations at the end of the data series that are close in value. Here, we avoid this artifact, by not allowing the algorithm to consider the standard errors calculated from the last few data points.
Note
If the truncation point returned by MSER-m > n/2, it is considered an invalid value and truncated will return as False. It means the method has not been provided with enough data to produce a valid result, and more data is required.
Note
If the truncation obtained by MSER-m is the last index of the batched data, the MSER-m returns the time series data’s last index as the truncation point. This index can be used as a measure that the algorithm did not find any truncation point.
- kim_convergence.ucl.mser_m_ci(time_series_data: ndarray | list[float], *, confidence_coefficient=0.95, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, obj: MSER_m | None = None) tuple[float, float]
Approximate the confidence interval of the mean [mokashi2010].
- Parameters:
time_series_data (array_like, 1d) – time series data.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
batch_size (int, optional) – batch size. (default: 5)
scale (str, optional) – A method to standardize a dataset. (default: ‘translate_scale)
with_centering (bool, optional) – If True, use time_series_data minus the scale metod centering approach. (default: False)
with_scaling (bool, optional) – If True, scale the data to scale metod scaling approach. (default: False)
obj (MSER_m, optional) – instance of
MSER_m(default: None)
- Returns:
- tuple[float, float]
Lower and upper confidence limits for the mean.
- kim_convergence.ucl.mser_m_relative_half_width_estimate(time_series_data: ndarray | list[float], *, confidence_coefficient=0.95, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, obj: MSER_m | None = None) float
Get the relative half width estimate.
The relative half width estimate is the confidence interval half-width or upper confidence limit (UCL) divided by the sample mean.
The UCL is calculated as a confidence_coefficient% confidence interval for the mean, using the portion of the time series data, which is in the stationarity region.
- Parameters:
time_series_data (array_like, 1d) – time series data.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
batch_size (int, optional) – batch size. (default: 5)
scale (str, optional) – A method to standardize a dataset. (default: ‘translate_scale)
with_centering (bool, optional) – If True, use time_series_data minus the scale metod centering approach. (default: False)
with_scaling (bool, optional) – If True, scale the data to scale metod scaling approach. (default: False)
obj (MSER_m, optional) – instance of
MSER_m(default: None)
- Returns:
- float
Relative half width estimate.
- kim_convergence.ucl.mser_m_ucl(time_series_data: ndarray | list[float], *, confidence_coefficient=0.95, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, obj: MSER_m | None = None) float
Approximate the upper confidence limit of the mean.
- kim_convergence.ucl.mser_m_y_ci(time_series_data: ndarray | list[float], *, confidence_coefficient: float = 0.95, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, obj: MSER_m_y | None = None) tuple[float, float]
Approximate the confidence interval of the mean [mokashi2010].
- Parameters:
time_series_data (array_like, 1d) – time series data.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
batch_size (int, optional) – batch size. (default: 5)
scale (str, optional) – A method to standardize a dataset. (default: ‘translate_scale)
with_centering (bool, optional) – If True, use time_series_data minus the scale metod centering approach. (default: False)
with_scaling (bool, optional) – If True, scale the data to scale metod scaling approach. (default: False)
obj (MSER_m_y, optional) – instance of
MSER_m_y(default: None)
- Returns:
- tuple[float, float]
Lower and upper confidence limits for the mean.
- kim_convergence.ucl.mser_m_y_relative_half_width_estimate(time_series_data: ndarray | list[float], *, confidence_coefficient: float = 0.95, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, obj: MSER_m_y | None = None) float
Get the relative half width estimate.
The relative half width estimate is the confidence interval half-width or upper confidence limit (UCL) divided by the sample mean.
The UCL is calculated as a confidence_coefficient% confidence interval for the mean, using the portion of the time series data, which is in the stationarity region.
- Parameters:
time_series_data (array_like, 1d) – time series data.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
batch_size (int, optional) – batch size. (default: 5)
scale (str, optional) – A method to standardize a dataset. (default: ‘translate_scale)
with_centering (bool, optional) – If True, use time_series_data minus the scale metod centering approach. (default: False)
with_scaling (bool, optional) – If True, scale the data to scale metod scaling approach. (default: False)
obj (MSER_m_y, optional) – instance of
MSER_m_y(default: None)
- Returns:
- float
Relative half width estimate.
- kim_convergence.ucl.mser_m_y_ucl(time_series_data: ndarray | list[float], *, confidence_coefficient: float = 0.95, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False, obj: MSER_m_y | None = None) float
Approximate the upper confidence limit of the mean.
- kim_convergence.ucl.n_skart_ci(time_series_data: ndarray | list[float], *, confidence_coefficient=0.95, equilibration_length_estimate: int = 0, fft: bool = True, obj: N_SKART | None = None) tuple[float, float]
Approximate the confidence interval of the mean.
- Parameters:
time_series_data (array_like, 1d) – time series data.
equilibration_length_estimate (int, optional) – an estimate for the equilibration length.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
fft (bool, optional) – if
True, use FFT convolution. FFT should be preferred for long time series. (default: True)obj (N_SKART, optional) – instance of
N_SKART(default: None)
- Returns:
- tuple[float, float]
Lower and upper confidence limits for the mean.
- kim_convergence.ucl.n_skart_relative_half_width_estimate(time_series_data: ndarray | list[float], *, confidence_coefficient=0.95, equilibration_length_estimate: int = 0, fft: bool = True, obj: N_SKART | None = None) float
Get the relative half width estimate.
The relative half width estimate is the confidence interval half-width or upper confidence limit (UCL) divided by the sample mean.
The UCL is calculated as a confidence_coefficient% confidence interval for the mean, using the portion of the time series data, which is in the stationarity region.
- Parameters:
time_series_data (array_like, 1d) – time series data.
equilibration_length_estimate (int, optional) – an estimate for the equilibration length.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
fft (bool, optional) – if
True, use FFT convolution. FFT should be preferred for long time series. (default: True)obj (N_SKART, optional) – instance of
N_SKART(default: None)
- Returns:
- float
Relative half width estimate.
- kim_convergence.ucl.n_skart_ucl(time_series_data: ndarray | list[float], *, confidence_coefficient=0.95, equilibration_length_estimate: int = 0, fft: bool = True, obj: N_SKART | None = None) float
Approximate the upper confidence limit of the mean.
Approximate the confidence interval of the mean.
If the population standard deviation is known, and population_standard_deviation is given,
\[UCL = t_{\alpha,d} \left(\frac{\text population\ standard\ deviation}{\sqrt{n}}\right)\]If the population standard deviation is unknown, the sample standard deviation is estimated and be used as sample_standard_deviation,
\[UCL = t_{\alpha,d} \left(\frac{\text sample\ standard\ deviation}{\sqrt{n}}\right)\]
In both cases, the
Student's tdistribution is used as the critical value. This value depends on the confidence_coefficient and the degrees of freedom, which is found by subtracting one from the number of observations.Confidence limits for the mean are interval estimates. Interval estimates are often desirable because instead of a single estimate for the mean, a confidence interval generates a lower and upper limit. It indicates how much uncertainty there is in our estimation of the true mean. The narrower the gap, the more precise our estimate is.
Confidence limits are defined as \(\bar{Y} \pm UCL,\) where \(\bar{Y}\) is the sample mean, and \(UCL\) is the approximate upper confidence limit of the mean.
- Parameters:
time_series_data (array_like, 1d) – time series data.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
population_standard_deviation (float, optional) – population standard deviation. (default: None)
si (float, or str, optional) – estimated statistical inefficiency. (default: None)
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
uncorrelated_sample_indices (array_like, 1d, optional) – indices of uncorrelated subsamples of the time series data. (default: None)
sample_method (str, optional) – sampling method, one of the
uncorrelated,random, orblock_averaged. (default: None)obj (UncorrelatedSamples, optional) – instance of
UncorrelatedSamples(default: None)
- Returns:
- tuple[float, float]
Lower and upper confidence limits for the mean. The approximately unbiased estimate of confidence Limits for the mean.
Get the relative half width estimate.
The relative half width estimate is the confidence interval half-width or upper confidence limit (UCL) divided by the sample mean.
The UCL is calculated as a confidence_coefficient% confidence interval for the mean, using the portion of the time series data, which is in the stationarity region.
- Parameters:
time_series_data (array_like, 1d) – time series data.
confidence_coefficient (float, optional) – probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
population_standard_deviation (float, optional) – population standard deviation. (default: None)
si (float, or str, optional) – estimated statistical inefficiency. (default: None)
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
uncorrelated_sample_indices (array_like, 1d, optional) – indices of uncorrelated subsamples of the time series data. (default: None)
sample_method (str, optional) – sampling method, one of the
uncorrelated,random, orblock_averaged. (default: None)obj (UncorrelatedSamples, optional) – instance of
UncorrelatedSamples(default: None)
- Returns:
- float
Relative half width estimate
Approximate the upper confidence limit of the mean.
Statistical Functions
stats module.
- class kim_convergence.stats.ZERO_RC(xlo: float, xhi: float, *, abs_tol: float = 1e-50, rel_tol: float = 1e-08)
Zero finding class by reverse communication.
- zero(status: int, x: float, fx: float, xlo: float, xhi: float)
Perform the zero finding.
- Parameters:
status (int) – Status. If 0, other parameters are ignored.
x (float) – Input value at which function f is evaluated.
fx (float) – Function value f(x).
xlo (float) – Lower interval bound.
xhi (float) – Upper interval bound.
- Returns:
- tuple[int, float, float, float]
status: 0 = finished, 1 = needs eval, -1 = error. x: updated candidate. xlo/xhi: refined bracketing interval.
- class kim_convergence.stats.ZERO_RC_BOUNDS(small: float, big: float, abs_step: float, rel_step: float, step_mul: float, *, abs_tol: float = 1e-50, rel_tol: float = 1e-08)
Bound zero finding class by reverse communication.
- zero(status: int, x: float, fx: float)
Bounds the zero of the function.
Bounds the zero of the function and finds zero of the function by reverse communication.
f must be a monotone function, otherwise the results are undefined. If f is an increasing monotone, then the result is bound by
[f(x-tolerance(x)) f(x+tolerance(x))]. If f is a decreasing monotone, then the result is bound by[f(x+tolerance(x)) f(x-tolerance(x))]. Wheretolerance(x) = Maximum(abs_tol, rel_tol * |x|).- Parameters:
status (int) – Status. If 0, other parameters are ignored.
x (float) – Input value at which function f is evaluated.
fx (float) – Function value f(x).
- Returns:
- tuple[int, float]
status: 0 = finished without error, 1 = needs another evaluation. x: updated input value.
- kim_convergence.stats.auto_correlate(x: ndarray | list[float], *, nlags: int | None = None, fft: bool = False) ndarray
Calculate the auto-correlation function.
Calculate the auto-correlation function for nlags lag for the input array. This estimator is biased.
- Parameters:
x (array_like, 1d) – Time series data.
nlags (int > 0 or None, optional) – Number of lags to return auto-correlation for it. (default: None)
fft (bool, optional) – Use FFT convolution for long series. (default: False)
- Returns:
- ndarray
Calculated auto correlation function.
- kim_convergence.stats.auto_covariance(x: ndarray | list[float], *, fft: bool = False) ndarray
Calculate biased auto-covariance estimates.
Compute auto-covariance estimates for every lag for the input array. This estimator is biased.
\[\gamma_k = \frac{1}{N}\sum\limits_{t=1}^{N-K}(x_t-\bar{x})(x_{t+K}-\bar{x})\]Note
Some sources use the following formula for computing the autocovariance:
\[\gamma_k = \frac{1}{N-K}\sum\limits_{t=1}^{N-K}(x_t-\bar{x})(x_{t+K}-\bar{x})\]This definition has less bias, than the one used here. But the \(\frac{1}{N}\) formulation has some desirable statistical properties and is the most commonly used in the statistics literature.
- Parameters:
x (array_like, 1d) – Time series data.
fft (bool, optional) – Use FFT convolution for long series. (default: False)
- Returns:
- 1darray
Estimated autocovariances.
- Raises:
CRError – If input validation fails.
- kim_convergence.stats.beta(a: float, b: float) float
Beta function.
Beta function [numrec2007] is defined as,
\[B(a, b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)},\]where \(\Gamma\) is the gamma function.
- Parameters:
a (float) – First parameter of the beta distribution.
b (float) – Second parameter of the beta distribution.
- Returns:
- float
Beta function value.
- kim_convergence.stats.betacf(a: float, b: float, x: float, *, eps: float = 1e-15, max_iteration: int = 200, _fpmin: float = 1e-30) float
Continued fraction for incomplete beta function by modified Lentz’s method.
Evaluates continued fraction for incomplete beta function by modified Lentz’s method [numrec2007].
- Parameters:
a (float) – First parameter of the beta distribution.
b (float) – Second parameter of the beta distribution.
x (float) – Real-valued such that it must be between 0.0 and 1.0.
eps (float, optional) – Machine precision epsilon. (default: {np.finfo(np.float64).resolution})
max_iteration (int, optional) – Maximum number of iterations. (default: 200)
_fpmin (float, optional) – Minimum floating point precision. (default: 1.0e-30)
- Returns:
- float
Continued fraction for incomplete beta function.
- kim_convergence.stats.betai(a: float, b: float, x: float) float
Incomplete beta function.
Incomplete beta function [numrec2007] is defined as,
\[I_x(a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \int_0^x~t^{a-1}(1-t)^{b-1}~dt,\]- Parameters:
a (float) – First parameter of the beta distribution.
b (float) – Second parameter of the beta distribution.
x (float) – Real-valued such that it must be between 0.0 and 1.0.
- Returns:
- float
Incomplete beta function value.
- kim_convergence.stats.betai_cdf(a: float, b: float, x: float) float
Calculate the cumulative distribution of the incomplete beta distribution.
Calculate the cumulative distribution of the incomplete beta distribution with parameters a and b as,
\[\int_0^x \frac{t^{a-1}~(1-t)^{b-1}}{Beta(a,b)}~dt,\]where, \(Beta(a,b)\) is the beta function.
- Parameters:
a (float) – First parameter of the beta distribution.
b (float) – Second parameter of the beta distribution.
x (float) – Upper limit of integration
- Returns:
- float
Cumulative incomplete beta distribution.
- kim_convergence.stats.betai_cdf_ccdf(a: float, b: float, x: float) tuple[float, float]
Calculate the cumulative distribution of the incomplete beta distribution.
Calculate the cumulative distribution of the incomplete beta distribution with parameters a and b as,
\[\int_0^x \frac{t^{a-1}~(1-t)^{b-1}}{Beta(a,b)}~dt,\]where, \(Beta(a,b)\) is the beta function.
- Parameters:
a (float) – First parameter of the beta distribution.
b (float) – Second parameter of the beta distribution.
x (float) – Upper limit of integration
- Returns:
- tuple[float, float]
Cumulative incomplete beta distribution, compliment of the cumulative incomplete beta distribution.
- kim_convergence.stats.check_population_cdf_args(population_cdf: str | None, population_args: tuple)
Check the input population_cdf and population_args for correctness.
- Parameters:
population_cdf (str) – The name of a distribution.
population_args (tuple) – Distribution parameter.
- kim_convergence.stats.chi_square_test(sample_var: float, sample_size: int, population_var: float, significance_level: float = 0.050000000000000044) bool
Chi-square test for the variance.
Calculate the chi-square test for the variance. This is a two-sided test. Test Statistic is \(T=(N−1)\frac{\text{var}}{\text{var}_0}\), where where N is the sample size and var is the sample variance. The ratio var/var0 compares the ratio of the sample variance to the target variance. The more this ratio deviates from 1, the more likely we are to reject the null hypothesis.
The null hypothesis is that the variance of a sample of independent observations x is equal to the given population variance, population_var.
- Parameters:
sample_var (float) – Sample variance.
sample_size (int) – Number of samples.
population_var (float) – population variance.
significance_level (float) – Significance level. A probability threshold below which the null hypothesis will be rejected. (default: 0.05)
- Returns:
- bool
Trueif the variance of a sample of independent observationsxequals the given population variancepopulation_var.
- kim_convergence.stats.cross_correlate(x: ndarray | list[float], y: ndarray | list[float] | None, *, nlags: int | None = None, fft: bool = False) ndarray
Calculate the cross-correlation function.
Calculate the cross-correlation function for nlags lag for the input array. This estimator is biased.
- Parameters:
x (array_like, 1d) – Time series data.
y (array_like, 1d) – Time series data.
nlags (int > 0 or None, optional) – Number of lags to return auto-correlation for. (default: None)
fft (bool, optional) – Use FFT convolution for long series. (default: False)
- Returns:
- ndarray
Calculated cross correlation.
- kim_convergence.stats.cross_covariance(x: ndarray | list[float], y: ndarray | list[float] | None, *, fft: bool = False) ndarray
Calculate the biased cross covariance estimate between two time series.
Calculate the cross covariance between two time series for every lag for the input arrays. This estimator is biased.
- Parameters:
x (array_like, 1d) – Time series data.
y (array_like, 1d) – Time series data.
fft (bool, optional) – Use FFT convolution for long series. (default: False)
- Returns:
- 1darray
Calculated cross covariances.
- Raises:
CRError – If input validation fails.
- kim_convergence.stats.get_distribution_stats(population_cdf: str | None, population_args: tuple, population_loc: float | None, population_scale: float | None)
Get the distribution stats from its name.
The stats include, Median, Mean, Variance, and Standard deviation of the distribution.
- Parameters:
population_cdf (str) – The name of a distribution.
population_args (tuple) – Distribution parameter.
population_loc (Optional[float]) – location of the distribution.
population_scale (Optional[float]) – scale of the distribution.
- Returns:
- tuple
median, mean, var, std
- kim_convergence.stats.get_fft_optimal_size(input_size: int) int
Find the optimal size for the FFT solver.
Get the next regular number greater than or equal to input_size [statsmodels]. Regular numbers are composites of the prime factors 2, 3, and 5. Also known as 5-smooth numbers or Hamming numbers, these are the optimal size for inputs to FFT solvers.
- Parameters:
input_size (int) – Input data size we want to use the FFT solver on it. This is the length to start searching from it and is a positive integer.
- Returns:
- int
The first 5-smooth number greater than or equal to
input_size.
- kim_convergence.stats.int_power(x: ndarray | list[float], exponent: int) ndarray
Array elements raised to the power exponent.
- Parameters:
x (array_like, 1d) – The bases.
exponent (int) – The exponent
- Returns:
- 1darray
Computed power array.
- kim_convergence.stats.kruskal_test(time_series_data: ndarray | list[float], population_cdf: str | None, population_args: tuple, population_loc: float | None, population_scale: float | None, significance_level: float = 0.050000000000000044) bool
Kruskal-Wallis H-test for independent samples.
The Kruskal-Wallis H-test tests the null hypothesis that the median of the time series data is the same as the one from population_cdf.
It is a non-parametric version of ANOVA.
- Parameters:
time_series_data (np.ndarray) – time series data.
population_cdf (Optional[str]) – The name of a distribution.
population_args (tuple) – Distribution parameter.
population_loc (Optional[float]) – location of the distribution.
population_scale (Optional[float]) – scale of the distribution.
significance_level (float, optional) – Probability threshold below which the null hypothesis is rejected. (default: 0.05)
- Returns:
- bool
Trueif the median of the time-series data equals the median of the specified population distribution.
Examples:
>>> import numpy as np >>> from scipy.stats import gamma >>> rng = np.random.RandomState(12345) >>> a = 1.99 >>> x = rng.gamma(a, 1, size=20) >>> kruskal_test(x, population_cdf='gamma', population_args=(shape,), population_loc=0, population_scale=1, significance_level=0.05) True
- kim_convergence.stats.ks_test(time_series_data: ndarray | list[float], population_cdf: str | None, population_args: tuple, population_loc: float | None, population_scale: float | None, significance_level: float = 0.050000000000000044) bool
Kolmogorov-Smirnov test for goodness of fit.
Note
This test is only valid for continuous distributions.
It uses the distribution of an observed variable against a given distribution.
The null hypothesis is that the observed samples are drawn from the same continuous distribution as the given distribution with population_loc and population_scale if they are given.
Note
The alternative hypothesis is two-sided. Where the empirical cumulative distribution function of the observed variables is less or greater than the cumulative distribution function of the given distribution.
The probability density of the given population distribution is in the standardized form. Thus to shift and/or scale the distribution population_loc and population_scale parameters are used. In these cases, the variable change y <- x, where y = (x - loc) / scale
- Parameters:
time_series_data (np.ndarray) – time series data.
population_cdf (Optional[str]) – The name of a distribution.
population_args (tuple) – Distribution parameter.
population_loc (Optional[float]) – location of the distribution.
population_scale (Optional[float]) – scale of the distribution.
significance_level (float, optional) – Probability threshold below which the null hypothesis is rejected. (default: 0.05)
- Returns:
- bool
Trueif the observed samples are drawn from the same continuous distribution as the given one (two-tailed p-value > significance_level).
- kim_convergence.stats.levene_test(time_series_data: ndarray | list[float], population_cdf: str | None, population_args: tuple, population_loc: float | None, population_scale: float | None, significance_level: float = 0.050000000000000044) bool
Perform modified Levene test for equal variances.
The modified Levene test tests the null hypothesis that one sample input time_series_data is from population population_cdf with the same variance [nistdiv898b].
Note
This test is fixed to use ‘median’ variation of the Levene’s test.
Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power.
Robustness means the ability of the test to not falsely detect unequal variances when the underlying data are not normally distributed and the variables are in fact equal.
Power means the ability of the test to detect unequal variances when the variances are in fact unequal.
- Parameters:
time_series_data (np.ndarray) – time series data.
population_cdf (Optional[str]) – The name of a distribution.
population_args (tuple) – Distribution parameter.
population_loc (Optional[float]) – location of the distribution.
population_scale (Optional[float]) – scale of the distribution.
significance_level (float, optional) – Probability threshold below which the null hypothesis is rejected. (default: 0.05)
- Returns:
- bool
Trueif the sample variance equals the population variance (two-tailed p-value > significance_level).
Examples:
>>> import numpy as np >>> from scipy.stats import gamma, alpha >>> rng = np.random.RandomState(12345) >>> shape, scale = 2., 2. >>> x = rng.gamma(shape, scale, size=1000) >>> levene_test(x, population_cdf='gamma', population_args=(shape,), population_loc=0, population_scale=scale, significance_level=0.05) True
>>> a = 1.99 >>> x = gamma.rvs(a, size=1000, random_state=rng) >>> levene_test(x, population_cdf='gamma', population_args=(a,), population_loc=0, population_scale=1, significance_level=0.05) True
>>> x = alpha.rvs(a, size=1000, random_state=rng) >>> levene_test(x, population_cdf='gamma', population_args=(a,), population_loc=0, population_scale=1, significance_level=0.05) False
Reject the null hypothesis at a confidence level of 5%, concluding that there is a difference in variance of the time_series_data and gamma distribution with shape parameter a.
Example:
>>> levene_test(x, population_cdf='alpha', population_args=(a,), population_loc=0, population_scale=1, significance_level=0.05) True
- kim_convergence.stats.modified_periodogram(x: ndarray | list[float], *, fft: bool = False, with_mean: bool = False) ndarray
Compute a modified periodogram to estimate the power spectrum.
Estimate the power spectrum using a modified periodogram. A periodogram [heidelberger1981] is an estimate of the spectral density of a signal and it is defined as,
\[\left \{ I\left(\frac{k}{n}\right) \right \}_{k = 1, \cdots, \left \lfloor \frac{n}{2} \right \rfloor},\; I\left( \frac{k}{n} \right) = \left| \sum_{j=0}^{j=n-1} {x(j) e^{-2\pi i j k / n}} \right|^2 / n\]- Parameters:
x (array_like, 1d) – Time series data.
fft (bool, optional) – Use FFT convolution for long series. (default: False)
with_mean (bool, optional) – If True, use x minus its mean. (default: False)
- Returns:
- 1darray
Computed modified periodogram array.
Note
This function does not return the array of sample frequencies. In case of need, one can compute it as,
\[f = \left \{ \frac{k}{n} \right \}_{k = 1, \cdots, \left \lfloor \frac{n}{2} \right \rfloor + 1}\]or
>>> f = np.arange(1., x.size//2 + 1) / x.size
- Raises:
CRError – If input validation fails.
- kim_convergence.stats.moment(x: ndarray | list[float], *, moment: int = 1) float
Calculates the nth moment about the mean for a sample.
- Parameters:
x (array_like, 1d) – Time series data.
moment (int, optional) – Order of central moment that is returned. (default: 1)
- Returns:
- float
n-th central moment.
Note
The k-th central moment of a time series data,
\[m_k = \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})^k,\]where \(n\) is the number of samples and \(\bar{x}\) is the mean.
- kim_convergence.stats.normal_interval(confidence_level: float, *, loc: float = 0.0, scale: float = 1.0) tuple[float, float]
Compute the normal distribution confidence interval.
Compute the normal-distribution confidence interval with equal areas around the median.
- Parameters:
confidence_level (float) – Confidence coefficient (must be between 0.0 and 1.0).
loc (float, optional) – Location parameter. (default: 0.0)
scale (float, optional) – Scale parameter. (default: 1.0)
- Returns:
- tuple[float, float]
Lower and upper bounds of the confidence interval that contains \(100~\text{confidence_level}\%\) of the distribution.
Note
Confidence interval is a range of values that is likely to contain an unknown population parameter.
Confidence level is the percentage of the confidence intervals which will hold the population parameter.
The significance level or alpha is the probability of rejecting the null hypothesis when it is true. To find alpha, just subtract the confidence interval from 100%. E.g., the significance level for a 90% confidence level is 100% – 90% = 10%.
- kim_convergence.stats.normal_inv_cdf(p: float, *, loc=0.0, scale: float = 1.0) float
Compute the normal distribution inverse cumulative distribution function.
- Parameters:
p (float) – Probability (must be between 0.0 and 1.0).
loc (float, optional) – Location parameter. (default: 0.0)
scale (float, optional) – Scale parameter. (default: 1.0)
- Returns:
- float
Inverse cumulative distribution function: value \(x\) such that \(P(X \le x) = p\).
- kim_convergence.stats.periodogram(x: ndarray | list[float], *, fft: bool = False, with_mean: bool = False) ndarray
Compute a periodogram to estimate the power spectrum.
- Parameters:
x (array_like, 1d) – Time series data.
fft (bool, optional) – Use FFT convolution for long series. (default: False)
with_mean (bool, optional) – If True, use x minus its mean. (default: False)
- Returns:
- 1darray
Computed power spectrum array.
Note
This function does not return the array of sample frequencies. In case of need, one can compute it as,
\[f = \left \{ \frac{k}{n} \right \}_{k = 1, \cdots, \left \lfloor \frac{n}{2} \right \rfloor + 1}\]or
>>> f = np.arange(1., x.size//2 + 1) / x.size
- kim_convergence.stats.randomness_test(x: ndarray | list[float], significance_level: float) bool
Testing for independence of observations.
The von-Neumann ratio test of independence of variables is a test designed for checking the independence of subsequent observations.
The null hypothesis is that the data are independent and normally distributed.
- Parameters:
x (array_like, 1d) – Time series data.
significance_level (float) – Probability threshold below which the null hypothesis is rejected.
- Returns:
- bool
Trueif the observations are independent.
Note
Given a series \(x\) of \(n\) data points, the Von-Neumann test [vonneumann1941] [vonneumann1941b] statistic is
\[v = \frac{\sum_{i=2}^{n} (x_i - x_{i-1})^2}{\sum_{i=1}^n (x_i - \bar{x})^2}\]Under the null hypothesis of independence, the mean \(\bar{v} = 2\) and the variance \(\sigma^2_v = \frac{4 (n - 2)}{(n^2-1)}\) (see [williams1941], and [madansky1988] for a simple derivation).
- kim_convergence.stats.s_normal_inv_cdf(p: float) float
Compute the standard normal distribution inverse cumulative distribution function.
Compute the inverse cumulative distribution function (percent point function or quantile function) for standard normal distribution [pythonstats], [wichura1988].
- Parameters:
p (float) – Probability (must be between 0.0 and 1.0).
- Returns:
- float
Inverse cumulative distribution function: value \(x\) such that \(P(X \le x) = p\).
- kim_convergence.stats.skew(x: ndarray | list[float], *, bias: bool = False) float
Compute the time series data set skewness [zwillinger2000].
skewnessis a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.- Parameters:
x (array_like, 1d) – Time series data.
bias (bool, optional) – If False, then the calculations are corrected for statistical bias. (default: False)
- Returns:
- float
The skewness
Note
For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution.
The sample skewness is computed as the Fisher-Pearson coefficient of skewness \(g_1 = \frac{m_3}{m_2^{3/2}}\), where \(m_i\) is the biased sample \(i\texttt{th}\) central moment. If
biasis False, the calculations are corrected for bias and the value computed is the adjusted Fisher-Pearson standardized moment coefficient, i.e.\[G_1 = \frac{k_3}{k_2^{3/2}} = \frac{\sqrt{N(N-1)}}{N-2} \frac{m_3}{m_2^{3/2}}.\]
- kim_convergence.stats.t_cdf(t: float, df: float) float
Compute the cumulative distribution of the t-distribution.
The cumulative distribution of the t-distribution for t > 0, can be written in terms of the regularized incomplete beta function as,
\[\int_{-\infty}^t f(u)\,du = 1 - \frac{1}{2} I_{x(t)}\left(\frac{\nu}{2}, \frac{1}{2}\right),\]where,
\[x(t) = \frac{\nu}{{t^2+\nu}}.\]Other t values would be obtained by symmetry.
- Parameters:
t (float) – Upper limit of the integration.
df (float) – Degrees of freedom, must be a positive number.
- Returns:
- float
Cumulative t-distribution.
- kim_convergence.stats.t_cdf_ccdf(t: float, df: float) tuple[float, float]
Compute the cumulative distribution of the t-distribution.
The cumulative distribution of the t-distribution for t > 0, can be written in terms of the regularized incomplete beta function as,
\[\int_{-\infty}^t f(u)\,du = 1 - \frac{1}{2} I_{x(t)}\left(\frac{\nu}{2}, \frac{1}{2}\right),\]where,
\[x(t) = \frac{\nu}{{t^2+\nu}}.\]Other t values would be obtained by symmetry.
- Parameters:
t (float) – Upper limit of the integration.
df (float) – Degrees of freedom, must be a positive number.
- Returns:
- tuple[float, float]
cdf: cumulative t-distribution value. ccdf: complement of the cumulative t-distribution (1 - cdf).
- kim_convergence.stats.t_interval(confidence_level: float, df: float, *, loc: float = 0.0, scale: float = 1.0) tuple[float, float]
Compute the t_distribution confidence interval.
Compute the t_distribution confidence interval with equal areas around the median.
- Parameters:
confidence_level (float) – (or confidence coefficient) must be between 0.0 and 1.0
df (float) – Degrees of freedom, must be > 0.
loc (float, optional) – location parameter (default: 0.0)
scale (float, optional) – scale parameter (default: 1.0)
- Returns:
- tuple[float, float]
Lower and upper bounds of the confidence interval that contains \(100 \cdot \text{confidence_level}\%\) of the t-distribution.
Note
Confidence interval is a range of values that is likely to contain an unknown population parameter.
Confidence level is the percentage of the confidence intervals which will hold the population parameter.
The significance level or alpha is the probability of rejecting the null hypothesis when it is true. To find alpha, just subtract the confidence interval from 100%. E.g., the significance level for a 90% confidence level is 100% – 90% = 10%.
- kim_convergence.stats.t_inv_cdf(p: float, df: float, *, loc: float = 0.0, scale: float = 1.0, _tol: float = 1e-08, _atol: float = 1e-50, _rtinf: float = 1e+100) float
Compute the t_distribution inverse cumulative distribution function.
Compute the inverse cumulative distribution function (percent point function or quantile function) for t-distributions with df degrees of freedom. Inverse cumulative distribution function finds the value of the random variable such that the probability of the variable being less than or equal to that value equals the given probability.
- Parameters:
p (float) – Probability (must be between 0.0 and 1.0)
df (float) – Degrees of freedom, must be > 1.
loc (float, optional) – location parameter (default: 0.0)
scale (float, optional) – scale parameter (default: 1.0)
- Returns:
- float
Inverse cumulative distribution function: value \(x\) such that \(P(X \le x) = p\).
- kim_convergence.stats.t_test(sample_mean: float, sample_std: float, sample_size: int, population_mean: float, significance_level: float = 0.050000000000000044) bool
T-test for the mean.
Calculate the T-test for the mean. This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations x is equal to the given population mean, population_mean.
- Parameters:
sample_mean (float) – Sample mean.
sample_std (float) – Sample standard deviation.
sample_size (int) – Number of samples.
population_mean (float) – Expected value in the null hypothesis.
significance_level (float) – Significance level. A probability threshold below which the null hypothesis will be rejected. (default: 0.05)
- Returns:
- bool
Trueif the expected value (mean) of a sample of independent observationsxequals the given population meanpopulation_mean.
- kim_convergence.stats.wilcoxon_test(time_series_data: ndarray | list[float], population_cdf: str | None, population_args: tuple, population_loc: float | None, population_scale: float | None, significance_level: float = 0.050000000000000044) bool
Calculate the Wilcoxon signed-rank test.
Here it is used as a non-parametric test to determine whether an unknown population mean is different from a specific value.
- Parameters:
time_series_data (np.ndarray) – time series data.
population_cdf (Optional[str]) – The name of a distribution.
population_args (tuple) – Distribution parameter.
population_loc (Optional[float]) – location of the distribution.
population_scale (Optional[float]) – scale of the distribution.
significance_level (float, optional) – Probability threshold below which the null hypothesis is rejected. (default: 0.05)
- Returns:
- bool
Trueif the sample is drawn from the specified population distribution.
Examples:
>>> import numpy as np >>> from scipy.stats import gamma >>> rng = np.random.RandomState(12345) >>> shape, scale = 2., 2. >>> x = rng.gamma(shape, scale, size=1000) >>> wilcoxon_test(x, population_cdf='gamma', population_args=(shape,), population_loc=0, population_scale=scale, significance_level=0.05) True
>>> wilcoxon_test(x, population_cdf='gamma', population_args=(shape,), population_loc=0, population_scale=1, significance_level=0.05) False
Time Series Functions
Time series module.
- kim_convergence.timeseries.estimate_equilibration_length(time_series_data: ndarray | list[float], *, si: str | None = None, nskip: int | None = 1, fft: bool = True, minimum_correlation_time: int | None = None, ignore_end: int | float | None = None, number_of_cores: int = 1, batch_size: int = 5, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False) tuple[int, float]
Estimate the equilibration point in a time series data.
Estimate the equilibration point in a time series data using the statistical inefficiencies [chodera2016], [geyer1992], [geyer2011].
- Parameters:
time_series_data (array_like, 1d) – Time-series data.
si (Optional[str], optional) – Statistical-inefficiency method. (default: None)
nskip (Optional[int], optional) – Number of data points to skip. (default: 1)
fft (bool, optional) – Use FFT convolution for long series. (default: True)
minimum_correlation_time (Optional[int], optional) – Minimum correlation-time window; algorithm stops when correlation first goes negative. (default: None)
ignore_end (Optional[Union[int, float]], optional) – If int, last points to ignore; if float in (0, 1), fraction to ignore; if None, uses one fourth of data. (default: None)
number_of_cores (int, optional) – The maximum number of concurrently running jobs, such as the number of Python worker processes or the size of the thread-pool. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. (default: 1)
- Returns:
- tuple[int, float]
equilibration_index: index where equilibrated region starts. statistical_inefficiency: statitical inefficiency estimates of a time series at the equilibration index estimate.
Note
batch_size, scale, with_centering, and with_scaling are accepted for API compatibility but are not used by this method.
- kim_convergence.timeseries.geyer_r_statistical_inefficiency(x: ndarray | list[float], y: ndarray | list[float] | None = None, *, fft: bool = True, minimum_correlation_time: int | None = None) float
Compute the statistical inefficiency.
Compute the statistical inefficiency using the Geyer’s [geyer1992], [geyer2011] initial monotone sequence criterion.
Note
The behavior is updated. Suppose the time series data is an array of (constant) numbers with standard deviation close to zero within abs_tol=1e-18, where abs(a) <= max(1e-9 * abs(a), abs_tol). In that case, this function returns the statistical inefficiency as the size of the time series data array.
Note
The effective sample size is computed by:
\[\begin{split}\hat{N}_{eff} &= \frac{N}{si} \\ si &= -1 + 2 \sum_{t'=0}^m \hat{P}_{t'}\end{split}\]where \(N\) is the number of data points. \(\hat{P}_{t'} = \hat{\rho}_{2t'} + \hat{\rho}_{2t'+1}\), where \(\hat{\rho}_t'\) is the estimated auto-correlation at lag \(t'\), and \(m\) is the last integer for which \(\hat{P}_{t'}\) is still positive (largest \(m\) such that \(\hat{P}_{t'} > 0,~t'=1,\cdots,m\)). The initial monotone sequence is obtained by further reducing \(\hat{P}_{t'}\) to the minimum of the preceding ones so that the estimated sequence is monotone.
The current implementation is similar to Stan [mcstan], which uses Geyer’s initial monotone sequence criterion (Geyer, 1992 [geyer1992]; Geyer, 2011 [geyer2011]).
- Parameters:
x (array_like, 1d) – time series data. Using this method, statistical inefficiency can not be estimated with less than four data points.
y (array_like, 1d, optional) – time series data. If it is passed to this function, the cross-correlation of timeseries x and y will be estimated instead of the auto-correlation of timeseries x. (default: None)
fft (bool, optional) – if
True, use FFT convolution. FFT should be preferred for long time series. (default: True)
- Returns:
- float
estimated statistical inefficiency. \(si >= 1\) is the estimated statistical inefficiency (equal to \(si = -1 + 2 \sum_{t'=0}^m \hat{P}_{t'}\), where \(\hat{P}_{t'} = \hat{\rho}_{2t'} + \hat{\rho}_{2t'+1}\))
Note
minimum_correlation_time is accepted for API compatibility but is not used by this method.
- kim_convergence.timeseries.geyer_split_r_statistical_inefficiency(x: ndarray | list[float], y: ndarray | list[float] | None = None, *, fft: bool = True, minimum_correlation_time: int | None = None) float
Compute the statistical inefficiency.
Compute the statistical inefficiency using the split-r method of Geyer’s [geyer1992], [geyer2011] initial monotone sequence criterion.
Note
The effective sample size is computed by:
\[\begin{split}\hat{N}_{eff} &= \frac{N}{si} \\ si &= -1 + 2 \sum_{t'=0}^m \hat{P}_{t'}\end{split}\]where \(N\) is the number of data points. \(\hat{P}_{t'} = \hat{\rho}_{2t'} + \hat{\rho}_{2t'+1}\), where \(\hat{\rho}_t'\) is the estimated auto-correlation at lag \(t'\), and \(m\) is the last integer for which \(\hat{P}_{t'}\) is still positive (largest \(m\) such that \(\hat{P}_{t'} > 0,~t'=1,\cdots,m\)). The initial monotone sequence is obtained by further reducing \(\hat{P}_{t'}\) to the minimum of the preceding ones so that the estimated sequence is monotone.
The current implementation is similar to Stan [mcstan], which uses Geyer’s initial monotone sequence criterion (Geyer, 1992 [geyer1992]; Geyer, 2011 [geyer2011]).
- Parameters:
x (array_like, 1d) – time series data. Using this method, statistical inefficiency can not be estimated with less than eight data points.
fft (bool, optional) – if
True, use FFT convolution. FFT should be preferred for long time series. (default: True)
- Returns:
- float
estimated statistical inefficiency. \(si >= 1\) is the estimated statistical inefficiency (equal to \(si = -1 + 2 \sum_{t'=0}^m \hat{P}_{t'}\), where \(\hat{P}_{t'} = \hat{\rho}_{2t'} + \hat{\rho}_{2t'+1}\))
Note
minimum_correlation_time is accepted for API compatibility but is not used by this method.
- kim_convergence.timeseries.geyer_split_statistical_inefficiency(x: ndarray | list[float], y: ndarray | list[float] | None = None, *, fft: bool = True, minimum_correlation_time: int | None = None) float
Compute the statistical inefficiency.
Computes the effective sample size. The value returned is the minimum of effective sample size and the data size times log10(data size).
Note
Note that the effective sample size can not be estimated with less than four samples.
Note
The behavior is updated. Suppose the time series data is an array of (constant) numbers with standard deviation close to zero within abs_tol=1e-18, where abs(a) <= max(1e-9 * abs(a), abs_tol). In that case, this function returns the statistical inefficiency as the size of the time series data array.
- Parameters:
x (array_like, 1d) – time series data.
fft (bool, optional) – if
True, use FFT convolution. FFT should be preferred for long time series. (default: True)
- Returns:
- float
estimated statistical inefficiency. \(si >= 1\) is the estimated statistical inefficiency
Note
minimum_correlation_time is accepted for API compatibility but is not used by this method.
- kim_convergence.timeseries.integrated_auto_correlation_time(x: ndarray | list[float], y: ndarray | list[float] | None = None, *, si: str | float | int | None = None, fft: bool = True, minimum_correlation_time: int | None = None) float
Estimate the integrated auto-correlation time.
The statistical inefficiency \(si\) of the observable \(x\) of a time series \(\left \{X\right \}_{t=0}^n\) is formally defined as, \(si \equiv 1 + 2\tau\), where \(\tau\) denotes the integrated auto-correlation time.
- Parameters:
x (array_like, 1d) – time series data.
y (array_like, 1d, optional) – time series data. (default: None) If it is passed to this function, the cross-correlation of timeseries x and y will be estimated instead of the auto-correlation of timeseries x.
si (float, or str, optional) – estimated statistical inefficiency, or a method of computing the statistical inefficiency. (default: None)
fft (bool, optional) – if
True, use FFT convolution. FFT should be preferred for long time series. (default: True)minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
- Returns:
- float
integrated auto-correlation time. estimated \(\tau\) (the integrated auto-correlation time)
- kim_convergence.timeseries.statistical_inefficiency(x: ndarray | list[float], y: ndarray | list[float] | None = None, *, fft: bool = True, minimum_correlation_time: int | None = None) float
Compute the statistical inefficiency.
The statistical inefficiency \(si\) of the observable \(x\) of a time series \(\{X\}_{t=0}^n\) is formally defined as,
\[\begin{split}si &\equiv 1 + 2\tau \\ \tau &\equiv \sum_{t=0}^n {\left( 1 - \frac{t}{n} \right) C\left(t\right)} \\ C\left(t\right) &\equiv \frac{<x(X_{t_0})x(X_{t_0+t})> - {<x>}^2}{<x^2>-{<x>}^2}\end{split}\]where \(\tau\) denotes the integrated auto-correlation time and \(C\left(t\right)\) is the normalized fluctuation auto-correlation function of the observable \(x\)
Note
The behavior is updated. Suppose the time series data is an array of (constant) numbers with standard deviation close to zero within abs_tol=1e-18, where abs(a) <= max(1e-9 * abs(a), abs_tol). In that case, this function returns the statistical inefficiency as the size of the time series data array.
- Parameters:
x (array_like, 1d) – time series data.
y (array_like, 1d, optional) – time series data. If it is passed to this function, the cross-correlation of timeseries x and y will be estimated instead of the auto-correlation of timeseries x. (default: None)
fft (bool, optional) – if
True, use FFT convolution. FFT should be preferred for long time series. (default: True)minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
- Returns:
- float
estimated statistical inefficiency. \(si >= 1\) is the estimated statistical inefficiency (equal to \(1 + 2\tau\), where \(\tau\) denotes the integrated auto-correlation time).
- kim_convergence.timeseries.time_series_data_si(time_series_data: ndarray | list[float], *, si: str | float | int | None = None, fft: bool = True, minimum_correlation_time: int | None = None) float
Helper method to compute or return the statistical inefficiency value.
- Parameters:
time_series_data (array_like, 1d) – time series data.
si (float, or str, optional) – estimated statistical inefficiency. (default: None)
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
- Returns:
- float
estimated statistical inefficiency value. \(si >= 1\) is the estimated statistical inefficiency.
Return average value for each block after blocking the data.
At first, break down the time series data into the series of blocks, where each block contains
sisuccessive data points. If si (statistical inefficiency) is not provided it will be computed. Then the average value for each block is determined. This coarse graining approach is commonly used for thermodynamic properties.- Parameters:
time_series_data (array_like, 1d) – time series data.
si (float, or str, optional) – estimated statistical inefficiency. (default: None)
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
uncorrelated_sample_indices (array_like, 1d, optional) – indices of uncorrelated subsamples of the time series data. must be monotonically increasing. If None they are computed automatically. (default: None)
- Returns:
- 1darray
uncorrelated_sample of the time series data. average value for each block after blocking the time series data.
Return random data for each block after blocking the data.
At first, break down the time series data into the series of blocks, where each block contains
sisuccessive data points. If si (statistical inefficiency) is not provided it will be computed. Then a single value is taken at random from each block.- Parameters:
time_series_data (array_like, 1d) – time series data.
si (float, or str, optional) – estimated statistical inefficiency. (default: None)
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
uncorrelated_sample_indices (array_like, 1d, optional) – indices of uncorrelated subsamples of the time series data. must be monotonically increasing. If None they are computed automatically. (default: None)
- Returns:
- 1darray
uncorrelated_sample of the time series data. random data for each block after blocking the time series data.
Return time series data at uncorrelated sample indices.
Subsample a correlated timeseries to extract an effectively uncorrelated dataset. If si (statistical inefficiency) is not provided it will be computed.
- Parameters:
time_series_data (array_like, 1d) – time series data.
si (float, or str, optional) – estimated statistical inefficiency.
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
uncorrelated_sample_indices (array_like, 1d, optional) – indices of uncorrelated subsamples of the time series data. must be monotonically increasing. If None they are computed automatically. (default: None)
- Returns:
- 1darray
uncorrelated_sample of the time series data. time series data at uncorrelated sample indices.
Return indices of uncorrelated subsamples of the time series data.
Return indices of the uncorrelated sample of the time series data. Subsample a correlated timeseries to extract an effectively uncorrelated dataset. If si (statistical inefficiency) is not provided it will be computed.
- Parameters:
time_series_data (array_like, 1d) – time series data.
si (float, or str, optional) – estimated statistical inefficiency. (default: None)
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
- Returns:
- 1darray
indices array. Indices of uncorrelated subsamples of the time series data.
Get time series data at the sample_method uncorrelated_sample indices.
Subsample a correlated timeseries to extract an effectively uncorrelated dataset. If si (statistical inefficiency) is not provided it will be computed.
- Parameters:
time_series_data (array_like, 1d) – time series data.
si (float, or str, optional) – estimated statistical inefficiency.
fft (bool, optional) – if True, use FFT convolution. FFT should be preferred for long time series. (default: True)
minimum_correlation_time (int, optional) – minimum amount of correlation function to compute. The algorithm terminates after computing the correlation time out to minimum_correlation_time when the correlation function first goes negative. (default: None)
uncorrelated_sample_indices (array_like, 1d, optional) – indices of uncorrelated subsamples of the time series data. (default: None)
sample_method (str, optional) – sampling method, one of the
uncorrelated,random, orblock_averaged. (default: None)
- Returns:
- 1darray
uncorrelated_sample of the time series data. time series data at uncorrelated sample indices.
Utility Functions
batch
- kim_convergence.batch(time_series_data: ~numpy.ndarray | list, *, batch_size: int = 5, func: ~typing.Callable[[...], ~numpy.ndarray] = <function mean>, scale: str = 'translate_scale', with_centering: bool = False, with_scaling: bool = False) ndarray
Batch the time series data.
- Parameters:
time_series_data (array_like, 1d) – Time series data.
batch_size (int, optional) – batch size. (default: 5)
func (callable, optional) – Reduction function capable of receiving a single axis argument. It is called with time_series_data as first argument. (default: np.mean)
scale (str, optional) – A method to standardize a dataset. (default: ‘translate_scale’)
with_centering (bool, optional) – If True, use time_series_data minus the scale metod centering approach. (default: False)
with_scaling (bool, optional) – If True, scale the data to scale metod scaling approach. (default: False)
- Returns
- 1darray
Batched (, and rescaled) data.
Note
This function will terminate the end of the data points which are remainder of the division of data points by the batch_size.
Note
By default, this method is using
np.meanand compute the arithmetic mean.Example:
>>> import numpy as np >>> rng = np.random.RandomState(12345) >>> x = np.ones(100) * 10 + (rng.random_sample(100) - 0.5) >>> x_batch = batch(x, batch_size=5) >>> x_batch.size 20 >>> print(x.mean(), x_batch.mean()) 10.054804081191616 10.054804081191616
>>> x_batch_scaled = batch(x, batch_size=5, scale='translate_scale', with_scaling=True) >>> x_batch_scaled.size 20 >>> print(x.mean(), x_batch_scaled.mean()) 10.054804081191616 1.0
outlier_test
- kim_convergence.outlier_test(x: ndarray | list[float], outlier_method: str = 'iqr') ndarray | None
Test to detect what are outliers in the data.
The intuitive definition for the concept of an outlier in the data is a point that significantly deviates from its expected value. Therefore, given a time series (or a random sample from a population), a point can be declared an outlier if the distance to its expected value is higher than a predefined threshold (\(|x_i - E(x)| > \tau\)), where \(x_i\) is the observed data point, and \(E(x)\) is its expected value.
The methods based on this strategy are the most common approaches in the literature. These methods intend to detect outliers, but it is up to the analyst to decide if the detected points are real outliers. Thus it is necessary to characterize standard data points before removing any outliers detected by these approaches.
- Parameters:
x (array_like, 1d) – Time series data.
outlier_method (str, optional) – Method for outlier detection. (default: ‘iqr’)
- Returns:
- Optional[ndarray]
Indices of outliers; None if no outliers found.
Scaler classes
- class kim_convergence.MinMaxScale(*, feature_range: tuple[float, float] = (0, 1))
Standardize/Transform a dataset by scaling it to a given range.
This estimator scales and translates a dataset such that it is in the given range, e.g. between zero and one.
The transformation is given by:
\[\begin{split}x_{\text{std}} = \frac{x - \min(x)}{\max(x) - \min(x)} \\ \text{scaled}_x = x_{\text{std}} \cdot (\text{max} - \text{min}) + \text{min}\end{split}\]where min, max = feature_range.
- Parameters:
feature_range (tuple, optional) – tuple (min, max). (default: (0, 1)). Desired range of transformed data.
Examples:
>>> from kim_convergence import MinMaxScale, minmax_scale >>> data = [-1., 3.] >>> mms = MinMaxScale() >>> scaled_x = mms.scale(data) >>> print(scaled_x) [0. 1.]
>>> x = mms.inverse(scaled_x) >>> print(x) [-1. 3.]
>>> data = [-1., 3., 100.] >>> scaled_x = minmax_scale(data) >>> print(scaled_x) [0. 0.03960396 1.]
>>> mms = MinMaxScale() >>> scaled_x = mms.scale(data) >>> x = mms.inverse(scaled_x) >>> print(x) [ -1. 3. 100.]
- inverse(x: ndarray) ndarray
Undo the scaling of dataset to its original range.
- Parameters:
x (array_like, 1d) – Time series data.
- Returns:
- 1darray
Transformed data.
- scale(x: ndarray | list) ndarray
Standardize a dataset by scaling it to a given range.
- Parameters:
x (array_like, 1d) – Time series data.
- Returns:
- 1darray
Scaled dataset to a given range.
- class kim_convergence.TranslateScale(*, with_centering: bool = True, with_scaling: bool = True)
Standardize a dataset.
Standardize a dataset by translating the data set so that \(x[0]=0\) and rescaled by overall averages so that the numbers are of O(1) with a good spread. (default: True)
The translate and scale of a sample x is calculated as:
\[z = \frac{(x - x_0)}{u}\]where \(x_0\) is \(x[0]\) or \(0\) if with_centering=False, and u is the mean of the samples or \(1\) if with_scaling=False.
- Parameters:
with_centering (bool, optional) – If True, use x minus its first element. (default: True)
with_scaling (bool, optional) – If True, scale the data to overall averages so that the numbers are of O(1) with a good spread. (default: True)
Examples:
>>> from kim_convergence import TranslateScale >>> data = [1., 2., 2., 2., 3.] >>> tsc = TranslateScale() >>> scaled_x = tsc.scale(data) >>> print(scaled_x) [0. 1. 1. 1. 2.]
>>> x = tsc.inverse(scaled_x) >>> print(x) [1. 2. 2. 2. 3.]
- inverse(x: ndarray) ndarray
Undo the scaling of dataset to its original range.
- Parameters:
x (array_like, 1d) – Time series data.
- Returns:
- 1darray
Transformed data.
- scale(x: ndarray | list) ndarray
Standardize a dataset by scaling it to a given range.
- Parameters:
x (array_like, 1d) – Time series data.
- Returns:
- 1darray
Scaled dataset to a given range.
- class kim_convergence.StandardScale(*, with_centering: bool = True, with_scaling: bool = True)
Standardize a dataset.
Standardize a dataset by removing the mean and scaling to unit variance. The standard score of a sample x is calculated as:
\[z = \frac{(x - u)}{s}\]where u is the mean of the samples or \(0\) if with_centering=False , and s is the standard deviation of the samples or \(1\) if with_scaling=False.
Centering and scaling happen independently.
- Parameters:
with_centering (bool, optional) – If True, use x minus its mean, or center the data before scaling. (default: True)
with_scaling (bool, optional) – If True, scale the data to unit variance (or equivalently, unit standard deviation). (default: True)
Note
If set explicitly with_centering=False (only variance scaling will be performed on x). We use a biased estimator for the standard deviation.
Examples:
>>> from kim_convergence import StandardScale >>> data = [-0.5, 6] >>> ssc = StandardScale() >>> scaled_x = ssc.scale(data) >>> print(scaled_x) [-1. 1.]
>>> x = ssc.inverse(scaled_x) >>> print(x) [-0.5 6. ]
- inverse(x: ndarray) ndarray
Undo the scaling of dataset to its original range.
- Parameters:
x (array_like, 1d) – Time series data.
- Returns:
- 1darray
Transformed data.
- scale(x: ndarray | list) ndarray
Standardize a dataset.
- Parameters:
x (array_like, 1d) – The data to center and scale.
- Returns:
- 1darray
Scaled and/or Centered dataset.
- class kim_convergence.RobustScale(*, with_centering: bool = True, with_scaling: bool = True, quantile_range: tuple[float, float] = (25.0, 75.0))
Standardize a dataset.
Standardize a dataset by centering to the median and component wise scale according to the inter-quartile range. These features are robust to outliers.
This way removes the median and scales the data according to the quantile range. The Interquartile Range is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).
Centering and scaling happen independently.
- Parameters:
with_centering (bool, optional) – If True, center the data before scaling. (default: True)
with_scaling (bool, optional) – If True, scale the data. (default: True)
quantile_range (tuple, or list, optional) – (q_min, q_max), 0.0 < q_min < q_max < 100.0 (default: (25.0, 75.0) = (1st quantile, 3rd quantile))
Examples:
>>> from kim_convergence import RobustScale >>> data = [ 4., 1., -2.] >>> rsc = RobustScale() >>> scaled_x = rsc.scale(data) >>> print(scaled_x) [ 1.22474487 0. -1.22474487]
>>> x = rsc.inverse(scaled_x) >>> print(x) [ 4. 1. -2.]
- inverse(x: ndarray) ndarray
Undo the scaling of dataset to its original range.
- Parameters:
x (array_like, 1d) – Time series data.
- Returns:
- 1darray
Transformed data.
- scale(x: ndarray | list) ndarray
Standardize a dataset using median and quantile range.
- Parameters:
x (array_like, 1d) – The data to center and scale.
- Returns:
- 1darray
Scaled dataset.
- class kim_convergence.MaxAbsScale
Standardize a dataset to the [-1, 1] range.
Standardize a dataset to the [-1, 1] range such that the maximal absolute value in the data set will be 1.0.
Examples:
>>> from kim_convergence import MaxAbsScale >>> data = [ 4., 1., -9.] >>> mas = MaxAbsScale() >>> scaled_x = mas.scale(data) >>> print(scaled_x) [ 0.44444444 0.11111111 -1. ]
>>> x = mas.inverse(scaled_x) >>> print(x) [ 4. 1. -9.]
- inverse(x: ndarray) ndarray
Undo the scaling of dataset to its original range.
- Parameters:
x (array_like, 1d) – Time series data.
- Returns:
- 1darray
Transformed data.
- scale(x: ndarray | list) ndarray
Online computation of max absolute value of x for later scaling.
All of x is processed as a single batch. This is intended for cases when
fit()is not feasible due to very large number of n_samples or because X is read from a continuous stream.- Parameters:
x (array_like, 1d) – The data to scale.
- Returns:
- 1darray
Scaled dataset.
Convenience functions
minmax_scale
- kim_convergence.minmax_scale(x: ndarray, *, with_centering: bool = True, with_scaling: bool = True, feature_range: tuple[float, float] = (0.0, 1.0)) ndarray
Standardize/Transform a dataset by scaling it to a given range.
This estimator scales and translates a dataset such that it is in the given range, e.g. between zero and one.
The transformation is given by:
\[\begin{split}x_{\text{std}} = \frac{x - \min(x)}{\max(x) - \min(x)} \\ \text{scaled}_x = x_{\text{std}} \cdot (\text{max} - \text{min}) + \text{min}\end{split}\]where min, max = feature_range.
- Parameters:
x (array_like, 1d) – Time series data.
feature_range (tuple, optional) – tuple (min, max). (default: (0, 1)) Desired range of transformed data.
- Returns:
- 1darray
Scaled dataset to a given range.
Note
with_centering, and with_scaling are accepted for API compatibility but are not used by this method.
translate_scale
- kim_convergence.translate_scale(x: ndarray, *, with_centering: bool = True, with_scaling: bool = True) ndarray
Standardize a dataset.
Standardize a dataset by translating the data set so that \(x[0]=0\) and rescaled by overall averages so that the numbers are of O(1) with a good spread. (default: True)
The translate and scale of a sample x is calculated as:
\[z = \frac{(x - x_0)}{u}\]where \(x_0\) is \(x[0]\) or \(0\) if with_centering=False, and u is the mean of the samples or \(1\) if with_scaling=False.
- Parameters:
x (array_like, 1d) – The data to center and scale.
with_centering (bool, optional) – If True, use x minus its first element. (default: True)
with_scaling (bool, optional) – If True, scale the data to overall averages so that the numbers are of O(1) with a good spread. (default: True)
- Returns:
- 1darray
Scaled dataset.
standard_scale
- kim_convergence.standard_scale(x: ndarray, *, with_centering: bool = True, with_scaling: bool = True) ndarray
Standardize a dataset.
Standardize a dataset by removing the mean and scaling to unit variance. The standard score of a sample x is calculated as:
\[z = \frac{(x - u)}{s}\]where u is the mean of the samples or \(0\) if with_centering=False , and s is the standard deviation of the samples or \(1\) if with_scaling=False.
- Parameters:
x (array_like, 1d) – The data to center and scale.
with_centering (bool, optional) – If True, use x minus its mean, or center the data before scaling. (default: True)
with_scaling (bool, optional) – If True, scale the data to unit variance (or equivalently, unit standard deviation). (default: True)
- Returns:
- 1darray
Scaled dataset
Note
If set explicitly with_centering=False (only variance scaling will be performed on x). We use a biased estimator for the standard deviation.
robust_scale
- kim_convergence.robust_scale(x: ndarray, *, with_centering: bool = True, with_scaling: bool = True, quantile_range: tuple[float, float] = (25.0, 75.0)) ndarray
Standardize a dataset.
Standardize a dataset by centering to the median and component wise scale according to the inter-quartile range.
- Parameters:
x (array_like, 1d) – The data to center and scale.
with_centering (bool, optional) – If True, center the data before scaling. (default: True)
with_scaling (bool, optional) – If True, scale the data. (default: True)
quantile_range (tuple, or list, optional) – (q_min, q_max), 0.0 < q_min < q_max < 100.0 (default: (25.0, 75.0) = (1st quantile, 3rd quantile))
- Returns:
- 1darray
Scaled dataset.
maxabs_scale
- kim_convergence.maxabs_scale(x: ndarray, *, with_centering: bool = True, with_scaling: bool = True) ndarray
Standardize a dataset to the [-1, 1] range.
Standardize a dataset to the [-1, 1] range such that the maximal absolute value in the data set will be 1.0.
- Parameters:
x (array_like, 1d) – The data to center and scale.
- Returns:
- 1darray
Scaled dataset.
Note
with_centering, and with_scaling are accepted for API compatibility but are not used by this method.
validate_split
- kim_convergence.validate_split(*, n_samples: int, train_size: int | float | None, test_size: int | float | None, default_test_size: int | float | None = None) tuple[int, int]
Validate test/train sizes.
Helper function to validate the test/train sizes to be meaningful with regard to the size of the data (n_samples)
- Parameters:
n_samples (int) – total number of sample points
train_size (int, float, or None) – train size
test_size (int, float, or None) – test size
default_test_size (int, float, or None, optional) – default test size. (default: None)
- Returns:
- tuple[int, int]
n_train: number of train points n_test: number of test points
- Raises:
CRError – If any size is invalid or inconsistent.
train_test_split
- kim_convergence.train_test_split(time_series_data: ndarray | list[float], *, train_size: int | float | None = None, test_size: int | float | None = None, seed: int | RandomState | None = None, default_test_size: int | float | None = 0.1) tuple[ndarray, ndarray]
Split time_series_data into random train and test indices.
- Parameters:
time_series_data (array_like) – time series data, array-like of shape
(n_samples, n_features), where n_samples is the number of samples and n_features is the number of features.test_size (int, float, or None, optional) – if
float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. Ifint, represents the absolute number of test samples. IfNone, the value is set to the complement of the train size. Iftrain_sizeis also None, it will be set todefault_test_size. (default: 0.1)train_size (int, float, or None, optional) – if
float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. Ifint, represents the absolute number of train samples. IfNone, the value is automatically set to the complement of the test size. (default: None)seed (None, int or np.random.RandomState(), optional) – random number seed. (default: None)
default_test_size (float, optional) – Default test size. (default: 0.1)
- Returns:
- tuple[np.ndarray, np.ndarray]
ind_train: training indices. ind_test: testing indices.
- Raises:
CRError – If any size is invalid or inconsistent, or if
seedhas an illegal type.
Error Classes
CRError
- exception kim_convergence.err.CRError(msg)
Raise an exception.
It raises an exception when receives an error message.
- Parameters:
msg (str) – Human-readable error message.
CRSampleSizeError
- exception kim_convergence.err.CRSampleSizeError(msg)
Raise an exception if there is not enough samples.