Upper Confidence Limit (UCL) Module

Overview

Upper Confidence Limit (UCL): The upper boundary (or limit) of a confidence interval of a parameter of interest such as the population mean.

A confidence interval is how much uncertainty there is with any particular statistic [nistdiv898].

The UCL module provides statistical methods for estimating confidence intervals, upper confidence limits, and uncertainty quantification for time series data. These methods are essential for convergence analysis, simulation output analysis, and Markov chain Monte Carlo (MCMC) diagnostics.

Confidence limits for the mean are interval estimates. Interval estimates are often desirable because instead of a single estimate for the mean, a confidence interval generates a lower and upper limit. It indicates how much uncertainty there is in our estimation of the true mean. The narrower the gap, the more precise our estimate is. We use a confidence level to express confidence limits. Choosing the confidence level is somewhat arbitrary, but 90 %, 95 %, and 99 % intervals are standard, and 95 % is the most commonly used.

Note:: One should note that a 95 % confidence interval does not mean a 95 % probability of containing the true mean. The interval computed from a sample either has the true mean, or it does not. The confidence level is simply the proportion of samples of a given size that may be expected to contain the true mean. For a 95 % confidence interval, if many samples are collected and the confidence interval computed, in the long run, about 95 % of these intervals would contain the true mean.

Available Methods

Method	Best For	Key Features
MSER-m	Stationary time series	Batch means, truncation detection
MSER-m-y	Time series with initial transient	Combines MSER-m with randomness testing
N-SKART	Non-normal, correlated data	Handles skewness and autocorrelation
Heidelberger-Welch	Spectral analysis approach	Spectral density at zero frequency
Uncorrelated Samples	Direct statistical estimation	Simple t-distribution based CI

Base Class

All UCL methods inherit from UCLBase, which provides a consistent interface for computing confidence limits.

MSER-m Algorithm

Determine the truncation point using marginal standard error rules.

Determine the truncation point using marginal standard error rules (MSER). The MSER [white1997] and MSER-5 [spratt1998] rules determine the truncation point as the value of \(d\) that best balances the tradeoff between improved accuracy (elimination of bias) and decreased precision (reduction in the sample size) for the input series. They select a truncation point that minimizes the width of the marginal confidence interval about the truncated sample mean. The marginal confidence interval is a measure of the homogeneity of the truncated series. The optimal truncation point \(d(j)^*\) selected by MSER-m can be expressed as:

\[d(j)^* = \underset{n>d(j) \geq 0}{\text{argmin}} \left[ \frac{1}{(n(j)-d(j))^2} \sum_{i=d}^{n}{\left(X_i(j)- \bar{X}_{n,d}(j) \right )^2} \right]\]

MSER-m applies the equation to a series of batch averages instead of the raw series.

param time_series_data:

Time series data.

type time_series_data:

array_like, 1d

param batch_size:

batch size. (default: 5)

type batch_size:

int, optional

param scale:

A method to standardize a dataset. (default: ‘translate_scale’)

type scale:

str, optional

param with_centering:

If True, use time_series_data minus the scale metod centering approach. (default: False)

type with_centering:

bool, optional

param with_scaling:

If True, scale the data to scale metod scaling approach. (default: False)

type with_scaling:

bool, optional

param ignore_end:

if int, it is the last few batch points that should be ignored. if float, should be in (0, 1) and it is the percent of last batch points that should be ignored. if None it would be set to the \(Min(batch_size, number_batches / 4)\). (default: None)

type ignore_end:

int, or float, or None, optional

returns:

tuple[bool, int]: truncated: True if truncation was applied. truncation_point: Index at which to truncate.

Note

MSER-m sometimes erroneously reports a truncation point at the end of the data series. This is because the method can be overly sensitive to observations at the end of the data series that are close in value. Here, we avoid this artifact, by not allowing the algorithm to consider the standard errors calculated from the last few data points.

Note

If the truncation point returned by MSER-m > n/2, it is considered an invalid value and truncated will return as False. It means the method has not been provided with enough data to produce a valid result, and more data is required.

Note

If the truncation obtained by MSER-m is the last index of the batched data, the MSER-m returns the time series data’s last index as the truncation point. This index can be used as a measure that the algorithm did not find any truncation point.

MSER-m-y Algorithm

MSER_m_y algorithm.

MSER_m_y [yousefi2011] computes k batch means of size m to evaluate the MSER-m statistic as described in [spratt1998] and detect the truncation point. If the truncation is detected, the point estimator of the mean is the sample mean of all observations in the truncated data set.

To compute the UCL, the MSER_m_y applies the von Neumann randomness test [vonneumann1941], [vonneumann1941b] to the truncated data to find a new batch size \(m^*\) for which the new batch means are approximately independent. It checks the randomness test on successively larger batch sizes until the test is finally passed and the batch means are finally determined to be approximately independent of each other. It starts by setting the initial batch size m as 1, and calculate the number of batches k’ accordingly.

kim_convergence.ucl.mser_m_y.MSER_m_y.significance_level

Significance level. A probability threshold below which the null hypothesis will be rejected.

Type:: float

Algorithm Flow

Compute k batch means of size m
Apply MSER-m to detect truncation point
Apply randomness test to find new batch size \(m^*\)
Iteratively increase batch size until batch means are independent

Key Features

Automatic detection of independent batch size
Significance level (probability threshold below which the null hypothesis will be rejected) of 0.2 for randomness test
Minimum data requirement: 10 points

N-SKART Algorithm

N-Skart algorithm.

N-Skart [tafazzoli2011] is a nonsequential procedure designed to compute a half the width of the confidence_coefficient% probability interval (CI) (confidence interval, or credible interval) around the time-series mean.

Note

N-Skart is a variant of the method of batch means.

N-Skart makes some modifications to the confidence interval (CI). These modifications account for the skewness (non-normality), and autocorrelation of the batch means which affect the distribution of the underlying Student’s t-statistic.

kim_convergence.ucl.n_skart.N_SKART.k_number_batches

number of nonspaced (adjacent) batches of size batch_size.

Type:: int

kim_convergence.ucl.n_skart.N_SKART.kp_number_batches

number of nonspaced (adjacent) batches.

Type:: int

kim_convergence.ucl.n_skart.N_SKART.batch_size

bacth size.

Type:: int

kim_convergence.ucl.n_skart.N_SKART.number_batches_per_spacer

number of batches per spacer.

Type:: int

kim_convergence.ucl.n_skart.N_SKART.maximum_number_batches_per_spacer

maximum number of batches per spacer.

Type:: int

kim_convergence.ucl.n_skart.N_SKART.significance_level

Significance level. A probability threshold below which the null hypothesis will be rejected.

Type:: float

kim_convergence.ucl.n_skart.N_SKART.randomness_test_counter

counter for applying the randomness test of von Neumann [vonneumann1941] [vonneumann1941b].

Type:: int

Algorithm Highlights

Adjusts for skewness using sample skewness of last 80% of data
Uses von Neumann randomness test with spacer batches
Includes correlation and skewness adjustments in UCL calculation

Heidelberger-Welch Algorithm

Approximate the upper confidence limit of the mean using spectral methods.

The Heidelberger-Welch method [heidelberger1981], [heidelberger1983] provides an approximately unbiased estimate of the upper confidence limit (half-width) of a confidence_coefficient% confidence interval for the time-series mean.

Mathematical Basis

For a covariance stationary time series, the variance of the sample mean can be expressed in terms of the spectral density at zero frequency:

\[\text{Var}(\bar{X}_N) = \frac{2\pi S(0)}{N} + O\left(\frac{1}{N^2}\right)\]

where \(S(0)\) is the spectral density at zero frequency and \(N\) is the sample size.

The method estimates \(S(0)\) from the modified periodogram using polynomial regression and applies bias correction factors \(C_1\) and \(C_2\) from Table 1 of Heidelberger & Welch (1981) [heidelberger1981].

Algorithm Description

Batching: The time series is divided into \(n = 4k\) batches, where \(k\) = heidel_welch_number_points
Periodogram: Compute the modified periodogram of batch means
Polynomial Fit: Fit polynomials of degrees 1-3 to the log periodogram
Model Selection: Choose the best-fitting polynomial degree based on test error (if test/train split is used) or residual norm
Variance Estimation: Estimate \(S(0)\) from the intercept of the best-fitting polynomial
UCL Calculation: Compute the upper confidence limit as:

\[UCL = t_{C_2}(\alpha) \cdot \sqrt{\frac{C_1 \cdot \widehat{S(0)}}{n}}\]

where:
- \(t_{C_2}(\alpha)\) is the t-distribution critical value with \(C_2\) degrees of freedom at confidence level \(\alpha\)
- \(C_1, C_2\) are bias correction constants from Heidelberger & Welch
- \(\widehat{S(0)}\) is the estimated spectral density at zero frequency
- \(n\) is the number of batches

Interpretation

The resulting confidence interval \([\hat{\mu} - UCL, \hat{\mu} + UCL]\) provides an approximate confidence_coefficient% confidence interval for the true population mean. As \(N \to \infty\), the UCL converges to 0 and the sample mean converges to the population mean.

Note

A 95% confidence interval means that if the experiment were repeated many times, approximately 95% of the computed intervals would contain the true population mean. It does not mean there is a 95% probability that the specific computed interval contains the true mean.

Key Parameters

heidel_welch_number_points (k): Controls the frequency resolution and number of points used for polynomial fitting. Larger values provide smoother spectral estimates but require more data.
confidence_coefficient: The desired coverage probability of the confidence interval (e.g., 0.95 for 95% confidence).
fft: Use FFT for periodogram computation (recommended for N > 30).

Performance Characteristics

Time Complexity: \(O(N \log N)\) due to FFT-based periodogram
Space Complexity: \(O(N + k)\) for data and periodogram storage
Minimum Data: Requires at least \(4k\) data points (default: 200 points with k=50)

Example

from kim_convergence.ucl import HeidelbergerWelch
import numpy as np

# Generate sample data
data = np.random.normal(loc=10, scale=2, size=5000)

# Initialize algorithm
hw = HeidelbergerWelch()

# Compute UCL
ucl = hw.ucl(
    data,
    confidence_coefficient=0.95,
    heidel_welch_number_points=50,
    fft=True
)

print(f"95% Upper Confidence Limit: {ucl:.4f}")

param time_series_data:

time series data.

type time_series_data:

array_like, 1d

param confidence_coefficient:

probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)

type confidence_coefficient:

float, optional

param heidel_welch_number_points:

the number of points that are used to obtain the polynomial fit. The parameter heidel_welch_number_points determines the frequency range over which the fit is made. (default: 50)

type heidel_welch_number_points:

int, optional

param fft:

Use FFT convolution for long series. (default: True)

type fft:

bool, optional

param test_size:

if float, should be between 0.0 and 1.0 and represent the proportion of the periodogram dataset to include in the test split. If int, represents the absolute number of test samples. (default: None)

type test_size:

int, float, optional

param train_size:

if float, should be between 0.0 and 1.0 and represent the proportion of the preiodogram dataset to include in the train split. If int, represents the absolute number of train samples. (default: None)

type train_size:

int, float, optional

returns:

float: Upper-confidence limit (approximately unbiased estimate of variance of the sample mean) based on the fitted polynomial degree.

raises CRError:

If inputs are invalid or computation fails

raises CRSampleSizeError:

If insufficient data points

Note

If both test_size and train_size are None, no train-test split is performed (uses entire periodogram for fitting).
In the ucl method, it is mandatory to compute and set the mean and std property.
Constants \(C_1, C_2\) are pre-computed based on heidel_welch_number_points and confidence_coefficient.

Note

equilibration_length_estimate, population_standard_deviation, si minimum_correlation_time, uncorrelated_sample_indices, and sample_method are accepted for API compatibility but are not used by this method.

Key Features

Adaptive polynomial fitting (degrees 1-3) to periodogram
Uses modified periodogram for spectral estimation
Optimal for stationary time series with unknown correlation structure

Uncorrelated Samples Method

Direct method that computes confidence intervals using uncorrelated subsamples of the time series, with optional population standard deviation.

Two Modes of Operation

Population standard deviation known:

\[UCL = t_{\alpha,d} \left(\frac{\sigma}{\sqrt{n}}\right)\]
Population standard deviation unknown:

\[UCL = t_{\alpha,d} \left(\frac{s}{\sqrt{n}}\right)\]

where \(\sigma\) is population standard deviation, \(s\) is sample standard deviation, and \(t_{\alpha,d}\) is the t-distribution critical value. This value depends on the confidence_coefficient and the degrees of freedom, which is found by subtracting one from the number of observations.

Sampling Methods

uncorrelated: Systematic sampling based on statistical inefficiency
random: Random sampling
block_averaged: Block-averaged sampling

Usage Examples

Basic Usage Pattern

from kim_convergence.ucl import MSER_m
import numpy as np

# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=10, scale=2, size=5000)

# Initialize algorithm
mser = MSER_m()

# Estimate equilibration length
truncated, truncation_point = mser.estimate_equilibration_length(data)

# Compute upper confidence limit
ucl_value = mser.ucl(data, confidence_coefficient=0.95)

# Get confidence interval
lower, upper = mser.ci(data, confidence_coefficient=0.95)

# Get relative half-width
relative_width = mser.relative_half_width_estimate(data, confidence_coefficient=0.95)

Comparing Different Methods

import numpy as np
from kim_convergence.ucl import (
    mser_m_ucl, mser_m_y_ucl, n_skart_ucl,
    heidelberger_welch_ucl, uncorrelated_samples_ucl
)

# Generate autocorrelated data
np.random.seed(42)
n = 5000
phi = 0.7
data = np.zeros(n)
data[0] = np.random.normal(loc=10, scale=2)
for i in range(1, n):
    data[i] = phi * data[i-1] + np.random.normal(loc=10*(1-phi), scale=2*np.sqrt(1-phi**2))

# Compare UCL values
results = {
    'MSER-m': mser_m_ucl(data, confidence_coefficient=0.95),
    'MSER-m-y': mser_m_y_ucl(data, confidence_coefficient=0.95),
    'N-SKART': n_skart_ucl(data, confidence_coefficient=0.95),
    'Heidelberger-Welch': heidelberger_welch_ucl(data, confidence_coefficient=0.95),
    'Uncorrelated Samples': uncorrelated_samples_ucl(data, confidence_coefficient=0.95)
}

Handling Skewed and Correlated Data

from kim_convergence.ucl import N_SKART
import numpy as np

# Create skewed, correlated data
np.random.seed(42)
n = 3000
base_data = np.random.gamma(shape=2.0, scale=1.0, size=n)

# Add autocorrelation
correlated_data = np.zeros(n)
correlated_data[0] = base_data[0]
phi = 0.6
for i in range(1, n):
    correlated_data[i] = phi * correlated_data[i-1] + (1-phi) * base_data[i]

# Use N-SKART
nskart = N_SKART()
truncated, eq_point = nskart.estimate_equilibration_length(correlated_data)
ucl = nskart.ucl(correlated_data, confidence_coefficient=0.95)

Algorithm Selection Guide

Choosing the Right Method

Consideration	MSER-m	MSER-m-y	N-SKART	Heidelberger- Welch	Uncorrelated Samples
Min Data Points	> batch_size	10	1280	200	5
Autocorrelation	Moderate	High	High	High	Low
Skewness	Not robust	Not robust	Robust	Not robust	Not robust
Initial Transient	Manual removal	Automatic	Automatic	Manual removal	Manual removal
Speed	Fast	Fast	Moderate	Fast	Fast

Quick Decision Tree

Is your data approximately normal and stationary? - Yes -> Use MSER-m or Uncorrelated Samples - No -> Go to 2
Does your data have significant initial transient (burn-in)? - Yes -> Use MSER-m-y or N-SKART - No -> Go to 3
Is your data highly skewed or heavy-tailed? - Yes -> Use N-SKART - No -> Use Heidelberger-Welch

Performance Considerations

Memory Usage

Method	Memory Usage	Notes
MSER-m	O(N)	Stores original data and batch means
MSER-m-y	O(N)	Similar to MSER-m with additional test arrays
N-SKART	O(N + k)	Stores data and multiple batch statistics
Heidelberger-Welch	O(N + k)	Stores data and periodogram (k=50 default)
Uncorrelated Samples	O(N)	Stores data and uncorrelated sample indices

Computational Complexity

Method	Complexity	Notes
MSER-m	O(N)	Single pass through data
MSER-m-y	O(N log N)	Multiple iterations with increasing batch size
N-SKART	O(N log N)	Iterative batch size adjustment and testing
Heidelberger-Welch	O(N log N)	FFT-based periodogram computation
Uncorrelated Samples	O(N)	Single pass with optional FFT for SI

Best Practices

Data Preparation

Ensure data is 1-dimensional numpy array or list
Remove obvious outliers using kim_convergence.outlier module
Check stationarity visually or with statistical tests
Consider differencing for non-stationary time series

Parameter Tuning

MSER-m/ MSER-m-y: Set batch_size ≈ 5-10 times the expected correlation time
N-SKART: Let algorithm determine parameters automatically
Heidelberger-Welch: Increase heidel_welch_number_points for smoother spectra
Uncorrelated Samples: Provide population_standard_deviation if known

Validation and Diagnostics

Always check the truncated flag from estimate_equilibration_length()
Compare results from 2-3 different methods for consistency
Use relative_half_width_estimate() to assess precision

Error Handling

All functions raise appropriate exceptions:

CRError: For general errors and invalid inputs
CRSampleSizeError: For insufficient sample sizes
Value errors for out-of-range parameters (e.g., confidence coefficient ∉ (0,1))

Theoretical Background

Batch Means Methods (MSER-m, MSER-m-y, N-SKART)

Batch means methods operate on the principle that for a stationary time series, batch means become approximately independent and normally distributed as batch size increases. The variance of the overall mean can be estimated from the variance of batch means:

\[\text{Var}(\bar{X}) \approx \frac{s_b^2}{k}\]

where \(s_b^2\) is the sample variance of batch means and \(k\) is the number of batches.

Spectral Methods (Heidelberger-Welch)

Spectral methods estimate the variance of the mean via the spectral density function. For a covariance stationary process:

\[\text{Var}(\bar{X}_N) = \frac{1}{N} \sum_{k=-(N-1)}^{N-1} \left(1 - \frac{|k|}{N}\right) \gamma(k)\]

where \(\gamma(k)\) is the autocovariance at lag k.

Direct Methods (Uncorrelated Samples)

Direct methods use the classical formula for the variance of the mean, adjusted for effective sample size when data is correlated:

\[\text{Var}(\bar{X}) = \frac{s^2}{N_{\text{eff}}} = \frac{s^2}{N / g}\]

where \(g\) is the statistical inefficiency and \(N_{\text{eff}}\) is the effective sample size.