Upper Confidence Limit (UCL) Module
Overview
Upper Confidence Limit (UCL): The upper boundary (or limit) of a confidence interval of a parameter of interest such as the population mean.
A confidence interval is how much uncertainty there is with any particular statistic [nistdiv898].
The UCL module provides statistical methods for estimating confidence intervals, upper confidence limits, and uncertainty quantification for time series data. These methods are essential for convergence analysis, simulation output analysis, and Markov chain Monte Carlo (MCMC) diagnostics.
Confidence limits for the mean are interval estimates. Interval estimates are often desirable because instead of a single estimate for the mean, a confidence interval generates a lower and upper limit. It indicates how much uncertainty there is in our estimation of the true mean. The narrower the gap, the more precise our estimate is. We use a confidence level to express confidence limits. Choosing the confidence level is somewhat arbitrary, but 90 %, 95 %, and 99 % intervals are standard, and 95 % is the most commonly used.
- Note:
One should note that a 95 % confidence interval does not mean a 95 % probability of containing the true mean. The interval computed from a sample either has the true mean, or it does not. The confidence level is simply the proportion of samples of a given size that may be expected to contain the true mean. For a 95 % confidence interval, if many samples are collected and the confidence interval computed, in the long run, about 95 % of these intervals would contain the true mean.
Contents
Available Methods
Method |
Best For |
Key Features |
|---|---|---|
MSER-m |
Stationary time series |
Batch means, truncation detection |
MSER-m-y |
Time series with initial transient |
Combines MSER-m with randomness testing |
N-SKART |
Non-normal, correlated data |
Handles skewness and autocorrelation |
Heidelberger-Welch |
Spectral analysis approach |
Spectral density at zero frequency |
Uncorrelated Samples |
Direct statistical estimation |
Simple t-distribution based CI |
Base Class
All UCL methods inherit from UCLBase, which provides a consistent interface
for computing confidence limits.
MSER-m Algorithm
Determine the truncation point using marginal standard error rules.
Determine the truncation point using marginal standard error rules (MSER). The MSER [white1997] and MSER-5 [spratt1998] rules determine the truncation point as the value of \(d\) that best balances the tradeoff between improved accuracy (elimination of bias) and decreased precision (reduction in the sample size) for the input series. They select a truncation point that minimizes the width of the marginal confidence interval about the truncated sample mean. The marginal confidence interval is a measure of the homogeneity of the truncated series. The optimal truncation point \(d(j)^*\) selected by MSER-m can be expressed as:
MSER-m applies the equation to a series of batch averages instead of the raw series.
- param time_series_data:
Time series data.
- type time_series_data:
array_like, 1d
- param batch_size:
batch size. (default: 5)
- type batch_size:
int, optional
- param scale:
A method to standardize a dataset. (default: ‘translate_scale’)
- type scale:
str, optional
- param with_centering:
If True, use time_series_data minus the scale metod centering approach. (default: False)
- type with_centering:
bool, optional
- param with_scaling:
If True, scale the data to scale metod scaling approach. (default: False)
- type with_scaling:
bool, optional
- param ignore_end:
if int, it is the last few batch points that should be ignored. if float, should be in (0, 1) and it is the percent of last batch points that should be ignored. if None it would be set to the \(Min(batch_size, number_batches / 4)\). (default: None)
- type ignore_end:
int, or float, or None, optional
- returns:
- tuple[bool, int]
truncated: True if truncation was applied. truncation_point: Index at which to truncate.
Note
MSER-m sometimes erroneously reports a truncation point at the end of the data series. This is because the method can be overly sensitive to observations at the end of the data series that are close in value. Here, we avoid this artifact, by not allowing the algorithm to consider the standard errors calculated from the last few data points.
Note
If the truncation point returned by MSER-m > n/2, it is considered an invalid value and truncated will return as False. It means the method has not been provided with enough data to produce a valid result, and more data is required.
Note
If the truncation obtained by MSER-m is the last index of the batched data, the MSER-m returns the time series data’s last index as the truncation point. This index can be used as a measure that the algorithm did not find any truncation point.
MSER-m-y Algorithm
MSER_m_y algorithm.
MSER_m_y [yousefi2011] computes k batch means of size m to evaluate the MSER-m statistic as described in [spratt1998] and detect the truncation point. If the truncation is detected, the point estimator of the mean is the sample mean of all observations in the truncated data set.
To compute the UCL, the MSER_m_y applies the von Neumann randomness test [vonneumann1941], [vonneumann1941b] to the truncated data to find a new batch size \(m^*\) for which the new batch means are approximately independent. It checks the randomness test on successively larger batch sizes until the test is finally passed and the batch means are finally determined to be approximately independent of each other. It starts by setting the initial batch size m as 1, and calculate the number of batches k’ accordingly.
- kim_convergence.ucl.mser_m_y.MSER_m_y.significance_level
Significance level. A probability threshold below which the null hypothesis will be rejected.
- Type:
float
Algorithm Flow
Compute k batch means of size m
Apply MSER-m to detect truncation point
Apply randomness test to find new batch size \(m^*\)
Iteratively increase batch size until batch means are independent
Key Features
Automatic detection of independent batch size
Significance level (probability threshold below which the null hypothesis will be rejected) of 0.2 for randomness test
Minimum data requirement: 10 points
N-SKART Algorithm
N-Skart algorithm.
N-Skart [tafazzoli2011] is a nonsequential procedure designed to compute a half the width of the confidence_coefficient% probability interval (CI) (confidence interval, or credible interval) around the time-series mean.
Note
N-Skart is a variant of the method of batch means.
N-Skart makes some modifications to the confidence interval (CI). These modifications account for the skewness (non-normality), and autocorrelation of the batch means which affect the distribution of the underlying Student’s t-statistic.
- kim_convergence.ucl.n_skart.N_SKART.k_number_batches
number of nonspaced (adjacent) batches of size
batch_size.- Type:
int
- kim_convergence.ucl.n_skart.N_SKART.kp_number_batches
number of nonspaced (adjacent) batches.
- Type:
int
- kim_convergence.ucl.n_skart.N_SKART.batch_size
bacth size.
- Type:
int
- kim_convergence.ucl.n_skart.N_SKART.number_batches_per_spacer
number of batches per spacer.
- Type:
int
- kim_convergence.ucl.n_skart.N_SKART.maximum_number_batches_per_spacer
maximum number of batches per spacer.
- Type:
int
- kim_convergence.ucl.n_skart.N_SKART.significance_level
Significance level. A probability threshold below which the null hypothesis will be rejected.
- Type:
float
- kim_convergence.ucl.n_skart.N_SKART.randomness_test_counter
counter for applying the randomness test of von Neumann [vonneumann1941] [vonneumann1941b].
- Type:
int
Algorithm Highlights
Adjusts for skewness using sample skewness of last 80% of data
Uses von Neumann randomness test with spacer batches
Includes correlation and skewness adjustments in UCL calculation
Heidelberger-Welch Algorithm
Approximate the upper confidence limit of the mean using spectral methods.
The Heidelberger-Welch method [heidelberger1981], [heidelberger1983] provides an approximately unbiased estimate of the upper confidence limit (half-width) of a confidence_coefficient% confidence interval for the time-series mean.
Mathematical Basis
For a covariance stationary time series, the variance of the sample mean can be expressed in terms of the spectral density at zero frequency:
where \(S(0)\) is the spectral density at zero frequency and \(N\) is the sample size.
The method estimates \(S(0)\) from the modified periodogram using polynomial regression and applies bias correction factors \(C_1\) and \(C_2\) from Table 1 of Heidelberger & Welch (1981) [heidelberger1981].
Algorithm Description
Batching: The time series is divided into \(n = 4k\) batches, where \(k\) =
heidel_welch_number_pointsPeriodogram: Compute the modified periodogram of batch means
Polynomial Fit: Fit polynomials of degrees 1-3 to the log periodogram
Model Selection: Choose the best-fitting polynomial degree based on test error (if test/train split is used) or residual norm
Variance Estimation: Estimate \(S(0)\) from the intercept of the best-fitting polynomial
UCL Calculation: Compute the upper confidence limit as:
\[UCL = t_{C_2}(\alpha) \cdot \sqrt{\frac{C_1 \cdot \widehat{S(0)}}{n}}\]where:
\(t_{C_2}(\alpha)\) is the t-distribution critical value with \(C_2\) degrees of freedom at confidence level \(\alpha\)
\(C_1, C_2\) are bias correction constants from Heidelberger & Welch
\(\widehat{S(0)}\) is the estimated spectral density at zero frequency
\(n\) is the number of batches
Interpretation
The resulting confidence interval \([\hat{\mu} - UCL, \hat{\mu} + UCL]\) provides an approximate confidence_coefficient% confidence interval for the true population mean. As \(N \to \infty\), the UCL converges to 0 and the sample mean converges to the population mean.
Note
A 95% confidence interval means that if the experiment were repeated many times, approximately 95% of the computed intervals would contain the true population mean. It does not mean there is a 95% probability that the specific computed interval contains the true mean.
Key Parameters
heidel_welch_number_points(k): Controls the frequency resolution and number of points used for polynomial fitting. Larger values provide smoother spectral estimates but require more data.confidence_coefficient: The desired coverage probability of the confidence interval (e.g., 0.95 for 95% confidence).fft: Use FFT for periodogram computation (recommended for N > 30).
Performance Characteristics
Time Complexity: \(O(N \log N)\) due to FFT-based periodogram
Space Complexity: \(O(N + k)\) for data and periodogram storage
Minimum Data: Requires at least \(4k\) data points (default: 200 points with k=50)
Example
from kim_convergence.ucl import HeidelbergerWelch
import numpy as np
# Generate sample data
data = np.random.normal(loc=10, scale=2, size=5000)
# Initialize algorithm
hw = HeidelbergerWelch()
# Compute UCL
ucl = hw.ucl(
data,
confidence_coefficient=0.95,
heidel_welch_number_points=50,
fft=True
)
print(f"95% Upper Confidence Limit: {ucl:.4f}")
- param time_series_data:
time series data.
- type time_series_data:
array_like, 1d
- param confidence_coefficient:
probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)
- type confidence_coefficient:
float, optional
- param heidel_welch_number_points:
the number of points that are used to obtain the polynomial fit. The parameter
heidel_welch_number_pointsdetermines the frequency range over which the fit is made. (default: 50)- type heidel_welch_number_points:
int, optional
- param fft:
Use FFT convolution for long series. (default: True)
- type fft:
bool, optional
- param test_size:
if
float, should be between 0.0 and 1.0 and represent the proportion of the periodogram dataset to include in the test split. Ifint, represents the absolute number of test samples. (default: None)- type test_size:
int, float, optional
- param train_size:
if
float, should be between 0.0 and 1.0 and represent the proportion of the preiodogram dataset to include in the train split. Ifint, represents the absolute number of train samples. (default: None)- type train_size:
int, float, optional
- returns:
- float
Upper-confidence limit (approximately unbiased estimate of variance of the sample mean) based on the fitted polynomial degree.
- raises CRError:
If inputs are invalid or computation fails
- raises CRSampleSizeError:
If insufficient data points
Note
If both
test_sizeandtrain_sizeare None, no train-test split is performed (uses entire periodogram for fitting).In the ucl method, it is mandatory to compute and set the
meanandstdproperty.Constants \(C_1, C_2\) are pre-computed based on
heidel_welch_number_pointsandconfidence_coefficient.
Note
equilibration_length_estimate, population_standard_deviation, si minimum_correlation_time, uncorrelated_sample_indices, and sample_method are accepted for API compatibility but are not used by this method.
Key Features
Adaptive polynomial fitting (degrees 1-3) to periodogram
Uses modified periodogram for spectral estimation
Optimal for stationary time series with unknown correlation structure
Usage Examples
Basic Usage Pattern
from kim_convergence.ucl import MSER_m
import numpy as np
# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=10, scale=2, size=5000)
# Initialize algorithm
mser = MSER_m()
# Estimate equilibration length
truncated, truncation_point = mser.estimate_equilibration_length(data)
# Compute upper confidence limit
ucl_value = mser.ucl(data, confidence_coefficient=0.95)
# Get confidence interval
lower, upper = mser.ci(data, confidence_coefficient=0.95)
# Get relative half-width
relative_width = mser.relative_half_width_estimate(data, confidence_coefficient=0.95)
Comparing Different Methods
import numpy as np
from kim_convergence.ucl import (
mser_m_ucl, mser_m_y_ucl, n_skart_ucl,
heidelberger_welch_ucl, uncorrelated_samples_ucl
)
# Generate autocorrelated data
np.random.seed(42)
n = 5000
phi = 0.7
data = np.zeros(n)
data[0] = np.random.normal(loc=10, scale=2)
for i in range(1, n):
data[i] = phi * data[i-1] + np.random.normal(loc=10*(1-phi), scale=2*np.sqrt(1-phi**2))
# Compare UCL values
results = {
'MSER-m': mser_m_ucl(data, confidence_coefficient=0.95),
'MSER-m-y': mser_m_y_ucl(data, confidence_coefficient=0.95),
'N-SKART': n_skart_ucl(data, confidence_coefficient=0.95),
'Heidelberger-Welch': heidelberger_welch_ucl(data, confidence_coefficient=0.95),
'Uncorrelated Samples': uncorrelated_samples_ucl(data, confidence_coefficient=0.95)
}
Algorithm Selection Guide
Choosing the Right Method
Consideration |
MSER-m |
MSER-m-y |
N-SKART |
Heidelberger- Welch |
Uncorrelated Samples |
|---|---|---|---|---|---|
Min Data Points |
> batch_size |
10 |
1280 |
200 |
5 |
Autocorrelation |
Moderate |
High |
High |
High |
Low |
Skewness |
Not robust |
Not robust |
Robust |
Not robust |
Not robust |
Initial Transient |
Manual removal |
Automatic |
Automatic |
Manual removal |
Manual removal |
Speed |
Fast |
Fast |
Moderate |
Fast |
Fast |
Quick Decision Tree
Is your data approximately normal and stationary? - Yes -> Use MSER-m or Uncorrelated Samples - No -> Go to 2
Does your data have significant initial transient (burn-in)? - Yes -> Use MSER-m-y or N-SKART - No -> Go to 3
Is your data highly skewed or heavy-tailed? - Yes -> Use N-SKART - No -> Use Heidelberger-Welch
Performance Considerations
Memory Usage
Method |
Memory Usage |
Notes |
|---|---|---|
MSER-m |
O(N) |
Stores original data and batch means |
MSER-m-y |
O(N) |
Similar to MSER-m with additional test arrays |
N-SKART |
O(N + k) |
Stores data and multiple batch statistics |
Heidelberger-Welch |
O(N + k) |
Stores data and periodogram (k=50 default) |
Uncorrelated Samples |
O(N) |
Stores data and uncorrelated sample indices |
Computational Complexity
Method |
Complexity |
Notes |
|---|---|---|
MSER-m |
O(N) |
Single pass through data |
MSER-m-y |
O(N log N) |
Multiple iterations with increasing batch size |
N-SKART |
O(N log N) |
Iterative batch size adjustment and testing |
Heidelberger-Welch |
O(N log N) |
FFT-based periodogram computation |
Uncorrelated Samples |
O(N) |
Single pass with optional FFT for SI |
Best Practices
Data Preparation
Ensure data is 1-dimensional numpy array or list
Remove obvious outliers using
kim_convergence.outliermoduleCheck stationarity visually or with statistical tests
Consider differencing for non-stationary time series
Parameter Tuning
MSER-m/ MSER-m-y: Set
batch_size≈ 5-10 times the expected correlation timeN-SKART: Let algorithm determine parameters automatically
Heidelberger-Welch: Increase
heidel_welch_number_pointsfor smoother spectraUncorrelated Samples: Provide
population_standard_deviationif known
Validation and Diagnostics
Always check the
truncatedflag fromestimate_equilibration_length()Compare results from 2-3 different methods for consistency
Use
relative_half_width_estimate()to assess precision
Error Handling
All functions raise appropriate exceptions:
CRError: For general errors and invalid inputsCRSampleSizeError: For insufficient sample sizesValue errors for out-of-range parameters (e.g., confidence coefficient ∉ (0,1))
Theoretical Background
Batch Means Methods (MSER-m, MSER-m-y, N-SKART)
Batch means methods operate on the principle that for a stationary time series, batch means become approximately independent and normally distributed as batch size increases. The variance of the overall mean can be estimated from the variance of batch means:
where \(s_b^2\) is the sample variance of batch means and \(k\) is the number of batches.
Spectral Methods (Heidelberger-Welch)
Spectral methods estimate the variance of the mean via the spectral density function. For a covariance stationary process:
where \(\gamma(k)\) is the autocovariance at lag k.