Upper Confidence Limit (UCL) Module

Overview

Upper Confidence Limit (UCL): The upper boundary (or limit) of a confidence interval of a parameter of interest such as the population mean.

A confidence interval is how much uncertainty there is with any particular statistic [nistdiv898].

The UCL module provides statistical methods for estimating confidence intervals, upper confidence limits, and uncertainty quantification for time series data. These methods are essential for convergence analysis, simulation output analysis, and Markov chain Monte Carlo (MCMC) diagnostics.

Confidence limits for the mean are interval estimates. Interval estimates are often desirable because instead of a single estimate for the mean, a confidence interval generates a lower and upper limit. It indicates how much uncertainty there is in our estimation of the true mean. The narrower the gap, the more precise our estimate is. We use a confidence level to express confidence limits. Choosing the confidence level is somewhat arbitrary, but 90 %, 95 %, and 99 % intervals are standard, and 95 % is the most commonly used.

Note:

One should note that a 95 % confidence interval does not mean a 95 % probability of containing the true mean. The interval computed from a sample either has the true mean, or it does not. The confidence level is simply the proportion of samples of a given size that may be expected to contain the true mean. For a 95 % confidence interval, if many samples are collected and the confidence interval computed, in the long run, about 95 % of these intervals would contain the true mean.

Contents

  1. Available Methods

  2. Base Class

  3. MSER-m Algorithm

  4. MSER-m-y Algorithm

  5. N-SKART Algorithm

  6. Heidelberger-Welch Algorithm

  7. Uncorrelated Samples Method

  8. Usage Examples

  9. Algorithm Selection Guide

  10. Performance Considerations

  11. Theoretical Background

Available Methods

Method

Best For

Key Features

MSER-m

Stationary time series

Batch means, truncation detection

MSER-m-y

Time series with initial transient

Combines MSER-m with randomness testing

N-SKART

Non-normal, correlated data

Handles skewness and autocorrelation

Heidelberger-Welch

Spectral analysis approach

Spectral density at zero frequency

Uncorrelated Samples

Direct statistical estimation

Simple t-distribution based CI

Base Class

All UCL methods inherit from UCLBase, which provides a consistent interface for computing confidence limits.

MSER-m Algorithm

Determine the truncation point using marginal standard error rules.

Determine the truncation point using marginal standard error rules (MSER). The MSER [white1997] and MSER-5 [spratt1998] rules determine the truncation point as the value of \(d\) that best balances the tradeoff between improved accuracy (elimination of bias) and decreased precision (reduction in the sample size) for the input series. They select a truncation point that minimizes the width of the marginal confidence interval about the truncated sample mean. The marginal confidence interval is a measure of the homogeneity of the truncated series. The optimal truncation point \(d(j)^*\) selected by MSER-m can be expressed as:

\[d(j)^* = \underset{n>d(j) \geq 0}{\text{argmin}} \left[ \frac{1}{(n(j)-d(j))^2} \sum_{i=d}^{n}{\left(X_i(j)- \bar{X}_{n,d}(j) \right )^2} \right]\]

MSER-m applies the equation to a series of batch averages instead of the raw series.

param time_series_data:

Time series data.

type time_series_data:

array_like, 1d

param batch_size:

batch size. (default: 5)

type batch_size:

int, optional

param scale:

A method to standardize a dataset. (default: ‘translate_scale’)

type scale:

str, optional

param with_centering:

If True, use time_series_data minus the scale metod centering approach. (default: False)

type with_centering:

bool, optional

param with_scaling:

If True, scale the data to scale metod scaling approach. (default: False)

type with_scaling:

bool, optional

param ignore_end:

if int, it is the last few batch points that should be ignored. if float, should be in (0, 1) and it is the percent of last batch points that should be ignored. if None it would be set to the \(Min(batch_size, number_batches / 4)\). (default: None)

type ignore_end:

int, or float, or None, optional

returns:
tuple[bool, int]

truncated: True if truncation was applied. truncation_point: Index at which to truncate.

Note

MSER-m sometimes erroneously reports a truncation point at the end of the data series. This is because the method can be overly sensitive to observations at the end of the data series that are close in value. Here, we avoid this artifact, by not allowing the algorithm to consider the standard errors calculated from the last few data points.

Note

If the truncation point returned by MSER-m > n/2, it is considered an invalid value and truncated will return as False. It means the method has not been provided with enough data to produce a valid result, and more data is required.

Note

If the truncation obtained by MSER-m is the last index of the batched data, the MSER-m returns the time series data’s last index as the truncation point. This index can be used as a measure that the algorithm did not find any truncation point.

MSER-m-y Algorithm

MSER_m_y algorithm.

MSER_m_y [yousefi2011] computes k batch means of size m to evaluate the MSER-m statistic as described in [spratt1998] and detect the truncation point. If the truncation is detected, the point estimator of the mean is the sample mean of all observations in the truncated data set.

To compute the UCL, the MSER_m_y applies the von Neumann randomness test [vonneumann1941], [vonneumann1941b] to the truncated data to find a new batch size \(m^*\) for which the new batch means are approximately independent. It checks the randomness test on successively larger batch sizes until the test is finally passed and the batch means are finally determined to be approximately independent of each other. It starts by setting the initial batch size m as 1, and calculate the number of batches k’ accordingly.

kim_convergence.ucl.mser_m_y.MSER_m_y.significance_level

Significance level. A probability threshold below which the null hypothesis will be rejected.

Type:

float

Algorithm Flow

  1. Compute k batch means of size m

  2. Apply MSER-m to detect truncation point

  3. Apply randomness test to find new batch size \(m^*\)

  4. Iteratively increase batch size until batch means are independent

Key Features

  • Automatic detection of independent batch size

  • Significance level (probability threshold below which the null hypothesis will be rejected) of 0.2 for randomness test

  • Minimum data requirement: 10 points

N-SKART Algorithm

N-Skart algorithm.

N-Skart [tafazzoli2011] is a nonsequential procedure designed to compute a half the width of the confidence_coefficient% probability interval (CI) (confidence interval, or credible interval) around the time-series mean.

Note

N-Skart is a variant of the method of batch means.

N-Skart makes some modifications to the confidence interval (CI). These modifications account for the skewness (non-normality), and autocorrelation of the batch means which affect the distribution of the underlying Student’s t-statistic.

kim_convergence.ucl.n_skart.N_SKART.k_number_batches

number of nonspaced (adjacent) batches of size batch_size.

Type:

int

kim_convergence.ucl.n_skart.N_SKART.kp_number_batches

number of nonspaced (adjacent) batches.

Type:

int

kim_convergence.ucl.n_skart.N_SKART.batch_size

bacth size.

Type:

int

kim_convergence.ucl.n_skart.N_SKART.number_batches_per_spacer

number of batches per spacer.

Type:

int

kim_convergence.ucl.n_skart.N_SKART.maximum_number_batches_per_spacer

maximum number of batches per spacer.

Type:

int

kim_convergence.ucl.n_skart.N_SKART.significance_level

Significance level. A probability threshold below which the null hypothesis will be rejected.

Type:

float

kim_convergence.ucl.n_skart.N_SKART.randomness_test_counter

counter for applying the randomness test of von Neumann [vonneumann1941] [vonneumann1941b].

Type:

int

Algorithm Highlights

  • Adjusts for skewness using sample skewness of last 80% of data

  • Uses von Neumann randomness test with spacer batches

  • Includes correlation and skewness adjustments in UCL calculation

Heidelberger-Welch Algorithm

Approximate the upper confidence limit of the mean using spectral methods.

The Heidelberger-Welch method [heidelberger1981], [heidelberger1983] provides an approximately unbiased estimate of the upper confidence limit (half-width) of a confidence_coefficient% confidence interval for the time-series mean.

Mathematical Basis

For a covariance stationary time series, the variance of the sample mean can be expressed in terms of the spectral density at zero frequency:

\[\text{Var}(\bar{X}_N) = \frac{2\pi S(0)}{N} + O\left(\frac{1}{N^2}\right)\]

where \(S(0)\) is the spectral density at zero frequency and \(N\) is the sample size.

The method estimates \(S(0)\) from the modified periodogram using polynomial regression and applies bias correction factors \(C_1\) and \(C_2\) from Table 1 of Heidelberger & Welch (1981) [heidelberger1981].

Algorithm Description

  1. Batching: The time series is divided into \(n = 4k\) batches, where \(k\) = heidel_welch_number_points

  2. Periodogram: Compute the modified periodogram of batch means

  3. Polynomial Fit: Fit polynomials of degrees 1-3 to the log periodogram

  4. Model Selection: Choose the best-fitting polynomial degree based on test error (if test/train split is used) or residual norm

  5. Variance Estimation: Estimate \(S(0)\) from the intercept of the best-fitting polynomial

  6. UCL Calculation: Compute the upper confidence limit as:

    \[UCL = t_{C_2}(\alpha) \cdot \sqrt{\frac{C_1 \cdot \widehat{S(0)}}{n}}\]

    where:

    • \(t_{C_2}(\alpha)\) is the t-distribution critical value with \(C_2\) degrees of freedom at confidence level \(\alpha\)

    • \(C_1, C_2\) are bias correction constants from Heidelberger & Welch

    • \(\widehat{S(0)}\) is the estimated spectral density at zero frequency

    • \(n\) is the number of batches

Interpretation

The resulting confidence interval \([\hat{\mu} - UCL, \hat{\mu} + UCL]\) provides an approximate confidence_coefficient% confidence interval for the true population mean. As \(N \to \infty\), the UCL converges to 0 and the sample mean converges to the population mean.

Note

A 95% confidence interval means that if the experiment were repeated many times, approximately 95% of the computed intervals would contain the true population mean. It does not mean there is a 95% probability that the specific computed interval contains the true mean.

Key Parameters

  • heidel_welch_number_points (k): Controls the frequency resolution and number of points used for polynomial fitting. Larger values provide smoother spectral estimates but require more data.

  • confidence_coefficient: The desired coverage probability of the confidence interval (e.g., 0.95 for 95% confidence).

  • fft: Use FFT for periodogram computation (recommended for N > 30).

Performance Characteristics

  • Time Complexity: \(O(N \log N)\) due to FFT-based periodogram

  • Space Complexity: \(O(N + k)\) for data and periodogram storage

  • Minimum Data: Requires at least \(4k\) data points (default: 200 points with k=50)

Example

from kim_convergence.ucl import HeidelbergerWelch
import numpy as np

# Generate sample data
data = np.random.normal(loc=10, scale=2, size=5000)

# Initialize algorithm
hw = HeidelbergerWelch()

# Compute UCL
ucl = hw.ucl(
    data,
    confidence_coefficient=0.95,
    heidel_welch_number_points=50,
    fft=True
)

print(f"95% Upper Confidence Limit: {ucl:.4f}")
param time_series_data:

time series data.

type time_series_data:

array_like, 1d

param confidence_coefficient:

probability (or confidence interval) and must be between 0.0 and 1.0, and represents the confidence for calculation of relative halfwidths estimation. (default: 0.95)

type confidence_coefficient:

float, optional

param heidel_welch_number_points:

the number of points that are used to obtain the polynomial fit. The parameter heidel_welch_number_points determines the frequency range over which the fit is made. (default: 50)

type heidel_welch_number_points:

int, optional

param fft:

Use FFT convolution for long series. (default: True)

type fft:

bool, optional

param test_size:

if float, should be between 0.0 and 1.0 and represent the proportion of the periodogram dataset to include in the test split. If int, represents the absolute number of test samples. (default: None)

type test_size:

int, float, optional

param train_size:

if float, should be between 0.0 and 1.0 and represent the proportion of the preiodogram dataset to include in the train split. If int, represents the absolute number of train samples. (default: None)

type train_size:

int, float, optional

returns:
float

Upper-confidence limit (approximately unbiased estimate of variance of the sample mean) based on the fitted polynomial degree.

raises CRError:

If inputs are invalid or computation fails

raises CRSampleSizeError:

If insufficient data points

Note

  • If both test_size and train_size are None, no train-test split is performed (uses entire periodogram for fitting).

  • In the ucl method, it is mandatory to compute and set the mean and std property.

  • Constants \(C_1, C_2\) are pre-computed based on heidel_welch_number_points and confidence_coefficient.

Note

equilibration_length_estimate, population_standard_deviation, si minimum_correlation_time, uncorrelated_sample_indices, and sample_method are accepted for API compatibility but are not used by this method.

Key Features

  • Adaptive polynomial fitting (degrees 1-3) to periodogram

  • Uses modified periodogram for spectral estimation

  • Optimal for stationary time series with unknown correlation structure

Uncorrelated Samples Method

Direct method that computes confidence intervals using uncorrelated subsamples of the time series, with optional population standard deviation.

Two Modes of Operation

  1. Population standard deviation known:

    \[UCL = t_{\alpha,d} \left(\frac{\sigma}{\sqrt{n}}\right)\]
  2. Population standard deviation unknown:

    \[UCL = t_{\alpha,d} \left(\frac{s}{\sqrt{n}}\right)\]

where \(\sigma\) is population standard deviation, \(s\) is sample standard deviation, and \(t_{\alpha,d}\) is the t-distribution critical value. This value depends on the confidence_coefficient and the degrees of freedom, which is found by subtracting one from the number of observations.

Sampling Methods

  • uncorrelated: Systematic sampling based on statistical inefficiency

  • random: Random sampling

  • block_averaged: Block-averaged sampling

Usage Examples

Basic Usage Pattern

from kim_convergence.ucl import MSER_m
import numpy as np

# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=10, scale=2, size=5000)

# Initialize algorithm
mser = MSER_m()

# Estimate equilibration length
truncated, truncation_point = mser.estimate_equilibration_length(data)

# Compute upper confidence limit
ucl_value = mser.ucl(data, confidence_coefficient=0.95)

# Get confidence interval
lower, upper = mser.ci(data, confidence_coefficient=0.95)

# Get relative half-width
relative_width = mser.relative_half_width_estimate(data, confidence_coefficient=0.95)

Comparing Different Methods

import numpy as np
from kim_convergence.ucl import (
    mser_m_ucl, mser_m_y_ucl, n_skart_ucl,
    heidelberger_welch_ucl, uncorrelated_samples_ucl
)

# Generate autocorrelated data
np.random.seed(42)
n = 5000
phi = 0.7
data = np.zeros(n)
data[0] = np.random.normal(loc=10, scale=2)
for i in range(1, n):
    data[i] = phi * data[i-1] + np.random.normal(loc=10*(1-phi), scale=2*np.sqrt(1-phi**2))

# Compare UCL values
results = {
    'MSER-m': mser_m_ucl(data, confidence_coefficient=0.95),
    'MSER-m-y': mser_m_y_ucl(data, confidence_coefficient=0.95),
    'N-SKART': n_skart_ucl(data, confidence_coefficient=0.95),
    'Heidelberger-Welch': heidelberger_welch_ucl(data, confidence_coefficient=0.95),
    'Uncorrelated Samples': uncorrelated_samples_ucl(data, confidence_coefficient=0.95)
}

Handling Skewed and Correlated Data

from kim_convergence.ucl import N_SKART
import numpy as np

# Create skewed, correlated data
np.random.seed(42)
n = 3000
base_data = np.random.gamma(shape=2.0, scale=1.0, size=n)

# Add autocorrelation
correlated_data = np.zeros(n)
correlated_data[0] = base_data[0]
phi = 0.6
for i in range(1, n):
    correlated_data[i] = phi * correlated_data[i-1] + (1-phi) * base_data[i]

# Use N-SKART
nskart = N_SKART()
truncated, eq_point = nskart.estimate_equilibration_length(correlated_data)
ucl = nskart.ucl(correlated_data, confidence_coefficient=0.95)

Algorithm Selection Guide

Choosing the Right Method

Consideration

MSER-m

MSER-m-y

N-SKART

Heidelberger- Welch

Uncorrelated Samples

Min Data Points

> batch_size

10

1280

200

5

Autocorrelation

Moderate

High

High

High

Low

Skewness

Not robust

Not robust

Robust

Not robust

Not robust

Initial Transient

Manual removal

Automatic

Automatic

Manual removal

Manual removal

Speed

Fast

Fast

Moderate

Fast

Fast

Quick Decision Tree

  1. Is your data approximately normal and stationary? - Yes -> Use MSER-m or Uncorrelated Samples - No -> Go to 2

  2. Does your data have significant initial transient (burn-in)? - Yes -> Use MSER-m-y or N-SKART - No -> Go to 3

  3. Is your data highly skewed or heavy-tailed? - Yes -> Use N-SKART - No -> Use Heidelberger-Welch

Performance Considerations

Memory Usage

Method

Memory Usage

Notes

MSER-m

O(N)

Stores original data and batch means

MSER-m-y

O(N)

Similar to MSER-m with additional test arrays

N-SKART

O(N + k)

Stores data and multiple batch statistics

Heidelberger-Welch

O(N + k)

Stores data and periodogram (k=50 default)

Uncorrelated Samples

O(N)

Stores data and uncorrelated sample indices

Computational Complexity

Method

Complexity

Notes

MSER-m

O(N)

Single pass through data

MSER-m-y

O(N log N)

Multiple iterations with increasing batch size

N-SKART

O(N log N)

Iterative batch size adjustment and testing

Heidelberger-Welch

O(N log N)

FFT-based periodogram computation

Uncorrelated Samples

O(N)

Single pass with optional FFT for SI

Best Practices

Data Preparation

  1. Ensure data is 1-dimensional numpy array or list

  2. Remove obvious outliers using kim_convergence.outlier module

  3. Check stationarity visually or with statistical tests

  4. Consider differencing for non-stationary time series

Parameter Tuning

  1. MSER-m/ MSER-m-y: Set batch_size ≈ 5-10 times the expected correlation time

  2. N-SKART: Let algorithm determine parameters automatically

  3. Heidelberger-Welch: Increase heidel_welch_number_points for smoother spectra

  4. Uncorrelated Samples: Provide population_standard_deviation if known

Validation and Diagnostics

  1. Always check the truncated flag from estimate_equilibration_length()

  2. Compare results from 2-3 different methods for consistency

  3. Use relative_half_width_estimate() to assess precision

Error Handling

All functions raise appropriate exceptions:

  • CRError: For general errors and invalid inputs

  • CRSampleSizeError: For insufficient sample sizes

  • Value errors for out-of-range parameters (e.g., confidence coefficient ∉ (0,1))

Theoretical Background

Batch Means Methods (MSER-m, MSER-m-y, N-SKART)

Batch means methods operate on the principle that for a stationary time series, batch means become approximately independent and normally distributed as batch size increases. The variance of the overall mean can be estimated from the variance of batch means:

\[\text{Var}(\bar{X}) \approx \frac{s_b^2}{k}\]

where \(s_b^2\) is the sample variance of batch means and \(k\) is the number of batches.

Spectral Methods (Heidelberger-Welch)

Spectral methods estimate the variance of the mean via the spectral density function. For a covariance stationary process:

\[\text{Var}(\bar{X}_N) = \frac{1}{N} \sum_{k=-(N-1)}^{N-1} \left(1 - \frac{|k|}{N}\right) \gamma(k)\]

where \(\gamma(k)\) is the autocovariance at lag k.

Direct Methods (Uncorrelated Samples)

Direct methods use the classical formula for the variance of the mean, adjusted for effective sample size when data is correlated:

\[\text{Var}(\bar{X}) = \frac{s^2}{N_{\text{eff}}} = \frac{s^2}{N / g}\]

where \(g\) is the statistical inefficiency and \(N_{\text{eff}}\) is the effective sample size.