Other Utilities
Support tools for data preprocessing, error handling, and quality control.
Batch Means
Compute batch means from time series data. The batch function:
Divides data into non-overlapping batches of size
batch_sizeApplies a reduction function (default:
np.mean) to each batchSupports optional scaling (centering, standardization) after batching
Truncates remainder data points that don’t fit into complete batches
Returns a view (not a copy) when no scaling is requested
Scaling Methods
Five scaling methods are provided:
minmax_scale– Scale to a specified feature range (default: [0, 1])translate_scale– Translate so first element is zero, scale by meanstandard_scale– Remove mean and scale to unit variancerobust_scale– Center to median, scale by interquartile rangemaxabs_scale– Scale to [-1, 1] range by maximum absolute value
Each method is available as a function and a class with scale()/inverse() methods.
Error Handling
Custom exception classes and validation utilities:
CRError– Base exception with caller identificationCRSampleSizeError– Raised for insufficient data samplescr_warning()– Print warning messages with caller contextcr_check()– Validate variable types and boundsDecorators:
_check_ndim,_check_isfinitefor input validation
Outlier Detection
Seven methods to identify outliers:
iqr/boxplot– Points beyond 1.5 × IQR from quartilesextreme_iqr/extreme_boxplot– Points beyond 3 × IQRz_score/standard_score– \(|Z|\) > 3 from mean and stdmodified_z_score– Robust version using median and MAD (\(|Z|\) > 3.5)
Returns a 1-D NumPy array of indices or None if no outliers are found.
Data Splitting
train_test_split randomly partitions indices for training and testing:
Supports absolute counts or fractions for train/test sizes
Validates that splits are feasible given data size
Accepts an optional
seedfor reproducible splitsUses NumPy’s random number generator internally
Returns two NumPy index arrays:
(train_idx, test_idx)
Quick Example
import numpy as np
from kim_convergence import batch, standard_scale, outlier_test, train_test_split
data = np.random.randn(1000)
# Batch the data
batched = batch(data, batch_size=10)
# Scale to zero mean, unit variance
scaled = standard_scale(batched)
# Check for outliers
outliers = outlier_test(scaled, outlier_method='iqr')
# Split for validation
train_idx, test_idx = train_test_split(data, test_size=0.2, seed=42)
Usage Hints
Batch means: Use for variance estimation in correlated data
Scaling: Apply
robust_scalewhen outliers are presentOutlier detection:
modified_z_scoreworks best for small datasetsData splitting: Set
seedfor reproducible cross-validation