Utilities¶
Utility functions and data structures.
Configuration¶
load_builders_from_config(config_path: Optional[Union[str, Path]] = None, config_dict: Optional[Dict[str, Any]] = None, config_str: Optional[str] = None, path_key: Optional[str] = None, dataset_names: Optional[List[str]] = None) -> Dict[str, xaitimesynth.TimeSeriesBuilder]
¶
Loads and creates TimeSeriesBuilder instances from various configuration sources.
This function can load configurations from a dictionary, a YAML file path,
or a string containing YAML content. Exactly one of config_path,
config_dict, or config_str must be provided.
Args: config_path (Optional[Union[str, Path]]): Path to a YAML configuration file. config_dict (Optional[Dict[str, Any]]): A dictionary containing the configuration. config_str (Optional[str]): A string containing YAML configuration. path_key (Optional[str]): A key (or path using '/' as separator) within the configuration dictionary where the dataset definitions are located. If None, assumes the top-level dictionary contains the dataset definitions. Example: "experiments/datasets". Default is None. dataset_names (Optional[List[str]]): A list of specific dataset names to load. If None, all datasets found at the specified location are loaded. Default is None.
Returns: Dict[str, TimeSeriesBuilder]: A dictionary where keys are the dataset names and values are the configured TimeSeriesBuilder instances.
Raises: ValueError: If not exactly one configuration source is provided, if the configuration source is invalid, the path_key does not lead to a dictionary, or required keys are missing. FileNotFoundError: If config_path is provided and the file does not exist. yaml.YAMLError: If config_str or the file at config_path contains invalid YAML. AttributeError: If a specified component function name does not exist in the xaitimesynth package.
Detailed Configuration Structure:
The configuration (whether from file, string, or dict) must ultimately resolve
to a Python dictionary. This dictionary contains dataset definitions, either at
the top level or nested under the path_key.
Each dataset definition (the value associated with a dataset name key) is a
dictionary specifying the parameters for a `TimeSeriesBuilder` and its components.
Key elements include:
- Builder arguments: `n_timesteps`, `n_samples`, `n_dimensions`, `random_state`, etc.
- `classes` (list, mandatory): A list of dictionaries, each defining a class.
- `id` (mandatory): The class label.
- `weight` (float, optional): Sampling weight for the class.
- `signals` (list, optional): List of signal component dictionaries.
- `function` (str, mandatory): Name of a signal generator function (e.g., "random_walk").
- `params` (dict, optional): Parameters for the generator function.
- `dimensions` (list, optional): Dimensions to apply to.
- `shared_randomness` (bool, optional).
- Location keys (optional): `start_pct`, `end_pct`, `length_pct` (float only),
`random_location`, `shared_location`. Note: `length_pct` for signals only
accepts a scalar float; stochastic forms (tuple/list/range) are not supported.
- `features` (list, optional): List of feature component dictionaries.
- `function` (str, mandatory): Name of a feature generator function (e.g., "peak").
- `params` (dict, optional): Parameters for the generator function.
- Location keys (optional): `start_pct`, `end_pct`, `length_pct`, `random_location`,
`shared_location`. `length_pct` accepts a scalar float, a list of floats
(discrete choices), or ``{range: [min, max]}`` for uniform per-sample sampling.
- `dimensions` (list, optional): Dimensions to apply to.
- `shared_randomness` (bool, optional).
Example YAML Structure (config.yaml):
# Option 1: Top-level dataset definition (path_key=None)
my_dataset_1:
n_timesteps: 150
n_samples: 200
n_dimensions: 2
random_state: 42
classes:
- id: 0 # Class 0 definition
weight: 1.0
signals:
- function: random_walk
params: { step_size: 0.1 }
dimensions: [0, 1] # Apply to both dimensions
- function: gaussian_noise
params: { sigma: 0.05 }
# dimensions omitted -> applies to all
features: [] # No specific features for class 0
- id: 1 # Class 1 definition
weight: 1.5 # Sample class 1 more often
signals:
- { function: random_walk, params: { step_size: 0.1 }, dimensions: [0, 1] }
- { function: gaussian_noise, params: { sigma: 0.05 } }
features:
- function: peak
params: { amplitude: 1.5, width: 3 }
length_pct: 0.1 # Feature length is 10% of total timesteps
random_location: true # Place it randomly
dimensions: [0] # Only in dimension 0
shared_location: false # If dim had >1 element, location would differ
- function: constant
params: { value: -1.0 }
start_pct: 0.7
end_pct: 0.9
dimensions: [1] # Only in dimension 1
# Option 2: Nested dataset definitions (path_key="experiments/datasets")
experiments:
datasets:
dataset_nested:
n_timesteps: 80
n_samples: 50
classes:
- id: 0
signals: [ { function: seasonal, params: { period: 10 } } ]
# ... potentially more classes ...
YAML Anchors and Aliases: YAML's anchor/alias feature can be used to reuse configuration across multiple datasets. This is particularly useful for defining common settings, signals, or features.
Example:
```yaml
# Define common settings with anchor (&)
common: &common_settings
n_timesteps: 100
n_samples: 1000
random_state: 42
normalization: "zscore"
# Define common signal configuration
base_random_walk: &base_signal
function: random_walk
params:
step_size: 0.1
# Use aliases (*) to reference the anchors
dataset_a:
<<: *common_settings # Merges all common settings
n_dimensions: 1
classes:
- id: 0
signals:
- <<: *base_signal # Use the common signal definition
dataset_b:
<<: *common_settings
n_samples: 2000 # Override specific settings
n_dimensions: 2
classes:
- id: 0
signals:
- <<: *base_signal
dimensions: [0, 1] # Add dimensions parameter
```
The `<<:` syntax is a YAML merge key that merges all key-value pairs from the
referenced anchor into the current mapping.
Example Usage: ```python from xaitimesynth.parser import load_builders_from_config
# Load all datasets from top level of a file
builders_file = load_builders_from_config(config_path="config.yaml")
# Load only 'dataset_c' from a nested path in a file
builders_c = load_builders_from_config(
config_path="config.yaml",
path_key="experiments/datasets",
dataset_names=["dataset_c"]
)
# Load from a dictionary
my_config = {
"my_dataset": {"n_timesteps": 10, "classes": [{"id": 0}]}
}
builders_dict = load_builders_from_config(config_dict=my_config)
# Load from a YAML string
yaml_str = "my_data:
n_timesteps: 5" builders_str = load_builders_from_config(config_str=yaml_str) ```
Source code in xaitimesynth/parser.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 | |
Data Structures¶
TimeSeriesComponents
dataclass
¶
Stores the separate components of a generated time series.
This dataclass is designed to hold the individual components that constitute a synthetic time series. By storing these components separately, it facilitates ground truth evaluation of XAI (Explainable AI) methods, allowing for a deeper understanding of how each component contributes to the final time series.
Attributes:
| Name | Type | Description |
|---|---|---|
background |
ndarray
|
Background signal, the base structure component (e.g., constant, random walk). |
features |
Optional[Dict[str, ndarray]]
|
Dictionary mapping feature names to their vector representations. Defaults to None. |
feature_masks |
Optional[Dict[str, ndarray]]
|
Dictionary of boolean masks indicating feature locations. Defaults to None. |
aggregated |
Optional[ndarray]
|
The final aggregated time series after combining components. Defaults to None. |
Source code in xaitimesynth/data_structures.py
__post_init__()
¶
Validate that components have compatible shapes with the background.
Source code in xaitimesynth/data_structures.py
Normalization¶
normalize(data: np.ndarray, method: str = 'zscore', **kwargs) -> np.ndarray
¶
Normalize data using specified method.
Applies a normalization method to the input data based on the specified method. Supports 'zscore' (standardization), 'minmax' (min-max scaling), and 'none' (no normalization).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Input array to normalize. |
required |
method
|
str
|
Normalization method ("zscore", "minmax", or "none"). Defaults to "zscore". |
'zscore'
|
**kwargs
|
Additional parameters for specific normalization methods. |
{}
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Normalized data according to specified method. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an invalid normalization method is specified. |