Skip to content

Builder

The TimeSeriesBuilder class provides a fluent API for constructing synthetic time series datasets.

TimeSeriesBuilder

TimeSeriesBuilder

Builder for synthetic time series datasets with known ground truth for XAI.

This class provides a fluent API for building synthetic time series datasets with known ground truth features for explainable AI (XAI) evaluation.

The builder creates time series by combining multiple components: - Background: The base structure of the time series (e.g., random walk, gaussian noise) - Features: Discriminative patterns for class separation (e.g., peaks, level changes, ...)

Terminology: - "Signals" are background components added with add_signal(), stored in background - Features are components that distinguish between classes, added with add_feature()

Component flexibility: - Component generators are not strictly limited to their registered role - A signal generator could be used as a feature or vice versa - Features can be localized in time or span the entire series - It's up to the user to ensure features actually create meaningful class differences

Key capabilities: - Univariate and multivariate time series generation - Control over feature positions and randomness - Support for shared patterns across dimensions - Training/test splits with consistent class distributions - Built-in visualization and conversion utilities

Advanced usage: - Components can be configured with various parameters - Features can be positioned at fixed or random locations - For multivariate series, components can target specific dimensions - Shared randomness and locations can be controlled across dimensions

When components are not registered, the builder uses default fill values: - Features: NaN where the feature doesn't exist - Background: zeros where no background component exists

Attributes:

Name Type Description
n_timesteps int

Length of each time series.

n_samples int

Total number of samples to generate.

n_dimensions int

Number of dimensions in each time series.

normalization str

Normalization method for the final time series.

normalization_kwargs dict

Additional parameters for normalization.

random_state int

Random seed for reproducibility.

rng RandomState

Random number generator.

feature_fill_value

Value used for non-existent features (default: np.nan).

background_fill_value

Value used for background when none exists (default: 0.0).

class_definitions list

List of class definitions with components.

current_class dict

Current class being configured.

data_format str

Format of the output tensor data. Either 'channels_last' corresponding to shape [batch, time_steps, channels] or 'channels_first' corresponding to shape [batch, channels, time_steps]. Default is 'channels_first'.

Source code in xaitimesynth/builder.py
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
class TimeSeriesBuilder:
    """Builder for synthetic time series datasets with known ground truth for XAI.

    This class provides a fluent API for building synthetic time series datasets with
    known ground truth features for explainable AI (XAI) evaluation.

    The builder creates time series by combining multiple components:
    - Background: The base structure of the time series (e.g., random walk, gaussian noise)
    - Features: Discriminative patterns for class separation (e.g., peaks, level changes, ...)

    Terminology:
    - "Signals" are background components added with add_signal(), stored in background
    - Features are components that distinguish between classes, added with add_feature()

    Component flexibility:
    - Component generators are not strictly limited to their registered role
    - A signal generator could be used as a feature or vice versa
    - Features can be localized in time or span the entire series
    - It's up to the user to ensure features actually create meaningful class differences

    Key capabilities:
    - Univariate and multivariate time series generation
    - Control over feature positions and randomness
    - Support for shared patterns across dimensions
    - Training/test splits with consistent class distributions
    - Built-in visualization and conversion utilities

    Advanced usage:
    - Components can be configured with various parameters
    - Features can be positioned at fixed or random locations
    - For multivariate series, components can target specific dimensions
    - Shared randomness and locations can be controlled across dimensions

    When components are not registered, the builder uses default fill values:
    - Features: NaN where the feature doesn't exist
    - Background: zeros where no background component exists

    Attributes:
        n_timesteps (int): Length of each time series.
        n_samples (int): Total number of samples to generate.
        n_dimensions (int): Number of dimensions in each time series.
        normalization (str): Normalization method for the final time series.
        normalization_kwargs (dict): Additional parameters for normalization.
        random_state (int): Random seed for reproducibility.
        rng (np.random.RandomState): Random number generator.
        feature_fill_value: Value used for non-existent features (default: np.nan).
        background_fill_value: Value used for background when none exists (default: 0.0).
        class_definitions (list): List of class definitions with components.
        current_class (dict): Current class being configured.
        data_format (str): Format of the output tensor data. Either 'channels_last'
            corresponding to shape [batch, time_steps, channels] or 'channels_first'
            corresponding to shape [batch, channels, time_steps]. Default is 'channels_first'.
    """

    def __init__(
        self,
        n_timesteps: int = 100,
        n_samples: int = 1000,
        n_dimensions: int = 1,
        normalization: str = "zscore",
        random_state: Optional[int] = None,
        normalization_kwargs: Optional[Dict[str, Any]] = {},
        feature_fill_value: Any = np.nan,
        background_fill_value: Any = 0.0,
        data_format: str = "channels_first",
    ):
        """Initialize the time series builder.

        Args:
            n_timesteps (int): Length of each time series. Default is 100.
            n_samples (int): Total number of samples to generate. Default is 1000.
            n_dimensions (int): Number of dimensions for multivariate time series. Default is 1 (univariate).
            normalization (str): Normalization method for the final time series.
                Options: "zscore" (standardization), "minmax" (scale to 0-1), or "none". Default is "zscore".
            random_state (int, optional): Seed for random number generation to ensure reproducibility.
            normalization_kwargs (dict, optional): Additional parameters for normalization methods.
                For "minmax": can specify "feature_range" as tuple (min, max).
            feature_fill_value: Value used for non-existent features. Default is np.nan.
                Using NaN makes features only appear where they're defined in visualizations.
            background_fill_value: Value used for background when none exists. Default is 0.0.
                Background typically affects the entire time series, so zeros represent
                "no contribution" rather than "doesn't exist".
            data_format (str): Format of the output tensor data.
                'channels_last': [batch, time_steps, channels] (original XAITimeSynth format)
                'channels_first': [batch, channels, time_steps] (PyTorch/tsai format)
                Default is 'channels_first'.

        Raises:
            ValueError: If n_dimensions is less than 1.
            ValueError: If data_format is not one of ['channels_first', 'channels_last']
        """
        self.n_timesteps = n_timesteps
        self.n_samples = n_samples
        self.n_dimensions = n_dimensions

        # Validate n_dimensions
        if n_dimensions < 1:
            raise ValueError("n_dimensions must be at least 1")

        # Validate data_format
        if data_format not in ["channels_first", "channels_last"]:
            raise ValueError(
                "data_format must be one of ['channels_first', 'channels_last']"
            )
        self.data_format = data_format

        self.normalization = normalization
        self.normalization_kwargs = normalization_kwargs or {}
        self.random_state = random_state
        self.rng = np.random.RandomState(random_state)
        self.feature_fill_value = feature_fill_value
        self.background_fill_value = background_fill_value

        # Initialize class definitions and the current class
        self.class_definitions = []
        self.current_class = None

    def for_class(self, class_label: int, weight: float = 1.0) -> "TimeSeriesBuilder":
        """Set the current class for component assignment.

        Creates a new class definition and makes it the target for subsequent component additions.
        Multiple calls create multiple classes for classification tasks.

        Args:
            class_label (int): Integer label for the class, used as the target value.
            weight (float): Relative weight of this class in the dataset. Controls the
                class distribution in the generated dataset. Default is 1.0.

        Returns:
            TimeSeriesBuilder: Self for method chaining.
        """
        # Create a new class definition
        class_def = {
            "label": class_label,
            "weight": weight,
            "components": {"background": [], "features": []},
        }

        self.class_definitions.append(class_def)
        self.current_class = class_def

        return self

    def _validate_dimensions(self, dimensions: List[int]) -> None:
        """Validate dimension indices against n_dimensions.

        Ensures all provided dimension indices are within valid range for the configured
        number of dimensions in the builder.

        Args:
            dimensions (List[int]): List of dimension indices to validate.

        Raises:
            ValueError: If any dimension index is out of range (0 to n_dimensions-1).
        """
        for d in dimensions:
            if not 0 <= d < self.n_dimensions:
                raise ValueError(
                    f"Dimension {d} is out of range. "
                    f"Valid dimensions are 0 to {self.n_dimensions - 1}."
                )

    def add_signal(
        self,
        component: Dict[str, Any],
        dim: Optional[List[int]] = None,
        shared_randomness: bool = False,
        start_pct: Optional[float] = None,
        end_pct: Optional[float] = None,
        length_pct: Optional[float] = None,
        random_location: bool = False,
        shared_location: bool = True,
    ) -> "TimeSeriesBuilder":
        """Add a signal component to the current class.

        Signals form the background structure of the time series (e.g., random walks,
        gaussian noise, trends). All signals are added to the background component.

        Default behavior: When no location parameters are specified (start_pct, end_pct,
        length_pct all None and random_location=False), the signal spans the entire time
        series length.

        Segment mode: To apply a signal to only part of the time series, either:
        - Specify start_pct and end_pct for a fixed segment, or
        - Set random_location=True with length_pct for a randomly positioned segment.

        Args:
            component (Dict[str, Any]): Component definition dictionary with 'type' and parameters.
            dim (List[int]): List of dimension indices where this signal should be applied.
                If None, the signal will be added to all dimensions. Default is None.
            shared_randomness (bool): If True, the same random pattern will be used across all
                specified dimensions. If False, each dimension gets its own random pattern
                (for stochastic components). Default is False.
            start_pct (float, optional): Start position as percentage of time series length (0-1).
                Required together with end_pct for a fixed segment.
            end_pct (float, optional): End position as percentage of time series length (0-1).
                Required together with start_pct for a fixed segment.
            length_pct (float, optional): Length of signal as percentage of time series length (0-1).
                Required when random_location is True.
            random_location (bool): Whether to place the signal at a random location.
                Requires length_pct. Default is False.
            shared_location (bool): If True and random_location is True, the same random
                location will be used across all dimensions. If False, each dimension gets
                its own random location. Default is True.

        Returns:
            TimeSeriesBuilder: Self for method chaining.

        Raises:
            ValueError: If no class is selected or if location parameters are inconsistent.

        Examples:
            # Full time series (default - no location params)
            builder.add_signal(gaussian_noise(sigma=0.1))

            # Fixed segment from 20% to 50% of the series
            builder.add_signal(constant(value=1.0), start_pct=0.2, end_pct=0.5)

            # Random segment of 30% length
            builder.add_signal(constant(value=1.0), random_location=True, length_pct=0.3)
        """
        if self.current_class is None:
            raise ValueError("No class selected. Call for_class() first.")

        if dim is None:
            dim = list(range(self.n_dimensions))
        self._validate_dimensions(dim)

        # Determine if this is a segment or full-series signal
        has_time_range = (
            start_pct is not None
            or end_pct is not None
            or length_pct is not None
            or random_location
        )

        # Validate location parameters based on mode
        if has_time_range:
            if random_location:
                if length_pct is None:
                    raise ValueError(
                        "length_pct must be provided when random_location is True"
                    )
                if not (0 < length_pct <= 1):
                    raise ValueError("length_pct must be between 0 and 1")
            else:
                # Fixed segment mode - requires both start_pct and end_pct
                if start_pct is None or end_pct is None:
                    raise ValueError(
                        "Both start_pct and end_pct must be provided for a fixed segment"
                    )
                if not (
                    0 <= start_pct < 1 and 0 < end_pct <= 1 and start_pct < end_pct
                ):
                    raise ValueError(
                        "Invalid start_pct or end_pct. Must be between 0 and 1, "
                        "with start_pct < end_pct"
                    )

        # Build the component definition
        component_with_params = component.copy()

        if has_time_range:
            if random_location:
                component_with_params["random_location"] = True
                component_with_params["length_pct"] = length_pct
                component_with_params["shared_location"] = shared_location
            else:
                component_with_params["random_location"] = False
                component_with_params["start_pct"] = start_pct
                component_with_params["end_pct"] = end_pct

        # Add dimensions and randomness settings
        # Use single component when sharing location/randomness or single dimension
        if (
            (has_time_range and shared_location and random_location)
            or shared_randomness
            or len(dim) == 1
        ):
            component_with_params["dimensions"] = dim
            component_with_params["shared_randomness"] = shared_randomness
            component_with_params["shared_location"] = shared_location
            self.current_class["components"]["background"].append(component_with_params)
        else:
            # Create separate component entries for each dimension
            for d in dim:
                component_with_dim = component_with_params.copy()
                component_with_dim["dimensions"] = [d]
                component_with_dim["shared_randomness"] = shared_randomness
                component_with_dim["shared_location"] = shared_location
                self.current_class["components"]["background"].append(
                    component_with_dim
                )

        return self

    def add_feature(
        self,
        component: Dict[str, Any],
        start_pct: Optional[float] = None,
        end_pct: Optional[float] = None,
        length_pct: Optional[Union[float, Tuple[float, float], List[float]]] = None,
        random_location: bool = False,
        dim: Optional[List[int]] = None,
        shared_location: bool = True,
        shared_randomness: bool = False,
    ) -> "TimeSeriesBuilder":
        """Add a feature component to the current class.

        Features are distinctive patterns that can differentiate between classes.
        They can be placed at fixed or random locations within the time series.

        Args:
            component (Dict[str, Any]): Component definition dictionary with 'type' and parameters.
            start_pct (float, optional): Start position as percentage of time series length (0-1).
                Required when random_location is False.
            end_pct (float, optional): End position as percentage of time series length (0-1).
                Required when random_location is False.
            length_pct (float | tuple | list, optional): Length of feature as percentage of time
                series length. Required when random_location is True. Three forms accepted:
                - float: fixed length, e.g. ``0.5``
                - tuple (min, max): sample uniformly per sample in range, e.g. ``(0.25, 0.75)``
                - list of floats: sample from discrete choices per sample, e.g. ``[0.25, 0.5]``
            random_location (bool): Whether to place the feature at a random location.
                Default is False (fixed position).
            dim (List[int]): List of dimension indices where this feature should be applied.
                If None, the feature will be added to all dimensions. Default is None.
            shared_location (bool): If True and random_location is True, the same random
                location will be used across all dimensions. If False, each dimension gets
                its own random location. Default is True.
            shared_randomness (bool): If True, the same random pattern will be used across
                all dimensions. If False, each dimension gets its own random pattern
                (for stochastic components). Default is False.

        Returns:
            TimeSeriesBuilder: Self for method chaining.

        Raises:
            ValueError: If no class is selected or if location parameters are invalid.
        """
        if self.current_class is None:
            raise ValueError("No class selected. Call for_class() first.")

        if dim is None:
            dim = list(range(self.n_dimensions))
        self._validate_dimensions(dim)

        # Create feature definition
        feature_def = component.copy()

        # Add location parameters
        if random_location:
            if length_pct is None:
                raise ValueError(
                    "length_pct must be provided when random_location is True"
                )
            if isinstance(length_pct, tuple):
                if len(length_pct) != 2 or not (0 < length_pct[0] < length_pct[1] <= 1):
                    raise ValueError(
                        "length_pct tuple must be (min, max) with 0 < min < max <= 1"
                    )
            elif isinstance(length_pct, list):
                if not length_pct or not all(0 < v <= 1 for v in length_pct):
                    raise ValueError(
                        "length_pct list must be non-empty with all values in (0, 1]"
                    )
            else:
                if not (0 < length_pct <= 1):
                    raise ValueError("length_pct must be between 0 and 1")

            feature_def["random_location"] = True
            feature_def["length_pct"] = length_pct
        else:
            if start_pct is None or end_pct is None:
                raise ValueError(
                    "start_pct and end_pct must be provided when random_location is False"
                )
            if not (0 <= start_pct < 1 and 0 < end_pct <= 1 and start_pct < end_pct):
                raise ValueError(
                    "Invalid start_pct or end_pct. Must be between 0 and 1, with start_pct < end_pct"
                )

            feature_def["random_location"] = False
            feature_def["start_pct"] = start_pct
            feature_def["end_pct"] = end_pct

        # Add to feature collection, ensuring the shared location logic is properly observed
        if shared_location and random_location or shared_randomness or len(dim) == 1:
            feature_def["dimensions"] = dim
            feature_def["shared_location"] = shared_location
            feature_def["shared_randomness"] = shared_randomness
            self.current_class["components"]["features"].append(feature_def)
        else:
            # Create separate feature entries for each dimension when not sharing
            for d in dim:
                feature_single_dim = feature_def.copy()
                feature_single_dim["dimensions"] = [d]  # Single dimension
                feature_single_dim["shared_location"] = shared_location
                feature_single_dim["shared_randomness"] = shared_randomness
                self.current_class["components"]["features"].append(feature_single_dim)

        return self

    def _generate_component_vector(
        self, component_def: Dict[str, Any], feature_length: Optional[int] = None
    ) -> np.ndarray:
        """Generate a component vector based on its definition.

        Calls the appropriate component generator based on the component type
        and parameters specified in the definition.

        Args:
            component_def (Dict[str, Any]): Component definition dictionary with 'type'
                and parameters for the generator.
            feature_length (Optional[int]): Length of the feature in timesteps.
                Only used for feature components.

        Returns:
            np.ndarray: Generated component vector with specified pattern.
        """
        component_type = component_def["type"]
        component_params = component_def.copy()
        component_params.pop("type")

        # Remove dimension information if present
        component_params.pop("dimensions", None)
        component_params.pop("shared_location", None)
        component_params.pop("shared_randomness", None)

        # If it's a feature, add the feature_length parameter
        if feature_length is not None:
            component_params["length"] = feature_length

        return generate_component(
            component_type, self.n_timesteps, self.rng, **component_params
        )

    def _resolve_length_pct(
        self,
        raw: Union[float, Tuple[float, float], List[float]],
        rng: np.random.RandomState,
    ) -> float:
        """Resolve a length_pct specification to a concrete float for one sample.

        Args:
            raw: Either a fixed float, a (min, max) tuple for uniform sampling, or a list
                of floats for discrete choice sampling.
            rng: Random number generator used for sampling.

        Returns:
            float: Resolved length as a fraction of the series length.
        """
        if isinstance(raw, tuple):
            return rng.uniform(raw[0], raw[1])
        elif isinstance(raw, list):
            return raw[rng.randint(0, len(raw))]
        return raw

    def _generate_feature_vector(
        self,
        feature_def: Dict[str, Any],
        dim_index: Optional[int] = None,
        shared_location_cache: Optional[Tuple[int, int]] = None,
    ) -> Tuple[np.ndarray, np.ndarray]:
        """Generate a feature vector and its corresponding mask.

        Creates a feature at the specified location (fixed or random) and returns
        both the vector and a boolean mask indicating the feature's position.

        Args:
            feature_def (Dict[str, Any]): Feature definition dictionary with 'type',
                location parameters, and generator parameters.
            dim_index (Optional[int]): The index in the dimensions list to use for location
                determination. Only used when shared_location is False.
            shared_location_cache (Optional[Tuple[int, int]]): Pre-calculated start and end
                indices for a shared location. Used to ensure consistency across dimensions.

        Returns:
            Tuple[np.ndarray, np.ndarray]: Tuple containing:
                - Feature vector with specified pattern at the determined location
                - Boolean mask indicating the feature's position (True where feature exists)
        """
        # Initialize with feature with fill value
        feature = np.full(self.n_timesteps, self.feature_fill_value)
        mask = np.zeros(self.n_timesteps, dtype=bool)

        # Determine feature location
        if feature_def["random_location"]:
            if shared_location_cache is not None:
                # Use the cached shared location
                start_idx, end_idx = shared_location_cache
            else:
                length_pct = self._resolve_length_pct(
                    feature_def["length_pct"], self.rng
                )
                feature_length = max(1, int(length_pct * self.n_timesteps))

                # Generate random start position
                # If dim_index is provided and shared_location is False, use different
                # random locations for each dimension
                if dim_index is not None and not feature_def["shared_location"]:
                    # Use dim_index to get a different random seed for each dimension
                    dim_rng = np.random.RandomState(self.rng.randint(0, 2**32 - 1))
                    max_start = self.n_timesteps - feature_length
                    start_idx = dim_rng.randint(0, max_start + 1)
                else:
                    max_start = self.n_timesteps - feature_length
                    start_idx = self.rng.randint(0, max_start + 1)

                end_idx = start_idx + feature_length
        else:
            start_pct = feature_def["start_pct"]
            end_pct = feature_def["end_pct"]

            start_idx = int(start_pct * self.n_timesteps)
            end_idx = int(end_pct * self.n_timesteps)

            # Ensure at least one timestep is selected
            if start_idx == end_idx:
                end_idx = start_idx + 1

        # Mark the feature region
        mask[start_idx:end_idx] = True

        # Generate the feature vector
        feature_params = feature_def.copy()
        feature_type = feature_params.pop("type")

        # Remove location parameters
        feature_params.pop("random_location", None)
        feature_params.pop("start_pct", None)
        feature_params.pop("end_pct", None)
        feature_params.pop("length_pct", None)
        feature_params.pop("dimensions", None)
        feature_params.pop("shared_location", None)
        feature_params.pop("shared_randomness", None)

        # Generate the component for the feature length
        feature_length = end_idx - start_idx
        feature_values = generate_component(
            feature_type,
            self.n_timesteps,
            self.rng,
            length=feature_length,
            **feature_params,
        )

        # Place the feature in the correct location
        feature[start_idx:end_idx] = feature_values

        return feature, mask

    def build(
        self,
        return_components: bool = True,
        deterministic_class_counts: bool = True,
        shuffle: bool = True,
    ) -> Dict[str, Any]:
        """Build the dataset based on the configured class definitions.

        Generates time series data by combining all components for each class according
        to the specified parameters, with options to include component vectors and
        create a train/test split.

        Args:
            return_components (bool): Whether to return the individual component vectors.
                Useful for visualization and analysis. Default is True.
            deterministic_class_counts (bool): If True, class counts will be determined exactly
                by the weights rather than using multinomial sampling. This ensures exact class
                proportions. Default is True.
            shuffle (bool): Whether to shuffle the samples across classes. If True (default),
                samples will be randomly ordered. If False, samples will be grouped by class
                in the order classes were defined.

        Returns:
            Dict[str, Any]: Dictionary containing the generated dataset with keys:
                - 'X': Time series data with shape determined by data_format:
                       - 'channels_last': [n_samples, n_timesteps, n_dimensions]
                       - 'channels_first': [n_samples, n_dimensions, n_timesteps]
                - 'y': Class labels for each sample
                - 'feature_masks': Boolean masks showing feature locations
                - 'metadata': Dataset configuration information
                - 'components': Individual component vectors (if return_components=True)
                If train_test_split is provided, also includes:
                - 'X_train', 'y_train': Training data
                - 'X_test', 'y_test': Testing data

        Raises:
            ValueError: If no class definitions have been provided.
        """
        if not self.class_definitions:
            raise ValueError(
                "No class definitions provided. Call for_class() at least once."
            )

        # Normalize class weights and determine class distribution
        weights = np.array([cd["weight"] for cd in self.class_definitions])
        weights = weights / weights.sum()

        if deterministic_class_counts:
            # Deterministic class counts based on exact weights
            raw_counts = weights * self.n_samples
            # Round to integers and ensure we have exactly n_samples total
            class_counts = np.floor(raw_counts).astype(int)
            remaining = self.n_samples - class_counts.sum()
            # Distribute remaining samples based on fractional parts
            if remaining > 0:
                fractions = raw_counts - class_counts
                indices = np.argsort(fractions)[-remaining:]
                for idx in indices:
                    class_counts[idx] += 1
        else:
            # Probabilistic class counts using multinomial sampling
            class_counts = self.rng.multinomial(self.n_samples, weights)

        # Initialize arrays - always create in channels_last format first (internal format)
        X = np.zeros((self.n_samples, self.n_timesteps, self.n_dimensions))
        y = np.zeros(self.n_samples, dtype=int)
        all_components = []
        feature_masks = {}

        # Generate data for each class
        sample_idx = 0
        for class_def, count in zip(self.class_definitions, class_counts):
            class_label = class_def["label"]

            for _ in range(count):
                # Initialize arrays for this sample with appropriate fill values per dimension
                background = np.full(
                    (self.n_timesteps, self.n_dimensions), self.background_fill_value
                )
                features_dict = {}
                feature_masks_dict = {}

                # Add base structure components
                for base_def in class_def["components"]["background"]:
                    # For signals with time range parameters, generate random location once if shared
                    if "random_location" in base_def and base_def["random_location"]:
                        # Determine signal length
                        length_pct = base_def["length_pct"]
                        signal_length = max(1, int(length_pct * self.n_timesteps))
                        max_start = self.n_timesteps - signal_length

                        # If shared_location is True, generate the location once for all dimensions
                        shared_location = base_def.get("shared_location", True)
                        if shared_location:
                            shared_start_idx = self.rng.randint(0, max_start + 1)
                            shared_end_idx = shared_start_idx + signal_length

                        # Apply to specified dimensions with appropriate location handling
                        for i, dim_idx in enumerate(base_def["dimensions"]):
                            # Create a full-length vector filled with the background fill value
                            base_vector = np.full(
                                self.n_timesteps, self.background_fill_value
                            )

                            # Determine signal location - possibly unique per dimension
                            if shared_location:
                                # Use the shared location for all dimensions
                                start_idx = shared_start_idx
                                end_idx = shared_end_idx
                            else:
                                # Create a unique location for each dimension
                                dim_rng = np.random.RandomState(
                                    self.rng.randint(0, 2**32 - 1)
                                )
                                start_idx = dim_rng.randint(0, max_start + 1)
                                end_idx = start_idx + signal_length

                            # Calculate the actual length of the signal segment
                            signal_length = end_idx - start_idx

                            # Prepare parameters for component generation
                            signal_params = base_def.copy()
                            signal_type = signal_params.pop("type")

                            # Remove location and dimension parameters
                            signal_params.pop("random_location", None)
                            signal_params.pop("length_pct", None)
                            signal_params.pop("shared_location", None)
                            signal_params.pop("dimensions", None)
                            signal_params.pop("shared_randomness", None)

                            # Generate the component only for the specified length
                            signal_values = generate_component(
                                signal_type, signal_length, self.rng, **signal_params
                            )

                            # Place the signal in the correct location
                            base_vector[start_idx:end_idx] = signal_values

                            # Add to background for this dimension
                            background[:, dim_idx] = self._add_vector_handling_nans(
                                background[:, dim_idx], base_vector
                            )
                    else:
                        # Handle non-random location signals (the original behavior)
                        if "random_location" in base_def:
                            # Fixed location signal
                            base_vector = np.full(
                                self.n_timesteps, self.background_fill_value
                            )

                            start_pct = base_def["start_pct"]
                            end_pct = base_def["end_pct"]
                            start_idx = int(start_pct * self.n_timesteps)
                            end_idx = int(end_pct * self.n_timesteps)

                            # Ensure at least one timestep is selected
                            if start_idx == end_idx:
                                end_idx = start_idx + 1

                            signal_length = end_idx - start_idx

                            # Generate the component only for the specified length
                            signal_params = base_def.copy()
                            signal_type = signal_params.pop("type")

                            # Remove location parameters
                            signal_params.pop("random_location", None)
                            signal_params.pop("start_pct", None)
                            signal_params.pop("end_pct", None)
                            signal_params.pop("dimensions", None)
                            signal_params.pop("shared_randomness", None)

                            signal_values = generate_component(
                                signal_type, signal_length, self.rng, **signal_params
                            )

                            base_vector[start_idx:end_idx] = signal_values
                        else:
                            # Full-length signal (original behavior)
                            base_vector = self._generate_component_vector(base_def)

                        # Apply to all specified dimensions with the same signal
                        for dim_idx in base_def["dimensions"]:
                            background[:, dim_idx] = self._add_vector_handling_nans(
                                background[:, dim_idx], base_vector
                            )

                # Initialize aggregated time series
                aggregated = background.copy()

                # Add features
                for feature_idx, feature_def in enumerate(
                    class_def["components"]["features"]
                ):
                    # For each dimension in the feature
                    feature_dims = feature_def["dimensions"]

                    # Generate a shared random location once if needed
                    shared_location_cache = None
                    if feature_def.get("random_location", False) and feature_def.get(
                        "shared_location", True
                    ):
                        # Pre-calculate the shared location to ensure it's the same across dimensions
                        length_pct = self._resolve_length_pct(
                            feature_def["length_pct"], self.rng
                        )
                        feature_length = max(1, int(length_pct * self.n_timesteps))
                        max_start = self.n_timesteps - feature_length
                        shared_start_idx = self.rng.randint(0, max_start + 1)
                        shared_end_idx = shared_start_idx + feature_length
                        shared_location_cache = (shared_start_idx, shared_end_idx)

                    for i, dim_idx in enumerate(feature_dims):
                        # Generate feature vector - if shared_location is True and we have a cached location,
                        # pass it; otherwise pass the dimension index for unique locations
                        dim_index = (
                            None
                            if feature_def.get("shared_location", True)
                            else dim_idx
                        )
                        feature, mask = self._generate_feature_vector(
                            feature_def, dim_index, shared_location_cache
                        )

                        # Add to aggregated series for this dimension
                        aggregated[:, dim_idx] = self._add_vector_handling_nans(
                            aggregated[:, dim_idx], feature
                        )

                        # Store components
                        feature_name = (
                            f"feature_{feature_idx}_{feature_def['type']}_dim{dim_idx}"
                        )
                        if feature_name not in features_dict:
                            features_dict[feature_name] = feature
                            feature_masks_dict[feature_name] = mask

                        # Add to global feature masks
                        feature_key = f"class_{class_label}_{feature_name}"
                        if feature_key not in feature_masks:
                            feature_masks[feature_key] = np.zeros(
                                (self.n_samples, self.n_timesteps), dtype=bool
                            )

                        feature_masks[feature_key][sample_idx] = mask

                # Normalize if required (apply to each dimension separately)
                for dim_idx in range(self.n_dimensions):
                    aggregated[:, dim_idx] = normalize(
                        aggregated[:, dim_idx],
                        method=self.normalization,
                        **self.normalization_kwargs,
                    )

                # Store the result
                X[sample_idx] = aggregated
                y[sample_idx] = class_label

                # Store components if needed
                if return_components:
                    all_components.append(
                        TimeSeriesComponents(
                            background=background,
                            features=features_dict,
                            feature_masks=feature_masks_dict,
                            aggregated=aggregated,
                        )
                    )

                sample_idx += 1

        # Shuffle the dataset if requested
        if shuffle:
            # Generate shuffled indices based on the random state
            indices = np.arange(self.n_samples)
            self.rng.shuffle(indices)

            # Shuffle X and y arrays
            X = X[indices]
            y = y[indices]

            # Shuffle components if they were returned
            if return_components:
                all_components = [all_components[i] for i in indices]

            # Shuffle feature masks
            for key in feature_masks:
                feature_masks[key] = feature_masks[key][indices]

        # Convert the tensor format if needed (from channels_last to channels_first)
        if self.data_format == "channels_first":
            # Transpose from [n_samples, n_timesteps, n_dimensions] to [n_samples, n_dimensions, n_timesteps]
            X = np.transpose(X, (0, 2, 1))

        # Prepare result dictionary
        result = {
            "X": X,
            "y": y,
            "feature_masks": feature_masks,
            "metadata": {
                "n_samples": self.n_samples,
                "n_timesteps": self.n_timesteps,
                "n_dimensions": self.n_dimensions,
                "class_definitions": self.class_definitions,
                "normalize": self.normalization,
                "normalization_kwargs": self.normalization_kwargs,
                "random_state": self.random_state,
                "data_format": self.data_format,
                "shuffled": shuffle,
            },
        }

        if return_components:
            result["components"] = all_components

        return result

    def to_df(
        self,
        dataset: Dict[str, Any],
        samples: Optional[List[int]] = None,
        classes: Optional[List[int]] = None,
        components: Optional[List[str]] = None,
        dimensions: Optional[List[int]] = None,
        format_classes: bool = False,
    ) -> pd.DataFrame:
        """Convert time series dataset to a long-format pandas DataFrame.

        Creates a DataFrame with one row per timestep per component per sample per dimension,
        suitable for detailed analysis and visualization with libraries like Seaborn or Plotly.

        Args:
            dataset (Dict[str, Any]): Dataset dictionary returned by build().
            samples (Optional[List[int]]): List of sample indices to include.
                If None, includes all samples.
            classes (Optional[List[int]]): List of class labels to include.
                If None, includes all classes.
            components (Optional[List[str]]): List of component types to include.
                Default includes all: ["aggregated", "background", "features"]
            dimensions (Optional[List[int]]): List of dimension indices to include.
                If None, includes all dimensions.
            format_classes (bool): If True, format class labels as "Class X".
                Otherwise use numeric labels. Default is False.

        Returns:
            pd.DataFrame: Long-format DataFrame with columns:
                - time: Timestep index
                - value: Component value at that timestep
                - class: Class label (formatted if format_classes=True)
                - sample: Sample index
                - component: Component type
                - feature: Feature name (for feature components)
                - dim: Dimension index

        Raises:
            ValueError: If specified dimensions are out of range.
        """
        # Default components to include (use programming-friendly names)
        default_components = ["aggregated", "background", "features"]
        components_to_include = (
            components if components is not None else default_components
        )

        # Get number of dimensions from metadata or infer from data shape
        n_dims = dataset.get("metadata", {}).get("n_dimensions", 1)
        if n_dims == 1 and len(dataset["X"].shape) == 3:
            n_dims = dataset["X"].shape[2]

        # Default dimensions to include
        if dimensions is None:
            dimensions = list(range(n_dims))
        else:
            # Validate dimensions
            for d in dimensions:
                if not 0 <= d < n_dims:
                    raise ValueError(
                        f"Dimension {d} is out of range (0 to {n_dims - 1})."
                    )

        # Filter by class if specified
        if classes is not None:
            class_indices = np.where(np.isin(dataset["y"], classes))[0]
        else:
            class_indices = np.arange(len(dataset["y"]))

        # Filter by sample if specified
        if samples is not None:
            sample_indices = np.array(samples)
            # Ensure sample indices are within class_indices
            sample_indices = np.intersect1d(sample_indices, class_indices)
        else:
            sample_indices = class_indices

        # Initialize list to hold DataFrames
        dfs = []

        # Process aggregated time series (formerly "Complete Series")
        if "aggregated" in components_to_include:
            # Get all selected samples at once
            X_selected = dataset["X"][sample_indices]
            n_samples = len(sample_indices)
            n_timesteps = X_selected.shape[1]

            # For each dimension
            for dim_idx in dimensions:
                # Create time indices for all samples
                times = np.arange(n_timesteps)

                # Create sample indices repeated for each timestep
                sample_idx_rep = np.repeat(sample_indices, n_timesteps)
                time_idx_rep = np.tile(times, n_samples)

                # Create values array for this dimension
                if len(X_selected.shape) == 3:  # Multivariate case
                    values = X_selected[:, :, dim_idx].flatten()
                else:  # Univariate case (backward compatibility)
                    values = X_selected.flatten()

                # Get class labels
                classes_rep = np.repeat(dataset["y"][sample_indices], n_timesteps)
                if format_classes:
                    class_labels = np.array([f"Class {c}" for c in classes_rep])
                else:
                    class_labels = classes_rep

                # Create DataFrame
                df_agg = pd.DataFrame(
                    {
                        "time": time_idx_rep,
                        "value": values,
                        "class": class_labels,
                        "sample": sample_idx_rep,
                        "component": "aggregated",
                        "feature": None,
                        "dim": dim_idx,
                    }
                )

                dfs.append(df_agg)

        # Process components if available
        if "components" in dataset:
            for component_name in ["background"]:
                if component_name in components_to_include:
                    for dim_idx in dimensions:
                        comp_data = []
                        valid_samples = []

                        # Collect data from all samples
                        for i, idx in enumerate(sample_indices):
                            comp = dataset["components"][idx]
                            if (
                                hasattr(comp, component_name)
                                and getattr(comp, component_name) is not None
                            ):
                                comp_array = getattr(comp, component_name)
                                # Check if component has dimension data
                                if (
                                    len(comp_array.shape) == 2
                                    and comp_array.shape[1] > dim_idx
                                ):
                                    comp_data.append(comp_array[:, dim_idx])
                                    valid_samples.append(idx)
                                elif len(comp_array.shape) == 1 and dim_idx == 0:
                                    # Backward compatibility - 1D array for univariate case
                                    comp_data.append(comp_array)
                                    valid_samples.append(idx)

                        if comp_data:
                            # Stack component data
                            comp_array = np.vstack(comp_data)
                            n_valid = len(valid_samples)
                            n_timesteps = comp_array.shape[1]

                            # Create indices
                            sample_idx_rep = np.repeat(valid_samples, n_timesteps)
                            time_idx_rep = np.tile(np.arange(n_timesteps), n_valid)

                            # Get class labels
                            classes_rep = np.repeat(
                                dataset["y"][valid_samples], n_timesteps
                            )
                            if format_classes:
                                class_labels = np.array(
                                    [f"Class {c}" for c in classes_rep]
                                )
                            else:
                                class_labels = classes_rep

                            # Create DataFrame
                            df_comp = pd.DataFrame(
                                {
                                    "time": time_idx_rep,
                                    "value": comp_array.flatten(),
                                    "class": class_labels,
                                    "sample": sample_idx_rep,
                                    "component": component_name,
                                    "feature": None,
                                    "dim": dim_idx,
                                }
                            )

                            dfs.append(df_comp)

            # Process features - features need special handling since they're stored in a dict
            if "features" in components_to_include:
                feature_dfs = []

                for idx in sample_indices:
                    comp = dataset["components"][idx]
                    if hasattr(comp, "features") and comp.features:
                        for feature_name, feature_values in comp.features.items():
                            # Extract dimension from feature name (if present)
                            if "_dim" in feature_name:
                                parts = feature_name.split("_dim")
                                dim_idx = int(parts[-1])
                                if dim_idx not in dimensions:
                                    continue
                            else:
                                # For backward compatibility, assume dimension 0
                                dim_idx = 0
                                if dim_idx not in dimensions:
                                    continue

                            # Get class label
                            class_label = dataset["y"][idx]
                            if format_classes:
                                class_str = f"Class {class_label}"
                            else:
                                class_str = class_label

                            # Create feature DataFrame
                            df_feature = pd.DataFrame(
                                {
                                    "time": np.arange(len(feature_values)),
                                    "value": feature_values,
                                    "class": class_str,
                                    "sample": idx,
                                    "component": "features",
                                    "feature": feature_name,
                                    "dim": dim_idx,
                                }
                            )

                            feature_dfs.append(df_feature)

                if feature_dfs:
                    dfs.append(pd.concat(feature_dfs, ignore_index=True))

        # Combine all DataFrames
        if not dfs:
            return pd.DataFrame()

        df = pd.concat(dfs, ignore_index=True)

        # Set up categorical variables for ordered plotting
        components_present = [
            c for c in components_to_include if c in df["component"].unique()
        ]
        df["component"] = pd.Categorical(
            df["component"], categories=components_present, ordered=True
        )

        if format_classes:
            class_labels = sorted(
                df["class"].unique(), key=lambda x: int(x.split()[-1])
            )
            df["class"] = pd.Categorical(
                df["class"], categories=class_labels, ordered=True
            )

        return df

    def _add_vector_handling_nans(
        self, base: np.ndarray, to_add: np.ndarray
    ) -> np.ndarray:
        """Add two vectors while properly handling NaN values.

        Special handling of NaN values during vector addition:
        1. Where both vectors have values (not NaN): Normal addition
        2. Where one vector has NaN: Use the non-NaN value
        3. Where both have NaN: Result remains NaN

        This allows components to only contribute where they're defined.

        Args:
            base (np.ndarray): Base vector to add to.
            to_add (np.ndarray): Vector to add to the base.

        Returns:
            np.ndarray: Combined vector with NaNs handled according to the rules above.
        """
        # Stack arrays and use nansum for element-wise addition that ignores NaNs
        result = np.nansum(np.stack([base, to_add]), axis=0)

        # Fix case where both values are NaN (nansum would return 0, but we want NaN)
        both_nan = np.isnan(base) & np.isnan(to_add)
        result[both_nan] = np.nan

        return result

    @staticmethod
    def convert_data_format(
        dataset: Dict[str, Any], target_format: str
    ) -> Dict[str, Any]:
        """Convert an existing dataset between 'channels_first' and 'channels_last' formats.

        This utility function helps convert datasets between the two supported tensor layouts:
        - 'channels_last': [batch_size, time_steps, channels] (original XAITimeSynth format)
        - 'channels_first': [batch_size, channels, time_steps] (PyTorch/tsai format)

        Args:
            dataset (Dict[str, Any]): Dataset dictionary returned by build().
            target_format (str): Target format, either 'channels_first' or 'channels_last'.

        Returns:
            Dict[str, Any]: Dataset with X tensor in the target format. The metadata
                is updated to reflect the new format.

        Raises:
            ValueError: If target_format is not one of ['channels_first', 'channels_last'].
            ValueError: If dataset doesn't contain a metadata entry with data_format.
        """
        # Validate format
        if target_format not in ["channels_first", "channels_last"]:
            raise ValueError(
                "target_format must be one of ['channels_first', 'channels_last']"
            )

        # Create a shallow copy of the dataset
        result = dataset.copy()

        # Get current format from metadata
        if "metadata" not in dataset or "data_format" not in dataset["metadata"]:
            # Try to infer format
            if "X" in dataset and len(dataset["X"].shape) == 3:
                # Assume original format for backward compatibility
                current_format = "channels_last"
            else:
                raise ValueError("Dataset doesn't have format information in metadata")
        else:
            current_format = dataset["metadata"]["data_format"]

        # If already in target format, return dataset as-is
        if current_format == target_format:
            return result

        # Convert format by transposing the data
        if "X" in result:
            # Convert from channels_last to channels_first
            if current_format == "channels_last" and target_format == "channels_first":
                result["X"] = np.transpose(result["X"], (0, 2, 1))
            # Convert from channels_first to channels_last
            elif (
                current_format == "channels_first" and target_format == "channels_last"
            ):
                result["X"] = np.transpose(result["X"], (0, 2, 1))

            # Also convert train/test splits if they exist
            if "X_train" in result:
                if (
                    current_format == "channels_last"
                    and target_format == "channels_first"
                ):
                    result["X_train"] = np.transpose(result["X_train"], (0, 2, 1))
                else:
                    result["X_train"] = np.transpose(result["X_train"], (0, 2, 1))

            if "X_test" in result:
                if (
                    current_format == "channels_last"
                    and target_format == "channels_first"
                ):
                    result["X_test"] = np.transpose(result["X_test"], (0, 2, 1))
                else:
                    result["X_test"] = np.transpose(result["X_test"], (0, 2, 1))

        # Update metadata
        if "metadata" in result:
            result["metadata"] = result["metadata"].copy()
            result["metadata"]["data_format"] = target_format

        return result

    def clone(
        self,
        n_timesteps: Optional[int] = None,
        n_samples: Optional[int] = None,
        n_dimensions: Optional[int] = None,
        normalization: Optional[str] = None,
        random_state: Optional[int] = None,
        normalization_kwargs: Optional[Dict[str, Any]] = None,
        feature_fill_value: Optional[Any] = None,
        background_fill_value: Optional[Any] = None,
        data_format: Optional[str] = None,
    ) -> "TimeSeriesBuilder":
        """Create a new builder with the same class definitions but different parameters.

        This method creates an independent copy of the builder with all its class
        definitions but allows overriding specific parameters. This is particularly
        useful for generating train/test/validation splits with the same underlying
        patterns but different sample counts or random seeds.

        Args:
            n_timesteps: New length of each time series. Defaults to original value.
            n_samples: New number of samples to generate. Defaults to original value.
            n_dimensions: New number of dimensions. Defaults to original value.
            normalization: New normalization method. Defaults to original value.
            random_state: New random seed for reproducibility. Defaults to original value.
            normalization_kwargs: New normalization parameters. Defaults to original value.
            feature_fill_value: New value for non-existent features. Defaults to original value.
            background_fill_value: New value for background. Defaults to original value.
            data_format: New data format ('channels_first' or 'channels_last'). Defaults to original value.

        Returns:
            TimeSeriesBuilder: A new independent builder with copied class definitions
            and potentially updated parameters.

        Example:
            ```python
            # Create base builder with class definitions
            base_builder = (
                TimeSeriesBuilder(n_timesteps=100, random_state=42)
                .for_class(0)
                .add_signal(random_walk(step_size=0.2))
                .for_class(1)
                .add_signal(random_walk(step_size=0.2))
                .add_feature(constant(value=1.0), start_pct=0.4, end_pct=0.6)
            )

            # Generate train dataset with 140 samples
            train_dataset = base_builder.clone(n_samples=140, random_state=42).build()

            # Generate test dataset with 60 samples and a different random seed
            test_dataset = base_builder.clone(n_samples=60, random_state=43).build()
            ```
        """
        # Prepare parameters with defaults from current instance when not provided
        params = {
            "n_timesteps": n_timesteps if n_timesteps is not None else self.n_timesteps,
            "n_samples": n_samples if n_samples is not None else self.n_samples,
            "n_dimensions": n_dimensions
            if n_dimensions is not None
            else self.n_dimensions,
            "normalization": normalization
            if normalization is not None
            else self.normalization,
            "random_state": random_state
            if random_state is not None
            else self.random_state,
            "normalization_kwargs": (
                normalization_kwargs
                if normalization_kwargs is not None
                else copy.deepcopy(self.normalization_kwargs)
            ),
            "feature_fill_value": feature_fill_value
            if feature_fill_value is not None
            else self.feature_fill_value,
            "background_fill_value": background_fill_value
            if background_fill_value is not None
            else self.background_fill_value,
            "data_format": data_format if data_format is not None else self.data_format,
        }
        # Create new builder with updated parameters
        new_builder = TimeSeriesBuilder(**params)

        # Copy class definitions (deep copy to ensure complete independence)
        new_builder.class_definitions = copy.deepcopy(self.class_definitions)

        # Set current class if one was selected in the original builder
        if self.current_class is not None:
            # Find the class label of the current class
            for i, class_def in enumerate(self.class_definitions):
                if class_def is self.current_class:
                    new_builder.current_class = new_builder.class_definitions[i]
                    break

        return new_builder

    def to_config(self) -> Dict[str, Any]:
        """Export the builder configuration as a dictionary.

        Converts the builder's internal state to a configuration dictionary
        that can be used with `load_builders_from_config()` or serialized to YAML.

        The output format matches what the parser expects, enabling round-trip
        conversion between Python code and configuration files.

        Returns:
            Dict[str, Any]: Configuration dictionary with builder parameters
            and class definitions.

        Example:
            ```python
            import yaml

            # Build a dataset programmatically
            builder = (
                TimeSeriesBuilder(n_timesteps=100, n_samples=200)
                .for_class(0)
                .add_signal(gaussian_noise(sigma=0.1))
                .for_class(1)
                .add_signal(gaussian_noise(sigma=0.1))
                .add_feature(peak(amplitude=1.0), start_pct=0.3, end_pct=0.6)
            )

            # Export to config dict
            config = builder.to_config()

            # Save to YAML file
            with open("config.yaml", "w") as f:
                yaml.dump({"my_dataset": config}, f)

            # Later, reload from YAML
            builders = load_builders_from_config(config_path="config.yaml")
            dataset = builders["my_dataset"].build()
            ```
        """
        # Keys that should stay at the component level, not in params
        COMPONENT_KEYS = {
            "type",
            "dimensions",
            "shared_randomness",
            "shared_location",
            "start_pct",
            "end_pct",
            "length_pct",
            "random_location",
        }

        def convert_component(comp: Dict[str, Any]) -> Dict[str, Any]:
            """Convert internal component format to config format."""
            result = {}

            # Map 'type' to 'function'
            if "type" in comp:
                result["function"] = comp["type"]

            # Extract params (everything except special keys)
            params = {k: v for k, v in comp.items() if k not in COMPONENT_KEYS}
            if params:
                result["params"] = params

            # Copy over special keys
            if "dimensions" in comp:
                result["dimensions"] = comp["dimensions"]
            if comp.get("shared_randomness"):
                result["shared_randomness"] = True
            if "shared_location" in comp and not comp.get("shared_location", True):
                result["shared_location"] = False

            # Location parameters
            if comp.get("random_location"):
                result["random_location"] = True
                if "length_pct" in comp:
                    lp = comp["length_pct"]
                    # Serialize tuples as {range: [min, max]} for YAML roundtrip fidelity
                    result["length_pct"] = (
                        {"range": list(lp)} if isinstance(lp, tuple) else lp
                    )
            elif "start_pct" in comp or "end_pct" in comp:
                if "start_pct" in comp:
                    result["start_pct"] = comp["start_pct"]
                if "end_pct" in comp:
                    result["end_pct"] = comp["end_pct"]

            return result

        # Build the config dictionary
        config: Dict[str, Any] = {
            "n_timesteps": self.n_timesteps,
            "n_samples": self.n_samples,
            "n_dimensions": self.n_dimensions,
            "normalization": self.normalization,
            "data_format": self.data_format,
        }

        # Only include optional parameters if they have non-default values
        if self.random_state is not None:
            config["random_state"] = self.random_state
        if self.normalization_kwargs:
            config["normalization_kwargs"] = self.normalization_kwargs

        # Convert class definitions
        classes = []
        for class_def in self.class_definitions:
            class_config: Dict[str, Any] = {"id": class_def["label"]}

            if class_def.get("weight", 1.0) != 1.0:
                class_config["weight"] = class_def["weight"]

            # Convert background components to signals list
            signals = []
            for comp in class_def["components"].get("background", []):
                signals.append(convert_component(comp))

            if signals:
                class_config["signals"] = signals

            # Convert features
            features = []
            for comp in class_def["components"].get("features", []):
                features.append(convert_component(comp))

            if features:
                class_config["features"] = features

            classes.append(class_config)

        config["classes"] = classes

        return config

__init__(n_timesteps: int = 100, n_samples: int = 1000, n_dimensions: int = 1, normalization: str = 'zscore', random_state: Optional[int] = None, normalization_kwargs: Optional[Dict[str, Any]] = {}, feature_fill_value: Any = np.nan, background_fill_value: Any = 0.0, data_format: str = 'channels_first')

Initialize the time series builder.

Parameters:

Name Type Description Default
n_timesteps int

Length of each time series. Default is 100.

100
n_samples int

Total number of samples to generate. Default is 1000.

1000
n_dimensions int

Number of dimensions for multivariate time series. Default is 1 (univariate).

1
normalization str

Normalization method for the final time series. Options: "zscore" (standardization), "minmax" (scale to 0-1), or "none". Default is "zscore".

'zscore'
random_state int

Seed for random number generation to ensure reproducibility.

None
normalization_kwargs dict

Additional parameters for normalization methods. For "minmax": can specify "feature_range" as tuple (min, max).

{}
feature_fill_value Any

Value used for non-existent features. Default is np.nan. Using NaN makes features only appear where they're defined in visualizations.

nan
background_fill_value Any

Value used for background when none exists. Default is 0.0. Background typically affects the entire time series, so zeros represent "no contribution" rather than "doesn't exist".

0.0
data_format str

Format of the output tensor data. 'channels_last': [batch, time_steps, channels] (original XAITimeSynth format) 'channels_first': [batch, channels, time_steps] (PyTorch/tsai format) Default is 'channels_first'.

'channels_first'

Raises:

Type Description
ValueError

If n_dimensions is less than 1.

ValueError

If data_format is not one of ['channels_first', 'channels_last']

Source code in xaitimesynth/builder.py
def __init__(
    self,
    n_timesteps: int = 100,
    n_samples: int = 1000,
    n_dimensions: int = 1,
    normalization: str = "zscore",
    random_state: Optional[int] = None,
    normalization_kwargs: Optional[Dict[str, Any]] = {},
    feature_fill_value: Any = np.nan,
    background_fill_value: Any = 0.0,
    data_format: str = "channels_first",
):
    """Initialize the time series builder.

    Args:
        n_timesteps (int): Length of each time series. Default is 100.
        n_samples (int): Total number of samples to generate. Default is 1000.
        n_dimensions (int): Number of dimensions for multivariate time series. Default is 1 (univariate).
        normalization (str): Normalization method for the final time series.
            Options: "zscore" (standardization), "minmax" (scale to 0-1), or "none". Default is "zscore".
        random_state (int, optional): Seed for random number generation to ensure reproducibility.
        normalization_kwargs (dict, optional): Additional parameters for normalization methods.
            For "minmax": can specify "feature_range" as tuple (min, max).
        feature_fill_value: Value used for non-existent features. Default is np.nan.
            Using NaN makes features only appear where they're defined in visualizations.
        background_fill_value: Value used for background when none exists. Default is 0.0.
            Background typically affects the entire time series, so zeros represent
            "no contribution" rather than "doesn't exist".
        data_format (str): Format of the output tensor data.
            'channels_last': [batch, time_steps, channels] (original XAITimeSynth format)
            'channels_first': [batch, channels, time_steps] (PyTorch/tsai format)
            Default is 'channels_first'.

    Raises:
        ValueError: If n_dimensions is less than 1.
        ValueError: If data_format is not one of ['channels_first', 'channels_last']
    """
    self.n_timesteps = n_timesteps
    self.n_samples = n_samples
    self.n_dimensions = n_dimensions

    # Validate n_dimensions
    if n_dimensions < 1:
        raise ValueError("n_dimensions must be at least 1")

    # Validate data_format
    if data_format not in ["channels_first", "channels_last"]:
        raise ValueError(
            "data_format must be one of ['channels_first', 'channels_last']"
        )
    self.data_format = data_format

    self.normalization = normalization
    self.normalization_kwargs = normalization_kwargs or {}
    self.random_state = random_state
    self.rng = np.random.RandomState(random_state)
    self.feature_fill_value = feature_fill_value
    self.background_fill_value = background_fill_value

    # Initialize class definitions and the current class
    self.class_definitions = []
    self.current_class = None

for_class(class_label: int, weight: float = 1.0) -> TimeSeriesBuilder

Set the current class for component assignment.

Creates a new class definition and makes it the target for subsequent component additions. Multiple calls create multiple classes for classification tasks.

Parameters:

Name Type Description Default
class_label int

Integer label for the class, used as the target value.

required
weight float

Relative weight of this class in the dataset. Controls the class distribution in the generated dataset. Default is 1.0.

1.0

Returns:

Name Type Description
TimeSeriesBuilder TimeSeriesBuilder

Self for method chaining.

Source code in xaitimesynth/builder.py
def for_class(self, class_label: int, weight: float = 1.0) -> "TimeSeriesBuilder":
    """Set the current class for component assignment.

    Creates a new class definition and makes it the target for subsequent component additions.
    Multiple calls create multiple classes for classification tasks.

    Args:
        class_label (int): Integer label for the class, used as the target value.
        weight (float): Relative weight of this class in the dataset. Controls the
            class distribution in the generated dataset. Default is 1.0.

    Returns:
        TimeSeriesBuilder: Self for method chaining.
    """
    # Create a new class definition
    class_def = {
        "label": class_label,
        "weight": weight,
        "components": {"background": [], "features": []},
    }

    self.class_definitions.append(class_def)
    self.current_class = class_def

    return self

add_signal(component: Dict[str, Any], dim: Optional[List[int]] = None, shared_randomness: bool = False, start_pct: Optional[float] = None, end_pct: Optional[float] = None, length_pct: Optional[float] = None, random_location: bool = False, shared_location: bool = True) -> TimeSeriesBuilder

Add a signal component to the current class.

Signals form the background structure of the time series (e.g., random walks, gaussian noise, trends). All signals are added to the background component.

Default behavior: When no location parameters are specified (start_pct, end_pct, length_pct all None and random_location=False), the signal spans the entire time series length.

Segment mode: To apply a signal to only part of the time series, either: - Specify start_pct and end_pct for a fixed segment, or - Set random_location=True with length_pct for a randomly positioned segment.

Parameters:

Name Type Description Default
component Dict[str, Any]

Component definition dictionary with 'type' and parameters.

required
dim List[int]

List of dimension indices where this signal should be applied. If None, the signal will be added to all dimensions. Default is None.

None
shared_randomness bool

If True, the same random pattern will be used across all specified dimensions. If False, each dimension gets its own random pattern (for stochastic components). Default is False.

False
start_pct float

Start position as percentage of time series length (0-1). Required together with end_pct for a fixed segment.

None
end_pct float

End position as percentage of time series length (0-1). Required together with start_pct for a fixed segment.

None
length_pct float

Length of signal as percentage of time series length (0-1). Required when random_location is True.

None
random_location bool

Whether to place the signal at a random location. Requires length_pct. Default is False.

False
shared_location bool

If True and random_location is True, the same random location will be used across all dimensions. If False, each dimension gets its own random location. Default is True.

True

Returns:

Name Type Description
TimeSeriesBuilder TimeSeriesBuilder

Self for method chaining.

Raises:

Type Description
ValueError

If no class is selected or if location parameters are inconsistent.

Examples:

Full time series (default - no location params)

builder.add_signal(gaussian_noise(sigma=0.1))

Fixed segment from 20% to 50% of the series

builder.add_signal(constant(value=1.0), start_pct=0.2, end_pct=0.5)

Random segment of 30% length

builder.add_signal(constant(value=1.0), random_location=True, length_pct=0.3)

Source code in xaitimesynth/builder.py
def add_signal(
    self,
    component: Dict[str, Any],
    dim: Optional[List[int]] = None,
    shared_randomness: bool = False,
    start_pct: Optional[float] = None,
    end_pct: Optional[float] = None,
    length_pct: Optional[float] = None,
    random_location: bool = False,
    shared_location: bool = True,
) -> "TimeSeriesBuilder":
    """Add a signal component to the current class.

    Signals form the background structure of the time series (e.g., random walks,
    gaussian noise, trends). All signals are added to the background component.

    Default behavior: When no location parameters are specified (start_pct, end_pct,
    length_pct all None and random_location=False), the signal spans the entire time
    series length.

    Segment mode: To apply a signal to only part of the time series, either:
    - Specify start_pct and end_pct for a fixed segment, or
    - Set random_location=True with length_pct for a randomly positioned segment.

    Args:
        component (Dict[str, Any]): Component definition dictionary with 'type' and parameters.
        dim (List[int]): List of dimension indices where this signal should be applied.
            If None, the signal will be added to all dimensions. Default is None.
        shared_randomness (bool): If True, the same random pattern will be used across all
            specified dimensions. If False, each dimension gets its own random pattern
            (for stochastic components). Default is False.
        start_pct (float, optional): Start position as percentage of time series length (0-1).
            Required together with end_pct for a fixed segment.
        end_pct (float, optional): End position as percentage of time series length (0-1).
            Required together with start_pct for a fixed segment.
        length_pct (float, optional): Length of signal as percentage of time series length (0-1).
            Required when random_location is True.
        random_location (bool): Whether to place the signal at a random location.
            Requires length_pct. Default is False.
        shared_location (bool): If True and random_location is True, the same random
            location will be used across all dimensions. If False, each dimension gets
            its own random location. Default is True.

    Returns:
        TimeSeriesBuilder: Self for method chaining.

    Raises:
        ValueError: If no class is selected or if location parameters are inconsistent.

    Examples:
        # Full time series (default - no location params)
        builder.add_signal(gaussian_noise(sigma=0.1))

        # Fixed segment from 20% to 50% of the series
        builder.add_signal(constant(value=1.0), start_pct=0.2, end_pct=0.5)

        # Random segment of 30% length
        builder.add_signal(constant(value=1.0), random_location=True, length_pct=0.3)
    """
    if self.current_class is None:
        raise ValueError("No class selected. Call for_class() first.")

    if dim is None:
        dim = list(range(self.n_dimensions))
    self._validate_dimensions(dim)

    # Determine if this is a segment or full-series signal
    has_time_range = (
        start_pct is not None
        or end_pct is not None
        or length_pct is not None
        or random_location
    )

    # Validate location parameters based on mode
    if has_time_range:
        if random_location:
            if length_pct is None:
                raise ValueError(
                    "length_pct must be provided when random_location is True"
                )
            if not (0 < length_pct <= 1):
                raise ValueError("length_pct must be between 0 and 1")
        else:
            # Fixed segment mode - requires both start_pct and end_pct
            if start_pct is None or end_pct is None:
                raise ValueError(
                    "Both start_pct and end_pct must be provided for a fixed segment"
                )
            if not (
                0 <= start_pct < 1 and 0 < end_pct <= 1 and start_pct < end_pct
            ):
                raise ValueError(
                    "Invalid start_pct or end_pct. Must be between 0 and 1, "
                    "with start_pct < end_pct"
                )

    # Build the component definition
    component_with_params = component.copy()

    if has_time_range:
        if random_location:
            component_with_params["random_location"] = True
            component_with_params["length_pct"] = length_pct
            component_with_params["shared_location"] = shared_location
        else:
            component_with_params["random_location"] = False
            component_with_params["start_pct"] = start_pct
            component_with_params["end_pct"] = end_pct

    # Add dimensions and randomness settings
    # Use single component when sharing location/randomness or single dimension
    if (
        (has_time_range and shared_location and random_location)
        or shared_randomness
        or len(dim) == 1
    ):
        component_with_params["dimensions"] = dim
        component_with_params["shared_randomness"] = shared_randomness
        component_with_params["shared_location"] = shared_location
        self.current_class["components"]["background"].append(component_with_params)
    else:
        # Create separate component entries for each dimension
        for d in dim:
            component_with_dim = component_with_params.copy()
            component_with_dim["dimensions"] = [d]
            component_with_dim["shared_randomness"] = shared_randomness
            component_with_dim["shared_location"] = shared_location
            self.current_class["components"]["background"].append(
                component_with_dim
            )

    return self

add_feature(component: Dict[str, Any], start_pct: Optional[float] = None, end_pct: Optional[float] = None, length_pct: Optional[Union[float, Tuple[float, float], List[float]]] = None, random_location: bool = False, dim: Optional[List[int]] = None, shared_location: bool = True, shared_randomness: bool = False) -> TimeSeriesBuilder

Add a feature component to the current class.

Features are distinctive patterns that can differentiate between classes. They can be placed at fixed or random locations within the time series.

Parameters:

Name Type Description Default
component Dict[str, Any]

Component definition dictionary with 'type' and parameters.

required
start_pct float

Start position as percentage of time series length (0-1). Required when random_location is False.

None
end_pct float

End position as percentage of time series length (0-1). Required when random_location is False.

None
length_pct float | tuple | list

Length of feature as percentage of time series length. Required when random_location is True. Three forms accepted: - float: fixed length, e.g. 0.5 - tuple (min, max): sample uniformly per sample in range, e.g. (0.25, 0.75) - list of floats: sample from discrete choices per sample, e.g. [0.25, 0.5]

None
random_location bool

Whether to place the feature at a random location. Default is False (fixed position).

False
dim List[int]

List of dimension indices where this feature should be applied. If None, the feature will be added to all dimensions. Default is None.

None
shared_location bool

If True and random_location is True, the same random location will be used across all dimensions. If False, each dimension gets its own random location. Default is True.

True
shared_randomness bool

If True, the same random pattern will be used across all dimensions. If False, each dimension gets its own random pattern (for stochastic components). Default is False.

False

Returns:

Name Type Description
TimeSeriesBuilder TimeSeriesBuilder

Self for method chaining.

Raises:

Type Description
ValueError

If no class is selected or if location parameters are invalid.

Source code in xaitimesynth/builder.py
def add_feature(
    self,
    component: Dict[str, Any],
    start_pct: Optional[float] = None,
    end_pct: Optional[float] = None,
    length_pct: Optional[Union[float, Tuple[float, float], List[float]]] = None,
    random_location: bool = False,
    dim: Optional[List[int]] = None,
    shared_location: bool = True,
    shared_randomness: bool = False,
) -> "TimeSeriesBuilder":
    """Add a feature component to the current class.

    Features are distinctive patterns that can differentiate between classes.
    They can be placed at fixed or random locations within the time series.

    Args:
        component (Dict[str, Any]): Component definition dictionary with 'type' and parameters.
        start_pct (float, optional): Start position as percentage of time series length (0-1).
            Required when random_location is False.
        end_pct (float, optional): End position as percentage of time series length (0-1).
            Required when random_location is False.
        length_pct (float | tuple | list, optional): Length of feature as percentage of time
            series length. Required when random_location is True. Three forms accepted:
            - float: fixed length, e.g. ``0.5``
            - tuple (min, max): sample uniformly per sample in range, e.g. ``(0.25, 0.75)``
            - list of floats: sample from discrete choices per sample, e.g. ``[0.25, 0.5]``
        random_location (bool): Whether to place the feature at a random location.
            Default is False (fixed position).
        dim (List[int]): List of dimension indices where this feature should be applied.
            If None, the feature will be added to all dimensions. Default is None.
        shared_location (bool): If True and random_location is True, the same random
            location will be used across all dimensions. If False, each dimension gets
            its own random location. Default is True.
        shared_randomness (bool): If True, the same random pattern will be used across
            all dimensions. If False, each dimension gets its own random pattern
            (for stochastic components). Default is False.

    Returns:
        TimeSeriesBuilder: Self for method chaining.

    Raises:
        ValueError: If no class is selected or if location parameters are invalid.
    """
    if self.current_class is None:
        raise ValueError("No class selected. Call for_class() first.")

    if dim is None:
        dim = list(range(self.n_dimensions))
    self._validate_dimensions(dim)

    # Create feature definition
    feature_def = component.copy()

    # Add location parameters
    if random_location:
        if length_pct is None:
            raise ValueError(
                "length_pct must be provided when random_location is True"
            )
        if isinstance(length_pct, tuple):
            if len(length_pct) != 2 or not (0 < length_pct[0] < length_pct[1] <= 1):
                raise ValueError(
                    "length_pct tuple must be (min, max) with 0 < min < max <= 1"
                )
        elif isinstance(length_pct, list):
            if not length_pct or not all(0 < v <= 1 for v in length_pct):
                raise ValueError(
                    "length_pct list must be non-empty with all values in (0, 1]"
                )
        else:
            if not (0 < length_pct <= 1):
                raise ValueError("length_pct must be between 0 and 1")

        feature_def["random_location"] = True
        feature_def["length_pct"] = length_pct
    else:
        if start_pct is None or end_pct is None:
            raise ValueError(
                "start_pct and end_pct must be provided when random_location is False"
            )
        if not (0 <= start_pct < 1 and 0 < end_pct <= 1 and start_pct < end_pct):
            raise ValueError(
                "Invalid start_pct or end_pct. Must be between 0 and 1, with start_pct < end_pct"
            )

        feature_def["random_location"] = False
        feature_def["start_pct"] = start_pct
        feature_def["end_pct"] = end_pct

    # Add to feature collection, ensuring the shared location logic is properly observed
    if shared_location and random_location or shared_randomness or len(dim) == 1:
        feature_def["dimensions"] = dim
        feature_def["shared_location"] = shared_location
        feature_def["shared_randomness"] = shared_randomness
        self.current_class["components"]["features"].append(feature_def)
    else:
        # Create separate feature entries for each dimension when not sharing
        for d in dim:
            feature_single_dim = feature_def.copy()
            feature_single_dim["dimensions"] = [d]  # Single dimension
            feature_single_dim["shared_location"] = shared_location
            feature_single_dim["shared_randomness"] = shared_randomness
            self.current_class["components"]["features"].append(feature_single_dim)

    return self

build(return_components: bool = True, deterministic_class_counts: bool = True, shuffle: bool = True) -> Dict[str, Any]

Build the dataset based on the configured class definitions.

Generates time series data by combining all components for each class according to the specified parameters, with options to include component vectors and create a train/test split.

Parameters:

Name Type Description Default
return_components bool

Whether to return the individual component vectors. Useful for visualization and analysis. Default is True.

True
deterministic_class_counts bool

If True, class counts will be determined exactly by the weights rather than using multinomial sampling. This ensures exact class proportions. Default is True.

True
shuffle bool

Whether to shuffle the samples across classes. If True (default), samples will be randomly ordered. If False, samples will be grouped by class in the order classes were defined.

True

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: Dictionary containing the generated dataset with keys: - 'X': Time series data with shape determined by data_format: - 'channels_last': [n_samples, n_timesteps, n_dimensions] - 'channels_first': [n_samples, n_dimensions, n_timesteps] - 'y': Class labels for each sample - 'feature_masks': Boolean masks showing feature locations - 'metadata': Dataset configuration information - 'components': Individual component vectors (if return_components=True) If train_test_split is provided, also includes: - 'X_train', 'y_train': Training data - 'X_test', 'y_test': Testing data

Raises:

Type Description
ValueError

If no class definitions have been provided.

Source code in xaitimesynth/builder.py
def build(
    self,
    return_components: bool = True,
    deterministic_class_counts: bool = True,
    shuffle: bool = True,
) -> Dict[str, Any]:
    """Build the dataset based on the configured class definitions.

    Generates time series data by combining all components for each class according
    to the specified parameters, with options to include component vectors and
    create a train/test split.

    Args:
        return_components (bool): Whether to return the individual component vectors.
            Useful for visualization and analysis. Default is True.
        deterministic_class_counts (bool): If True, class counts will be determined exactly
            by the weights rather than using multinomial sampling. This ensures exact class
            proportions. Default is True.
        shuffle (bool): Whether to shuffle the samples across classes. If True (default),
            samples will be randomly ordered. If False, samples will be grouped by class
            in the order classes were defined.

    Returns:
        Dict[str, Any]: Dictionary containing the generated dataset with keys:
            - 'X': Time series data with shape determined by data_format:
                   - 'channels_last': [n_samples, n_timesteps, n_dimensions]
                   - 'channels_first': [n_samples, n_dimensions, n_timesteps]
            - 'y': Class labels for each sample
            - 'feature_masks': Boolean masks showing feature locations
            - 'metadata': Dataset configuration information
            - 'components': Individual component vectors (if return_components=True)
            If train_test_split is provided, also includes:
            - 'X_train', 'y_train': Training data
            - 'X_test', 'y_test': Testing data

    Raises:
        ValueError: If no class definitions have been provided.
    """
    if not self.class_definitions:
        raise ValueError(
            "No class definitions provided. Call for_class() at least once."
        )

    # Normalize class weights and determine class distribution
    weights = np.array([cd["weight"] for cd in self.class_definitions])
    weights = weights / weights.sum()

    if deterministic_class_counts:
        # Deterministic class counts based on exact weights
        raw_counts = weights * self.n_samples
        # Round to integers and ensure we have exactly n_samples total
        class_counts = np.floor(raw_counts).astype(int)
        remaining = self.n_samples - class_counts.sum()
        # Distribute remaining samples based on fractional parts
        if remaining > 0:
            fractions = raw_counts - class_counts
            indices = np.argsort(fractions)[-remaining:]
            for idx in indices:
                class_counts[idx] += 1
    else:
        # Probabilistic class counts using multinomial sampling
        class_counts = self.rng.multinomial(self.n_samples, weights)

    # Initialize arrays - always create in channels_last format first (internal format)
    X = np.zeros((self.n_samples, self.n_timesteps, self.n_dimensions))
    y = np.zeros(self.n_samples, dtype=int)
    all_components = []
    feature_masks = {}

    # Generate data for each class
    sample_idx = 0
    for class_def, count in zip(self.class_definitions, class_counts):
        class_label = class_def["label"]

        for _ in range(count):
            # Initialize arrays for this sample with appropriate fill values per dimension
            background = np.full(
                (self.n_timesteps, self.n_dimensions), self.background_fill_value
            )
            features_dict = {}
            feature_masks_dict = {}

            # Add base structure components
            for base_def in class_def["components"]["background"]:
                # For signals with time range parameters, generate random location once if shared
                if "random_location" in base_def and base_def["random_location"]:
                    # Determine signal length
                    length_pct = base_def["length_pct"]
                    signal_length = max(1, int(length_pct * self.n_timesteps))
                    max_start = self.n_timesteps - signal_length

                    # If shared_location is True, generate the location once for all dimensions
                    shared_location = base_def.get("shared_location", True)
                    if shared_location:
                        shared_start_idx = self.rng.randint(0, max_start + 1)
                        shared_end_idx = shared_start_idx + signal_length

                    # Apply to specified dimensions with appropriate location handling
                    for i, dim_idx in enumerate(base_def["dimensions"]):
                        # Create a full-length vector filled with the background fill value
                        base_vector = np.full(
                            self.n_timesteps, self.background_fill_value
                        )

                        # Determine signal location - possibly unique per dimension
                        if shared_location:
                            # Use the shared location for all dimensions
                            start_idx = shared_start_idx
                            end_idx = shared_end_idx
                        else:
                            # Create a unique location for each dimension
                            dim_rng = np.random.RandomState(
                                self.rng.randint(0, 2**32 - 1)
                            )
                            start_idx = dim_rng.randint(0, max_start + 1)
                            end_idx = start_idx + signal_length

                        # Calculate the actual length of the signal segment
                        signal_length = end_idx - start_idx

                        # Prepare parameters for component generation
                        signal_params = base_def.copy()
                        signal_type = signal_params.pop("type")

                        # Remove location and dimension parameters
                        signal_params.pop("random_location", None)
                        signal_params.pop("length_pct", None)
                        signal_params.pop("shared_location", None)
                        signal_params.pop("dimensions", None)
                        signal_params.pop("shared_randomness", None)

                        # Generate the component only for the specified length
                        signal_values = generate_component(
                            signal_type, signal_length, self.rng, **signal_params
                        )

                        # Place the signal in the correct location
                        base_vector[start_idx:end_idx] = signal_values

                        # Add to background for this dimension
                        background[:, dim_idx] = self._add_vector_handling_nans(
                            background[:, dim_idx], base_vector
                        )
                else:
                    # Handle non-random location signals (the original behavior)
                    if "random_location" in base_def:
                        # Fixed location signal
                        base_vector = np.full(
                            self.n_timesteps, self.background_fill_value
                        )

                        start_pct = base_def["start_pct"]
                        end_pct = base_def["end_pct"]
                        start_idx = int(start_pct * self.n_timesteps)
                        end_idx = int(end_pct * self.n_timesteps)

                        # Ensure at least one timestep is selected
                        if start_idx == end_idx:
                            end_idx = start_idx + 1

                        signal_length = end_idx - start_idx

                        # Generate the component only for the specified length
                        signal_params = base_def.copy()
                        signal_type = signal_params.pop("type")

                        # Remove location parameters
                        signal_params.pop("random_location", None)
                        signal_params.pop("start_pct", None)
                        signal_params.pop("end_pct", None)
                        signal_params.pop("dimensions", None)
                        signal_params.pop("shared_randomness", None)

                        signal_values = generate_component(
                            signal_type, signal_length, self.rng, **signal_params
                        )

                        base_vector[start_idx:end_idx] = signal_values
                    else:
                        # Full-length signal (original behavior)
                        base_vector = self._generate_component_vector(base_def)

                    # Apply to all specified dimensions with the same signal
                    for dim_idx in base_def["dimensions"]:
                        background[:, dim_idx] = self._add_vector_handling_nans(
                            background[:, dim_idx], base_vector
                        )

            # Initialize aggregated time series
            aggregated = background.copy()

            # Add features
            for feature_idx, feature_def in enumerate(
                class_def["components"]["features"]
            ):
                # For each dimension in the feature
                feature_dims = feature_def["dimensions"]

                # Generate a shared random location once if needed
                shared_location_cache = None
                if feature_def.get("random_location", False) and feature_def.get(
                    "shared_location", True
                ):
                    # Pre-calculate the shared location to ensure it's the same across dimensions
                    length_pct = self._resolve_length_pct(
                        feature_def["length_pct"], self.rng
                    )
                    feature_length = max(1, int(length_pct * self.n_timesteps))
                    max_start = self.n_timesteps - feature_length
                    shared_start_idx = self.rng.randint(0, max_start + 1)
                    shared_end_idx = shared_start_idx + feature_length
                    shared_location_cache = (shared_start_idx, shared_end_idx)

                for i, dim_idx in enumerate(feature_dims):
                    # Generate feature vector - if shared_location is True and we have a cached location,
                    # pass it; otherwise pass the dimension index for unique locations
                    dim_index = (
                        None
                        if feature_def.get("shared_location", True)
                        else dim_idx
                    )
                    feature, mask = self._generate_feature_vector(
                        feature_def, dim_index, shared_location_cache
                    )

                    # Add to aggregated series for this dimension
                    aggregated[:, dim_idx] = self._add_vector_handling_nans(
                        aggregated[:, dim_idx], feature
                    )

                    # Store components
                    feature_name = (
                        f"feature_{feature_idx}_{feature_def['type']}_dim{dim_idx}"
                    )
                    if feature_name not in features_dict:
                        features_dict[feature_name] = feature
                        feature_masks_dict[feature_name] = mask

                    # Add to global feature masks
                    feature_key = f"class_{class_label}_{feature_name}"
                    if feature_key not in feature_masks:
                        feature_masks[feature_key] = np.zeros(
                            (self.n_samples, self.n_timesteps), dtype=bool
                        )

                    feature_masks[feature_key][sample_idx] = mask

            # Normalize if required (apply to each dimension separately)
            for dim_idx in range(self.n_dimensions):
                aggregated[:, dim_idx] = normalize(
                    aggregated[:, dim_idx],
                    method=self.normalization,
                    **self.normalization_kwargs,
                )

            # Store the result
            X[sample_idx] = aggregated
            y[sample_idx] = class_label

            # Store components if needed
            if return_components:
                all_components.append(
                    TimeSeriesComponents(
                        background=background,
                        features=features_dict,
                        feature_masks=feature_masks_dict,
                        aggregated=aggregated,
                    )
                )

            sample_idx += 1

    # Shuffle the dataset if requested
    if shuffle:
        # Generate shuffled indices based on the random state
        indices = np.arange(self.n_samples)
        self.rng.shuffle(indices)

        # Shuffle X and y arrays
        X = X[indices]
        y = y[indices]

        # Shuffle components if they were returned
        if return_components:
            all_components = [all_components[i] for i in indices]

        # Shuffle feature masks
        for key in feature_masks:
            feature_masks[key] = feature_masks[key][indices]

    # Convert the tensor format if needed (from channels_last to channels_first)
    if self.data_format == "channels_first":
        # Transpose from [n_samples, n_timesteps, n_dimensions] to [n_samples, n_dimensions, n_timesteps]
        X = np.transpose(X, (0, 2, 1))

    # Prepare result dictionary
    result = {
        "X": X,
        "y": y,
        "feature_masks": feature_masks,
        "metadata": {
            "n_samples": self.n_samples,
            "n_timesteps": self.n_timesteps,
            "n_dimensions": self.n_dimensions,
            "class_definitions": self.class_definitions,
            "normalize": self.normalization,
            "normalization_kwargs": self.normalization_kwargs,
            "random_state": self.random_state,
            "data_format": self.data_format,
            "shuffled": shuffle,
        },
    }

    if return_components:
        result["components"] = all_components

    return result

clone(n_timesteps: Optional[int] = None, n_samples: Optional[int] = None, n_dimensions: Optional[int] = None, normalization: Optional[str] = None, random_state: Optional[int] = None, normalization_kwargs: Optional[Dict[str, Any]] = None, feature_fill_value: Optional[Any] = None, background_fill_value: Optional[Any] = None, data_format: Optional[str] = None) -> TimeSeriesBuilder

Create a new builder with the same class definitions but different parameters.

This method creates an independent copy of the builder with all its class definitions but allows overriding specific parameters. This is particularly useful for generating train/test/validation splits with the same underlying patterns but different sample counts or random seeds.

Parameters:

Name Type Description Default
n_timesteps Optional[int]

New length of each time series. Defaults to original value.

None
n_samples Optional[int]

New number of samples to generate. Defaults to original value.

None
n_dimensions Optional[int]

New number of dimensions. Defaults to original value.

None
normalization Optional[str]

New normalization method. Defaults to original value.

None
random_state Optional[int]

New random seed for reproducibility. Defaults to original value.

None
normalization_kwargs Optional[Dict[str, Any]]

New normalization parameters. Defaults to original value.

None
feature_fill_value Optional[Any]

New value for non-existent features. Defaults to original value.

None
background_fill_value Optional[Any]

New value for background. Defaults to original value.

None
data_format Optional[str]

New data format ('channels_first' or 'channels_last'). Defaults to original value.

None

Returns:

Name Type Description
TimeSeriesBuilder TimeSeriesBuilder

A new independent builder with copied class definitions

TimeSeriesBuilder

and potentially updated parameters.

Example
# Create base builder with class definitions
base_builder = (
    TimeSeriesBuilder(n_timesteps=100, random_state=42)
    .for_class(0)
    .add_signal(random_walk(step_size=0.2))
    .for_class(1)
    .add_signal(random_walk(step_size=0.2))
    .add_feature(constant(value=1.0), start_pct=0.4, end_pct=0.6)
)

# Generate train dataset with 140 samples
train_dataset = base_builder.clone(n_samples=140, random_state=42).build()

# Generate test dataset with 60 samples and a different random seed
test_dataset = base_builder.clone(n_samples=60, random_state=43).build()
Source code in xaitimesynth/builder.py
def clone(
    self,
    n_timesteps: Optional[int] = None,
    n_samples: Optional[int] = None,
    n_dimensions: Optional[int] = None,
    normalization: Optional[str] = None,
    random_state: Optional[int] = None,
    normalization_kwargs: Optional[Dict[str, Any]] = None,
    feature_fill_value: Optional[Any] = None,
    background_fill_value: Optional[Any] = None,
    data_format: Optional[str] = None,
) -> "TimeSeriesBuilder":
    """Create a new builder with the same class definitions but different parameters.

    This method creates an independent copy of the builder with all its class
    definitions but allows overriding specific parameters. This is particularly
    useful for generating train/test/validation splits with the same underlying
    patterns but different sample counts or random seeds.

    Args:
        n_timesteps: New length of each time series. Defaults to original value.
        n_samples: New number of samples to generate. Defaults to original value.
        n_dimensions: New number of dimensions. Defaults to original value.
        normalization: New normalization method. Defaults to original value.
        random_state: New random seed for reproducibility. Defaults to original value.
        normalization_kwargs: New normalization parameters. Defaults to original value.
        feature_fill_value: New value for non-existent features. Defaults to original value.
        background_fill_value: New value for background. Defaults to original value.
        data_format: New data format ('channels_first' or 'channels_last'). Defaults to original value.

    Returns:
        TimeSeriesBuilder: A new independent builder with copied class definitions
        and potentially updated parameters.

    Example:
        ```python
        # Create base builder with class definitions
        base_builder = (
            TimeSeriesBuilder(n_timesteps=100, random_state=42)
            .for_class(0)
            .add_signal(random_walk(step_size=0.2))
            .for_class(1)
            .add_signal(random_walk(step_size=0.2))
            .add_feature(constant(value=1.0), start_pct=0.4, end_pct=0.6)
        )

        # Generate train dataset with 140 samples
        train_dataset = base_builder.clone(n_samples=140, random_state=42).build()

        # Generate test dataset with 60 samples and a different random seed
        test_dataset = base_builder.clone(n_samples=60, random_state=43).build()
        ```
    """
    # Prepare parameters with defaults from current instance when not provided
    params = {
        "n_timesteps": n_timesteps if n_timesteps is not None else self.n_timesteps,
        "n_samples": n_samples if n_samples is not None else self.n_samples,
        "n_dimensions": n_dimensions
        if n_dimensions is not None
        else self.n_dimensions,
        "normalization": normalization
        if normalization is not None
        else self.normalization,
        "random_state": random_state
        if random_state is not None
        else self.random_state,
        "normalization_kwargs": (
            normalization_kwargs
            if normalization_kwargs is not None
            else copy.deepcopy(self.normalization_kwargs)
        ),
        "feature_fill_value": feature_fill_value
        if feature_fill_value is not None
        else self.feature_fill_value,
        "background_fill_value": background_fill_value
        if background_fill_value is not None
        else self.background_fill_value,
        "data_format": data_format if data_format is not None else self.data_format,
    }
    # Create new builder with updated parameters
    new_builder = TimeSeriesBuilder(**params)

    # Copy class definitions (deep copy to ensure complete independence)
    new_builder.class_definitions = copy.deepcopy(self.class_definitions)

    # Set current class if one was selected in the original builder
    if self.current_class is not None:
        # Find the class label of the current class
        for i, class_def in enumerate(self.class_definitions):
            if class_def is self.current_class:
                new_builder.current_class = new_builder.class_definitions[i]
                break

    return new_builder

to_df(dataset: Dict[str, Any], samples: Optional[List[int]] = None, classes: Optional[List[int]] = None, components: Optional[List[str]] = None, dimensions: Optional[List[int]] = None, format_classes: bool = False) -> pd.DataFrame

Convert time series dataset to a long-format pandas DataFrame.

Creates a DataFrame with one row per timestep per component per sample per dimension, suitable for detailed analysis and visualization with libraries like Seaborn or Plotly.

Parameters:

Name Type Description Default
dataset Dict[str, Any]

Dataset dictionary returned by build().

required
samples Optional[List[int]]

List of sample indices to include. If None, includes all samples.

None
classes Optional[List[int]]

List of class labels to include. If None, includes all classes.

None
components Optional[List[str]]

List of component types to include. Default includes all: ["aggregated", "background", "features"]

None
dimensions Optional[List[int]]

List of dimension indices to include. If None, includes all dimensions.

None
format_classes bool

If True, format class labels as "Class X". Otherwise use numeric labels. Default is False.

False

Returns:

Type Description
DataFrame

pd.DataFrame: Long-format DataFrame with columns: - time: Timestep index - value: Component value at that timestep - class: Class label (formatted if format_classes=True) - sample: Sample index - component: Component type - feature: Feature name (for feature components) - dim: Dimension index

Raises:

Type Description
ValueError

If specified dimensions are out of range.

Source code in xaitimesynth/builder.py
def to_df(
    self,
    dataset: Dict[str, Any],
    samples: Optional[List[int]] = None,
    classes: Optional[List[int]] = None,
    components: Optional[List[str]] = None,
    dimensions: Optional[List[int]] = None,
    format_classes: bool = False,
) -> pd.DataFrame:
    """Convert time series dataset to a long-format pandas DataFrame.

    Creates a DataFrame with one row per timestep per component per sample per dimension,
    suitable for detailed analysis and visualization with libraries like Seaborn or Plotly.

    Args:
        dataset (Dict[str, Any]): Dataset dictionary returned by build().
        samples (Optional[List[int]]): List of sample indices to include.
            If None, includes all samples.
        classes (Optional[List[int]]): List of class labels to include.
            If None, includes all classes.
        components (Optional[List[str]]): List of component types to include.
            Default includes all: ["aggregated", "background", "features"]
        dimensions (Optional[List[int]]): List of dimension indices to include.
            If None, includes all dimensions.
        format_classes (bool): If True, format class labels as "Class X".
            Otherwise use numeric labels. Default is False.

    Returns:
        pd.DataFrame: Long-format DataFrame with columns:
            - time: Timestep index
            - value: Component value at that timestep
            - class: Class label (formatted if format_classes=True)
            - sample: Sample index
            - component: Component type
            - feature: Feature name (for feature components)
            - dim: Dimension index

    Raises:
        ValueError: If specified dimensions are out of range.
    """
    # Default components to include (use programming-friendly names)
    default_components = ["aggregated", "background", "features"]
    components_to_include = (
        components if components is not None else default_components
    )

    # Get number of dimensions from metadata or infer from data shape
    n_dims = dataset.get("metadata", {}).get("n_dimensions", 1)
    if n_dims == 1 and len(dataset["X"].shape) == 3:
        n_dims = dataset["X"].shape[2]

    # Default dimensions to include
    if dimensions is None:
        dimensions = list(range(n_dims))
    else:
        # Validate dimensions
        for d in dimensions:
            if not 0 <= d < n_dims:
                raise ValueError(
                    f"Dimension {d} is out of range (0 to {n_dims - 1})."
                )

    # Filter by class if specified
    if classes is not None:
        class_indices = np.where(np.isin(dataset["y"], classes))[0]
    else:
        class_indices = np.arange(len(dataset["y"]))

    # Filter by sample if specified
    if samples is not None:
        sample_indices = np.array(samples)
        # Ensure sample indices are within class_indices
        sample_indices = np.intersect1d(sample_indices, class_indices)
    else:
        sample_indices = class_indices

    # Initialize list to hold DataFrames
    dfs = []

    # Process aggregated time series (formerly "Complete Series")
    if "aggregated" in components_to_include:
        # Get all selected samples at once
        X_selected = dataset["X"][sample_indices]
        n_samples = len(sample_indices)
        n_timesteps = X_selected.shape[1]

        # For each dimension
        for dim_idx in dimensions:
            # Create time indices for all samples
            times = np.arange(n_timesteps)

            # Create sample indices repeated for each timestep
            sample_idx_rep = np.repeat(sample_indices, n_timesteps)
            time_idx_rep = np.tile(times, n_samples)

            # Create values array for this dimension
            if len(X_selected.shape) == 3:  # Multivariate case
                values = X_selected[:, :, dim_idx].flatten()
            else:  # Univariate case (backward compatibility)
                values = X_selected.flatten()

            # Get class labels
            classes_rep = np.repeat(dataset["y"][sample_indices], n_timesteps)
            if format_classes:
                class_labels = np.array([f"Class {c}" for c in classes_rep])
            else:
                class_labels = classes_rep

            # Create DataFrame
            df_agg = pd.DataFrame(
                {
                    "time": time_idx_rep,
                    "value": values,
                    "class": class_labels,
                    "sample": sample_idx_rep,
                    "component": "aggregated",
                    "feature": None,
                    "dim": dim_idx,
                }
            )

            dfs.append(df_agg)

    # Process components if available
    if "components" in dataset:
        for component_name in ["background"]:
            if component_name in components_to_include:
                for dim_idx in dimensions:
                    comp_data = []
                    valid_samples = []

                    # Collect data from all samples
                    for i, idx in enumerate(sample_indices):
                        comp = dataset["components"][idx]
                        if (
                            hasattr(comp, component_name)
                            and getattr(comp, component_name) is not None
                        ):
                            comp_array = getattr(comp, component_name)
                            # Check if component has dimension data
                            if (
                                len(comp_array.shape) == 2
                                and comp_array.shape[1] > dim_idx
                            ):
                                comp_data.append(comp_array[:, dim_idx])
                                valid_samples.append(idx)
                            elif len(comp_array.shape) == 1 and dim_idx == 0:
                                # Backward compatibility - 1D array for univariate case
                                comp_data.append(comp_array)
                                valid_samples.append(idx)

                    if comp_data:
                        # Stack component data
                        comp_array = np.vstack(comp_data)
                        n_valid = len(valid_samples)
                        n_timesteps = comp_array.shape[1]

                        # Create indices
                        sample_idx_rep = np.repeat(valid_samples, n_timesteps)
                        time_idx_rep = np.tile(np.arange(n_timesteps), n_valid)

                        # Get class labels
                        classes_rep = np.repeat(
                            dataset["y"][valid_samples], n_timesteps
                        )
                        if format_classes:
                            class_labels = np.array(
                                [f"Class {c}" for c in classes_rep]
                            )
                        else:
                            class_labels = classes_rep

                        # Create DataFrame
                        df_comp = pd.DataFrame(
                            {
                                "time": time_idx_rep,
                                "value": comp_array.flatten(),
                                "class": class_labels,
                                "sample": sample_idx_rep,
                                "component": component_name,
                                "feature": None,
                                "dim": dim_idx,
                            }
                        )

                        dfs.append(df_comp)

        # Process features - features need special handling since they're stored in a dict
        if "features" in components_to_include:
            feature_dfs = []

            for idx in sample_indices:
                comp = dataset["components"][idx]
                if hasattr(comp, "features") and comp.features:
                    for feature_name, feature_values in comp.features.items():
                        # Extract dimension from feature name (if present)
                        if "_dim" in feature_name:
                            parts = feature_name.split("_dim")
                            dim_idx = int(parts[-1])
                            if dim_idx not in dimensions:
                                continue
                        else:
                            # For backward compatibility, assume dimension 0
                            dim_idx = 0
                            if dim_idx not in dimensions:
                                continue

                        # Get class label
                        class_label = dataset["y"][idx]
                        if format_classes:
                            class_str = f"Class {class_label}"
                        else:
                            class_str = class_label

                        # Create feature DataFrame
                        df_feature = pd.DataFrame(
                            {
                                "time": np.arange(len(feature_values)),
                                "value": feature_values,
                                "class": class_str,
                                "sample": idx,
                                "component": "features",
                                "feature": feature_name,
                                "dim": dim_idx,
                            }
                        )

                        feature_dfs.append(df_feature)

            if feature_dfs:
                dfs.append(pd.concat(feature_dfs, ignore_index=True))

    # Combine all DataFrames
    if not dfs:
        return pd.DataFrame()

    df = pd.concat(dfs, ignore_index=True)

    # Set up categorical variables for ordered plotting
    components_present = [
        c for c in components_to_include if c in df["component"].unique()
    ]
    df["component"] = pd.Categorical(
        df["component"], categories=components_present, ordered=True
    )

    if format_classes:
        class_labels = sorted(
            df["class"].unique(), key=lambda x: int(x.split()[-1])
        )
        df["class"] = pd.Categorical(
            df["class"], categories=class_labels, ordered=True
        )

    return df

convert_data_format(dataset: Dict[str, Any], target_format: str) -> Dict[str, Any] staticmethod

Convert an existing dataset between 'channels_first' and 'channels_last' formats.

This utility function helps convert datasets between the two supported tensor layouts: - 'channels_last': [batch_size, time_steps, channels] (original XAITimeSynth format) - 'channels_first': [batch_size, channels, time_steps] (PyTorch/tsai format)

Parameters:

Name Type Description Default
dataset Dict[str, Any]

Dataset dictionary returned by build().

required
target_format str

Target format, either 'channels_first' or 'channels_last'.

required

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: Dataset with X tensor in the target format. The metadata is updated to reflect the new format.

Raises:

Type Description
ValueError

If target_format is not one of ['channels_first', 'channels_last'].

ValueError

If dataset doesn't contain a metadata entry with data_format.

Source code in xaitimesynth/builder.py
@staticmethod
def convert_data_format(
    dataset: Dict[str, Any], target_format: str
) -> Dict[str, Any]:
    """Convert an existing dataset between 'channels_first' and 'channels_last' formats.

    This utility function helps convert datasets between the two supported tensor layouts:
    - 'channels_last': [batch_size, time_steps, channels] (original XAITimeSynth format)
    - 'channels_first': [batch_size, channels, time_steps] (PyTorch/tsai format)

    Args:
        dataset (Dict[str, Any]): Dataset dictionary returned by build().
        target_format (str): Target format, either 'channels_first' or 'channels_last'.

    Returns:
        Dict[str, Any]: Dataset with X tensor in the target format. The metadata
            is updated to reflect the new format.

    Raises:
        ValueError: If target_format is not one of ['channels_first', 'channels_last'].
        ValueError: If dataset doesn't contain a metadata entry with data_format.
    """
    # Validate format
    if target_format not in ["channels_first", "channels_last"]:
        raise ValueError(
            "target_format must be one of ['channels_first', 'channels_last']"
        )

    # Create a shallow copy of the dataset
    result = dataset.copy()

    # Get current format from metadata
    if "metadata" not in dataset or "data_format" not in dataset["metadata"]:
        # Try to infer format
        if "X" in dataset and len(dataset["X"].shape) == 3:
            # Assume original format for backward compatibility
            current_format = "channels_last"
        else:
            raise ValueError("Dataset doesn't have format information in metadata")
    else:
        current_format = dataset["metadata"]["data_format"]

    # If already in target format, return dataset as-is
    if current_format == target_format:
        return result

    # Convert format by transposing the data
    if "X" in result:
        # Convert from channels_last to channels_first
        if current_format == "channels_last" and target_format == "channels_first":
            result["X"] = np.transpose(result["X"], (0, 2, 1))
        # Convert from channels_first to channels_last
        elif (
            current_format == "channels_first" and target_format == "channels_last"
        ):
            result["X"] = np.transpose(result["X"], (0, 2, 1))

        # Also convert train/test splits if they exist
        if "X_train" in result:
            if (
                current_format == "channels_last"
                and target_format == "channels_first"
            ):
                result["X_train"] = np.transpose(result["X_train"], (0, 2, 1))
            else:
                result["X_train"] = np.transpose(result["X_train"], (0, 2, 1))

        if "X_test" in result:
            if (
                current_format == "channels_last"
                and target_format == "channels_first"
            ):
                result["X_test"] = np.transpose(result["X_test"], (0, 2, 1))
            else:
                result["X_test"] = np.transpose(result["X_test"], (0, 2, 1))

    # Update metadata
    if "metadata" in result:
        result["metadata"] = result["metadata"].copy()
        result["metadata"]["data_format"] = target_format

    return result