Builder¶

The TimeSeriesBuilder class provides a fluent API for constructing synthetic time series datasets.

TimeSeriesBuilder¶

`TimeSeriesBuilder` ¶

Builder for synthetic time series datasets with known ground truth for XAI.

This class provides a fluent API for building synthetic time series datasets with known ground truth features for explainable AI (XAI) evaluation.

The builder creates time series by combining multiple components: - Background: The base structure of the time series (e.g., random walk, gaussian noise) - Features: Discriminative patterns for class separation (e.g., peaks, level changes, ...)

Terminology: - "Signals" are background components added with add_signal(), stored in background - Features are components that distinguish between classes, added with add_feature()

Component flexibility: - Component generators are not strictly limited to their registered role - A signal generator could be used as a feature or vice versa - Features can be localized in time or span the entire series - It's up to the user to ensure features actually create meaningful class differences

Key capabilities: - Univariate and multivariate time series generation - Control over feature positions and randomness - Support for shared patterns across dimensions - Training/test splits with consistent class distributions - Built-in visualization and conversion utilities

Advanced usage: - Components can be configured with various parameters - Features can be positioned at fixed or random locations - For multivariate series, components can target specific dimensions - Shared randomness and locations can be controlled across dimensions

When components are not registered, the builder uses default fill values: - Features: NaN where the feature doesn't exist - Background: zeros where no background component exists

Attributes:

Name	Type	Description
`n_timesteps`	`int`	Length of each time series.
`n_samples`	`int`	Total number of samples to generate.
`n_dimensions`	`int`	Number of dimensions in each time series.
`normalization`	`str`	Normalization method for the final time series.
`normalization_kwargs`	`dict`	Additional parameters for normalization.
`random_state`	`int`	Random seed for reproducibility.
`rng`	`RandomState`	Random number generator.
`feature_fill_value`		Value used for non-existent features (default: np.nan).
`background_fill_value`		Value used for background when none exists (default: 0.0).
`class_definitions`	`list`	List of class definitions with components.
`current_class`	`dict`	Current class being configured.
`data_format`	`str`	Format of the output tensor data. Either 'channels_last' corresponding to shape [batch, time_steps, channels] or 'channels_first' corresponding to shape [batch, channels, time_steps]. Default is 'channels_first'.

Source code in xaitimesynth/builder.py

class TimeSeriesBuilder:
    """Builder for synthetic time series datasets with known ground truth for XAI.

    This class provides a fluent API for building synthetic time series datasets with
    known ground truth features for explainable AI (XAI) evaluation.

    The builder creates time series by combining multiple components:
    - Background: The base structure of the time series (e.g., random walk, gaussian noise)
    - Features: Discriminative patterns for class separation (e.g., peaks, level changes, ...)

    Terminology:
    - "Signals" are background components added with add_signal(), stored in background
    - Features are components that distinguish between classes, added with add_feature()

    Component flexibility:
    - Component generators are not strictly limited to their registered role
    - A signal generator could be used as a feature or vice versa
    - Features can be localized in time or span the entire series
    - It's up to the user to ensure features actually create meaningful class differences

    Key capabilities:
    - Univariate and multivariate time series generation
    - Control over feature positions and randomness
    - Support for shared patterns across dimensions
    - Training/test splits with consistent class distributions
    - Built-in visualization and conversion utilities

    Advanced usage:
    - Components can be configured with various parameters
    - Features can be positioned at fixed or random locations
    - For multivariate series, components can target specific dimensions
    - Shared randomness and locations can be controlled across dimensions

    When components are not registered, the builder uses default fill values:
    - Features: NaN where the feature doesn't exist
    - Background: zeros where no background component exists

    Attributes:
        n_timesteps (int): Length of each time series.
        n_samples (int): Total number of samples to generate.
        n_dimensions (int): Number of dimensions in each time series.
        normalization (str): Normalization method for the final time series.
        normalization_kwargs (dict): Additional parameters for normalization.
        random_state (int): Random seed for reproducibility.
        rng (np.random.RandomState): Random number generator.
        feature_fill_value: Value used for non-existent features (default: np.nan).
        background_fill_value: Value used for background when none exists (default: 0.0).
        class_definitions (list): List of class definitions with components.
        current_class (dict): Current class being configured.
        data_format (str): Format of the output tensor data. Either 'channels_last'
            corresponding to shape [batch, time_steps, channels] or 'channels_first'
            corresponding to shape [batch, channels, time_steps]. Default is 'channels_first'.
    """

    def __init__(
        self,
        n_timesteps: int = 100,
        n_samples: int = 1000,
        n_dimensions: int = 1,
        normalization: str = "zscore",
        random_state: Optional[int] = None,
        normalization_kwargs: Optional[Dict[str, Any]] = {},
        feature_fill_value: Any = np.nan,
        background_fill_value: Any = 0.0,
        data_format: str = "channels_first",
    ):
        """Initialize the time series builder.

        Args:
            n_timesteps (int): Length of each time series. Default is 100.
            n_samples (int): Total number of samples to generate. Default is 1000.
            n_dimensions (int): Number of dimensions for multivariate time series. Default is 1 (univariate).
            normalization (str): Normalization method for the final time series.
                Options: "zscore" (standardization), "minmax" (scale to 0-1), or "none". Default is "zscore".
            random_state (int, optional): Seed for random number generation to ensure reproducibility.
            normalization_kwargs (dict, optional): Additional parameters for normalization methods.
                For "minmax": can specify "feature_range" as tuple (min, max).
            feature_fill_value: Value used for non-existent features. Default is np.nan.
                Using NaN makes features only appear where they're defined in visualizations.
            background_fill_value: Value used for background when none exists. Default is 0.0.
                Background typically affects the entire time series, so zeros represent
                "no contribution" rather than "doesn't exist".
            data_format (str): Format of the output tensor data.
                'channels_last': [batch, time_steps, channels] (original XAITimeSynth format)
                'channels_first': [batch, channels, time_steps] (PyTorch/tsai format)
                Default is 'channels_first'.

        Raises:
            ValueError: If n_dimensions is less than 1.
            ValueError: If data_format is not one of ['channels_first', 'channels_last']
        """
        self.n_timesteps = n_timesteps
        self.n_samples = n_samples
        self.n_dimensions = n_dimensions

        # Validate n_dimensions
        if n_dimensions < 1:
            raise ValueError("n_dimensions must be at least 1")

        # Validate data_format
        if data_format not in ["channels_first", "channels_last"]:
            raise ValueError(
                "data_format must be one of ['channels_first', 'channels_last']"
            )
        self.data_format = data_format

        self.normalization = normalization
        self.normalization_kwargs = normalization_kwargs or {}
        self.random_state = random_state
        self.rng = np.random.RandomState(random_state)
        self.feature_fill_value = feature_fill_value
        self.background_fill_value = background_fill_value

        # Initialize class definitions and the current class
        self.class_definitions = []
        self.current_class = None

    def for_class(self, class_label: int, weight: float = 1.0) -> "TimeSeriesBuilder":
        """Set the current class for component assignment.

        Creates a new class definition and makes it the target for subsequent component additions.
        Multiple calls create multiple classes for classification tasks.

        Args:
            class_label (int): Integer label for the class, used as the target value.
            weight (float): Relative weight of this class in the dataset. Controls the
                class distribution in the generated dataset. Default is 1.0.

        Returns:
            TimeSeriesBuilder: Self for method chaining.
        """
        # Create a new class definition
        class_def = {
            "label": class_label,
            "weight": weight,
            "components": {"background": [], "features": []},
        }

        self.class_definitions.append(class_def)
        self.current_class = class_def

        return self

    def _validate_dimensions(self, dimensions: List[int]) -> None:
        """Validate dimension indices against n_dimensions.

        Ensures all provided dimension indices are within valid range for the configured
        number of dimensions in the builder.

        Args:
            dimensions (List[int]): List of dimension indices to validate.

        Raises:
            ValueError: If any dimension index is out of range (0 to n_dimensions-1).
        """
        for d in dimensions:
            if not 0 <= d < self.n_dimensions:
                raise ValueError(
                    f"Dimension {d} is out of range. "
                    f"Valid dimensions are 0 to {self.n_dimensions - 1}."
                )

    def add_signal(
        self,
        component: Dict[str, Any],
        dim: Optional[List[int]] = None,
        shared_randomness: bool = False,
        start_pct: Optional[float] = None,
        end_pct: Optional[float] = None,
        length_pct: Optional[float] = None,
        random_location: bool = False,
        shared_location: bool = True,
    ) -> "TimeSeriesBuilder":
        """Add a signal component to the current class.

        Signals form the background structure of the time series (e.g., random walks,
        gaussian noise, trends). All signals are added to the background component.

        Default behavior: When no location parameters are specified (start_pct, end_pct,
        length_pct all None and random_location=False), the signal spans the entire time
        series length.

        Segment mode: To apply a signal to only part of the time series, either:
        - Specify start_pct and end_pct for a fixed segment, or
        - Set random_location=True with length_pct for a randomly positioned segment.

        Args:
            component (Dict[str, Any]): Component definition dictionary with 'type' and parameters.
            dim (List[int]): List of dimension indices where this signal should be applied.
                If None, the signal will be added to all dimensions. Default is None.
            shared_randomness (bool): If True, the same random pattern will be used across all
                specified dimensions. If False, each dimension gets its own random pattern
                (for stochastic components). Default is False.
            start_pct (float, optional): Start position as percentage of time series length (0-1).
                Required together with end_pct for a fixed segment.
            end_pct (float, optional): End position as percentage of time series length (0-1).
                Required together with start_pct for a fixed segment.
            length_pct (float, optional): Length of signal as percentage of time series length (0-1).
                Required when random_location is True.
            random_location (bool): Whether to place the signal at a random location.
                Requires length_pct. Default is False.
            shared_location (bool): If True and random_location is True, the same random
                location will be used across all dimensions. If False, each dimension gets
                its own random location. Default is True.

        Returns:
            TimeSeriesBuilder: Self for method chaining.

        Raises:
            ValueError: If no class is selected or if location parameters are inconsistent.

        Examples:
            # Full time series (default - no location params)
            builder.add_signal(gaussian_noise(sigma=0.1))

            # Fixed segment from 20% to 50% of the series
            builder.add_signal(constant(value=1.0), start_pct=0.2, end_pct=0.5)

            # Random segment of 30% length
            builder.add_signal(constant(value=1.0), random_location=True, length_pct=0.3)
        """
        if self.current_class is None:
            raise ValueError("No class selected. Call for_class() first.")

        if dim is None:
            dim = list(range(self.n_dimensions))
        self._validate_dimensions(dim)

        # Determine if this is a segment or full-series signal
        has_time_range = (
            start_pct is not None
            or end_pct is not None
            or length_pct is not None
            or random_location
        )

        # Validate location parameters based on mode
        if has_time_range:
            if random_location:
                if length_pct is None:
                    raise ValueError(
                        "length_pct must be provided when random_location is True"
                    )
                if not (0 < length_pct <= 1):
                    raise ValueError("length_pct must be between 0 and 1")
            else:
                # Fixed segment mode - requires both start_pct and end_pct
                if start_pct is None or end_pct is None:
                    raise ValueError(
                        "Both start_pct and end_pct must be provided for a fixed segment"
                    )
                if not (
                    0 <= start_pct < 1 and 0 < end_pct <= 1 and start_pct < end_pct
                ):
                    raise ValueError(
                        "Invalid start_pct or end_pct. Must be between 0 and 1, "
                        "with start_pct < end_pct"
                    )

        # Build the component definition
        component_with_params = component.copy()

        if has_time_range:
            if random_location:
                component_with_params["random_location"] = True
                component_with_params["length_pct"] = length_pct
                component_with_params["shared_location"] = shared_location
            else:
                component_with_params["random_location"] = False
                component_with_params["start_pct"] = start_pct
                component_with_params["end_pct"] = end_pct

        # Add dimensions and randomness settings
        # Use single component when sharing location/randomness or single dimension
        if (
            (has_time_range and shared_location and random_location)
            or shared_randomness
            or len(dim) == 1
        ):
            component_with_params["dimensions"] = dim
            component_with_params["shared_randomness"] = shared_randomness
            component_with_params["shared_location"] = shared_location
            self.current_class["components"]["background"].append(component_with_params)
        else:
            # Create separate component entries for each dimension
            for d in dim:
                component_with_dim = component_with_params.copy()
                component_with_dim["dimensions"] = [d]
                component_with_dim["shared_randomness"] = shared_randomness
                component_with_dim["shared_location"] = shared_location
                self.current_class["components"]["background"].append(
                    component_with_dim
                )

        return self

    def add_feature(
        self,
        component: Dict[str, Any],
        start_pct: Optional[float] = None,
        end_pct: Optional[float] = None,
        length_pct: Optional[Union[float, Tuple[float, float], List[float]]] = None,
        random_location: bool = False,
        dim: Optional[List[int]] = None,
        shared_location: bool = True,
        shared_randomness: bool = False,
    ) -> "TimeSeriesBuilder":
        """Add a feature component to the current class.

        Features are distinctive patterns that can differentiate between classes.
        They can be placed at fixed or random locations within the time series.

        Args:
            component (Dict[str, Any]): Component definition dictionary with 'type' and parameters.
            start_pct (float, optional): Start position as percentage of time series length (0-1).
                Required when random_location is False.
            end_pct (float, optional): End position as percentage of time series length (0-1).
                Required when random_location is False.
            length_pct (float | tuple | list, optional): Length of feature as percentage of time
                series length. Required when random_location is True. Three forms accepted:
                - float: fixed length, e.g. ``0.5``
                - tuple (min, max): sample uniformly per sample in range, e.g. ``(0.25, 0.75)``
                - list of floats: sample from discrete choices per sample, e.g. ``[0.25, 0.5]``
            random_location (bool): Whether to place the feature at a random location.
                Default is False (fixed position).
            dim (List[int]): List of dimension indices where this feature should be applied.
                If None, the feature will be added to all dimensions. Default is None.
            shared_location (bool): If True and random_location is True, the same random
                location will be used across all dimensions. If False, each dimension gets
                its own random location. Default is True.
            shared_randomness (bool): If True, the same random pattern will be used across
                all dimensions. If False, each dimension gets its own random pattern
                (for stochastic components). Default is False.

        Returns:
            TimeSeriesBuilder: Self for method chaining.

        Raises:
            ValueError: If no class is selected or if location parameters are invalid.
        """
        if self.current_class is None:
            raise ValueError("No class selected. Call for_class() first.")

        if dim is None:
            dim = list(range(self.n_dimensions))
        self._validate_dimensions(dim)

        # Create feature definition
        feature_def = component.copy()

        # Add location parameters
        if random_location:
            if length_pct is None:
                raise ValueError(
                    "length_pct must be provided when random_location is True"
                )
            if isinstance(length_pct, tuple):
                if len(length_pct) != 2 or not (0 < length_pct[0] < length_pct[1] <= 1):
                    raise ValueError(
                        "length_pct tuple must be (min, max) with 0 < min < max <= 1"
                    )
            elif isinstance(length_pct, list):
                if not length_pct or not all(0 < v <= 1 for v in length_pct):
                    raise ValueError(
                        "length_pct list must be non-empty with all values in (0, 1]"
                    )
            else:
                if not (0 < length_pct <= 1):
                    raise ValueError("length_pct must be between 0 and 1")

            feature_def["random_location"] = True
            feature_def["length_pct"] = length_pct
        else:
            if start_pct is None or end_pct is None:
                raise ValueError(
                    "start_pct and end_pct must be provided when random_location is False"
                )
            if not (0 <= start_pct < 1 and 0 < end_pct <= 1 and start_pct < end_pct):
                raise ValueError(
                    "Invalid start_pct or end_pct. Must be between 0 and 1, with start_pct < end_pct"
                )

            feature_def["random_location"] = False
            feature_def["start_pct"] = start_pct
            feature_def["end_pct"] = end_pct

        # Add to feature collection, ensuring the shared location logic is properly observed
        if shared_location and random_location or shared_randomness or len(dim) == 1:
            feature_def["dimensions"] = dim
            feature_def["shared_location"] = shared_location
            feature_def["shared_randomness"] = shared_randomness
            self.current_class["components"]["features"].append(feature_def)
        else:
            # Create separate feature entries for each dimension when not sharing
            for d in dim:
                feature_single_dim = feature_def.copy()
                feature_single_dim["dimensions"] = [d]  # Single dimension
                feature_single_dim["shared_location"] = shared_location
                feature_single_dim["shared_randomness"] = shared_randomness
                self.current_class["components"]["features"].append(feature_single_dim)

        return self

    def _generate_component_vector(
        self, component_def: Dict[str, Any], feature_length: Optional[int] = None
    ) -> np.ndarray:
        """Generate a component vector based on its definition.

        Calls the appropriate component generator based on the component type
        and parameters specified in the definition.

        Args:
            component_def (Dict[str, Any]): Component definition dictionary with 'type'
                and parameters for the generator.
            feature_length (Optional[int]): Length of the feature in timesteps.
                Only used for feature components.

        Returns:
            np.ndarray: Generated component vector with specified pattern.
        """
        component_type = component_def["type"]
        component_params = component_def.copy()
        component_params.pop("type")

        # Remove dimension information if present
        component_params.pop("dimensions", None)
        component_params.pop("shared_location", None)
        component_params.pop("shared_randomness", None)

        # If it's a feature, add the feature_length parameter
        if feature_length is not None:
            component_params["length"] = feature_length

        return generate_component(
            component_type, self.n_timesteps, self.rng, **component_params
        )

    def _resolve_length_pct(
        self,
        raw: Union[float, Tuple[float, float], List[float]],
        rng: np.random.RandomState,
    ) -> float:
        """Resolve a length_pct specification to a concrete float for one sample.

        Args:
            raw: Either a fixed float, a (min, max) tuple for uniform sampling, or a list
                of floats for discrete choice sampling.
            rng: Random number generator used for sampling.

        Returns:
            float: Resolved length as a fraction of the series length.
        """
        if isinstance(raw, tuple):
            return rng.uniform(raw[0], raw[1])
        elif isinstance(raw, list):
            return raw[rng.randint(0, len(raw))]
        return raw

    def _generate_feature_vector(
        self,
        feature_def: Dict[str, Any],
        dim_index: Optional[int] = None,
        shared_location_cache: Optional[Tuple[int, int]] = None,
    ) -> Tuple[np.ndarray, np.ndarray]:
        """Generate a feature vector and its corresponding mask.

        Creates a feature at the specified location (fixed or random) and returns
        both the vector and a boolean mask indicating the feature's position.

        Args:
            feature_def (Dict[str, Any]): Feature definition dictionary with 'type',
                location parameters, and generator parameters.
            dim_index (Optional[int]): The index in the dimensions list to use for location
                determination. Only used when shared_location is False.
            shared_location_cache (Optional[Tuple[int, int]]): Pre-calculated start and end
                indices for a shared location. Used to ensure consistency across dimensions.

        Returns:
            Tuple[np.ndarray, np.ndarray]: Tuple containing:
                - Feature vector with specified pattern at the determined location
                - Boolean mask indicating the feature's position (True where feature exists)
        """
        # Initialize with feature with fill value
        feature = np.full(self.n_timesteps, self.feature_fill_value)
        mask = np.zeros(self.n_timesteps, dtype=bool)

        # Determine feature location
        if feature_def["random_location"]:
            if shared_location_cache is not None:
                # Use the cached shared location
                start_idx, end_idx = shared_location_cache
            else:
                length_pct = self._resolve_length_pct(
                    feature_def["length_pct"], self.rng
                )
                feature_length = max(1, int(length_pct * self.n_timesteps))

                # Generate random start position
                # If dim_index is provided and shared_location is False, use different
                # random locations for each dimension
                if dim_index is not None and not feature_def["shared_location"]:
                    # Use dim_index to get a different random seed for each dimension
                    dim_rng = np.random.RandomState(self.rng.randint(0, 2**32 - 1))
                    max_start = self.n_timesteps - feature_length
                    start_idx = dim_rng.randint(0, max_start + 1)
                else:
                    max_start = self.n_timesteps - feature_length
                    start_idx = self.rng.randint(0, max_start + 1)

                end_idx = start_idx + feature_length
        else:
            start_pct = feature_def["start_pct"]
            end_pct = feature_def["end_pct"]

            start_idx = int(start_pct * self.n_timesteps)
            end_idx = int(end_pct * self.n_timesteps)

            # Ensure at least one timestep is selected
            if start_idx == end_idx:
                end_idx = start_idx + 1

        # Mark the feature region
        mask[start_idx:end_idx] = True

        # Generate the feature vector
        feature_params = feature_def.copy()
        feature_type = feature_params.pop("type")

        # Remove location parameters
        feature_params.pop("random_location", None)
        feature_params.pop("start_pct", None)
        feature_params.pop("end_pct", None)
        feature_params.pop("length_pct", None)
        feature_params.pop("dimensions", None)
        feature_params.pop("shared_location", None)
        feature_params.pop("shared_randomness", None)

        # Generate the component for the feature length
        feature_length = end_idx - start_idx
        feature_values = generate_component(
            feature_type,
            self.n_timesteps,
            self.rng,
            length=feature_length,
            **feature_params,
        )

        # Place the feature in the correct location
        feature[start_idx:end_idx] = feature_values

        return feature, mask

    def build(
        self,
        return_components: bool = True,
        deterministic_class_counts: bool = True,
        shuffle: bool = True,
    ) -> Dict[str, Any]:
        """Build the dataset based on the configured class definitions.

        Generates time series data by combining all components for each class according
        to the specified parameters, with options to include component vectors and
        create a train/test split.

        Args:
            return_components (bool): Whether to return the individual component vectors.
                Useful for visualization and analysis. Default is True.
            deterministic_class_counts (bool): If True, class counts will be determined exactly
                by the weights rather than using multinomial sampling. This ensures exact class
                proportions. Default is True.
            shuffle (bool): Whether to shuffle the samples across classes. If True (default),
                samples will be randomly ordered. If False, samples will be grouped by class
                in the order classes were defined.

        Returns:
            Dict[str, Any]: Dictionary containing the generated dataset with keys:
                - 'X': Time series data with shape determined by data_format:
                       - 'channels_last': [n_samples, n_timesteps, n_dimensions]
                       - 'channels_first': [n_samples, n_dimensions, n_timesteps]
                - 'y': Class labels for each sample
                - 'feature_masks': Boolean masks showing feature locations
                - 'metadata': Dataset configuration information
                - 'components': Individual component vectors (if return_components=True)
                If train_test_split is provided, also includes:
                - 'X_train', 'y_train': Training data
                - 'X_test', 'y_test': Testing data

        Raises:
            ValueError: If no class definitions have been provided.
        """
        if not self.class_definitions:
            raise ValueError(
                "No class definitions provided. Call for_class() at least once."
            )

        # Normalize class weights and determine class distribution
        weights = np.array([cd["weight"] for cd in self.class_definitions])
        weights = weights / weights.sum()

        if deterministic_class_counts:
            # Deterministic class counts based on exact weights
            raw_counts = weights * self.n_samples
            # Round to integers and ensure we have exactly n_samples total
            class_counts = np.floor(raw_counts).astype(int)
            remaining = self.n_samples - class_counts.sum()
            # Distribute remaining samples based on fractional parts
            if remaining > 0:
                fractions = raw_counts - class_counts
                indices = np.argsort(fractions)[-remaining:]
                for idx in indices:
                    class_counts[idx] += 1
        else:
            # Probabilistic class counts using multinomial sampling
            class_counts = self.rng.multinomial(self.n_samples, weights)

        # Initialize arrays - always create in channels_last format first (internal format)
        X = np.zeros((self.n_samples, self.n_timesteps, self.n_dimensions))
        y = np.zeros(self.n_samples, dtype=int)
        all_components = []
        feature_masks = {}

        # Generate data for each class
        sample_idx = 0
        for class_def, count in zip(self.class_definitions, class_counts):
            class_label = class_def["label"]

            for _ in range(count):
                # Initialize arrays for this sample with appropriate fill values per dimension
                background = np.full(
                    (self.n_timesteps, self.n_dimensions), self.background_fill_value
                )
                features_dict = {}
                feature_masks_dict = {}

                # Add base structure components
                for base_def in class_def["components"]["background"]:
                    # For signals with time range parameters, generate random location once if shared
                    if "random_location" in base_def and base_def["random_location"]:
                        # Determine signal length
                        length_pct = base_def["length_pct"]
                        signal_length = max(1, int(length_pct * self.n_timesteps))
                        max_start = self.n_timesteps - signal_length

                        # If shared_location is True, generate the location once for all dimensions
                        shared_location = base_def.get("shared_location", True)
                        if shared_location:
                            shared_start_idx = self.rng.randint(0, max_start + 1)
                            shared_end_idx = shared_start_idx + signal_length

                        # Apply to specified dimensions with appropriate location handling
                        for i, dim_idx in enumerate(base_def["dimensions"]):
                            # Create a full-length vector filled with the background fill value
                            base_vector = np.full(
                                self.n_timesteps, self.background_fill_value
                            )

                            # Determine signal location - possibly unique per dimension
                            if shared_location:
                                # Use the shared location for all dimensions
                                start_idx = shared_start_idx
                                end_idx = shared_end_idx
                            else:
                                # Create a unique location for each dimension
                                dim_rng = np.random.RandomState(
                                    self.rng.randint(0, 2**32 - 1)
                                )
                                start_idx = dim_rng.randint(0, max_start + 1)
                                end_idx = start_idx + signal_length

                            # Calculate the actual length of the signal segment
                            signal_length = end_idx - start_idx

                            # Prepare parameters for component generation
                            signal_params = base_def.copy()
                            signal_type = signal_params.pop("type")

                            # Remove location and dimension parameters
                            signal_params.pop("random_location", None)
                            signal_params.pop("length_pct", None)
                            signal_params.pop("shared_location", None)
                            signal_params.pop("dimensions", None)
                            signal_params.pop("shared_randomness", None)

                            # Generate the component only for the specified length
                            signal_values = generate_component(
                                signal_type, signal_length, self.rng, **signal_params
                            )

                            # Place the signal in the correct location
                            base_vector[start_idx:end_idx] = signal_values

                            # Add to background for this dimension
                            background[:, dim_idx] = self._add_vector_handling_nans(
                                background[:, dim_idx], base_vector
                            )
                    else:
                        # Handle non-random location signals (the original behavior)
                        if "random_location" in base_def:
                            # Fixed location signal
                            base_vector = np.full(
                                self.n_timesteps, self.background_fill_value
                            )

                            start_pct = base_def["start_pct"]
                            end_pct = base_def["end_pct"]
                            start_idx = int(start_pct * self.n_timesteps)
                            end_idx = int(end_pct * self.n_timesteps)

                            # Ensure at least one timestep is selected
                            if start_idx == end_idx:
                                end_idx = start_idx + 1

                            signal_length = end_idx - start_idx

                            # Generate the component only for the specified length
                            signal_params = base_def.copy()
                            signal_type = signal_params.pop("type")

                            # Remove location parameters
                            signal_params.pop("random_location", None)
                            signal_params.pop("start_pct", None)
                            signal_params.pop("end_pct", None)
                            signal_params.pop("dimensions", None)
                            signal_params.pop("shared_randomness", None)

                            signal_values = generate_component(
                                signal_type, signal_length, self.rng, **signal_params
                            )

                            base_vector[start_idx:end_idx] = signal_values
                        else:
                            # Full-length signal (original behavior)
                            base_vector = self._generate_component_vector(base_def)

                        # Apply to all specified dimensions with the same signal
                        for dim_idx in base_def["dimensions"]:
                            background[:, dim_idx] = self._add_vector_handling_nans(
                                background[:, dim_idx], base_vector
                            )

                # Initialize aggregated time series
                aggregated = background.copy()

                # Add features
                for feature_idx, feature_def in enumerate(
                    class_def["components"]["features"]
                ):
                    # For each dimension in the feature
                    feature_dims = feature_def["dimensions"]

                    # Generate a shared random location once if needed
                    shared_location_cache = None
                    if feature_def.get("random_location", False) and feature_def.get(
                        "shared_location", True
                    ):
                        # Pre-calculate the shared location to ensure it's the same across dimensions
                        length_pct = self._resolve_length_pct(
                            feature_def["length_pct"], self.rng
                        )
                        feature_length = max(1, int(length_pct * self.n_timesteps))
                        max_start = self.n_timesteps - feature_length
                        shared_start_idx = self.rng.randint(0, max_start + 1)
                        shared_end_idx = shared_start_idx + feature_length
                        shared_location_cache = (shared_start_idx, shared_end_idx)

                    for i, dim_idx in enumerate(feature_dims):
                        # Generate feature vector - if shared_location is True and we have a cached location,
                        # pass it; otherwise pass the dimension index for unique locations
                        dim_index = (
                            None
                            if feature_def.get("shared_location", True)
                            else dim_idx
                        )
                        feature, mask = self._generate_feature_vector(
                            feature_def, dim_index, shared_location_cache
                        )

                        # Add to aggregated series for this dimension
                        aggregated[:, dim_idx] = self._add_vector_handling_nans(
                            aggregated[:, dim_idx], feature
                        )

                        # Store components
                        feature_name = (
                            f"feature_{feature_idx}_{feature_def['type']}_dim{dim_idx}"
                        )
                        if feature_name not in features_dict:
                            features_dict[feature_name] = feature
                            feature_masks_dict[feature_name] = mask

                        # Add to global feature masks
                        feature_key = f"class_{class_label}_{feature_name}"
                        if feature_key not in feature_masks:
                            feature_masks[feature_key] = np.zeros(
                                (self.n_samples, self.n_timesteps), dtype=bool
                            )

                        feature_masks[feature_key][sample_idx] = mask

                # Normalize if required (apply to each dimension separately)
                for dim_idx in range(self.n_dimensions):
                    aggregated[:, dim_idx] = normalize(
                        aggregated[:, dim_idx],
                        method=self.normalization,
                        **self.normalization_kwargs,
                    )

                # Store the result
                X[sample_idx] = aggregated
                y[sample_idx] = class_label

                # Store components if needed
                if return_components:
                    all_components.append(
                        TimeSeriesComponents(
                            background=background,
                            features=features_dict,
                            feature_masks=feature_masks_dict,
                            aggregated=aggregated,
                        )
                    )

                sample_idx += 1

        # Shuffle the dataset if requested
        if shuffle:
            # Generate shuffled indices based on the random state
            indices = np.arange(self.n_samples)
            self.rng.shuffle(indices)

            # Shuffle X and y arrays
            X = X[indices]
            y = y[indices]

            # Shuffle components if they were returned
            if return_components:
                all_components = [all_components[i] for i in indices]

            # Shuffle feature masks
            for key in feature_masks:
                feature_masks[key] = feature_masks[key][indices]

        # Convert the tensor format if needed (from channels_last to channels_first)
        if self.data_format == "channels_first":
            # Transpose from [n_samples, n_timesteps, n_dimensions] to [n_samples, n_dimensions, n_timesteps]
            X = np.transpose(X, (0, 2, 1))

        # Prepare result dictionary
        result = {
            "X": X,
            "y": y,
            "feature_masks": feature_masks,
            "metadata": {
                "n_samples": self.n_samples,
                "n_timesteps": self.n_timesteps,
                "n_dimensions": self.n_dimensions,
                "class_definitions": self.class_definitions,
                "normalize": self.normalization,
                "normalization_kwargs": self.normalization_kwargs,
                "random_state": self.random_state,
                "data_format": self.data_format,
                "shuffled": shuffle,
            },
        }

        if return_components:
            result["components"] = all_components

        return result

    def to_df(
        self,
        dataset: Dict[str, Any],
        samples: Optional[List[int]] = None,
        classes: Optional[List[int]] = None,
        components: Optional[List[str]] = None,
        dimensions: Optional[List[int]] = None,
        format_classes: bool = False,
    ) -> pd.DataFrame:
        """Convert time series dataset to a long-format pandas DataFrame.

        Creates a DataFrame with one row per timestep per component per sample per dimension,
        suitable for detailed analysis and visualization with libraries like Seaborn or Plotly.

        Args:
            dataset (Dict[str, Any]): Dataset dictionary returned by build().
            samples (Optional[List[int]]): List of sample indices to include.
                If None, includes all samples.
            classes (Optional[List[int]]): List of class labels to include.
                If None, includes all classes.
            components (Optional[List[str]]): List of component types to include.
                Default includes all: ["aggregated", "background", "features"]
            dimensions (Optional[List[int]]): List of dimension indices to include.
                If None, includes all dimensions.
            format_classes (bool): If True, format class labels as "Class X".
                Otherwise use numeric labels. Default is False.

        Returns:
            pd.DataFrame: Long-format DataFrame with columns:
                - time: Timestep index
                - value: Component value at that timestep
                - class: Class label (formatted if format_classes=True)
                - sample: Sample index
                - component: Component type
                - feature: Feature name (for feature components)
                - dim: Dimension index

        Raises:
            ValueError: If specified dimensions are out of range.
        """
        # Default components to include (use programming-friendly names)
        default_components = ["aggregated", "background", "features"]
        components_to_include = (
            components if components is not None else default_components
        )

        # Get number of dimensions from metadata or infer from data shape
        n_dims = dataset.get("metadata", {}).get("n_dimensions", 1)
        if n_dims == 1 and len(dataset["X"].shape) == 3:
            n_dims = dataset["X"].shape[2]

        # Default dimensions to include
        if dimensions is None:
            dimensions = list(range(n_dims))
        else:
            # Validate dimensions
            for d in dimensions:
                if not 0 <= d < n_dims:
                    raise ValueError(
                        f"Dimension {d} is out of range (0 to {n_dims - 1})."
                    )

        # Filter by class if specified
        if classes is not None:
            class_indices = np.where(np.isin(dataset["y"], classes))[0]
        else:
            class_indices = np.arange(len(dataset["y"]))

        # Filter by sample if specified
        if samples is not None:
            sample_indices = np.array(samples)
            # Ensure sample indices are within class_indices
            sample_indices = np.intersect1d(sample_indices, class_indices)
        else:
            sample_indices = class_indices

        # Initialize list to hold DataFrames
        dfs = []

        # Process aggregated time series (formerly "Complete Series")
        if "aggregated" in components_to_include:
            # Get all selected samples at once
            X_selected = dataset["X"][sample_indices]
            n_samples = len(sample_indices)
            n_timesteps = X_selected.shape[1]

            # For each dimension
            for dim_idx in dimensions:
                # Create time indices for all samples
                times = np.arange(n_timesteps)

                # Create sample indices repeated for each timestep
                sample_idx_rep = np.repeat(sample_indices, n_timesteps)
                time_idx_rep = np.tile(times, n_samples)

                # Create values array for this dimension
                if len(X_selected.shape) == 3:  # Multivariate case
                    values = X_selected[:, :, dim_idx].flatten()
                else:  # Univariate case (backward compatibility)
                    values = X_selected.flatten()

                # Get class labels
                classes_rep = np.repeat(dataset["y"][sample_indices], n_timesteps)
                if format_classes:
                    class_labels = np.array([f"Class {c}" for c in classes_rep])
                else:
                    class_labels = classes_rep

                # Create DataFrame
                df_agg = pd.DataFrame(
                    {
                        "time": time_idx_rep,
                        "value": values,
                        "class": class_labels,
                        "sample": sample_idx_rep,
                        "component": "aggregated",
                        "feature": None,
                        "dim": dim_idx,
                    }
                )

                dfs.append(df_agg)

        # Process components if available
        if "components" in dataset:
            for component_name in ["background"]:
                if component_name in components_to_include:
                    for dim_idx in dimensions:
                        comp_data = []
                        valid_samples = []

                        # Collect data from all samples
                        for i, idx in enumerate(sample_indices):
                            comp = dataset["components"][idx]
                            if (
                                hasattr(comp, component_name)
                                and getattr(comp, component_name) is not None
                            ):
                                comp_array = getattr(comp, component_name)
                                # Check if component has dimension data
                                if (
                                    len(comp_array.shape) == 2
                                    and comp_array.shape[1] > dim_idx
                                ):
                                    comp_data.append(comp_array[:, dim_idx])
                                    valid_samples.append(idx)
                                elif len(comp_array.shape) == 1 and dim_idx == 0:
                                    # Backward compatibility - 1D array for univariate case
                                    comp_data.append(comp_array)
                                    valid_samples.append(idx)

                        if comp_data:
                            # Stack component data
                            comp_array = np.vstack(comp_data)
                            n_valid = len(valid_samples)
                            n_timesteps = comp_array.shape[1]

                            # Create indices
                            sample_idx_rep = np.repeat(valid_samples, n_timesteps)
                            time_idx_rep = np.tile(np.arange(n_timesteps), n_valid)

                            # Get class labels
                            classes_rep = np.repeat(
                                dataset["y"][valid_samples], n_timesteps
                            )
                            if format_classes:
                                class_labels = np.array(
                                    [f"Class {c}" for c in classes_rep]
                                )
                            else:
                                class_labels = classes_rep

                            # Create DataFrame
                            df_comp = pd.DataFrame(
                                {
                                    "time": time_idx_rep,
                                    "value": comp_array.flatten(),
                                    "class": class_labels,
                                    "sample": sample_idx_rep,
                                    "component": component_name,
                                    "feature": None,
                                    "dim": dim_idx,
                                }
                            )

                            dfs.append(df_comp)

            # Process features - features need special handling since they're stored in a dict
            if "features" in components_to_include:
                feature_dfs = []

                for idx in sample_indices:
                    comp = dataset["components"][idx]
                    if hasattr(comp, "features") and comp.features:
                        for feature_name, feature_values in comp.features.items():
                            # Extract dimension from feature name (if present)
                            if "_dim" in feature_name:
                                parts = feature_name.split("_dim")
                                dim_idx = int(parts[-1])
                                if dim_idx not in dimensions:
                                    continue
                            else:
                                # For backward compatibility, assume dimension 0
                                dim_idx = 0
                                if dim_idx not in dimensions:
                                    continue

                            # Get class label
                            class_label = dataset["y"][idx]
                            if format_classes:
                                class_str = f"Class {class_label}"
                            else:
                                class_str = class_label

                            # Create feature DataFrame
                            df_feature = pd.DataFrame(
                                {
                                    "time": np.arange(len(feature_values)),
                                    "value": feature_values,
                                    "class": class_str,
                                    "sample": idx,
                                    "component": "features",
                                    "feature": feature_name,
                                    "dim": dim_idx,
                                }
                            )

                            feature_dfs.append(df_feature)

                if feature_dfs:
                    dfs.append(pd.concat(feature_dfs, ignore_index=True))

        # Combine all DataFrames
        if not dfs:
            return pd.DataFrame()

        df = pd.concat(dfs, ignore_index=True)

        # Set up categorical variables for ordered plotting
        components_present = [
            c for c in components_to_include if c in df["component"].unique()
        ]
        df["component"] = pd.Categorical(
            df["component"], categories=components_present, ordered=True
        )

        if format_classes:
            class_labels = sorted(
                df["class"].unique(), key=lambda x: int(x.split()[-1])
            )
            df["class"] = pd.Categorical(
                df["class"], categories=class_labels, ordered=True
            )

        return df

    def _add_vector_handling_nans(
        self, base: np.ndarray, to_add: np.ndarray
    ) -> np.ndarray:
        """Add two vectors while properly handling NaN values.

        Special handling of NaN values during vector addition:
        1. Where both vectors have values (not NaN): Normal addition
        2. Where one vector has NaN: Use the non-NaN value
        3. Where both have NaN: Result remains NaN

        This allows components to only contribute where they're defined.

        Args:
            base (np.ndarray): Base vector to add to.
            to_add (np.ndarray): Vector to add to the base.

        Returns:
            np.ndarray: Combined vector with NaNs handled according to the rules above.
        """
        # Stack arrays and use nansum for element-wise addition that ignores NaNs
        result = np.nansum(np.stack([base, to_add]), axis=0)

        # Fix case where both values are NaN (nansum would return 0, but we want NaN)
        both_nan = np.isnan(base) & np.isnan(to_add)
        result[both_nan] = np.nan

        return result

    @staticmethod
    def convert_data_format(
        dataset: Dict[str, Any], target_format: str
    ) -> Dict[str, Any]:
        """Convert an existing dataset between 'channels_first' and 'channels_last' formats.

        This utility function helps convert datasets between the two supported tensor layouts:
        - 'channels_last': [batch_size, time_steps, channels] (original XAITimeSynth format)
        - 'channels_first': [batch_size, channels, time_steps] (PyTorch/tsai format)

        Args:
            dataset (Dict[str, Any]): Dataset dictionary returned by build().
            target_format (str): Target format, either 'channels_first' or 'channels_last'.

        Returns:
            Dict[str, Any]: Dataset with X tensor in the target format. The metadata
                is updated to reflect the new format.

        Raises:
            ValueError: If target_format is not one of ['channels_first', 'channels_last'].
            ValueError: If dataset doesn't contain a metadata entry with data_format.
        """
        # Validate format
        if target_format not in ["channels_first", "channels_last"]:
            raise ValueError(
                "target_format must be one of ['channels_first', 'channels_last']"
            )

        # Create a shallow copy of the dataset
        result = dataset.copy()

        # Get current format from metadata
        if "metadata" not in dataset or "data_format" not in dataset["metadata"]:
            # Try to infer format
            if "X" in dataset and len(dataset["X"].shape) == 3:
                # Assume original format for backward compatibility
                current_format = "channels_last"
            else:
                raise ValueError("Dataset doesn't have format information in metadata")
        else:
            current_format = dataset["metadata"]["data_format"]

        # If already in target format, return dataset as-is
        if current_format == target_format:
            return result

        # Convert format by transposing the data
        if "X" in result:
            # Convert from channels_last to channels_first
            if current_format == "channels_last" and target_format == "channels_first":
                result["X"] = np.transpose(result["X"], (0, 2, 1))
            # Convert from channels_first to channels_last
            elif (
                current_format == "channels_first" and target_format == "channels_last"
            ):
                result["X"] = np.transpose(result["X"], (0, 2, 1))

            # Also convert train/test splits if they exist
            if "X_train" in result:
                if (
                    current_format == "channels_last"
                    and target_format == "channels_first"
                ):
                    result["X_train"] = np.transpose(result["X_train"], (0, 2, 1))
                else:
                    result["X_train"] = np.transpose(result["X_train"], (0, 2, 1))

            if "X_test" in result:
                if (
                    current_format == "channels_last"
                    and target_format == "channels_first"
                ):
                    result["X_test"] = np.transpose(result["X_test"], (0, 2, 1))
                else:
                    result["X_test"] = np.transpose(result["X_test"], (0, 2, 1))

        # Update metadata
        if "metadata" in result:
            result["metadata"] = result["metadata"].copy()
            result["metadata"]["data_format"] = target_format

        return result

    def clone(
        self,
        n_timesteps: Optional[int] = None,
        n_samples: Optional[int] = None,
        n_dimensions: Optional[int] = None,
        normalization: Optional[str] = None,
        random_state: Optional[int] = None,
        normalization_kwargs: Optional[Dict[str, Any]] = None,
        feature_fill_value: Optional[Any] = None,
        background_fill_value: Optional[Any] = None,
        data_format: Optional[str] = None,
    ) -> "TimeSeriesBuilder":
        """Create a new builder with the same class definitions but different parameters.

        This method creates an independent copy of the builder with all its class
        definitions but allows overriding specific parameters. This is particularly
        useful for generating train/test/validation splits with the same underlying
        patterns but different sample counts or random seeds.

        Args:
            n_timesteps: New length of each time series. Defaults to original value.
            n_samples: New number of samples to generate. Defaults to original value.
            n_dimensions: New number of dimensions. Defaults to original value.
            normalization: New normalization method. Defaults to original value.
            random_state: New random seed for reproducibility. Defaults to original value.
            normalization_kwargs: New normalization parameters. Defaults to original value.
            feature_fill_value: New value for non-existent features. Defaults to original value.
            background_fill_value: New value for background. Defaults to original value.
            data_format: New data format ('channels_first' or 'channels_last'). Defaults to original value.

        Returns:
            TimeSeriesBuilder: A new independent builder with copied class definitions
            and potentially updated parameters.

        Example:
            ```python
            # Create base builder with class definitions
            base_builder = (
                TimeSeriesBuilder(n_timesteps=100, random_state=42)
                .for_class(0)
                .add_signal(random_walk(step_size=0.2))
                .for_class(1)
                .add_signal(random_walk(step_size=0.2))
                .add_feature(constant(value=1.0), start_pct=0.4, end_pct=0.6)
            )

            # Generate train dataset with 140 samples
            train_dataset = base_builder.clone(n_samples=140, random_state=42).build()

            # Generate test dataset with 60 samples and a different random seed
            test_dataset = base_builder.clone(n_samples=60, random_state=43).build()
            ```
        """
        # Prepare parameters with defaults from current instance when not provided
        params = {
            "n_timesteps": n_timesteps if n_timesteps is not None else self.n_timesteps,
            "n_samples": n_samples if n_samples is not None else self.n_samples,
            "n_dimensions": n_dimensions
            if n_dimensions is not None
            else self.n_dimensions,
            "normalization": normalization
            if normalization is not None
            else self.normalization,
            "random_state": random_state
            if random_state is not None
            else self.random_state,
            "normalization_kwargs": (
                normalization_kwargs
                if normalization_kwargs is not None
                else copy.deepcopy(self.normalization_kwargs)
            ),
            "feature_fill_value": feature_fill_value
            if feature_fill_value is not None
            else self.feature_fill_value,
            "background_fill_value": background_fill_value
            if background_fill_value is not None
            else self.background_fill_value,
            "data_format": data_format if data_format is not None else self.data_format,
        }
        # Create new builder with updated parameters
        new_builder = TimeSeriesBuilder(**params)

        # Copy class definitions (deep copy to ensure complete independence)
        new_builder.class_definitions = copy.deepcopy(self.class_definitions)

        # Set current class if one was selected in the original builder
        if self.current_class is not None:
            # Find the class label of the current class
            for i, class_def in enumerate(self.class_definitions):
                if class_def is self.current_class:
                    new_builder.current_class = new_builder.class_definitions[i]
                    break

        return new_builder

    def to_config(self) -> Dict[str, Any]:
        """Export the builder configuration as a dictionary.

        Converts the builder's internal state to a configuration dictionary
        that can be used with `load_builders_from_config()` or serialized to YAML.

        The output format matches what the parser expects, enabling round-trip
        conversion between Python code and configuration files.

        Returns:
            Dict[str, Any]: Configuration dictionary with builder parameters
            and class definitions.

        Example:
            ```python
            import yaml

            # Build a dataset programmatically
            builder = (
                TimeSeriesBuilder(n_timesteps=100, n_samples=200)
                .for_class(0)
                .add_signal(gaussian_noise(sigma=0.1))
                .for_class(1)
                .add_signal(gaussian_noise(sigma=0.1))
                .add_feature(peak(amplitude=1.0), start_pct=0.3, end_pct=0.6)
            )

            # Export to config dict
            config = builder.to_config()

            # Save to YAML file
            with open("config.yaml", "w") as f:
                yaml.dump({"my_dataset": config}, f)

            # Later, reload from YAML
            builders = load_builders_from_config(config_path="config.yaml")
            dataset = builders["my_dataset"].build()
            ```
        """
        # Keys that should stay at the component level, not in params
        COMPONENT_KEYS = {
            "type",
            "dimensions",
            "shared_randomness",
            "shared_location",
            "start_pct",
            "end_pct",
            "length_pct",
            "random_location",
        }

        def convert_component(comp: Dict[str, Any]) -> Dict[str, Any]:
            """Convert internal component format to config format."""
            result = {}

            # Map 'type' to 'function'
            if "type" in comp:
                result["function"] = comp["type"]

            # Extract params (everything except special keys)
            params = {k: v for k, v in comp.items() if k not in COMPONENT_KEYS}
            if params:
                result["params"] = params

            # Copy over special keys
            if "dimensions" in comp:
                result["dimensions"] = comp["dimensions"]
            if comp.get("shared_randomness"):
                result["shared_randomness"] = True
            if "shared_location" in comp and not comp.get("shared_location", True):
                result["shared_location"] = False

            # Location parameters
            if comp.get("random_location"):
                result["random_location"] = True
                if "length_pct" in comp:
                    lp = comp["length_pct"]
                    # Serialize tuples as {range: [min, max]} for YAML roundtrip fidelity
                    result["length_pct"] = (
                        {"range": list(lp)} if isinstance(lp, tuple) else lp
                    )
            elif "start_pct" in comp or "end_pct" in comp:
                if "start_pct" in comp:
                    result["start_pct"] = comp["start_pct"]
                if "end_pct" in comp:
                    result["end_pct"] = comp["end_pct"]

            return result

        # Build the config dictionary
        config: Dict[str, Any] = {
            "n_timesteps": self.n_timesteps,
            "n_samples": self.n_samples,
            "n_dimensions": self.n_dimensions,
            "normalization": self.normalization,
            "data_format": self.data_format,
        }

        # Only include optional parameters if they have non-default values
        if self.random_state is not None:
            config["random_state"] = self.random_state
        if self.normalization_kwargs:
            config["normalization_kwargs"] = self.normalization_kwargs

        # Convert class definitions
        classes = []
        for class_def in self.class_definitions:
            class_config: Dict[str, Any] = {"id": class_def["label"]}

            if class_def.get("weight", 1.0) != 1.0:
                class_config["weight"] = class_def["weight"]

            # Convert background components to signals list
            signals = []
            for comp in class_def["components"].get("background", []):
                signals.append(convert_component(comp))

            if signals:
                class_config["signals"] = signals

            # Convert features
            features = []
            for comp in class_def["components"].get("features", []):
                features.append(convert_component(comp))

            if features:
                class_config["features"] = features

            classes.append(class_config)

        config["classes"] = classes

        return config

`init(n_timesteps: int = 100, n_samples: int = 1000, n_dimensions: int = 1, normalization: str = 'zscore', random_state: Optional[int] = None, normalization_kwargs: Optional[Dict[str, Any]] = {}, feature_fill_value: Any = np.nan, background_fill_value: Any = 0.0, data_format: str = 'channels_first')` ¶

Initialize the time series builder.

Parameters:

Name	Type	Description	Default
`n_timesteps`	`int`	Length of each time series. Default is 100.	`100`
`n_samples`	`int`	Total number of samples to generate. Default is 1000.	`1000`
`n_dimensions`	`int`	Number of dimensions for multivariate time series. Default is 1 (univariate).	`1`
`normalization`	`str`	Normalization method for the final time series. Options: "zscore" (standardization), "minmax" (scale to 0-1), or "none". Default is "zscore".	`'zscore'`
`random_state`	`int`	Seed for random number generation to ensure reproducibility.	`None`
`normalization_kwargs`	`dict`	Additional parameters for normalization methods. For "minmax": can specify "feature_range" as tuple (min, max).	`{}`
`feature_fill_value`	`Any`	Value used for non-existent features. Default is np.nan. Using NaN makes features only appear where they're defined in visualizations.	`nan`
`background_fill_value`	`Any`	Value used for background when none exists. Default is 0.0. Background typically affects the entire time series, so zeros represent "no contribution" rather than "doesn't exist".	`0.0`
`data_format`	`str`	Format of the output tensor data. 'channels_last': [batch, time_steps, channels] (original XAITimeSynth format) 'channels_first': [batch, channels, time_steps] (PyTorch/tsai format) Default is 'channels_first'.	`'channels_first'`

Raises:

Type	Description
`ValueError`	If n_dimensions is less than 1.
`ValueError`	If data_format is not one of ['channels_first', 'channels_last']

Source code in xaitimesynth/builder.py

def __init__(
    self,
    n_timesteps: int = 100,
    n_samples: int = 1000,
    n_dimensions: int = 1,
    normalization: str = "zscore",
    random_state: Optional[int] = None,
    normalization_kwargs: Optional[Dict[str, Any]] = {},
    feature_fill_value: Any = np.nan,
    background_fill_value: Any = 0.0,
    data_format: str = "channels_first",
):
    """Initialize the time series builder.

    Args:
        n_timesteps (int): Length of each time series. Default is 100.
        n_samples (int): Total number of samples to generate. Default is 1000.
        n_dimensions (int): Number of dimensions for multivariate time series. Default is 1 (univariate).
        normalization (str): Normalization method for the final time series.
            Options: "zscore" (standardization), "minmax" (scale to 0-1), or "none". Default is "zscore".
        random_state (int, optional): Seed for random number generation to ensure reproducibility.
        normalization_kwargs (dict, optional): Additional parameters for normalization methods.
            For "minmax": can specify "feature_range" as tuple (min, max).
        feature_fill_value: Value used for non-existent features. Default is np.nan.
            Using NaN makes features only appear where they're defined in visualizations.
        background_fill_value: Value used for background when none exists. Default is 0.0.
            Background typically affects the entire time series, so zeros represent
            "no contribution" rather than "doesn't exist".
        data_format (str): Format of the output tensor data.
            'channels_last': [batch, time_steps, channels] (original XAITimeSynth format)
            'channels_first': [batch, channels, time_steps] (PyTorch/tsai format)
            Default is 'channels_first'.

    Raises:
        ValueError: If n_dimensions is less than 1.
        ValueError: If data_format is not one of ['channels_first', 'channels_last']
    """
    self.n_timesteps = n_timesteps
    self.n_samples = n_samples
    self.n_dimensions = n_dimensions

    # Validate n_dimensions
    if n_dimensions < 1:
        raise ValueError("n_dimensions must be at least 1")

    # Validate data_format
    if data_format not in ["channels_first", "channels_last"]:
        raise ValueError(
            "data_format must be one of ['channels_first', 'channels_last']"
        )
    self.data_format = data_format

    self.normalization = normalization
    self.normalization_kwargs = normalization_kwargs or {}
    self.random_state = random_state
    self.rng = np.random.RandomState(random_state)
    self.feature_fill_value = feature_fill_value
    self.background_fill_value = background_fill_value

    # Initialize class definitions and the current class
    self.class_definitions = []
    self.current_class = None

`for_class(class_label: int, weight: float = 1.0) -> TimeSeriesBuilder` ¶

Set the current class for component assignment.

Creates a new class definition and makes it the target for subsequent component additions. Multiple calls create multiple classes for classification tasks.

Parameters:

Name	Type	Description	Default
`class_label`	`int`	Integer label for the class, used as the target value.	required
`weight`	`float`	Relative weight of this class in the dataset. Controls the class distribution in the generated dataset. Default is 1.0.	`1.0`

Returns:

Name	Type	Description
`TimeSeriesBuilder`	`TimeSeriesBuilder`	Self for method chaining.

Source code in xaitimesynth/builder.py

def for_class(self, class_label: int, weight: float = 1.0) -> "TimeSeriesBuilder":
    """Set the current class for component assignment.

    Creates a new class definition and makes it the target for subsequent component additions.
    Multiple calls create multiple classes for classification tasks.

    Args:
        class_label (int): Integer label for the class, used as the target value.
        weight (float): Relative weight of this class in the dataset. Controls the
            class distribution in the generated dataset. Default is 1.0.

    Returns:
        TimeSeriesBuilder: Self for method chaining.
    """
    # Create a new class definition
    class_def = {
        "label": class_label,
        "weight": weight,
        "components": {"background": [], "features": []},
    }

    self.class_definitions.append(class_def)
    self.current_class = class_def

    return self

`add_signal(component: Dict[str, Any], dim: Optional[List[int]] = None, shared_randomness: bool = False, start_pct: Optional[float] = None, end_pct: Optional[float] = None, length_pct: Optional[float] = None, random_location: bool = False, shared_location: bool = True) -> TimeSeriesBuilder` ¶

Add a signal component to the current class.

Signals form the background structure of the time series (e.g., random walks, gaussian noise, trends). All signals are added to the background component.

Default behavior: When no location parameters are specified (start_pct, end_pct, length_pct all None and random_location=False), the signal spans the entire time series length.

Segment mode: To apply a signal to only part of the time series, either: - Specify start_pct and end_pct for a fixed segment, or - Set random_location=True with length_pct for a randomly positioned segment.

Parameters:

Name	Type	Description	Default
`component`	`Dict[str, Any]`	Component definition dictionary with 'type' and parameters.	required
`dim`	`List[int]`	List of dimension indices where this signal should be applied. If None, the signal will be added to all dimensions. Default is None.	`None`
`shared_randomness`	`bool`	If True, the same random pattern will be used across all specified dimensions. If False, each dimension gets its own random pattern (for stochastic components). Default is False.	`False`
`start_pct`	`float`	Start position as percentage of time series length (0-1). Required together with end_pct for a fixed segment.	`None`
`end_pct`	`float`	End position as percentage of time series length (0-1). Required together with start_pct for a fixed segment.	`None`
`length_pct`	`float`	Length of signal as percentage of time series length (0-1). Required when random_location is True.	`None`
`random_location`	`bool`	Whether to place the signal at a random location. Requires length_pct. Default is False.	`False`
`shared_location`	`bool`	If True and random_location is True, the same random location will be used across all dimensions. If False, each dimension gets its own random location. Default is True.	`True`

Returns:

Name	Type	Description
`TimeSeriesBuilder`	`TimeSeriesBuilder`	Self for method chaining.

Raises:

Type	Description
`ValueError`	If no class is selected or if location parameters are inconsistent.

Examples:

Full time series (default - no location params)¶

builder.add_signal(gaussian_noise(sigma=0.1))

Fixed segment from 20% to 50% of the series¶

builder.add_signal(constant(value=1.0), start_pct=0.2, end_pct=0.5)

Random segment of 30% length¶

builder.add_signal(constant(value=1.0), random_location=True, length_pct=0.3)

Source code in xaitimesynth/builder.py

def add_signal(
    self,
    component: Dict[str, Any],
    dim: Optional[List[int]] = None,
    shared_randomness: bool = False,
    start_pct: Optional[float] = None,
    end_pct: Optional[float] = None,
    length_pct: Optional[float] = None,
    random_location: bool = False,
    shared_location: bool = True,
) -> "TimeSeriesBuilder":
    """Add a signal component to the current class.

    Signals form the background structure of the time series (e.g., random walks,
    gaussian noise, trends). All signals are added to the background component.

    Default behavior: When no location parameters are specified (start_pct, end_pct,
    length_pct all None and random_location=False), the signal spans the entire time
    series length.

    Segment mode: To apply a signal to only part of the time series, either:
    - Specify start_pct and end_pct for a fixed segment, or
    - Set random_location=True with length_pct for a randomly positioned segment.

    Args:
        component (Dict[str, Any]): Component definition dictionary with 'type' and parameters.
        dim (List[int]): List of dimension indices where this signal should be applied.
            If None, the signal will be added to all dimensions. Default is None.
        shared_randomness (bool): If True, the same random pattern will be used across all
            specified dimensions. If False, each dimension gets its own random pattern
            (for stochastic components). Default is False.
        start_pct (float, optional): Start position as percentage of time series length (0-1).
            Required together with end_pct for a fixed segment.
        end_pct (float, optional): End position as percentage of time series length (0-1).
            Required together with start_pct for a fixed segment.
        length_pct (float, optional): Length of signal as percentage of time series length (0-1).
            Required when random_location is True.
        random_location (bool): Whether to place the signal at a random location.
            Requires length_pct. Default is False.
        shared_location (bool): If True and random_location is True, the same random
            location will be used across all dimensions. If False, each dimension gets
            its own random location. Default is True.

    Returns:
        TimeSeriesBuilder: Self for method chaining.

    Raises:
        ValueError: If no class is selected or if location parameters are inconsistent.

    Examples:
        # Full time series (default - no location params)
        builder.add_signal(gaussian_noise(sigma=0.1))

        # Fixed segment from 20% to 50% of the series
        builder.add_signal(constant(value=1.0), start_pct=0.2, end_pct=0.5)

        # Random segment of 30% length
        builder.add_signal(constant(value=1.0), random_location=True, length_pct=0.3)
    """
    if self.current_class is None:
        raise ValueError("No class selected. Call for_class() first.")

    if dim is None:
        dim = list(range(self.n_dimensions))
    self._validate_dimensions(dim)

    # Determine if this is a segment or full-series signal
    has_time_range = (
        start_pct is not None
        or end_pct is not None
        or length_pct is not None
        or random_location
    )

    # Validate location parameters based on mode
    if has_time_range:
        if random_location:
            if length_pct is None:
                raise ValueError(
                    "length_pct must be provided when random_location is True"
                )
            if not (0 < length_pct <= 1):
                raise ValueError("length_pct must be between 0 and 1")
        else:
            # Fixed segment mode - requires both start_pct and end_pct
            if start_pct is None or end_pct is None:
                raise ValueError(
                    "Both start_pct and end_pct must be provided for a fixed segment"
                )
            if not (
                0 <= start_pct < 1 and 0 < end_pct <= 1 and start_pct < end_pct
            ):
                raise ValueError(
                    "Invalid start_pct or end_pct. Must be between 0 and 1, "
                    "with start_pct < end_pct"
                )

    # Build the component definition
    component_with_params = component.copy()

    if has_time_range:
        if random_location:
            component_with_params["random_location"] = True
            component_with_params["length_pct"] = length_pct
            component_with_params["shared_location"] = shared_location
        else:
            component_with_params["random_location"] = False
            component_with_params["start_pct"] = start_pct
            component_with_params["end_pct"] = end_pct

    # Add dimensions and randomness settings
    # Use single component when sharing location/randomness or single dimension
    if (
        (has_time_range and shared_location and random_location)
        or shared_randomness
        or len(dim) == 1
    ):
        component_with_params["dimensions"] = dim
        component_with_params["shared_randomness"] = shared_randomness
        component_with_params["shared_location"] = shared_location
        self.current_class["components"]["background"].append(component_with_params)
    else:
        # Create separate component entries for each dimension
        for d in dim:
            component_with_dim = component_with_params.copy()
            component_with_dim["dimensions"] = [d]
            component_with_dim["shared_randomness"] = shared_randomness
            component_with_dim["shared_location"] = shared_location
            self.current_class["components"]["background"].append(
                component_with_dim
            )

    return self

`add_feature(component: Dict[str, Any], start_pct: Optional[float] = None, end_pct: Optional[float] = None, length_pct: Optional[Union[float, Tuple[float, float], List[float]]] = None, random_location: bool = False, dim: Optional[List[int]] = None, shared_location: bool = True, shared_randomness: bool = False) -> TimeSeriesBuilder` ¶

Add a feature component to the current class.

Features are distinctive patterns that can differentiate between classes. They can be placed at fixed or random locations within the time series.

Parameters:

Name	Type	Description	Default
`component`	`Dict[str, Any]`	Component definition dictionary with 'type' and parameters.	required
`start_pct`	`float`	Start position as percentage of time series length (0-1). Required when random_location is False.	`None`
`end_pct`	`float`	End position as percentage of time series length (0-1). Required when random_location is False.	`None`
`length_pct`	`float \| tuple \| list`	Length of feature as percentage of time series length. Required when random_location is True. Three forms accepted: - float: fixed length, e.g. `0.5` - tuple (min, max): sample uniformly per sample in range, e.g. `(0.25, 0.75)` - list of floats: sample from discrete choices per sample, e.g. `[0.25, 0.5]`	`None`
`random_location`	`bool`	Whether to place the feature at a random location. Default is False (fixed position).	`False`
`dim`	`List[int]`	List of dimension indices where this feature should be applied. If None, the feature will be added to all dimensions. Default is None.	`None`
`shared_location`	`bool`	If True and random_location is True, the same random location will be used across all dimensions. If False, each dimension gets its own random location. Default is True.	`True`
`shared_randomness`	`bool`	If True, the same random pattern will be used across all dimensions. If False, each dimension gets its own random pattern (for stochastic components). Default is False.	`False`

Returns:

Name	Type	Description
`TimeSeriesBuilder`	`TimeSeriesBuilder`	Self for method chaining.

Raises:

Type	Description
`ValueError`	If no class is selected or if location parameters are invalid.

Source code in xaitimesynth/builder.py

def add_feature(
    self,
    component: Dict[str, Any],
    start_pct: Optional[float] = None,
    end_pct: Optional[float] = None,
    length_pct: Optional[Union[float, Tuple[float, float], List[float]]] = None,
    random_location: bool = False,
    dim: Optional[List[int]] = None,
    shared_location: bool = True,
    shared_randomness: bool = False,
) -> "TimeSeriesBuilder":
    """Add a feature component to the current class.

    Features are distinctive patterns that can differentiate between classes.
    They can be placed at fixed or random locations within the time series.

    Args:
        component (Dict[str, Any]): Component definition dictionary with 'type' and parameters.
        start_pct (float, optional): Start position as percentage of time series length (0-1).
            Required when random_location is False.
        end_pct (float, optional): End position as percentage of time series length (0-1).
            Required when random_location is False.
        length_pct (float | tuple | list, optional): Length of feature as percentage of time
            series length. Required when random_location is True. Three forms accepted:
            - float: fixed length, e.g. ``0.5``
            - tuple (min, max): sample uniformly per sample in range, e.g. ``(0.25, 0.75)``
            - list of floats: sample from discrete choices per sample, e.g. ``[0.25, 0.5]``
        random_location (bool): Whether to place the feature at a random location.
            Default is False (fixed position).
        dim (List[int]): List of dimension indices where this feature should be applied.
            If None, the feature will be added to all dimensions. Default is None.
        shared_location (bool): If True and random_location is True, the same random
            location will be used across all dimensions. If False, each dimension gets
            its own random location. Default is True.
        shared_randomness (bool): If True, the same random pattern will be used across
            all dimensions. If False, each dimension gets its own random pattern
            (for stochastic components). Default is False.

    Returns:
        TimeSeriesBuilder: Self for method chaining.

    Raises:
        ValueError: If no class is selected or if location parameters are invalid.
    """
    if self.current_class is None:
        raise ValueError("No class selected. Call for_class() first.")

    if dim is None:
        dim = list(range(self.n_dimensions))
    self._validate_dimensions(dim)

    # Create feature definition
    feature_def = component.copy()

    # Add location parameters
    if random_location:
        if length_pct is None:
            raise ValueError(
                "length_pct must be provided when random_location is True"
            )
        if isinstance(length_pct, tuple):
            if len(length_pct) != 2 or not (0 < length_pct[0] < length_pct[1] <= 1):
                raise ValueError(
                    "length_pct tuple must be (min, max) with 0 < min < max <= 1"
                )
        elif isinstance(length_pct, list):
            if not length_pct or not all(0 < v <= 1 for v in length_pct):
                raise ValueError(
                    "length_pct list must be non-empty with all values in (0, 1]"
                )
        else:
            if not (0 < length_pct <= 1):
                raise ValueError("length_pct must be between 0 and 1")

        feature_def["random_location"] = True
        feature_def["length_pct"] = length_pct
    else:
        if start_pct is None or end_pct is None:
            raise ValueError(
                "start_pct and end_pct must be provided when random_location is False"
            )
        if not (0 <= start_pct < 1 and 0 < end_pct <= 1 and start_pct < end_pct):
            raise ValueError(
                "Invalid start_pct or end_pct. Must be between 0 and 1, with start_pct < end_pct"
            )

        feature_def["random_location"] = False
        feature_def["start_pct"] = start_pct
        feature_def["end_pct"] = end_pct

    # Add to feature collection, ensuring the shared location logic is properly observed
    if shared_location and random_location or shared_randomness or len(dim) == 1:
        feature_def["dimensions"] = dim
        feature_def["shared_location"] = shared_location
        feature_def["shared_randomness"] = shared_randomness
        self.current_class["components"]["features"].append(feature_def)
    else:
        # Create separate feature entries for each dimension when not sharing
        for d in dim:
            feature_single_dim = feature_def.copy()
            feature_single_dim["dimensions"] = [d]  # Single dimension
            feature_single_dim["shared_location"] = shared_location
            feature_single_dim["shared_randomness"] = shared_randomness
            self.current_class["components"]["features"].append(feature_single_dim)

    return self

`build(return_components: bool = True, deterministic_class_counts: bool = True, shuffle: bool = True) -> Dict[str, Any]` ¶

Build the dataset based on the configured class definitions.

Generates time series data by combining all components for each class according to the specified parameters, with options to include component vectors and create a train/test split.

Parameters:

Name	Type	Description	Default
`return_components`	`bool`	Whether to return the individual component vectors. Useful for visualization and analysis. Default is True.	`True`
`deterministic_class_counts`	`bool`	If True, class counts will be determined exactly by the weights rather than using multinomial sampling. This ensures exact class proportions. Default is True.	`True`
`shuffle`	`bool`	Whether to shuffle the samples across classes. If True (default), samples will be randomly ordered. If False, samples will be grouped by class in the order classes were defined.	`True`

Returns:

Type Description

Dict[str, Any]

Dict[str, Any]: Dictionary containing the generated dataset with keys: - 'X': Time series data with shape determined by data_format: - 'channels_last': [n_samples, n_timesteps, n_dimensions] - 'channels_first': [n_samples, n_dimensions, n_timesteps] - 'y': Class labels for each sample - 'feature_masks': Boolean masks showing feature locations - 'metadata': Dataset configuration information - 'components': Individual component vectors (if return_components=True) If train_test_split is provided, also includes: - 'X_train', 'y_train': Training data - 'X_test', 'y_test': Testing data

Raises:

Type	Description
`ValueError`	If no class definitions have been provided.

Source code in xaitimesynth/builder.py

def build(
    self,
    return_components: bool = True,
    deterministic_class_counts: bool = True,
    shuffle: bool = True,
) -> Dict[str, Any]:
    """Build the dataset based on the configured class definitions.

    Generates time series data by combining all components for each class according
    to the specified parameters, with options to include component vectors and
    create a train/test split.

    Args:
        return_components (bool): Whether to return the individual component vectors.
            Useful for visualization and analysis. Default is True.
        deterministic_class_counts (bool): If True, class counts will be determined exactly
            by the weights rather than using multinomial sampling. This ensures exact class
            proportions. Default is True.
        shuffle (bool): Whether to shuffle the samples across classes. If True (default),
            samples will be randomly ordered. If False, samples will be grouped by class
            in the order classes were defined.

    Returns:
        Dict[str, Any]: Dictionary containing the generated dataset with keys:
            - 'X': Time series data with shape determined by data_format:
                   - 'channels_last': [n_samples, n_timesteps, n_dimensions]
                   - 'channels_first': [n_samples, n_dimensions, n_timesteps]
            - 'y': Class labels for each sample
            - 'feature_masks': Boolean masks showing feature locations
            - 'metadata': Dataset configuration information
            - 'components': Individual component vectors (if return_components=True)
            If train_test_split is provided, also includes:
            - 'X_train', 'y_train': Training data
            - 'X_test', 'y_test': Testing data

    Raises:
        ValueError: If no class definitions have been provided.
    """
    if not self.class_definitions:
        raise ValueError(
            "No class definitions provided. Call for_class() at least once."
        )

    # Normalize class weights and determine class distribution
    weights = np.array([cd["weight"] for cd in self.class_definitions])
    weights = weights / weights.sum()

    if deterministic_class_counts:
        # Deterministic class counts based on exact weights
        raw_counts = weights * self.n_samples
        # Round to integers and ensure we have exactly n_samples total
        class_counts = np.floor(raw_counts).astype(int)
        remaining = self.n_samples - class_counts.sum()
        # Distribute remaining samples based on fractional parts
        if remaining > 0:
            fractions = raw_counts - class_counts
            indices = np.argsort(fractions)[-remaining:]
            for idx in indices:
                class_counts[idx] += 1
    else:
        # Probabilistic class counts using multinomial sampling
        class_counts = self.rng.multinomial(self.n_samples, weights)

    # Initialize arrays - always create in channels_last format first (internal format)
    X = np.zeros((self.n_samples, self.n_timesteps, self.n_dimensions))
    y = np.zeros(self.n_samples, dtype=int)
    all_components = []
    feature_masks = {}

    # Generate data for each class
    sample_idx = 0
    for class_def, count in zip(self.class_definitions, class_counts):
        class_label = class_def["label"]

        for _ in range(count):
            # Initialize arrays for this sample with appropriate fill values per dimension
            background = np.full(
                (self.n_timesteps, self.n_dimensions), self.background_fill_value
            )
            features_dict = {}
            feature_masks_dict = {}

            # Add base structure components
            for base_def in class_def["components"]["background"]:
                # For signals with time range parameters, generate random location once if shared
                if "random_location" in base_def and base_def["random_location"]:
                    # Determine signal length
                    length_pct = base_def["length_pct"]
                    signal_length = max(1, int(length_pct * self.n_timesteps))
                    max_start = self.n_timesteps - signal_length

                    # If shared_location is True, generate the location once for all dimensions
                    shared_location = base_def.get("shared_location", True)
                    if shared_location:
                        shared_start_idx = self.rng.randint(0, max_start + 1)
                        shared_end_idx = shared_start_idx + signal_length

                    # Apply to specified dimensions with appropriate location handling
                    for i, dim_idx in enumerate(base_def["dimensions"]):
                        # Create a full-length vector filled with the background fill value
                        base_vector = np.full(
                            self.n_timesteps, self.background_fill_value
                        )

                        # Determine signal location - possibly unique per dimension
                        if shared_location:
                            # Use the shared location for all dimensions
                            start_idx = shared_start_idx
                            end_idx = shared_end_idx
                        else:
                            # Create a unique location for each dimension
                            dim_rng = np.random.RandomState(
                                self.rng.randint(0, 2**32 - 1)
                            )
                            start_idx = dim_rng.randint(0, max_start + 1)
                            end_idx = start_idx + signal_length

                        # Calculate the actual length of the signal segment
                        signal_length = end_idx - start_idx

                        # Prepare parameters for component generation
                        signal_params = base_def.copy()
                        signal_type = signal_params.pop("type")

                        # Remove location and dimension parameters
                        signal_params.pop("random_location", None)
                        signal_params.pop("length_pct", None)
                        signal_params.pop("shared_location", None)
                        signal_params.pop("dimensions", None)
                        signal_params.pop("shared_randomness", None)

                        # Generate the component only for the specified length
                        signal_values = generate_component(
                            signal_type, signal_length, self.rng, **signal_params
                        )

                        # Place the signal in the correct location
                        base_vector[start_idx:end_idx] = signal_values

                        # Add to background for this dimension
                        background[:, dim_idx] = self._add_vector_handling_nans(
                            background[:, dim_idx], base_vector
                        )
                else:
                    # Handle non-random location signals (the original behavior)
                    if "random_location" in base_def:
                        # Fixed location signal
                        base_vector = np.full(
                            self.n_timesteps, self.background_fill_value
                        )

                        start_pct = base_def["start_pct"]
                        end_pct = base_def["end_pct"]
                        start_idx = int(start_pct * self.n_timesteps)
                        end_idx = int(end_pct * self.n_timesteps)

                        # Ensure at least one timestep is selected
                        if start_idx == end_idx:
                            end_idx = start_idx + 1

                        signal_length = end_idx - start_idx

                        # Generate the component only for the specified length
                        signal_params = base_def.copy()
                        signal_type = signal_params.pop("type")

                        # Remove location parameters
                        signal_params.pop("random_location", None)
                        signal_params.pop("start_pct", None)
                        signal_params.pop("end_pct", None)
                        signal_params.pop("dimensions", None)
                        signal_params.pop("shared_randomness", None)

                        signal_values = generate_component(
                            signal_type, signal_length, self.rng, **signal_params
                        )

                        base_vector[start_idx:end_idx] = signal_values
                    else:
                        # Full-length signal (original behavior)
                        base_vector = self._generate_component_vector(base_def)

                    # Apply to all specified dimensions with the same signal
                    for dim_idx in base_def["dimensions"]:
                        background[:, dim_idx] = self._add_vector_handling_nans(
                            background[:, dim_idx], base_vector
                        )

            # Initialize aggregated time series
            aggregated = background.copy()

            # Add features
            for feature_idx, feature_def in enumerate(
                class_def["components"]["features"]
            ):
                # For each dimension in the feature
                feature_dims = feature_def["dimensions"]

                # Generate a shared random location once if needed
                shared_location_cache = None
                if feature_def.get("random_location", False) and feature_def.get(
                    "shared_location", True
                ):
                    # Pre-calculate the shared location to ensure it's the same across dimensions
                    length_pct = self._resolve_length_pct(
                        feature_def["length_pct"], self.rng
                    )
                    feature_length = max(1, int(length_pct * self.n_timesteps))
                    max_start = self.n_timesteps - feature_length
                    shared_start_idx = self.rng.randint(0, max_start + 1)
                    shared_end_idx = shared_start_idx + feature_length
                    shared_location_cache = (shared_start_idx, shared_end_idx)

                for i, dim_idx in enumerate(feature_dims):
                    # Generate feature vector - if shared_location is True and we have a cached location,
                    # pass it; otherwise pass the dimension index for unique locations
                    dim_index = (
                        None
                        if feature_def.get("shared_location", True)
                        else dim_idx
                    )
                    feature, mask = self._generate_feature_vector(
                        feature_def, dim_index, shared_location_cache
                    )

                    # Add to aggregated series for this dimension
                    aggregated[:, dim_idx] = self._add_vector_handling_nans(
                        aggregated[:, dim_idx], feature
                    )

                    # Store components
                    feature_name = (
                        f"feature_{feature_idx}_{feature_def['type']}_dim{dim_idx}"
                    )
                    if feature_name not in features_dict:
                        features_dict[feature_name] = feature
                        feature_masks_dict[feature_name] = mask

                    # Add to global feature masks
                    feature_key = f"class_{class_label}_{feature_name}"
                    if feature_key not in feature_masks:
                        feature_masks[feature_key] = np.zeros(
                            (self.n_samples, self.n_timesteps), dtype=bool
                        )

                    feature_masks[feature_key][sample_idx] = mask

            # Normalize if required (apply to each dimension separately)
            for dim_idx in range(self.n_dimensions):
                aggregated[:, dim_idx] = normalize(
                    aggregated[:, dim_idx],
                    method=self.normalization,
                    **self.normalization_kwargs,
                )

            # Store the result
            X[sample_idx] = aggregated
            y[sample_idx] = class_label

            # Store components if needed
            if return_components:
                all_components.append(
                    TimeSeriesComponents(
                        background=background,
                        features=features_dict,
                        feature_masks=feature_masks_dict,
                        aggregated=aggregated,
                    )
                )

            sample_idx += 1

    # Shuffle the dataset if requested
    if shuffle:
        # Generate shuffled indices based on the random state
        indices = np.arange(self.n_samples)
        self.rng.shuffle(indices)

        # Shuffle X and y arrays
        X = X[indices]
        y = y[indices]

        # Shuffle components if they were returned
        if return_components:
            all_components = [all_components[i] for i in indices]

        # Shuffle feature masks
        for key in feature_masks:
            feature_masks[key] = feature_masks[key][indices]

    # Convert the tensor format if needed (from channels_last to channels_first)
    if self.data_format == "channels_first":
        # Transpose from [n_samples, n_timesteps, n_dimensions] to [n_samples, n_dimensions, n_timesteps]
        X = np.transpose(X, (0, 2, 1))

    # Prepare result dictionary
    result = {
        "X": X,
        "y": y,
        "feature_masks": feature_masks,
        "metadata": {
            "n_samples": self.n_samples,
            "n_timesteps": self.n_timesteps,
            "n_dimensions": self.n_dimensions,
            "class_definitions": self.class_definitions,
            "normalize": self.normalization,
            "normalization_kwargs": self.normalization_kwargs,
            "random_state": self.random_state,
            "data_format": self.data_format,
            "shuffled": shuffle,
        },
    }

    if return_components:
        result["components"] = all_components

    return result

`clone(n_timesteps: Optional[int] = None, n_samples: Optional[int] = None, n_dimensions: Optional[int] = None, normalization: Optional[str] = None, random_state: Optional[int] = None, normalization_kwargs: Optional[Dict[str, Any]] = None, feature_fill_value: Optional[Any] = None, background_fill_value: Optional[Any] = None, data_format: Optional[str] = None) -> TimeSeriesBuilder` ¶

Create a new builder with the same class definitions but different parameters.

This method creates an independent copy of the builder with all its class definitions but allows overriding specific parameters. This is particularly useful for generating train/test/validation splits with the same underlying patterns but different sample counts or random seeds.

Parameters:

Name	Type	Description	Default
`n_timesteps`	`Optional[int]`	New length of each time series. Defaults to original value.	`None`
`n_samples`	`Optional[int]`	New number of samples to generate. Defaults to original value.	`None`
`n_dimensions`	`Optional[int]`	New number of dimensions. Defaults to original value.	`None`
`normalization`	`Optional[str]`	New normalization method. Defaults to original value.	`None`
`random_state`	`Optional[int]`	New random seed for reproducibility. Defaults to original value.	`None`
`normalization_kwargs`	`Optional[Dict[str, Any]]`	New normalization parameters. Defaults to original value.	`None`
`feature_fill_value`	`Optional[Any]`	New value for non-existent features. Defaults to original value.	`None`
`background_fill_value`	`Optional[Any]`	New value for background. Defaults to original value.	`None`
`data_format`	`Optional[str]`	New data format ('channels_first' or 'channels_last'). Defaults to original value.	`None`

Returns:

Name	Type	Description
`TimeSeriesBuilder`	`TimeSeriesBuilder`	A new independent builder with copied class definitions
	`TimeSeriesBuilder`	and potentially updated parameters.

Example

# Create base builder with class definitions
base_builder = (
    TimeSeriesBuilder(n_timesteps=100, random_state=42)
    .for_class(0)
    .add_signal(random_walk(step_size=0.2))
    .for_class(1)
    .add_signal(random_walk(step_size=0.2))
    .add_feature(constant(value=1.0), start_pct=0.4, end_pct=0.6)
)

# Generate train dataset with 140 samples
train_dataset = base_builder.clone(n_samples=140, random_state=42).build()

# Generate test dataset with 60 samples and a different random seed
test_dataset = base_builder.clone(n_samples=60, random_state=43).build()

Source code in xaitimesynth/builder.py

def clone(
    self,
    n_timesteps: Optional[int] = None,
    n_samples: Optional[int] = None,
    n_dimensions: Optional[int] = None,
    normalization: Optional[str] = None,
    random_state: Optional[int] = None,
    normalization_kwargs: Optional[Dict[str, Any]] = None,
    feature_fill_value: Optional[Any] = None,
    background_fill_value: Optional[Any] = None,
    data_format: Optional[str] = None,
) -> "TimeSeriesBuilder":
    """Create a new builder with the same class definitions but different parameters.

    This method creates an independent copy of the builder with all its class
    definitions but allows overriding specific parameters. This is particularly
    useful for generating train/test/validation splits with the same underlying
    patterns but different sample counts or random seeds.

    Args:
        n_timesteps: New length of each time series. Defaults to original value.
        n_samples: New number of samples to generate. Defaults to original value.
        n_dimensions: New number of dimensions. Defaults to original value.
        normalization: New normalization method. Defaults to original value.
        random_state: New random seed for reproducibility. Defaults to original value.
        normalization_kwargs: New normalization parameters. Defaults to original value.
        feature_fill_value: New value for non-existent features. Defaults to original value.
        background_fill_value: New value for background. Defaults to original value.
        data_format: New data format ('channels_first' or 'channels_last'). Defaults to original value.

    Returns:
        TimeSeriesBuilder: A new independent builder with copied class definitions
        and potentially updated parameters.

    Example:
        ```python
        # Create base builder with class definitions
        base_builder = (
            TimeSeriesBuilder(n_timesteps=100, random_state=42)
            .for_class(0)
            .add_signal(random_walk(step_size=0.2))
            .for_class(1)
            .add_signal(random_walk(step_size=0.2))
            .add_feature(constant(value=1.0), start_pct=0.4, end_pct=0.6)
        )

        # Generate train dataset with 140 samples
        train_dataset = base_builder.clone(n_samples=140, random_state=42).build()

        # Generate test dataset with 60 samples and a different random seed
        test_dataset = base_builder.clone(n_samples=60, random_state=43).build()
        ```
    """
    # Prepare parameters with defaults from current instance when not provided
    params = {
        "n_timesteps": n_timesteps if n_timesteps is not None else self.n_timesteps,
        "n_samples": n_samples if n_samples is not None else self.n_samples,
        "n_dimensions": n_dimensions
        if n_dimensions is not None
        else self.n_dimensions,
        "normalization": normalization
        if normalization is not None
        else self.normalization,
        "random_state": random_state
        if random_state is not None
        else self.random_state,
        "normalization_kwargs": (
            normalization_kwargs
            if normalization_kwargs is not None
            else copy.deepcopy(self.normalization_kwargs)
        ),
        "feature_fill_value": feature_fill_value
        if feature_fill_value is not None
        else self.feature_fill_value,
        "background_fill_value": background_fill_value
        if background_fill_value is not None
        else self.background_fill_value,
        "data_format": data_format if data_format is not None else self.data_format,
    }
    # Create new builder with updated parameters
    new_builder = TimeSeriesBuilder(**params)

    # Copy class definitions (deep copy to ensure complete independence)
    new_builder.class_definitions = copy.deepcopy(self.class_definitions)

    # Set current class if one was selected in the original builder
    if self.current_class is not None:
        # Find the class label of the current class
        for i, class_def in enumerate(self.class_definitions):
            if class_def is self.current_class:
                new_builder.current_class = new_builder.class_definitions[i]
                break

    return new_builder

`to_df(dataset: Dict[str, Any], samples: Optional[List[int]] = None, classes: Optional[List[int]] = None, components: Optional[List[str]] = None, dimensions: Optional[List[int]] = None, format_classes: bool = False) -> pd.DataFrame` ¶

Convert time series dataset to a long-format pandas DataFrame.

Creates a DataFrame with one row per timestep per component per sample per dimension, suitable for detailed analysis and visualization with libraries like Seaborn or Plotly.

Parameters:

Name	Type	Description	Default
`dataset`	`Dict[str, Any]`	Dataset dictionary returned by build().	required
`samples`	`Optional[List[int]]`	List of sample indices to include. If None, includes all samples.	`None`
`classes`	`Optional[List[int]]`	List of class labels to include. If None, includes all classes.	`None`
`components`	`Optional[List[str]]`	List of component types to include. Default includes all: ["aggregated", "background", "features"]	`None`
`dimensions`	`Optional[List[int]]`	List of dimension indices to include. If None, includes all dimensions.	`None`
`format_classes`	`bool`	If True, format class labels as "Class X". Otherwise use numeric labels. Default is False.	`False`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Long-format DataFrame with columns: - time: Timestep index - value: Component value at that timestep - class: Class label (formatted if format_classes=True) - sample: Sample index - component: Component type - feature: Feature name (for feature components) - dim: Dimension index

Raises:

Type	Description
`ValueError`	If specified dimensions are out of range.

Source code in xaitimesynth/builder.py

def to_df(
    self,
    dataset: Dict[str, Any],
    samples: Optional[List[int]] = None,
    classes: Optional[List[int]] = None,
    components: Optional[List[str]] = None,
    dimensions: Optional[List[int]] = None,
    format_classes: bool = False,
) -> pd.DataFrame:
    """Convert time series dataset to a long-format pandas DataFrame.

    Creates a DataFrame with one row per timestep per component per sample per dimension,
    suitable for detailed analysis and visualization with libraries like Seaborn or Plotly.

    Args:
        dataset (Dict[str, Any]): Dataset dictionary returned by build().
        samples (Optional[List[int]]): List of sample indices to include.
            If None, includes all samples.
        classes (Optional[List[int]]): List of class labels to include.
            If None, includes all classes.
        components (Optional[List[str]]): List of component types to include.
            Default includes all: ["aggregated", "background", "features"]
        dimensions (Optional[List[int]]): List of dimension indices to include.
            If None, includes all dimensions.
        format_classes (bool): If True, format class labels as "Class X".
            Otherwise use numeric labels. Default is False.

    Returns:
        pd.DataFrame: Long-format DataFrame with columns:
            - time: Timestep index
            - value: Component value at that timestep
            - class: Class label (formatted if format_classes=True)
            - sample: Sample index
            - component: Component type
            - feature: Feature name (for feature components)
            - dim: Dimension index

    Raises:
        ValueError: If specified dimensions are out of range.
    """
    # Default components to include (use programming-friendly names)
    default_components = ["aggregated", "background", "features"]
    components_to_include = (
        components if components is not None else default_components
    )

    # Get number of dimensions from metadata or infer from data shape
    n_dims = dataset.get("metadata", {}).get("n_dimensions", 1)
    if n_dims == 1 and len(dataset["X"].shape) == 3:
        n_dims = dataset["X"].shape[2]

    # Default dimensions to include
    if dimensions is None:
        dimensions = list(range(n_dims))
    else:
        # Validate dimensions
        for d in dimensions:
            if not 0 <= d < n_dims:
                raise ValueError(
                    f"Dimension {d} is out of range (0 to {n_dims - 1})."
                )

    # Filter by class if specified
    if classes is not None:
        class_indices = np.where(np.isin(dataset["y"], classes))[0]
    else:
        class_indices = np.arange(len(dataset["y"]))

    # Filter by sample if specified
    if samples is not None:
        sample_indices = np.array(samples)
        # Ensure sample indices are within class_indices
        sample_indices = np.intersect1d(sample_indices, class_indices)
    else:
        sample_indices = class_indices

    # Initialize list to hold DataFrames
    dfs = []

    # Process aggregated time series (formerly "Complete Series")
    if "aggregated" in components_to_include:
        # Get all selected samples at once
        X_selected = dataset["X"][sample_indices]
        n_samples = len(sample_indices)
        n_timesteps = X_selected.shape[1]

        # For each dimension
        for dim_idx in dimensions:
            # Create time indices for all samples
            times = np.arange(n_timesteps)

            # Create sample indices repeated for each timestep
            sample_idx_rep = np.repeat(sample_indices, n_timesteps)
            time_idx_rep = np.tile(times, n_samples)

            # Create values array for this dimension
            if len(X_selected.shape) == 3:  # Multivariate case
                values = X_selected[:, :, dim_idx].flatten()
            else:  # Univariate case (backward compatibility)
                values = X_selected.flatten()

            # Get class labels
            classes_rep = np.repeat(dataset["y"][sample_indices], n_timesteps)
            if format_classes:
                class_labels = np.array([f"Class {c}" for c in classes_rep])
            else:
                class_labels = classes_rep

            # Create DataFrame
            df_agg = pd.DataFrame(
                {
                    "time": time_idx_rep,
                    "value": values,
                    "class": class_labels,
                    "sample": sample_idx_rep,
                    "component": "aggregated",
                    "feature": None,
                    "dim": dim_idx,
                }
            )

            dfs.append(df_agg)

    # Process components if available
    if "components" in dataset:
        for component_name in ["background"]:
            if component_name in components_to_include:
                for dim_idx in dimensions:
                    comp_data = []
                    valid_samples = []

                    # Collect data from all samples
                    for i, idx in enumerate(sample_indices):
                        comp = dataset["components"][idx]
                        if (
                            hasattr(comp, component_name)
                            and getattr(comp, component_name) is not None
                        ):
                            comp_array = getattr(comp, component_name)
                            # Check if component has dimension data
                            if (
                                len(comp_array.shape) == 2
                                and comp_array.shape[1] > dim_idx
                            ):
                                comp_data.append(comp_array[:, dim_idx])
                                valid_samples.append(idx)
                            elif len(comp_array.shape) == 1 and dim_idx == 0:
                                # Backward compatibility - 1D array for univariate case
                                comp_data.append(comp_array)
                                valid_samples.append(idx)

                    if comp_data:
                        # Stack component data
                        comp_array = np.vstack(comp_data)
                        n_valid = len(valid_samples)
                        n_timesteps = comp_array.shape[1]

                        # Create indices
                        sample_idx_rep = np.repeat(valid_samples, n_timesteps)
                        time_idx_rep = np.tile(np.arange(n_timesteps), n_valid)

                        # Get class labels
                        classes_rep = np.repeat(
                            dataset["y"][valid_samples], n_timesteps
                        )
                        if format_classes:
                            class_labels = np.array(
                                [f"Class {c}" for c in classes_rep]
                            )
                        else:
                            class_labels = classes_rep

                        # Create DataFrame
                        df_comp = pd.DataFrame(
                            {
                                "time": time_idx_rep,
                                "value": comp_array.flatten(),
                                "class": class_labels,
                                "sample": sample_idx_rep,
                                "component": component_name,
                                "feature": None,
                                "dim": dim_idx,
                            }
                        )

                        dfs.append(df_comp)

        # Process features - features need special handling since they're stored in a dict
        if "features" in components_to_include:
            feature_dfs = []

            for idx in sample_indices:
                comp = dataset["components"][idx]
                if hasattr(comp, "features") and comp.features:
                    for feature_name, feature_values in comp.features.items():
                        # Extract dimension from feature name (if present)
                        if "_dim" in feature_name:
                            parts = feature_name.split("_dim")
                            dim_idx = int(parts[-1])
                            if dim_idx not in dimensions:
                                continue
                        else:
                            # For backward compatibility, assume dimension 0
                            dim_idx = 0
                            if dim_idx not in dimensions:
                                continue

                        # Get class label
                        class_label = dataset["y"][idx]
                        if format_classes:
                            class_str = f"Class {class_label}"
                        else:
                            class_str = class_label

                        # Create feature DataFrame
                        df_feature = pd.DataFrame(
                            {
                                "time": np.arange(len(feature_values)),
                                "value": feature_values,
                                "class": class_str,
                                "sample": idx,
                                "component": "features",
                                "feature": feature_name,
                                "dim": dim_idx,
                            }
                        )

                        feature_dfs.append(df_feature)

            if feature_dfs:
                dfs.append(pd.concat(feature_dfs, ignore_index=True))

    # Combine all DataFrames
    if not dfs:
        return pd.DataFrame()

    df = pd.concat(dfs, ignore_index=True)

    # Set up categorical variables for ordered plotting
    components_present = [
        c for c in components_to_include if c in df["component"].unique()
    ]
    df["component"] = pd.Categorical(
        df["component"], categories=components_present, ordered=True
    )

    if format_classes:
        class_labels = sorted(
            df["class"].unique(), key=lambda x: int(x.split()[-1])
        )
        df["class"] = pd.Categorical(
            df["class"], categories=class_labels, ordered=True
        )

    return df

`convert_data_format(dataset: Dict[str, Any], target_format: str) -> Dict[str, Any]` `staticmethod` ¶

Convert an existing dataset between 'channels_first' and 'channels_last' formats.

This utility function helps convert datasets between the two supported tensor layouts: - 'channels_last': [batch_size, time_steps, channels] (original XAITimeSynth format) - 'channels_first': [batch_size, channels, time_steps] (PyTorch/tsai format)

Parameters:

Name	Type	Description	Default
`dataset`	`Dict[str, Any]`	Dataset dictionary returned by build().	required
`target_format`	`str`	Target format, either 'channels_first' or 'channels_last'.	required

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: Dataset with X tensor in the target format. The metadata is updated to reflect the new format.

Raises:

Type	Description
`ValueError`	If target_format is not one of ['channels_first', 'channels_last'].
`ValueError`	If dataset doesn't contain a metadata entry with data_format.

Source code in xaitimesynth/builder.py

@staticmethod
def convert_data_format(
    dataset: Dict[str, Any], target_format: str
) -> Dict[str, Any]:
    """Convert an existing dataset between 'channels_first' and 'channels_last' formats.

    This utility function helps convert datasets between the two supported tensor layouts:
    - 'channels_last': [batch_size, time_steps, channels] (original XAITimeSynth format)
    - 'channels_first': [batch_size, channels, time_steps] (PyTorch/tsai format)

    Args:
        dataset (Dict[str, Any]): Dataset dictionary returned by build().
        target_format (str): Target format, either 'channels_first' or 'channels_last'.

    Returns:
        Dict[str, Any]: Dataset with X tensor in the target format. The metadata
            is updated to reflect the new format.

    Raises:
        ValueError: If target_format is not one of ['channels_first', 'channels_last'].
        ValueError: If dataset doesn't contain a metadata entry with data_format.
    """
    # Validate format
    if target_format not in ["channels_first", "channels_last"]:
        raise ValueError(
            "target_format must be one of ['channels_first', 'channels_last']"
        )

    # Create a shallow copy of the dataset
    result = dataset.copy()

    # Get current format from metadata
    if "metadata" not in dataset or "data_format" not in dataset["metadata"]:
        # Try to infer format
        if "X" in dataset and len(dataset["X"].shape) == 3:
            # Assume original format for backward compatibility
            current_format = "channels_last"
        else:
            raise ValueError("Dataset doesn't have format information in metadata")
    else:
        current_format = dataset["metadata"]["data_format"]

    # If already in target format, return dataset as-is
    if current_format == target_format:
        return result

    # Convert format by transposing the data
    if "X" in result:
        # Convert from channels_last to channels_first
        if current_format == "channels_last" and target_format == "channels_first":
            result["X"] = np.transpose(result["X"], (0, 2, 1))
        # Convert from channels_first to channels_last
        elif (
            current_format == "channels_first" and target_format == "channels_last"
        ):
            result["X"] = np.transpose(result["X"], (0, 2, 1))

        # Also convert train/test splits if they exist
        if "X_train" in result:
            if (
                current_format == "channels_last"
                and target_format == "channels_first"
            ):
                result["X_train"] = np.transpose(result["X_train"], (0, 2, 1))
            else:
                result["X_train"] = np.transpose(result["X_train"], (0, 2, 1))

        if "X_test" in result:
            if (
                current_format == "channels_last"
                and target_format == "channels_first"
            ):
                result["X_test"] = np.transpose(result["X_test"], (0, 2, 1))
            else:
                result["X_test"] = np.transpose(result["X_test"], (0, 2, 1))

    # Update metadata
    if "metadata" in result:
        result["metadata"] = result["metadata"].copy()
        result["metadata"]["data_format"] = target_format

    return result

Builder¶

TimeSeriesBuilder¶

TimeSeriesBuilder ¶

for_class(class_label: int, weight: float = 1.0) -> TimeSeriesBuilder ¶

add_signal(component: Dict[str, Any], dim: Optional[List[int]] = None, shared_randomness: bool = False, start_pct: Optional[float] = None, end_pct: Optional[float] = None, length_pct: Optional[float] = None, random_location: bool = False, shared_location: bool = True) -> TimeSeriesBuilder ¶

Full time series (default - no location params)¶

Fixed segment from 20% to 50% of the series¶

Random segment of 30% length¶

build(return_components: bool = True, deterministic_class_counts: bool = True, shuffle: bool = True) -> Dict[str, Any] ¶

to_df(dataset: Dict[str, Any], samples: Optional[List[int]] = None, classes: Optional[List[int]] = None, components: Optional[List[str]] = None, dimensions: Optional[List[int]] = None, format_classes: bool = False) -> pd.DataFrame ¶

convert_data_format(dataset: Dict[str, Any], target_format: str) -> Dict[str, Any] staticmethod ¶

`TimeSeriesBuilder` ¶

`for_class(class_label: int, weight: float = 1.0) -> TimeSeriesBuilder` ¶

`add_signal(component: Dict[str, Any], dim: Optional[List[int]] = None, shared_randomness: bool = False, start_pct: Optional[float] = None, end_pct: Optional[float] = None, length_pct: Optional[float] = None, random_location: bool = False, shared_location: bool = True) -> TimeSeriesBuilder` ¶

`build(return_components: bool = True, deterministic_class_counts: bool = True, shuffle: bool = True) -> Dict[str, Any]` ¶

`to_df(dataset: Dict[str, Any], samples: Optional[List[int]] = None, classes: Optional[List[int]] = None, components: Optional[List[str]] = None, dimensions: Optional[List[int]] = None, format_classes: bool = False) -> pd.DataFrame` ¶

`convert_data_format(dataset: Dict[str, Any], target_format: str) -> Dict[str, Any]` `staticmethod` ¶