Data Structure¶
This notebook walks through the dictionary returned by TimeSeriesBuilder.build(). We build a small two-class dataset and inspect every key — printing shapes and values so you can verify the structure before plugging it into your own XAI workflow.
In [1]:
Copied!
import numpy as np
from lets_plot import LetsPlot
from xaitimesynth import (
TimeSeriesBuilder,
gaussian_noise,
gaussian_pulse,
plot_components,
)
LetsPlot.setup_html()
import numpy as np
from lets_plot import LetsPlot
from xaitimesynth import (
TimeSeriesBuilder,
gaussian_noise,
gaussian_pulse,
plot_components,
)
LetsPlot.setup_html()
In [2]:
Copied!
dataset = (
TimeSeriesBuilder(n_timesteps=100, n_samples=30, random_state=0)
.for_class(0)
.add_signal(gaussian_noise(sigma=0.5))
.for_class(1)
.add_signal(gaussian_noise(sigma=0.5))
.add_feature(gaussian_pulse(amplitude=2.0), start_pct=0.3, end_pct=0.7)
.build()
)
plot_components(dataset)
dataset = (
TimeSeriesBuilder(n_timesteps=100, n_samples=30, random_state=0)
.for_class(0)
.add_signal(gaussian_noise(sigma=0.5))
.for_class(1)
.add_signal(gaussian_noise(sigma=0.5))
.add_feature(gaussian_pulse(amplitude=2.0), start_pct=0.3, end_pct=0.7)
.build()
)
plot_components(dataset)
Out[2]:
Top-level keys¶
| Key | Contents |
|---|---|
X |
Time series data: (n_samples, n_dims, n_timesteps) — channels-first |
y |
Class labels: (n_samples,) |
feature_masks |
Ground truth boolean masks: Dict[str, (n_samples, n_timesteps)] |
components |
Per-sample breakdown: List[TimeSeriesComponents] |
metadata |
Configuration info: Dict |
In [3]:
Copied!
print("X shape: ", dataset["X"].shape) # (n_samples, n_dims, n_timesteps)
print("y shape: ", dataset["y"].shape) # (n_samples,)
print("feature_masks keys: ", list(dataset["feature_masks"].keys()))
print("components length: ", len(dataset["components"])) # n_samples
print("metadata keys: ", list(dataset["metadata"].keys()))
print("X shape: ", dataset["X"].shape) # (n_samples, n_dims, n_timesteps)
print("y shape: ", dataset["y"].shape) # (n_samples,)
print("feature_masks keys: ", list(dataset["feature_masks"].keys()))
print("components length: ", len(dataset["components"])) # n_samples
print("metadata keys: ", list(dataset["metadata"].keys()))
X shape: (30, 1, 100) y shape: (30,) feature_masks keys: ['class_1_feature_0_gaussian_pulse_dim0'] components length: 30 metadata keys: ['n_samples', 'n_timesteps', 'n_dimensions', 'class_definitions', 'normalize', 'normalization_kwargs', 'random_state', 'data_format', 'shuffled']
feature_masks — Ground Truth¶
Dictionary of boolean arrays indicating where each feature is located. This is the ground truth used when evaluating XAI attributions.
Key format: class_{label}_feature_{idx}_{type}_dim{dim}
| Part | Meaning |
|---|---|
class_{label} |
Which class this feature belongs to |
feature_{idx} |
Feature index (order added via add_feature()) |
{type} |
Component type (e.g. peak, gaussian_pulse) |
dim{dim} |
Dimension index |
In [4]:
Copied!
mask = dataset["feature_masks"]["class_1_feature_0_gaussian_pulse_dim0"]
print("mask shape:", mask.shape) # (n_samples, n_timesteps)
print("mask dtype:", mask.dtype) # bool
# Find feature location for a specific sample
sample_idx = 5
feature_timesteps = np.where(mask[sample_idx])[0]
print(
f"Sample {sample_idx}: feature at timesteps {feature_timesteps[0]}–{feature_timesteps[-1]}"
)
mask = dataset["feature_masks"]["class_1_feature_0_gaussian_pulse_dim0"]
print("mask shape:", mask.shape) # (n_samples, n_timesteps)
print("mask dtype:", mask.dtype) # bool
# Find feature location for a specific sample
sample_idx = 5
feature_timesteps = np.where(mask[sample_idx])[0]
print(
f"Sample {sample_idx}: feature at timesteps {feature_timesteps[0]}–{feature_timesteps[-1]}"
)
mask shape: (30, 100) mask dtype: bool Sample 5: feature at timesteps 30–69
In [5]:
Copied!
# Combine masks for all features of a class (OR across features)
class_1_masks = [
v for k, v in dataset["feature_masks"].items() if k.startswith("class_1")
]
combined = np.any(class_1_masks, axis=0) # shape: (n_samples, n_timesteps)
print("combined mask shape:", combined.shape)
# Combine masks for all features of a class (OR across features)
class_1_masks = [
v for k, v in dataset["feature_masks"].items() if k.startswith("class_1")
]
combined = np.any(class_1_masks, axis=0) # shape: (n_samples, n_timesteps)
print("combined mask shape:", combined.shape)
combined mask shape: (30, 100)
components — Per-Sample Breakdown¶
dataset["components"] is a list of TimeSeriesComponents objects, one per sample. Each object exposes the individual signal parts before they were combined.
| Attribute | Shape | Description |
|---|---|---|
background |
(T, D) |
Base signal (signals added via add_signal()) |
aggregated |
(T, D) |
Final combined signal — matches X[i] transposed |
features |
Dict[str, (T,)] |
Feature values; NaN outside the feature region |
feature_masks |
Dict[str, (T,)] |
Per-sample boolean masks (1D) |
In [6]:
Copied!
comp = dataset["components"][sample_idx]
print("background shape: ", comp.background.shape) # (T, D)
print("aggregated shape: ", comp.aggregated.shape) # (T, D)
print("features keys: ", list(comp.features.keys()))
print("feature_masks keys:", list(comp.feature_masks.keys()))
comp = dataset["components"][sample_idx]
print("background shape: ", comp.background.shape) # (T, D)
print("aggregated shape: ", comp.aggregated.shape) # (T, D)
print("features keys: ", list(comp.features.keys()))
print("feature_masks keys:", list(comp.feature_masks.keys()))
background shape: (100, 1) aggregated shape: (100, 1) features keys: ['feature_0_gaussian_pulse_dim0'] feature_masks keys: ['feature_0_gaussian_pulse_dim0']
In [7]:
Copied!
# aggregated is (T, D); X[i] is (D, T) — they match after transposing
assert np.allclose(dataset["X"][sample_idx], comp.aggregated.T)
print("aggregated.T matches X[sample_idx]: True")
# aggregated is (T, D); X[i] is (D, T) — they match after transposing
assert np.allclose(dataset["X"][sample_idx], comp.aggregated.T)
print("aggregated.T matches X[sample_idx]: True")
aggregated.T matches X[sample_idx]: True
metadata¶
In [8]:
Copied!
for k, v in dataset["metadata"].items():
print(f"{k}: {v}")
for k, v in dataset["metadata"].items():
print(f"{k}: {v}")
n_samples: 30
n_timesteps: 100
n_dimensions: 1
class_definitions: [{'label': 0, 'weight': 1.0, 'components': {'background': [{'type': 'gaussian_noise', 'mu': 0.0, 'sigma': 0.5, 'dimensions': [0], 'shared_randomness': False, 'shared_location': True}], 'features': []}}, {'label': 1, 'weight': 1.0, 'components': {'background': [{'type': 'gaussian_noise', 'mu': 0.0, 'sigma': 0.5, 'dimensions': [0], 'shared_randomness': False, 'shared_location': True}], 'features': [{'type': 'gaussian_pulse', 'amplitude': 2.0, 'width_ratio': 1, 'center': 0.5, 'random_location': False, 'start_pct': 0.3, 'end_pct': 0.7, 'dimensions': [0], 'shared_location': True, 'shared_randomness': False}]}}]
normalize: zscore
normalization_kwargs: {}
random_state: 0
data_format: channels_first
shuffled: True
Shape Summary¶
| Array | Shape | Notes |
|---|---|---|
dataset["X"] |
(N, D, T) |
channels-first |
dataset["y"] |
(N,) |
|
dataset["feature_masks"][key] |
(N, T) |
boolean |
comp.background |
(T, D) |
per-sample |
comp.aggregated |
(T, D) |
per-sample; equals X[i].T |
comp.features[key] |
(T,) |
NaN outside feature window |
comp.feature_masks[key] |
(T,) |
boolean, per-sample |
Where: N = n_samples, D = n_dimensions, T = n_timesteps
In [9]:
Copied!
class_1_mask = dataset["y"] == 1
X_class_1 = dataset["X"][class_1_mask]
components_class_1 = [c for i, c in enumerate(dataset["components"]) if class_1_mask[i]]
print("X_class_1 shape:", X_class_1.shape)
class_1_mask = dataset["y"] == 1
X_class_1 = dataset["X"][class_1_mask]
components_class_1 = [c for i, c in enumerate(dataset["components"]) if class_1_mask[i]]
print("X_class_1 shape:", X_class_1.shape)
X_class_1 shape: (15, 1, 100)
Access feature location from top-level masks¶
In [10]:
Copied!
mask_key = "class_1_feature_0_gaussian_pulse_dim0"
m = dataset["feature_masks"][mask_key]
start = np.where(m[sample_idx])[0][0]
end = np.where(m[sample_idx])[0][-1]
print(f"Sample {sample_idx}: feature at timesteps {start}–{end}")
# Same result via per-sample components
comp_mask = dataset["components"][sample_idx].feature_masks[
"feature_0_gaussian_pulse_dim0"
]
assert np.array_equal(m[sample_idx], comp_mask)
print("Top-level and per-sample masks agree: True")
mask_key = "class_1_feature_0_gaussian_pulse_dim0"
m = dataset["feature_masks"][mask_key]
start = np.where(m[sample_idx])[0][0]
end = np.where(m[sample_idx])[0][-1]
print(f"Sample {sample_idx}: feature at timesteps {start}–{end}")
# Same result via per-sample components
comp_mask = dataset["components"][sample_idx].feature_masks[
"feature_0_gaussian_pulse_dim0"
]
assert np.array_equal(m[sample_idx], comp_mask)
print("Top-level and per-sample masks agree: True")
Sample 5: feature at timesteps 30–69 Top-level and per-sample masks agree: True