Datasets¶
Ready-made dataset generators. Each function returns the standard xaitimesynth dictionary with ground-truth feature masks included.
generate_cylinder_bell_funnel(n_samples: int = 300, n_timesteps: int = 128, weights: Optional[List[float]] = None, random_state: Optional[int] = None, normalization: str = 'none', data_format: str = 'channels_first') -> Dict[str, Any]
¶
Generate a Cylinder-Bell-Funnel (CBF) dataset with ground-truth feature masks.
Recreates the classic CBF time series benchmark (Saito, 2000) using
xaitimesynth's builder, so each sample comes with a boolean feature_mask
that marks the exact window where the class-discriminating pattern lives.
The three classes differ only inside a randomly placed window [a, b]:
.. code-block:: text
Cylinder (0): constant plateau of amplitude (6 + η)
Bell (1): linearly increasing ramp 0 → (6 + η)
Funnel (2): linearly decreasing ramp (6 + η) → 0
Outside [a, b] all classes share the same Gaussian noise background ε(t) ~ N(0,1). The amplitude noise η ~ N(0,1) is drawn fresh for every sample.
Approximation vs. original:
The original formulation draws a ~ Uniform[16, 32] (window never starts
before timestep 16) and b - a ~ Uniform[32, 96]. This implementation
samples the window length uniformly from [32, 96] timesteps
(length_pct=(0.25, 0.75)) and places it at a fully random start
position, so the window can begin at timestep 0. The length distribution
is faithful; the start distribution is wider. For XAI benchmarking the
ground-truth mask is what matters, so this difference is intentional.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_samples
|
int
|
Total number of time series to generate. Default 300. |
300
|
n_timesteps
|
int
|
Length of each time series. Default 128. |
128
|
weights
|
list of float
|
Sampling weight for each of the three
classes |
None
|
random_state
|
int
|
Seed for reproducibility. Default None. |
None
|
normalization
|
str
|
Normalisation applied to each generated series.
|
'none'
|
data_format
|
str
|
Output tensor layout. |
'channels_first'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Dict[str, Any]
|
Standard xaitimesynth dataset dictionary with keys:
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
References
Saito, N. (2000). Local feature extraction and its applications using a library of bases. Topics in Analysis and Its Applications: Selected Theses, 269–451. World Scientific.
Example
dataset = generate_cylinder_bell_funnel(n_samples=90, random_state=42) X, y = dataset["X"], dataset["y"] X.shape (90, 1, 128) import numpy as np np.bincount(y) array([30, 30, 30]) masks = dataset["feature_masks"]
Source code in xaitimesynth/datasets.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |