Adding Custom Generators¶
This guide explains how to create custom time series components for use in the TimeSeriesBuilder API. There are two approaches: the manual() component for one-off patterns (see Custom data generation), and registering a proper reusable component covered here.
For defining new custom data generators, the decorator approach below or using manual() function are easiest and quickest. For reusable generators integrated into the package, follow the two-function pattern: a generator function in generators.py + a component function in components.py, then register in __init__.py.
Quick Extension with Decorators¶
For quick custom extensions or prototyping, you can use the @register_component_generator decorator. This simplifies a lot of the steps below into a single decorator:
# In generators.py or your own module
from xaitimesynth.registry import register_component_generator
@register_component_generator(component_type="both")
def generate_sine_wave(
n_timesteps: int,
frequency: float = 0.1,
amplitude: float = 1.0,
phase: float = 0.0,
rng: Optional[np.random.RandomState] = None,
length: Optional[int] = None,
**kwargs,
) -> np.ndarray:
"""Generate a sine wave signal."""
output_length = length if length is not None else n_timesteps
t = np.arange(output_length)
return amplitude * np.sin(2 * np.pi * frequency * t / n_timesteps + phase)
You can then use your registered component directly in the TimeSeriesBuilder. Pass the component as a dictionary with the registered type name, or define a small helper function for a cleaner call site:
from xaitimesynth import TimeSeriesBuilder, gaussian_noise
# Option 1: pass a dict directly
dataset = (
TimeSeriesBuilder(n_timesteps=200, n_samples=50)
.for_class(0)
.add_signal({"type": "sine_wave", "frequency": 0.05, "amplitude": 1.5})
.add_signal(gaussian_noise(sigma=0.1))
.for_class(1)
.add_signal(gaussian_noise(sigma=0.1))
.add_feature({"type": "sine_wave", "frequency": 0.2, "amplitude": 2.0}, start_pct=0.3, end_pct=0.7)
.build()
)
# Option 2: define a helper for a cleaner API (mirrors the two-function pattern)
def sine_wave(frequency=0.1, amplitude=1.0, phase=0.0, **kwargs):
return {"type": "sine_wave", "frequency": frequency, "amplitude": amplitude, "phase": phase, **kwargs}
dataset = (
TimeSeriesBuilder(n_timesteps=200, n_samples=50)
.for_class(0)
.add_signal(sine_wave(frequency=0.05, amplitude=1.5))
.add_signal(gaussian_noise(sigma=0.1))
.for_class(1)
.add_signal(gaussian_noise(sigma=0.1))
.add_feature(sine_wave(frequency=0.2, amplitude=2.0), start_pct=0.3, end_pct=0.7)
.build()
)
Limitations of the decorator approach:
- Component function docstrings won't be visible to users
- Less control over the component function API
- Not recommended for stable package integration
Best for:
- Quick experiments
- User-defined custom generators
- Prototyping new components before full integration
The Two-Function Pattern¶
⚠️Note: You likely will only need or want to read the below if you're thinking of more permantently adding a data generating function to this package either locally, or by contributing to the package. Otherwise it's likely too much detail, and you don't need to know the internals to use the package productively.
XAITimeSynth uses a two-function pattern: one component function and one generator function per component type. This is necessary internally as the TimeSeriesBuilder passes the dictionary definitions along and creates the data based on the generator functions from the component functions.
Component Functions (User-Facing)¶
- Location:
components.py - Purpose: Provide a clean, user-friendly API for defining components
- Signature: Takes only user-configurable parameters (no internal stuff)
- Returns: A dictionary with the component specification
Example:
def random_walk(step_size: float = 0.1, **kwargs) -> Dict[str, Any]:
"""Create a definition for a random walk signal component.
Args:
step_size: Standard deviation of random steps. Defaults to 0.1.
**kwargs: Additional parameters.
Returns:
Dict defining the 'random_walk' component with its parameters.
"""
return {"type": "random_walk", "step_size": step_size, **kwargs}
Generator Functions (Internal)¶
- Location:
generators.py - Purpose: Actually create the numpy arrays with the time series data
- Signature: Always follows a standard pattern (see below)
- Returns: A 1D numpy array
Example:
def generate_random_walk(
n_timesteps: int,
step_size: float = 0.1,
rng: Optional[np.random.RandomState] = None,
length: Optional[int] = None,
**kwargs,
) -> np.ndarray:
"""Generate a random walk time series.
Args:
n_timesteps: Nominal length of the time series context.
step_size: Standard deviation of random steps. Defaults to 0.1.
rng: Random number generator instance.
length: Actual desired length (overrides n_timesteps if provided).
**kwargs: Additional parameters for compatibility.
Returns:
A 1D numpy array of the specified length.
"""
if rng is None:
rng = np.random.RandomState()
output_length = length if length is not None else n_timesteps
steps = rng.normal(0, step_size, output_length)
return np.cumsum(steps)
Why Two Functions?¶
This separation provides several benefits:
- Clean API: Users don't see internal parameters like
rng,n_timesteps, orlength - Flexibility: Component definitions can be created, stored, modified, and reused before generation
- Type Safety: Component functions can be registered as "signal", "feature", or "both"
- Documentation: Each function can have targeted documentation for its audience
- Direct Access: Advanced users can call generator functions directly if needed
Standard Parameters Explained¶
All generator functions must follow a standardized signature. Here's why each standard parameter exists:
Required Parameters (in order)¶
1. n_timesteps: int¶
- Purpose: The total length of the time series being generated
- Why it's needed:
- Generators may need to scale frequencies or patterns to fit the full series length
- Even when generating a partial feature (using
length), knowing the full context helps maintain correct scaling - Example: A sine wave with
period=10should complete the same number of cycles whether it's a full signal or a localized feature
2. Standard Generator-Specific Parameters¶
- These are the parameters that control the generator's behavior
- Examples:
step_sizefor random_walk,muandsigmafor gaussian_noise - Placed after
n_timestepsbut before the standard optional parameters
3. rng: Optional[np.random.RandomState]¶
- Purpose: Provides reproducible randomness
- Why it's in every generator:
- Uniform API: All generators can be called the same way, making the internal dispatch simple
- Reproducibility: The builder can pass its RNG to all generators for reproducible datasets
- Future-proofing: Even deterministic generators can be extended with random variations later
- For deterministic generators: Simply ignore this parameter (but still include it in the signature)
4. length: Optional[int]¶
- Purpose: Specifies the actual output length when different from
n_timesteps - Why it's needed:
- Signals vs Features: Signals span the full series; features are localized to a window
- Builder flexibility: The builder can request specific lengths for positioned features
- Standard logic: All generators use the same pattern:
5. **kwargs¶
- Purpose: Catches any extra parameters passed by the builder
- Why it's needed:
- Forward compatibility: New builder features won't break existing generators
- Flexibility: Users can pass custom parameters without breaking the API
- Tolerates extras: If a component definition has extra keys, they won't cause errors
Standard Return Type¶
All generators must return np.ndarray - a 1D numpy array of floats with length equal to output_length.
Adding a New Generator¶
Follow these steps to add a new generator to the package:
Step 1: Implement the Generator Function¶
Add your generator to generators.py. Follow this template:
def generate_your_component(
n_timesteps: int,
# Your custom parameters here (with defaults)
param1: type = default_value,
param2: type = default_value,
# Standard optional parameters
rng: Optional[np.random.RandomState] = None,
length: Optional[int] = None,
**kwargs,
) -> np.ndarray:
"""Generate your custom component.
Brief description of what this generator creates.
Args:
n_timesteps: The nominal length of the time series context. The actual output
length is determined by `length` parameter if provided, otherwise `n_timesteps`.
param1: Description of parameter 1.
param2: Description of parameter 2.
rng: Random number generator instance. [If unused, add: "Included for API
consistency but unused in this deterministic generator."] Defaults to None.
length: The exact desired length of the output time series array.
If provided, this overrides `n_timesteps`. If None, `n_timesteps` is used.
Typically provided by the TimeSeriesBuilder. Defaults to None.
**kwargs: Catches unused parameters passed by TimeSeriesBuilder for compatibility.
Returns:
np.ndarray: A 1D numpy array of the specified length containing [description].
Example:
>>> rng = np.random.RandomState(42)
>>> generate_your_component(n_timesteps=10, param1=value1, rng=rng)
array([...])
"""
# Handle RNG if needed
if rng is None:
rng = np.random.RandomState()
# Determine output length
output_length = length if length is not None else n_timesteps
# Your implementation here
result = np.zeros(output_length) # Or your actual logic
# ... calculation logic ...
return result
Key Requirements:
- Function name must start with generate_
- Parameters must be in the order: n_timesteps, custom params, rng, length, **kwargs
- Must return a 1D numpy array of length output_length
- Must handle both length=None (full signal) and length=N (partial feature) cases
Step 2: Register in GENERATOR_FUNCS Dictionary¶
At the end of generators.py, add your generator to the lookup dictionary:
The key string (e.g., "your_component") is the component type that will be used in component definitions.
Step 3: Create the Component Function¶
Add the user-facing function to components.py:
def your_component(param1: type = default_value, param2: type = default_value, **kwargs) -> Dict[str, Any]:
"""Create a definition for your custom component.
Brief user-friendly description of what this component does and when to use it.
Args:
param1: Description of parameter 1 from a user perspective.
param2: Description of parameter 2 from a user perspective.
**kwargs: Additional parameters passed to the generator during build time.
Returns:
Dict[str, Any]: A dictionary defining the 'your_component' component with its parameters.
Example:
>>> comp = your_component(param1=value1, param2=value2)
>>> comp['type']
'your_component'
"""
return {"type": "your_component", "param1": param1, "param2": param2, **kwargs}
Key Requirements:
- Function name should match the GENERATOR_FUNCS key (without the generate_ prefix)
- Takes only user-configurable parameters (no n_timesteps, rng, or length)
- Returns a dictionary with at least a "type" key matching the generator name
- All parameters should be included in the returned dictionary
Step 4: Register and Export (for package integration)¶
If adding to the package itself (rather than user-side code), add to __init__.py:
from .components import your_component
register_component(your_component, "signal") # Or "feature" or "both"
# Add "your_component" to __all__
For user-side use, the decorator approach below is simpler.
Complete Example: Sine Wave Generator¶
Here's a complete example showing how to add a sine wave generator:
In generators.py:¶
def generate_sine_wave(
n_timesteps: int,
frequency: float = 0.1,
amplitude: float = 1.0,
phase: float = 0.0,
rng: Optional[np.random.RandomState] = None,
length: Optional[int] = None,
**kwargs,
) -> np.ndarray:
"""Generate a sine wave time series.
Creates a sinusoidal signal with specified frequency, amplitude, and phase.
The frequency is relative to the full time series length (n_timesteps).
Args:
n_timesteps: The nominal length of the time series context.
frequency: Frequency of the sine wave as a fraction of the sampling rate.
Defaults to 0.1.
amplitude: Peak amplitude of the sine wave. Defaults to 1.0.
phase: Phase shift in radians. Defaults to 0.0.
rng: Random number generator instance. Included for API consistency
but unused in this deterministic generator. Defaults to None.
length: The exact desired length of the output. If None, uses n_timesteps.
Defaults to None.
**kwargs: Catches unused parameters for compatibility.
Returns:
np.ndarray: A 1D array of the specified length containing a sine wave.
Example:
>>> import numpy as np
>>> wave = generate_sine_wave(n_timesteps=100, frequency=0.1, amplitude=2.0)
>>> wave.shape
(100,)
>>> np.max(wave)
2.0
"""
# Determine output length
output_length = length if length is not None else n_timesteps
# Generate time indices
t = np.arange(output_length)
# Create sine wave (frequency is relative to n_timesteps for consistent scaling)
return amplitude * np.sin(2 * np.pi * frequency * t / n_timesteps + phase)
# Add to GENERATOR_FUNCS dictionary
GENERATOR_FUNCS = {
# ...existing generators...
"sine_wave": generate_sine_wave,
}
In components.py:¶
def sine_wave(
frequency: float = 0.1, amplitude: float = 1.0, phase: float = 0.0, **kwargs
) -> Dict[str, Any]:
"""Create a definition for a sine wave component.
Generates a sinusoidal pattern useful for creating periodic signals or
oscillating features. Can be used as both a full-length signal or a
localized feature.
Args:
frequency: Frequency of the sine wave as a fraction of the sampling rate.
Defaults to 0.1 (one cycle every 10 timesteps).
amplitude: Peak amplitude of the sine wave. Defaults to 1.0.
phase: Phase shift in radians. Use this to offset the starting point
of the wave. Defaults to 0.0.
**kwargs: Additional parameters passed to the generator during build time.
Returns:
Dict[str, Any]: A dictionary defining the 'sine_wave' component.
Example:
>>> from xaitimesynth import TimeSeriesBuilder, sine_wave
>>> dataset = (
... TimeSeriesBuilder(n_timesteps=100, n_samples=50)
... .for_class(0)
... .add_signal(sine_wave(frequency=0.05, amplitude=2.0))
... .build()
... )
"""
return {
"type": "sine_wave",
"frequency": frequency,
"amplitude": amplitude,
"phase": phase,
**kwargs,
}
Usage:¶
from xaitimesynth import TimeSeriesBuilder, sine_wave, gaussian_noise
# As a signal (full-length background)
dataset = (
TimeSeriesBuilder(n_timesteps=200, n_samples=100)
.for_class(0)
.add_signal(sine_wave(frequency=0.05, amplitude=1.5))
.add_signal(gaussian_noise(sigma=0.1))
.build()
)
# As a feature (localized pattern)
dataset = (
TimeSeriesBuilder(n_timesteps=200, n_samples=100)
.for_class(0)
.add_signal(gaussian_noise(sigma=0.5))
.for_class(1)
.add_signal(gaussian_noise(sigma=0.5))
.add_feature(sine_wave(frequency=0.2, amplitude=2.0), start_pct=0.3, end_pct=0.7)
.build()
)