Using lw_pipeline#
Quickstart#
Rough idea:
Write a
config.pyspecifying the locations of steps, data, and other configurations.Define the steps of a pipeline in separate
.pyfiles and store them in a folder.Pass the
config.pyto the pipeline CLI, which will handle the execution order based on a naming convention.
Config#
The configuration class Config is used to manage the configuration settings for your pipeline.
If passed a path to a file all defined variables in the file will be available as attributes of the Config instance.
from lw_pipeline import Config
config = Config("some/example/path/config.py")
Note
If there is a “local” configuration file some/example/path/config_local.py it will be loaded after the main configuration
file automatically.
Pipeline_Step#
Pipeline_Step is an abstract base class that you can subclass to define individual steps in your pipeline.
Each step should implement the step() method, which contains the processing logic for that step.
from lw_pipeline import Config, Pipeline_Step
class Example_Step(Pipeline_Step):
def __init__(self):
super().__init__("This is an example step.", Config())
def step(self, data):
return data * 2
example_step = Example_Step()
input_data = 5
output_data = example_step.step(input_data)
print(f"Input: {input_data}, Output: {output_data}")
In practice, it is not necessarily meant to be instantiated except for demonstration or testing. The CLI part of the pipeline aggregates all steps and handles running them.
If you choose to run the pipeline manually, the Pipeline class can
be used to run the pipeline with a list of steps.
from lw_pipeline import Pipeline, Pipeline_Step, Config
class Multiply_By(Pipeline_Step):
def __init__(self, multiplier):
super().__init__(f"This step multiplies by {multiplier}.", Config())
self.multiplier = multiplier
def step(self, data):
return data * self.multiplier
step1 = Multiply_By(2)
step2 = Multiply_By(3)
Pipeline([step1, step2]).run(1)
The run() method will run accepts an initial input of data and
returns the final output after running through all steps.
Pipeline_Data#
The abstract Pipeline_Data class is used to manage the data that flows through the pipeline.
As of now, the pipeline comes with a Pipeline_MNE_BIDS_Data class that is used to handle MNE-BIDS data.
Output Management#
The pipeline provides flexible output management through the Output_Manager class and the @register_output decorator.
Registering Outputs
Use @register_output to declare outputs with automatic existence checking and default parameters:
from lw_pipeline import Pipeline_Step, register_output
class Analysis_Step(Pipeline_Step):
@register_output(
\"expensive_plot\",
\"Resource-intensive visualization\",
check_exists=True, # Skip if file exists
extension=\".png\", # Default extension
suffix=\"analysis\" # Default BIDS suffix
)
def create_plot(self):
# Only runs if file doesn't exist and overwrite_mode allows
data = expensive_computation()
# extension and suffix automatically used from decorator
self.output_manager.save_figure(data, \"expensive_plot\")
Benefits:
Prevents waste: Skip expensive computations when outputs already exist
DRY principle: Define path parameters once in decorator, not in every save call
CLI control: Enable/disable outputs via
--outputsand--skip-outputsflagsRespects overwrite settings: Honors
overwrite_modeconfiguration
See Output Management Guide for details.
We refer to the minimal example for a more detailed explanation.
Command line interface#
The package defines a lw_pipeline CLI accepting the following arguments:
-h, --help#Show the help message and exit.
-v, --version#Show the version of the pipeline and exit.
-r, --run#Run the pipeline.
steps#Positional argument. List of steps to run, separated by commas (only necessary to specify 00-99).
-c, --config#Path to the configuration file.
-l, --list#List all steps in the step directory.
--ignore-questions#Ignore questions, i.e., always respond with the default answer to a question.
--report(with MNE-BIDS data)#Generate a report of the pipeline’s derivatives.
--store-report(with MNE-BIDS data)#Store the report tables in .tsv files in the derivatives directory (e.g., pipeline_report_bids_dir.tsv, pipeline_report_deriv_dir.tsv).
--full-report(with MNE-BIDS data)#Generate a full report (do not limit to subject, session, task specification in the config) of the pipeline’s derivatives.
--outputs#Comma-separated list of outputs to generate. Supports wildcards (e.g., ‘plot*’) and step-scoped syntax (e.g., ‘01:plot,02:*’). If not specified, all enabled outputs are generated. (cf. output management).
--skip-outputs#Comma-separated list of outputs to skip. Supports wildcards (e.g., ‘plot*’) and step-scoped syntax (e.g., ‘01:plot,02:*’). Takes precedence over –outputs.
--list-outputs#List all registered outputs in the pipeline steps.