See also
- @collate in the Ruffus Manual
- Use of add_inputs(...) | inputs(...) in the Ruffus Manual
- Decorators for more decorators
@collate( input, filter, replace_inputs | add_inputs, output, [extras,...] )ΒΆ
- Purpose:
Use filter to identify common sets of inputs which are to be grouped or collated together:
Each set of inputs which generate identical output and extras using the formatter or regex (regular expression) filters are collated into one job.
This variant of @collate allows additional inputs or dependencies to be added dynamically to the task, with optional string substitution.
add_inputs nests the the original input parameters in a list before adding additional dependencies.
inputs replaces the original input parameters wholescale.
This is a many to fewer operation.
Only out of date jobs (comparing input and output files) will be re-run.
Example of add_inputs
regex(r".*(\..+)"), "\1.summary" creates a separate summary file for each suffix. But we also add date of birth data for each species:
animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals" # summarise by file suffix: @collate(animal_files, regex(r".+\.(.+)$"), add_inputs(r"\1.date_of_birth"), r'\1.summary') def summarize(infiles, summary_file): passThis results in the following equivalent function calls:
summarize([ ["shark.fish", "fish.date_of_birth" ], ["tuna.fish", "fish.date_of_birth" ] ], "fish.summary") summarize([ ["cat.mammals", "mammals.date_of_birth"], ["dog.mammals", "mammals.date_of_birth"] ], "mammals.summary")Example of add_inputs
using inputs(...) will summarise only the dates of births for each species group:
animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals" # summarise by file suffix: @collate(animal_files, regex(r".+\.(.+)$"), inputs(r"\1.date_of_birth"), r'\1.summary') def summarize(infiles, summary_file): passThis results in the following equivalent function calls:
summarize(["fish.date_of_birth" ], "fish.summary") summarize(["mammals.date_of_birth"], "mammals.summary")Parameters:
- input = tasks_or_file_names
can be a:
- Task / list of tasks.
File names are taken from the output of the specified task(s)
- (Nested) list of file name strings (as in the example above).
- File names containing *[]? will be expanded as a glob.
E.g.:"a.*" => "a.1", "a.2"
- filter = matching_regex
is a python regular expression string, which must be wrapped in a regex indicator object See python regular expression (re) documentation for details of regular expression syntax
- filter = matching_formatter
a formatter indicator object containing optionally a python regular expression (re).
- add_inputs = add_inputs(...) or replace_inputs = inputs(...)
Specifies the resulting input(s) to each job.
Positional parameters must be disambiguated by wrapping the values in inputs(...) or an add_inputs(...).
Named parameters can be passed the values directly.
Takes:
- Task / list of tasks.
File names are taken from the output of the specified task(s)
- (Nested) list of file name strings.
Strings will be subject to substitution. File names containing *[]? will be expanded as a glob. E.g. "a.*" => "a.1", "a.2"
- output = output
Specifies the resulting output file name(s).
- extras = extras
Any extra parameters are passed verbatim to the task function
If you are using named parameters, these can be passed as a list, i.e. extras= [...]
See @collate for more straightforward ways to use collate.