2019-06-25
- Rule name
- Directives:
- Input
- Output
- Shell, Script, or Run
- Params
- Snakemake helper functions:
- expand
- temp
- protected
- directory
- glob_wildcards
Access anything from R or Python scripts via the snakemake object.
concatenate_files.R
if (exists("snakemake")) {
name_files <- snakemake@input[["data"]]
output_file <- snakemake@output[[1]]
use_all_data <- as.logical(snakemake@params[["use_all_data"]])
} else {
name_files <- list.files(path='data/raw', pattern="yob.*.txt", full.names=TRUE)
output_file <- "data/processed/all_names.csv"
use_all_data <- FALSE
}
-n --dryrun
-j --cores
-s
snakemake --dag | dot -Tsvg > dag.svg
open dag.svg
snakemake --configfile config/config_alldata.yaml
snakemake --config start_year=1950
Avoids specifying the target file(s) on the command line.
Snakemake deletes the file after any running rules that depend on it.
Normally snakemake expects all input & output to be files, not directories. But sometimes you may want to specify a directory as output (such as for our checkpoint rule).
Special rules that force snakemake to re-evaluate the DAG. Use a checkpoint when you don't know exactly how many output files a rule will produce.
Use the glob_wildcards function to create wildcards from files that exist. Glob is greedy and will try to match anything it can, even across different directories.
Refer to input or output files of other rules to avoid copy-pasting file paths.
Snakemake will record the wall time, memory usage, etc. as a tab-delimited text file.
Snakemake throws an error if you try to overwrite a protected file.
You can only delete a protected file with rm -f.
Two methods:
- Use only one node.
code/submit_1node.pbs - Use a profile to automatically submit multiple jobs.
code/submit_multi.pbs
Edit config/cluster.json.
Set default parameters -- email, walltime, processors, etc.
Override the default configuration on a rule-by-rule basis.