SpacerNet is a tool designed to identify and visualize the shared spacer elements between the CRISPRs of multiple bacterial strains.
SpacerNet requires a combined multi-strain (or species) .GFF file containing CRISPR arrays annotated by CRISPRCasFinder.
This input file can be created by concatonating the .gff outputs generated by running CRISPRCasFinder on a bacterial genome file.
A usage guide and additional details for CRISPRCasFinder can be found here: https://crisprcas.i2bc.paris-saclay.fr/CrisprCasFinder/Index
Each strain label in the .GFF file must be in the format speciesTag_strainID_int
- ex. ##gff-version 3; 2 CRISPR(s); strain: Pbon_393_1
NOTE: Depending on how CRISPRCasFinder parses the input genome file, these strain labels may need to be manually changed to reflect the above format.
Minimal command format:
python spacerNet.py input_file output_prefix
The
output_prefixis the prefix used to generate the output file names (eg. "paraburk" -> paraburk_spacers.fasta ect.)
Optional Flags:
--min_spacers
- type: int
- usage: Specifies the minimum number of spacers required for an array to be considered valid and included in the BLAST/network.
- default: 1 (all detected spaceres included)
--net_perc_id
- type: int
- usage: Percent identity threshold for spacers to be considered 'shared' in network
- default: 95 (High sequence similarity required for spacers to be considered homologous)
--blast_perc_id
- type: int
- usage: Percent identity threshold for all-against-all spacer BLAST
- default: 50 (Highly permissable spacer hits)
--node_color_mode
- type: string
- options: 'species' or 'strain'
- default: 'species'
- usage: specifes the coloring scheme for network nodes. 'species' mode colors nodes based on the speciesTag portion of the .gff strain header. 'strain' mode will give a unique node color to every unique speciesTag_strainID .gff strain header.
--min_edge_weight
- type: int
- usage: Specifies the minimum number of shared spacers needed to retain an edge
- default: 1 (all edges retained)
--remove_singletons
- usage: Including this flag will remove all isolated nodes (nodes without edges) after filtering.
--net_min_length
- type: int
- usage: specifices the minumium length required for a spacer BLAST hit to be included in the spacer sharing network
- default: 20 (Minimum 20bp BLAST hit length)
--network_mode
- usage: including this flag will bypass the spacer extraction and BLAST in order to only perform the network creation. When this flag is used, pre-run BLAST results must be used as the
input_file
--legend_labels
- type: str
- usage: Allows for the use of a .csv file containing species/strain mappings to construct the network legend. The mapping file must be specified using the file path. The file must contain the columns prefix,label.
--layout_k
- type: float
- usage: Specifies the k value for the network visualization. K value determines the spacing between nodes (larger K (>1.5)-> more spacing/larger clusters , smaller K (<1.5 -> less spacing/tighter clusters)
- default: 2.0
- CRISPRCasFinder : an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 2018
- Aric A. Hagberg, Daniel A. Schult and Pieter J. Swart, “Exploring network structure, dynamics, and function using NetworkX”, in Proceedings of the 7th Python in Science Conference (SciPy2008), Gäel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pp. 11–15, Aug 2008