ORIO (Online Resource for Integrative Omics) is an analysis platform for data from next generation sequencing (NGS). ORIO enables rapid analysis and integration of NGS data sets. ORIO was designed based on three central observations:
- Diverse biological phenomena may be represented by discrete positions in genomic space. Think protein binding sites for transcription factor regulation or transcription start sites for transcription initiation.
- Despite a wide diversity of NGS experiment and data types, analysis of NGS data often involves consideration and manipulation of genomic read coverage.
- Visual inspection remains a critical component of analysis.
ORIO enables analysis of multiple NGS datasets considering a feature list of discrete genomic coordinates. The results of this analysis may be dynamically displayed using the tools present in the companion `ORIO-web`_ framework.
An ORIO analysis run consists of two steps. First, the intersections between a feature list of genomic coordinates and a number of NGS data sets is performed. Second, the NGS data sets are correlated based on these intersection values. The output of these steps may be dynamically visualized using `ORIO-web`_.
matrix.py finds the intersection of a feature list of genomic coordinates and a NGS data set. This intersection describes the overlap of read coverage from the NGS data across genomic windows anchored on feature list positions.
python matrix.py [OPTIONS] BIGWIGS FEATURE_BED OUTPUT_MATRIX
BIGWIGS
matrix.pyrequires read coverage of a NGS data set in bigWig format.matrix.pycallsbigWigAverageOverBedfrom the UCSC kentUtils package. If the--stranded_bigwgsflag is set, two bigWig files are required. The first bigWig file must correspond to read coverage on the ‘plus’ strand; the second must correspond to the ‘minus’ strand. If the--stranded_bigwigsflag is not used, a single bigWig file is required.
FEATURE_BED
Genomic windows are generated about entries in a standard BED file. If
the --stranded_bed flag is used, the BED file must have at least six
columns per entry.
-a, --anchor [start | end | center]
-b, --bin_start INTEGER (Default: -2500)
-n, --bin_number INTEGER (Default: 50)
-s, --bin_size INTEGER (Default: 100)
anchor,bin_start,bin_number, andbin_sizeare used to specify the genomic windows created about each genomic feature in the input BED file.
anchorsets the anchor point for each BED range at range start, end, or center, observing entry strandedness (‘start’ is taken as the highest value of the range for entries on the ‘minus’ strand).
bin_startspecifies where the genomic window starts; negative values place the starting position upstream.
bin_numberspecifies the number of bins used in the genomic window. These bins are considered during generation of output matrix files.
bin_sizespecifies the size of each bin. Currently, bins must be the same size.
--opposite_strand_fn FILENAME
If specified, read coverage values for the opposite strand of BED
entries are used to generate a separate opposite-strand matrix file.
This file uses the same bins and windows as the ‘same-strand’
OUTPUT_MATRIX.
--stranded_bigwigs
If specified, strand-specific read coverage bigWigs are expected.
--stranded_bed
If specified, a stranded BED file is expected. The output matrix file is then generated informed by strand. In these stranded matrix files, more downstream positions are represented by higher positive values.
OUTPUT_MATRIX
The output matrix file gives read coverage over features in theFEATURE_BEDfile considering coverage in the inputBIGWIGSfile(s). Coverage is reported in bins set by the user-defined parameters. The first line ofOUTPUT_MATRIXis a header describing the bins. The positions given for each bin are relative to the anchor point set by-a, --anchor. Each row of the matrix corresponds to an individual feature inFEATURE_BED. Read coverage values are reported as sums in each bin. In a strand-specific analysis, read coverage in the opposite strand may be found using--opposite_strand_fn FILENAME.
Considering matrix files generated by matrix.py,
matrixByMatrix.py finds correlation values between NGS data sets.
matrixByMatrix.py clusters NGS data sets on the basis of these
correlation values. Clustering is also performed on genomic features
based on relative enrichment of individual NGS data sets.
python matrixByMatrix.py [OPTIONS] MATRIX_LIST_FN WINDOW_START BIN_NUMBER BIN_SIZE OUTPUT_JSON
MATRIX_LIST_FN
The matrix files generated by
matrix.pyto be analyzed bymatrixByMatrix.pyare described in tab-delimited fileMATRIX_LIST_FN. Each row in the file corresponds to an individual matrix file. In each row, the following information is given in order:MATRIX_ID DISPLAY_NAME FILE_PATH
FILE_PATHgives the path to the associated matrix file.MATRIX_IDandDISPLAY_NAMEare principally used to annotate theOUTPUT_JSONfor visualization in `ORIO-web`_.
WINDOW_START
BIN_NUMBER
BIN_SIZE
WINDOW_START,BIN_NUMBER, andBIN_SIZEspecify the dimensions of the genomic window used in creating the read coverage matrix files. These values should be consistent with the parameters used withmatrix.py.
OUTPUT_JSON
Results of the clustering analysis are reported in OUTPUT_JSON.
These results are designed for visualization in `ORIO-web`_.
--sort_vector SORT_VECTOR_FN
If a sort vector is specified by
--sort_vector, correlations are considering a user-defined sort vector. The sort vector provides an individual value for each genomic feature and has the following format:FEATURE_ENTRY ENTRY_VALUEWhen a sort vector is used,
matrixByMatrix.pyfinds the pairwise correlations between the sort vector and each matrix file specified inMATRIX_LIST_FN. Correlation values are found between sort vector values and read coverage sums in each bin. These correlation values are then used to hierarchically cluster NGS data sets.
An ORIO analysis run consists of two steps. First, the intersections between a feature list of genomic coordinates and a number of NGS data sets is performed. Second, the NGS data sets are correlated based on these intersection values. The output of these steps may be dynamically visualized using ORIO-web.
matrix.py finds the intersection of a feature list of genomic coordinates and a NGS data set. This intersection describes the overlap of read coverage from the NGS data across genomic windows anchored on feature list positions.
python matrix.py [OPTIONS] BIGWIGS FEATURE_BED OUTPUT_MATRIX
To install in development mode, use the command in the root path of the development environment:
pip install -e .


