Allele-specific junction
Allele-Specific Junction Quantification
To quantify allele-specific junction based on the LongcallR-phased BAM, run the following command:
python longcallR-asj.py
-b <phased_bam>
-a <annotation>
-f <reference>
-o <output_prefix>
-t <threads>
-g <gene_types>
-m <min_sup>
High-Confidence ASJ Using Known Genomic SNPs
To increase confidence by requiring allele-specific junction to be supported by both longcallR's RNA SNPs and known genomic SNPs, use the following command:
python longcallR-asj.py
-b <phased_bam>
-a <annotation>
-f <reference>
--dna_vcf <genomic_vcf>
--rna_vcf <rna_vcf>
-o <output_prefix>
-t <threads>
-g <gene_types>
-m <min_sup>
Store Allele-Specific Junctions in BED Format for IGV Visualization
To visualize allele-specific junctions in IGV, extract regions with a p-value below a specified threshold (default: 1e-10) and save them in BED format:
- output.asj.tsv: Input file containing allele-specific junctions - p_value_threshold (optional): Significance threshold (default: 1e-10) - output.asj.bed: Output BED file for IGV visualizationOutput format
Running longcallR-asj.py
will generate three output files output.asj.tsv
, output.asj_gene.tsv
and output.gene_coverage.tsv
. The formats of these files are described below:
1.output.asj.tsv
Column | Description |
---|---|
#Junction |
Junction coordinates in the format chr:start-end |
Strand |
Strand of the junction (+ or - ) |
Junction_set |
Junction set identifier (cluster of related junctions) |
Phase_set |
Phase set ID |
Hap1_absent |
Number of reads from haplotype 1 where the junction is absent |
Hap1_present |
Number of reads from haplotype 1 where the junction is present |
Hap2_absent |
Number of reads from haplotype 2 where the junction is absent |
Hap2_present |
Number of reads from haplotype 2 where the junction is present |
P_value |
P-value for allele-specific junction |
SOR |
Symmetric odds ratio |
Novel |
Whether the junction is novel (TRUE / FALSE ) |
GT_AG |
Whether junction has canonical GT-AG splice sites (TRUE / FALSE ) |
Gene_name |
Associated gene symbol |
2.output.asj_gene.tsv
Column | Description |
---|---|
#Gene_name |
Gene symbol that containing allele-specific junctions |
Chr |
Chromosome |
P_value |
P-value for the most significant allele-specific junction in the gene |
SOR |
Symmetric odds ratio for the most significant allele-specific junction in the gene |
3.output.gene_coverage.tsv
Column | Description |
---|---|
#Gene_name |
Gene symbol |
Chr |
Chromosome |
Start |
start position of gene, 1-based, inclusive |
End |
end position of gene, 1-based, inclusive |
Num_reads |
Number of reads in the gene |