Skip to content

Lift over error: invalid literal for int() with base 10: '2_KI270773v1_alt' #47

@Dazcam

Description

@Dazcam

I'm trying to use sumstats.py lift to lift hg19 SNPs in 5 GWAS sumstats files over to hg38. I have already run `sumstats.py csv' to standardise these files.

SNP	CHR	BP	PVAL	A1	A2	N	Z	OR	BETA	SE
rs11579922	1	1036860	.1662	A	C	50914	-1.3868004	.97278	-.02759733	.0199
rs11579015	1	1036959	.1067	T	C	49514	-1.6133769	.96435	-.03630098	.0225
rs11260592	1	1037303	.1716	T	C	50914	-1.3683987	.97287	-.02750481	.0201
rs11260593	1	1037313	.169	A	G	50914	-1.3730014	.97278	-.02759733	.0201
rs66622470	1	1038088	.1659	C	G	50914	1.3867192	1.02798	.02759571	.0199

However, I'm getting the following error for 2 out of the 5 files so far - the others are still running:

Traceback (most recent call last):
  File "python_convert/sumstats.py", line 2212, in <module>
    args.func(args, log)
  File "python_convert/sumstats.py", line 1375, in make_lift
    df.loc[index, cols.CHR] = int(lifted[0][0][3:])
ValueError: invalid literal for int() with base 10: '2_KI270773v1_alt'
Analysis finished at Tue May 18 18:02:06 2021
Total time elapsed: 2.0h:7.0m:48.60999999999967s

This appears to relate to entries in the the 'hg19ToHg38.over.chain.gz' file as there are no alt_chrs in the original GWAS sumstat files. There are 114 alt_chrs in total.

I'm wondering if there is a way around this, i.e. can I add a parameter to ignore/deal with these loci? What exactly does `--keep-bad-snps' do? I'm reluctant to do this without knowing fully what it does.

Interestingly, this error does not arise when I use the standard liftover tool, but using that means I need to generate bed files first. sumstats.py would be the neatest option for me.

Here is my code:

rule lift_over:
    input:   SCRATCH + GWAS_DIR + "GWAS_sumstats_standardised/{GWAS}_hg19_withZ_sumstats.tsv"
    output:  SCRATCH + GWAS_DIR + "GWAS_sumstats_standardised/{GWAS}_hg38_sumstats.tsv"
    message: "Formatting {input} sumstats"
    log:     SCRATCH + "logs/lift_over/{GWAS}_hg38.log"
    params:  SCRATCH + GWAS_DIR + "hg19ToHg38.over.chain.gz"
    shell:
             """

             python python_convert/sumstats.py lift \
             --sumstats {input} \
             --out {output} \
             --chain-file {params} \
             --log {log}

             """

I could also remove these entries from the chain file, but I thought I'd ask if there is a way to deal with them before proceeding.

Many Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions