Skip to content

Issue loading refFlat_table and refGene_seq #5

@rjbohlender

Description

@rjbohlender

I'm running on macOS. I'm just trying to get the tutorial analysis working. The files happily load in hadoop.

bin/seqspark conf/test.conf
conf file:       conf/test.conf
spark options:
18/09/19 14:22:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/19 14:22:41 WARN seqspark.SingleStudy$: using an existing output directory '/Users/rjbohlender/software/seqspark/demo'
18/09/19 14:22:41 INFO ds.Phenotype$: creating phenotype dataframe from simulated.tsv
18/09/19 14:22:45 INFO worker.Import$: start import ...
18/09/19 14:22:45 INFO worker.Import$: using all variants
18/09/19 14:22:45 INFO worker.Import$: using filter: true
18/09/19 14:22:45 INFO worker.Variants$: decompose multi-allelic variants
18/09/19 14:22:45 INFO worker.Annotation$: annotation
18/09/19 14:22:45 INFO worker.Annotation$: link gene database ...
18/09/19 14:22:45 INFO annot.RefGene$: load RefSeq: coord: /Users/rjbohlender/seqspark-db/refFlat_table seq: /Users/rjbohlender/seqspark-db/refGene_seq
18/09/19 14:22:46 ERROR seqspark.SingleStudy$: Something went wrong, exit
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Users/rjbohlender/seqspark-db/refFlat_table
	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1337)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
	at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1372)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.first(RDD.scala:1371)
	at org.dizhang.seqspark.annot.RefGene$.apply(RefGene.scala:61)
	at org.dizhang.seqspark.worker.Annotation$.linkGeneDB(Annotation.scala:109)
	at org.dizhang.seqspark.worker.Annotation$.apply(Annotation.scala:56)
	at org.dizhang.seqspark.worker.Pipeline$.run(Pipeline.scala:91)
	at org.dizhang.seqspark.worker.Pipeline$.apply(Pipeline.scala:51)
	at org.dizhang.seqspark.SingleStudy$.run(SingleStudy.scala:113)
	at org.dizhang.seqspark.SingleStudy$.apply(SingleStudy.scala:51)
	at org.dizhang.seqspark.SeqSpark$.main(SeqSpark.scala:68)
	at org.dizhang.seqspark.SeqSpark.main(SeqSpark.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Checking to make sure the files are there and accessible at the given path:

 /usr/local/Cellar/hadoop/3.1.1 > hadoop fs -cat /Users/rjbohlender/seqspark-db/refGene_seq | head
2018-09-19 14:25:27,142 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>NM_001308203.1
tctcttgaatgaaggatgggaggggagaaagagagacggagagagagaga
gacgcacagatgtgcacggaggccacagacactgacatttggaattcctt
caggcggacggaatagacctcagcagcggcgtggtgaggacttagctggg
acctggaatcgtatcctcctgtgttttttcagactccttggaaattaagg
aatgcaattctgccaccatgatggaaggattgaaaaaacgtacaaggaag
gcctttggaatacggaagaaagaaaaggacactgattctacaggttcacc
agatagagatggaattaagaaaagcaatggggcaccaaatggattttatg
cggaaattgattgggaaagatataactcacctgagctggatgaagaaggc
tacagcatcagacccgaggaacccggctctaccaaaggaaagcactttta
 /usr/local/Cellar/hadoop/3.1.1 > hadoop fs -cat /Users/rjbohlender/seqspark-db/refFlat_table | head
2018-09-19 14:26:31,345 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-09-19 14:26:32,372 INFO namenode.FSEditLog: Number of transactions: 34 Total time for transactions(ms): 5 Number of transactions batched in Syncs: 97 Number of syncs: 24 SyncTimes(ms): 10
geneName	name	chrom	strand	txStart	txEnd	cdsStart	cdsEnd	exonCount	exonStarts	exonEnds
OR4F5	NM_001005484	chr1	+	69090	70008	69090	70008	1	69090,	70008,
OR4F16	NM_001005277	chr1	+	367658	368597	367658	368597	1	367658,	368597,
OR4F3	NM_001005224	chr1	+	367658	368597	367658	368597	1	367658,	368597,
OR4F29	NM_001005221	chr1	+	367658	368597	367658	368597	1	367658,	368597,
OR4F16	NM_001005277	chr1	-	621095	622034	621095	622034	1	621095,	622034,
OR4F3	NM_001005224	chr1	-	621095	622034	621095	622034	1	621095,	622034,
OR4F29	NM_001005221	chr1	-	621095	622034	621095	622034	1	621095,	622034,
SAMD11	NM_152486	chr1	+	861120	879961	861321	879533	14	861120,861301,865534,866418,871151,874419,874654,876523,877515,877789,877938,878632,879077,879287,	861180,861393,865716,866469,871276,874509,874840,876686,877631,877868,878438,878757,879188,879961,
NOC2L	NM_015658	chr1	-	879582	894679	880073	894620	19	879582,880436,880897,881552,881781,883510,883869,886506,887379,887791,888554,889161,889383,891302,891474,892273,892478,894308,894594,	880180,880526,881033,881666,881925,883612,883983,886618,887519,887980,888668,889272,889462,891393,891595,892405,892653,894461,894679,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions