Jeremy, I have another question when I filter gene list In the filtered list there are multiple rows per gene. I should have one gene per row? I have attached the snap shot of out put, but not sure if galaxy server will take it or not. I did se the discussion on other forum: http://seqanswers.com/forums/showthread.php?t=8830 which suggest that possible complications in getting one gene per row. My next question is in that scenario what should be the best way of representing one gene per FPKM value? should we take average of FPKM per gene? I think in the gene it is till giving the transcript FPKM value but these values are different from previous file filtered with transcript id. Thanks. Jagat On Tue, May 3, 2011 at 3:08 AM, shamsher jagat <kanwarjag@gmail.com> wrote:
Jeremy,
I have been trying to follow the steps in filtering Cufflink out put files you have described in one of the previous messages ( http://gmod.827538.n3.nabble.com/Re-downstream-analysis-of-cuffdiff-out-put-... ):
I have shared histroy with you, but in summary:
File 35: when Filter GTF data by attributes value list on data 11 (combined GTF) and data 33 (which is gene expr file) . Will not this should have one gene per row. But it is not?
File 39: Filter GTF file by attribute value list on data 11 and data 38 (Cuffdiff splicing expr) it failed. I would assume that it should filter on the basis of TSSid . The error message is
Traceback (most recent call last):
File "/var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py", line 67, in
filter( gff_file, attribute_name, ids_file, output_file )
File "/var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py", line 57, in filter
if attributes[ attribute_name ] in ids_dict:
KeyError: 'tss_id'
40 : Filter GTF data by attribute list on data 11 and 34 (tss group exp) failed and error message is:
Traceback (most recent call last):
File "/var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py", line 67, in
filter( gff_file, attribute_name, ids_file, output_file )
File "/var/opt/galaxy/g2test/galaxy_test/tools/filters/gff/gtf_filter_by_attribute_values_list.py", line 57, in filter
if attributes[ attribute_name ] in ids_dict:
KeyError: 'tss_id'
I would consider that if one gene has different Id than there is splicing .
However in contrast isoform file with transcript Id is working fine (File 20)
On a different note can I convert GTF file to txt tab delaminated file I tried to convert file 11 in txt (following Edit attributes) but the file is not properly formatted especially col-pid and TSS id. Am I doing something wrong.
Thanks.