GFF not recognized in CUFFLINK
by Qian Dong
Dear Team,
I've been having a problem with cufflink regarding GFF files. I tried
searching the mailing list first and failed to find an answer. Could you
help me look at this?
I downloaded my genome annotation GFF file from NCBI (soon I realized NCBI
format may be a problem) for my bacterial RNA-seq data analysis. My GFF
file looks like the following:
'##gff-version 3#!gff-spec-version 1.20#!processor NCBI
annotwriter##sequence-region
NC_011420.2 1 4355543##species
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=414684NC_011420.2
RefSeqregion14355543.+.ID=id0;Dbxref=taxon:414684;Is_circular=true;culture-collection=ATCC:51521;gb-synonym=Rhodocista
centenaria SW;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=SW%3B
ATCC 51521NC_011420.2RefSeqgene113343.+.
ID=gene0;Name=RC1_0011;Dbxref=GeneID:7008893;gbkey=Gene;locus_tag=RC1_0011
NC_011420.2RefSeqCDS113343.+0ID=cds0;Name=YP_002296275.1;Parent=gene0;Note=Contains
a type I secretion target ggxgxdxxx repeat %282 copies%29 domain%3B
Contains a Cadherin domain%3B identified by match to protein family HMM
PF02789;Dbxref=Genbank:YP_002296275.1,GeneID:7008893;gbkey=CDS;product=hypothetical
protein;protein_id=YP_002296275.1;transl_table=11
I used this file for cufflink but all the FPKM values are 0. I checked out
this link: http://cufflinks.cbcb.umd.edu/gff.html and thought that maybe
the problem is because I don't have any mRNA feature in my gff file. Since
I am dealing with a bacterial genome, there is no exon/intron or UTR info
needed. Therefore I modified my GFF file into the following:
##gff-version 3#!gff-spec-version 1.20#!processor NCBI
annotwriter##sequence-region
NC_011420.2 1 4355543##species
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=414684NC_011420.2
RefSeqregion14355543.+.ID=id0;Dbxref=taxon:414684;Is_circular=true;culture-collection=ATCC:51521;gb-synonym=Rhodocista
centenaria SW;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=SW%3B
ATCC 51521NC_011420.2RefSeqmRNA113343.+.
ID=mRNA0;Name=RC1_0011;Dbxref=GeneID:7008893;gbkey=Gene;locus_tag=RC1_0011
NC_011420.2RefSeqCDS113343.+0ID=cds0;Name=YP_002296275.1;Parent=mRNA0;Note=Contains
a type I secretion target ggxgxdxxx repeat %282 copies%29 domain%3B
Contains a Cadherin domain%3B identified by match to protein family HMM
PF02789;Dbxref=Genbank:YP_002296275.1,GeneID:7008893;gbkey=CDS;product=hypothetical
protein;protein_id=YP_002296275.1;transl_table=11
I re-ran cufflink however this time there is error reported. I can only
tell from the report that there is a segmentation fault but not further
details. The report is as follows:
Error running cufflinks.
return code = 139
Command line:
cufflinks -q --no-update-check -I 100 -F 0.100000 -j 0.150000 -p 4 -G
/galaxy/test_pool/pool5/files/000/327/dataset_327777.dat
/galaxy/test_database/files/000/325/dataset_325086.dat
[19:41:41] Loading reference annotation.
Segmentation fault
cp: cannot stat
`/galaxy/test_pool/pool3/tmp/job_working_directory/000/170/170197/global_model.txt':
No such file or directory
cp: cannot stat
`/galaxy/test_pool/pool3/tmp/job_working_directory/000/170/170197/isoforms.fpkm_tracking':
No such file or directory
cp: cannot stat
`/galaxy/test_pool/pool3/tmp/job_working_directory/000/170/170197/genes.fpkm_tracking':
No such file or directory
My questions will be:
1. Is there any way to modify a NCBI bacterial genome annotation GFF file
to make it usable for cufflink? Our genome annotation is only available in
NCBI, not ensemble or USDC so this is pretty much my only choice..
2. Should I proceed with modifying the GFF file or should I convert it into
GTF and use the GTF instead in cufflink?
I am a biochemist and really new to the computer world so any advice will
help!
Thanks a lot,
Qian
--
Qian Dong
Bauer Lab, MCBD
Simon Hall: 313-317
212 S. Hawthorne Dr.
Bloomington, IN 47405
Email:dong3@indiana.edu
Lab Phone:812-855-8443
9 years, 9 months
Galaxy for SAGESeq
by Sujoy Ghosh
Hello,
I am interested in finding out anyone has any experience conducting SAGESeq
analysis in Galaxy. Can the existing RNASeq tools be formatted easily for
SAGESeq? Thanks.
Sujoy
9 years, 9 months
extract genome sequence
by Yan He
Hi everyone,
I have the genome sequence and gene annotation file. Is there a tool on
Galaxy to extract the 5,000 bp upstream, 5,000 bp downstream and genome
sequences of the genes (including exons and introns) from the genome
sequence? Any suggestions are highly appreciated! Thanks!
Yan
9 years, 9 months
public repository for workflows?
by Kenny Billiau
Hi,
I've browsed the archives briefly, but didn't find a lot of talk about
publicly available workflows or workflow repositories, except the ones
mentioned here: https://main.g2.bx.psu.edu/workflow/list_published
If I only google, then I simply find myexperiment.org, which is mostly
taverna workflows on there (and a whopping 9 galaxy ones).
Any chance anyone can point me to some other resources?
wkr,
Kenny
--
======================================================================
Ing. Kenny Billiau Bioinformatics Group
Scientific Programmer
+49 331 567 8626 billiau(a)mpimp-golm.mpg.de
Max Planck Institute for Molecular Plant Physiology
Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
http://bioinformatics.mpimp-golm.mpg.de
======================================================================
9 years, 9 months
How much FPKM can be take into consideration when compare gene expression
by Du, Jianguang
Dear All,
I am comparing the gene expression between two cell types by examining the Cufflink output file -- gene differential expression testing<javascript:void(0);>. The file lists the FPKM of genes in two cell types and log2 of fold. I want to look for genes that have more than 2-flod of expression in cell type A than in cell type B. What is the minimum FPKM in cell type A so that only the genes that have FPKM highier than this number can be taken into consideration for further analysis?
For example,
The FPKM of gene X in cell type A is 80, and in cell type B is 20, the fold of difference is 4.
The FPKM of gene Y in cell type A is 4, and in cell type B is 1, the fold of difference is also 4.
Is there a minimum FPKM in cell type A for genes to be selected for further analysis?
Thanks.
Jianguang
9 years, 9 months
How to rotate Galaxy log file
by Lukasz Lacinski
Dear All,
I use an init script that comes with Galaxy in the contrib/ subdirectory to start Galaxy. The log file
--log-file /home/galaxy/galaxy.log
specified in the script grows really quickly. How to logrotate the file?
Thanks,
Lukasz
9 years, 9 months
Cloudman share string not working
by greg
Hi guys,
I entered my share string
"cm-808d863548acae7c2328c39a90f52e29/shared/2012-09-17--19-47" on this
page "https://biocloudcentral.herokuapp.com/launch" in the field
labeled "Shared cluster string" and click the button to create my
instance. But then when I log into Cloudman the "Initial Cluster
configuration" dialog is still appearing.
I ran the same thing yesterday with an older share string and
everything worked fine.
Any ideas what could be going on? I'm pretty stuck.
Thanks,
Greg
This is all I see in the cluster status log (I entered my share string
again on the dialog, the disk status says 0 / 0 and applications and
data lights are yellow, and don't seem to progress):
13:34:46 - Master starting
13:34:50 - Retrieved file
'shared/2012-09-17--19-47/shared_instance_file_list.txt' from bucket
'cm-808d863548acae7c2328c39a90f52e29' to
'shared_instance_file_list.txt'.
13:41:29 - Retrieved file
'shared/2012-09-17--19-47/shared_instance_file_list.txt' from bucket
'cm-808d863548acae7c2328c39a90f52e29' to
'shared_instance_file_list.txt'.
13:41:30 - Retrieved file 'persistent_data.yaml' from bucket
'cm-c8c215c4c67525d91b3a2598f9e370f7' to 'shared_p_d.yaml'.
13:41:31 - Created a data volume 'vol-7f2cc105' of size 5GB from
shared cluster's snapshot 'snap-cfa775ba'
13:41:31 - Saved file 'persistent_data.yaml' to bucket
'cm-c8c215c4c67525d91b3a2598f9e370f7'
13:41:31 - Retrieved file 'persistent_data.yaml' from bucket
'cm-c8c215c4c67525d91b3a2598f9e370f7' to 'pd.yaml'.
9 years, 9 months