September 2012 - galaxy-user - lists.galaxyproject.org

GFF not recognized in CUFFLINK
by Qian Dong 26 Sep '12

26 Sep '12

Dear Team, I've been having a problem with cufflink regarding GFF files. I tried searching the mailing list first and failed to find an answer. Could you help me look at this? I downloaded my genome annotation GFF file from NCBI (soon I realized NCBI format may be a problem) for my bacterial RNA-seq data analysis. My GFF file looks like the following: '##gff-version 3#!gff-spec-version 1.20#!processor NCBI annotwriter##sequence-region NC_011420.2 1 4355543##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=414684NC_011420.2 RefSeqregion14355543.+.ID=id0;Dbxref=taxon:414684;Is_circular=true;culture-collection=ATCC:51521;gb-synonym=Rhodocista centenaria SW;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=SW%3B ATCC 51521NC_011420.2RefSeqgene113343.+. ID=gene0;Name=RC1_0011;Dbxref=GeneID:7008893;gbkey=Gene;locus_tag=RC1_0011 NC_011420.2RefSeqCDS113343.+0ID=cds0;Name=YP_002296275.1;Parent=gene0;Note=Contains a type I secretion target ggxgxdxxx repeat %282 copies%29 domain%3B Contains a Cadherin domain%3B identified by match to protein family HMM PF02789;Dbxref=Genbank:YP_002296275.1,GeneID:7008893;gbkey=CDS;product=hypothetical protein;protein_id=YP_002296275.1;transl_table=11 I used this file for cufflink but all the FPKM values are 0. I checked out this link: http://cufflinks.cbcb.umd.edu/gff.html and thought that maybe the problem is because I don't have any mRNA feature in my gff file. Since I am dealing with a bacterial genome, there is no exon/intron or UTR info needed. Therefore I modified my GFF file into the following: ##gff-version 3#!gff-spec-version 1.20#!processor NCBI annotwriter##sequence-region NC_011420.2 1 4355543##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=414684NC_011420.2 RefSeqregion14355543.+.ID=id0;Dbxref=taxon:414684;Is_circular=true;culture-collection=ATCC:51521;gb-synonym=Rhodocista centenaria SW;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=SW%3B ATCC 51521NC_011420.2RefSeqmRNA113343.+. ID=mRNA0;Name=RC1_0011;Dbxref=GeneID:7008893;gbkey=Gene;locus_tag=RC1_0011 NC_011420.2RefSeqCDS113343.+0ID=cds0;Name=YP_002296275.1;Parent=mRNA0;Note=Contains a type I secretion target ggxgxdxxx repeat %282 copies%29 domain%3B Contains a Cadherin domain%3B identified by match to protein family HMM PF02789;Dbxref=Genbank:YP_002296275.1,GeneID:7008893;gbkey=CDS;product=hypothetical protein;protein_id=YP_002296275.1;transl_table=11 I re-ran cufflink however this time there is error reported. I can only tell from the report that there is a segmentation fault but not further details. The report is as follows: Error running cufflinks. return code = 139 Command line: cufflinks -q --no-update-check -I 100 -F 0.100000 -j 0.150000 -p 4 -G /galaxy/test_pool/pool5/files/000/327/dataset_327777.dat /galaxy/test_database/files/000/325/dataset_325086.dat [19:41:41] Loading reference annotation. Segmentation fault cp: cannot stat `/galaxy/test_pool/pool3/tmp/job_working_directory/000/170/170197/global_model.txt': No such file or directory cp: cannot stat `/galaxy/test_pool/pool3/tmp/job_working_directory/000/170/170197/isoforms.fpkm_tracking': No such file or directory cp: cannot stat `/galaxy/test_pool/pool3/tmp/job_working_directory/000/170/170197/genes.fpkm_tracking': No such file or directory My questions will be: 1. Is there any way to modify a NCBI bacterial genome annotation GFF file to make it usable for cufflink? Our genome annotation is only available in NCBI, not ensemble or USDC so this is pretty much my only choice.. 2. Should I proceed with modifying the GFF file or should I convert it into GTF and use the GTF instead in cufflink? I am a biochemist and really new to the computer world so any advice will help! Thanks a lot, Qian -- Qian Dong Bauer Lab, MCBD Simon Hall: 313-317 212 S. Hawthorne Dr. Bloomington, IN 47405 Email:dong3@indiana.edu Lab Phone:812-855-8443

3 7

adding D. yakuba reference genome
by Stern PhD, David L 26 Sep '12

26 Sep '12

2 1

Galaxy for SAGESeq
by Sujoy Ghosh 26 Sep '12

26 Sep '12

Hello, I am interested in finding out anyone has any experience conducting SAGESeq analysis in Galaxy. Can the existing RNASeq tools be formatted easily for SAGESeq? Thanks. Sujoy

1 0

extract genome sequence
by Yan He 25 Sep '12

25 Sep '12

Hi everyone, I have the genome sequence and gene annotation file. Is there a tool on Galaxy to extract the 5,000 bp upstream, 5,000 bp downstream and genome sequences of the genes (including exons and introns) from the genome sequence? Any suggestions are highly appreciated! Thanks! Yan

4 5

public repository for workflows?
by Kenny Billiau 25 Sep '12

25 Sep '12

Hi, I've browsed the archives briefly, but didn't find a lot of talk about publicly available workflows or workflow repositories, except the ones mentioned here: https://main.g2.bx.psu.edu/workflow/list_published If I only google, then I simply find myexperiment.org, which is mostly taverna workflows on there (and a whopping 9 galaxy ones). Any chance anyone can point me to some other resources? wkr, Kenny -- ====================================================================== Ing. Kenny Billiau Bioinformatics Group Scientific Programmer +49 331 567 8626 billiau(a)mpimp-golm.mpg.de Max Planck Institute for Molecular Plant Physiology Am Mühlenberg 1, 14476 Potsdam-Golm, Germany http://bioinformatics.mpimp-golm.mpg.de ======================================================================

1 0

Galaxy September 20, 2012 Distribution & News Brief
by Jennifer Jackson 21 Sep '12

21 Sep '12

*Galaxy September 20, 2012 Distribution & News Brief* http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_20 <http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_20> /*Highlights:* http://wiki.g2.bx.psu.edu/News <http://wiki.g2.bx.psu.edu/News>/ * A /*new Galaxy tool*/<http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_20#Galaxy_Tool_Factory> that writes other /*new Galaxy tools*/! The /*Tool Factory*/ is in the Galaxy's Main *Tool Shed* (*/toolfactory/*). Try it now! * Learn how to *display multiple versions of a tool* in the Galaxy tool panel. * *CloudLaunch Overhaul* includes Boto 2.5.2 and simplified instance selection and key generation. * Release also includes more *Tool Shed *updates, *Framework* and *API* updates, plus *Security* and *Bug* fixes. http://getgalaxy.org http://bitbucket.org/galaxy/galaxy-dist <http://bitbucket.org/galaxy/galaxy-dist> *new: * $ hg clone http://www.bx.psu.edu/hg/galaxy galaxy-dist* upgrade:* $ hg pull -u -r da9d740fce31 ** *Thanks for using Galaxy!* Jennifer Jackson & Galaxy Team http://galaxyproject.org

1 0

How much FPKM can be take into consideration when compare gene expression
by Du, Jianguang 20 Sep '12

20 Sep '12

Dear All, I am comparing the gene expression between two cell types by examining the Cufflink output file -- gene differential expression testing<javascript:void(0);>. The file lists the FPKM of genes in two cell types and log2 of fold. I want to look for genes that have more than 2-flod of expression in cell type A than in cell type B. What is the minimum FPKM in cell type A so that only the genes that have FPKM highier than this number can be taken into consideration for further analysis? For example, The FPKM of gene X in cell type A is 80, and in cell type B is 20, the fold of difference is 4. The FPKM of gene Y in cell type A is 4, and in cell type B is 1, the fold of difference is also 4. Is there a minimum FPKM in cell type A for genes to be selected for further analysis? Thanks. Jianguang

2 1

How to rotate Galaxy log file
by Lukasz Lacinski 19 Sep '12

19 Sep '12

Dear All, I use an init script that comes with Galaxy in the contrib/ subdirectory to start Galaxy. The log file --log-file /home/galaxy/galaxy.log specified in the script grows really quickly. How to logrotate the file? Thanks, Lukasz

2 1

Cloudman share string not working
by greg 18 Sep '12

18 Sep '12

Hi guys, I entered my share string "cm-808d863548acae7c2328c39a90f52e29/shared/2012-09-17--19-47" on this page "https://biocloudcentral.herokuapp.com/launch" in the field labeled "Shared cluster string" and click the button to create my instance. But then when I log into Cloudman the "Initial Cluster configuration" dialog is still appearing. I ran the same thing yesterday with an older share string and everything worked fine. Any ideas what could be going on? I'm pretty stuck. Thanks, Greg This is all I see in the cluster status log (I entered my share string again on the dialog, the disk status says 0 / 0 and applications and data lights are yellow, and don't seem to progress): 13:34:46 - Master starting 13:34:50 - Retrieved file 'shared/2012-09-17--19-47/shared_instance_file_list.txt' from bucket 'cm-808d863548acae7c2328c39a90f52e29' to 'shared_instance_file_list.txt'. 13:41:29 - Retrieved file 'shared/2012-09-17--19-47/shared_instance_file_list.txt' from bucket 'cm-808d863548acae7c2328c39a90f52e29' to 'shared_instance_file_list.txt'. 13:41:30 - Retrieved file 'persistent_data.yaml' from bucket 'cm-c8c215c4c67525d91b3a2598f9e370f7' to 'shared_p_d.yaml'. 13:41:31 - Created a data volume 'vol-7f2cc105' of size 5GB from shared cluster's snapshot 'snap-cfa775ba' 13:41:31 - Saved file 'persistent_data.yaml' to bucket 'cm-c8c215c4c67525d91b3a2598f9e370f7' 13:41:31 - Retrieved file 'persistent_data.yaml' from bucket 'cm-c8c215c4c67525d91b3a2598f9e370f7' to 'pd.yaml'.

2 3

Error executing tool: Unable to create output dataset: object store is full
by Angela Inácio 18 Sep '12

18 Sep '12

Hi, I tried to convert my tabular data in fastq data, and galaxy did not allowed me: Error executing tool: Unable to create output dataset: object store is full could you please explain me what this mean? Thanks

2 1