September 2012 - galaxy-user - lists.galaxyproject.org

Re: [galaxy-user] How can I extract sequence information fromm cuffdiff files?
by Jennifer Jackson 13 Sep '12

13 Sep '12

Hi Humberto, Yes, my apologies, this should have been included in the original reply. The 'locus' field in the Cuffdiff files refers to a gene bound - not individual transcripts. To get to the transcripts, the inputs to Cuffdiff need to be accessed. If you used Cuffmerge, the "merged transcripts" GTF file would be the correct file to use as input to "Extract". If you used just Cuffcompare, use the "combined transcripts" GTF. To know which transcript was associated with which gene bound, compare the Cuffmerge merged transcripts GTF attributes (9th column: gene_id, tss_id, etc) with Cuffdiffs "gene_id", "tss_id" values - is also in the test_id column, depending on the file. The Cuffcompare GTF comparisons will be similar. You can gain access to the GTF attributes with the tool "Filter and Sort -> Filter GTF data by attribute values_list". Cut out the column of interest in the Cuffdiff file ("Text Manipulation -> Cut"), edit as desired, and use as a list filter. Or explore the other GFF filter options in the same tool group. Take care, Jen Galaxy team On 9/13/12 11:14 AM, Humberto Boncristiani wrote: > Hi > > Fetch sequence-extract genomic DNA do not accept cuffidif files. > Should I convert this file to some specific format? > > Thanks, > > Humberto. > > *Dr. Humberto Boncristiani* > National Research Council (NRC) Fellow > Adjunct Research Associate > Department of Biology > Univ. North Carolina at Greensboro > 312 Eberhart Bldg > Greensboro, NC 27403, USA. > Tel.:(1) 336-256-2591 > Fax: (1) 336-334-5839 > email: humbfb(a)gmail.com <mailto:humbfb@gmail.com> > > > > > On Sep 13, 2012, at 2:06 PM, Jennifer Jackson wrote: > >> Hello, >> >> By no annotation, do you mean species-specific annotation (GTF) was >> not used? And you want to compare to a protein database like Genbank >> NR or RefSeq? Then these are the instructions. Please let us know if >> you had something else in mind. >> >> The sequence extraction can be done on Galaxy Main (if that is where >> you are working), but the BLAST will need to be run on a local or >> cloud install. To get set up (instance and data), start here: >> http://getgalaxy.org >> http://usegalaxy.org/cloud >> >> The BLAST+ wrapper recently moved from the distribution to the Tool >> Shed, but there are installation tools integrated to help get this >> into your instance. See the latest News Brief for details (Sept 7, >> 2012) - these are also good to follow as you maintain your instance: >> http://wiki.g2.bx.psu.edu/News >> http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_07 >> >> Questions about local/cloud installs are best directed to the >> galaxy-dev(a)bx.psu.edu mailing list: >> http://wiki.g2.bx.psu.edu/Mailing%20Lists >> >> To extract the transcript sequences, use the tool 'Fetch Sequences -> >> Extract Genomic DNA'. This will accept a custom reference genome from >> the history, if you have been using one, by changing the option >> "Source for Genomic Data:" to "History". >> >> Hopefully this helps, >> >> Jen >> Galaxy team >> >> On 9/13/12 10:09 AM, Humberto Boncristiani wrote: >>> Hi. >>> >>> I got cuffdiff files with gene differential expression on it. I don't >>> have the annotation, therefore I need to extract the sequence >>> information from the genome coordinates and them blast them to identify >>> those. >>> How the easiest way to do it? >>> >>> Thanks. >>> >>> Humberto >>> >>> >>> >>> *Dr. Humberto Boncristiani* >>> National Research Council (NRC) Fellow >>> Adjunct Research Associate >>> Department of Biology >>> Univ. North Carolina at Greensboro >>> 312 Eberhart Bldg >>> Greensboro, NC 27403, USA. >>> Tel.:(1) 336-256-2591 >>> Fax: (1) 336-334-5839 >>> email: humbfb(a)gmail.com <mailto:humbfb@gmail.com> >>> >>> >>> >>> >>> >>> >>> ___________________________________________________________ >>> The Galaxy User list should be used for the discussion of >>> Galaxy analysis and other features on the public server >>> at usegalaxy.org. Please keep all replies on the list by >>> using "reply all" in your mail client. For discussion of >>> local Galaxy instances and the Galaxy source code, please >>> use the Galaxy Development list: >>> >>> http://lists.bx.psu.edu/listinfo/galaxy-dev >>> >>> To manage your subscriptions to this and other Galaxy lists, >>> please use the interface at: >>> >>> http://lists.bx.psu.edu/ >>> >> >> -- >> Jennifer Jackson >> http://galaxyproject.org > -- Jennifer Jackson http://galaxyproject.org

1 0

How can I extract sequence information fromm cuffdiff files?
by Humberto Boncristiani 13 Sep '12

13 Sep '12

Hi. I got cuffdiff files with gene differential expression on it. I don't have the annotation, therefore I need to extract the sequence information from the genome coordinates and them blast them to identify those. How the easiest way to do it? Thanks. Humberto Dr. Humberto Boncristiani National Research Council (NRC) Fellow Adjunct Research Associate Department of Biology Univ. North Carolina at Greensboro 312 Eberhart Bldg Greensboro, NC 27403, USA. Tel.:(1) 336-256-2591 Fax: (1) 336-334-5839 email: humbfb(a)gmail.com

2 1

Tophat results
by Xiefan Fang 13 Sep '12

13 Sep '12

Dear galaxy users, I aligned my RNA-seq data by using Tophat in galaxy. It generated some "Tophat deletions", "Tophat insertions" and "Tophat splice junctions" results. These are all BED files. Does anyone know how to use/analyze these kind of results? Also, I used illumina RNA-seq. Each biological sample has 36-48 million reads. The data for each sample were divided to 10-12 FASTQ files. When I did the "FASTQ Summary Statistics" and draw "boxplot" for each of the sub-file, the score value is about 9-10. Is it too low? Shall I combine the FASTQ files for each biological sample and do the statistics again? At last, does anyone know how to convert a long list of zebrafish genes (500-1000 genes) to human or mammalian orthologs? Thank you for your replies, Xiefan Fang University of Florida

2 1

Fwd: Galaxy: RNA-seq analysis problems
by James Taylor 13 Sep '12

13 Sep '12

Roberta, I'm traveling right now so I'm forwarding your message to our help list. Thanks. ---------- Forwarded message ---------- From: Roberta Galletti <roberta.galletti(a)ens-lyon.fr> Date: Tue, Sep 11, 2012 at 5:19 AM Subject: Re: Galaxy: RNA-seq analysis problems To: James Taylor <james(a)jamestaylor.org> Hello James, sorry to bother you again, but I've one more question for you. I know that most existing methodologies to analyze RNA-seq data, have a strong dependency on sequencing depth for their differential expression calls and that this results might have a considerable number of false positives. Unfortunately, 1 out of 3 biological replicates of a set of my samples have a much bigger seq depth with respect to the other two samples. Do the programs in the Galaxy NGS: RNA Analysis section take into account this problem and normalize it? Thank you in advance for you help, Roberta Galletti. On 6/11/2012 5:36 PM, James Taylor wrote: Glad to hear it! Thanks! On Jun 8, 2012, at 9:37 AM, Roberta Galletti wrote: James, I managed to make it work. Thank you for your help. Roberta. -- Roberta Galletti, PhD Laboratoire de Reproduction et Développement des Plantes Ecole Normale Supérieure de Lyon, UMR 5667 46, allée d'Italie 69364 LYON cedex 07 FRANCE e-mail 1: roberta.galletti(a)ens-lyon.fr e-mail 2: ro.galletti(a)tiscalinet.it Skype contact: roberta1977 ------------------------------ ------- ...A lab is just another place to play.... From 'Dancing naked in the mind field' Kary B. Mullis, Nobel Prize in Chemistry 1993.

2 1

What is the minimum Quality should I set for Filter FASTQ?
by Du, Jianguang 13 Sep '12

13 Sep '12

Dear All, I am analysing RNA-seq datasets for differential splicing events between cell types. Some of my reads contain bed nucleotides, should I run Filter FASTQ to remove these "not so good" reads? If I do need to, what is the "Minimum Quality" should I set for the Filter? Thanks. Jianguang

4 5

Unable to BAM data on UCSC or Ensembl browsers
by Joachim Jacob 13 Sep '12

13 Sep '12

Hi all, I have aligned RNA seq read with tophat to drosophila melanogaster 3 genome. However, I cannot view the alignment in UCSC (error Byte-range request was ignored by server), nor in Ensembl. Error in Ensembl: Malformed URL The URL used to reach this page may be incomplete or out-of-date. A location is required to build this page. For example, chromosomal coordinates: http://www.ensembl.org/Drosophila_melanogaster/Location/View?r=2L:21650001-… Perhaps can somebody find out what I am doing wrong? Thanks, Joachim -- Joachim Jacob, PhD Rijvisschestraat 120, 9052 Zwijnaarde Tel: +32 9 244.66.34 Bioinformatics Training and Services (BITS) http://www.bits.vib.be @bitsatvib

2 1

Peak-calling with MACS from .bowtie file
by Sébastien Vigneau 12 Sep '12

12 Sep '12

Hi, I have ChIP-seq alignment files in .bowtie format and would like to perform peak-calling using MACS. However, .bowtie format doesn't seem to be supported in Galaxy. Is there a way around to have MACS analyze these files within Galaxy, or is the only option to use MACS in command line? Thank you for your help! Sébastien

2 1

Using tool on input data set
by kauerbach＠comcast.net 12 Sep '12

12 Sep '12

Hello, I'm new to Galaxy and have a very basic question. If I upload a dataset, (say, a protein sequence file in fasta format), and I want to use one of the tools on this dataset, what do I have to do to make the tool aware of my dataset? When I tried it the tools did not know that my uploaded data file existed. Thank you.

2 1

Amplicon analysis with Galaxy
by Juha-Pekka Pursiheimo 12 Sep '12

12 Sep '12

Hi I'm very new with the Galaxy and its abilities so I need to ask this; is it possible to use Galaxy to analyse Truseq Custom Amplicon data?? I would like to use it as an additional approach. Best, Juha-Pekka <><><><><><><><><><><><><><><><><> Juha-Pekka Pursiheimo, PhD Senior Scientist NGS laboratory The Finnish Microarray and Sequencing Centre (FMSC) Turku Centre for Biotechnology University of Turku and Åbo Akademi University Tykistökatu 6A Biocity, 5 th floor FIN-20520 Turku, Finland Phone: +358-2-333 7697 Mobile: +358-400-617 938 Fax: +358-2-333 8000 E-mail: jpursihe(a)btk.fi juha-pekka.pursiheimo(a)btk.fi

1 0

MACS out put files from Galaxy
by peter scot 11 Sep '12

11 Sep '12

I ran MACS on my chipseq dataset and found various files: 1. under html report there ar etwo files one of negative peaks.xls and second is peaks.xls the file peaks.xls is same as peaks .intreval file in the right out put flow with one bp position added e..g if peak coordinate under html report are 99 to 120 than in the peaks .interval it is 100 to 121. Which one should be followed? 2. What is the meaning of negative peak. interval file? 3. I have used ctrl and treated sample to run MACS - there are two wig files one ctrl.wig and another treatment. Wig; Do these two files belong to ctrl and treated samples then where are corresponding bed files. If someone can direct me to the out put as we get in Galaxy while using MACS that will be helpful Thanks

3 2