Yes, my apologies, this should have been included in the original reply.
The 'locus' field in the Cuffdiff files refers to a gene bound - not
individual transcripts. To get to the transcripts, the inputs to
Cuffdiff need to be accessed. If you used Cuffmerge, the "merged
transcripts" GTF file would be the correct file to use as input to
"Extract". If you used just Cuffcompare, use the "combined transcripts" GTF.
To know which transcript was associated with which gene bound, compare
the Cuffmerge merged transcripts GTF attributes (9th column: gene_id,
tss_id, etc) with Cuffdiffs "gene_id", "tss_id" values - is also in the
test_id column, depending on the file. The Cuffcompare GTF comparisons
will be similar.
You can gain access to the GTF attributes with the tool "Filter and Sort
-> Filter GTF data by attribute values_list". Cut out the column of
interest in the Cuffdiff file ("Text Manipulation -> Cut"), edit as
desired, and use as a list filter. Or explore the other GFF filter
options in the same tool group.
On 9/13/12 11:14 AM, Humberto Boncristiani wrote:
> Fetch sequence-extract genomic DNA do not accept cuffidif files.
> Should I convert this file to some specific format?
> *Dr. Humberto Boncristiani*
> National Research Council (NRC) Fellow
> Adjunct Research Associate
> Department of Biology
> Univ. North Carolina at Greensboro
> 312 Eberhart Bldg
> Greensboro, NC 27403, USA.
> Tel.:(1) 336-256-2591
> Fax: (1) 336-334-5839
> email: humbfb(a)gmail.com <mailto:firstname.lastname@example.org>
> On Sep 13, 2012, at 2:06 PM, Jennifer Jackson wrote:
>> By no annotation, do you mean species-specific annotation (GTF) was
>> not used? And you want to compare to a protein database like Genbank
>> NR or RefSeq? Then these are the instructions. Please let us know if
>> you had something else in mind.
>> The sequence extraction can be done on Galaxy Main (if that is where
>> you are working), but the BLAST will need to be run on a local or
>> cloud install. To get set up (instance and data), start here:
>> The BLAST+ wrapper recently moved from the distribution to the Tool
>> Shed, but there are installation tools integrated to help get this
>> into your instance. See the latest News Brief for details (Sept 7,
>> 2012) - these are also good to follow as you maintain your instance:
>> Questions about local/cloud installs are best directed to the
>> galaxy-dev(a)bx.psu.edu mailing list:
>> To extract the transcript sequences, use the tool 'Fetch Sequences ->
>> Extract Genomic DNA'. This will accept a custom reference genome from
>> the history, if you have been using one, by changing the option
>> "Source for Genomic Data:" to "History".
>> Hopefully this helps,
>> Galaxy team
>> On 9/13/12 10:09 AM, Humberto Boncristiani wrote:
>>> I got cuffdiff files with gene differential expression on it. I don't
>>> have the annotation, therefore I need to extract the sequence
>>> information from the genome coordinates and them blast them to identify
>>> How the easiest way to do it?
>>> *Dr. Humberto Boncristiani*
>>> National Research Council (NRC) Fellow
>>> Adjunct Research Associate
>>> Department of Biology
>>> Univ. North Carolina at Greensboro
>>> 312 Eberhart Bldg
>>> Greensboro, NC 27403, USA.
>>> Tel.:(1) 336-256-2591
>>> Fax: (1) 336-334-5839
>>> email: humbfb(a)gmail.com <mailto:email@example.com>
>>> The Galaxy User list should be used for the discussion of
>>> Galaxy analysis and other features on the public server
>>> at usegalaxy.org. Please keep all replies on the list by
>>> using "reply all" in your mail client. For discussion of
>>> local Galaxy instances and the Galaxy source code, please
>>> use the Galaxy Development list:
>>> To manage your subscriptions to this and other Galaxy lists,
>>> please use the interface at:
>> Jennifer Jackson
I got cuffdiff files with gene differential expression on it. I don't have the annotation, therefore I need to extract the sequence information from the genome coordinates and them blast them to identify those.
How the easiest way to do it?
Dr. Humberto Boncristiani
National Research Council (NRC) Fellow
Adjunct Research Associate
Department of Biology
Univ. North Carolina at Greensboro
312 Eberhart Bldg
Greensboro, NC 27403, USA.
Fax: (1) 336-334-5839
Dear galaxy users,
I aligned my RNA-seq data by using Tophat in galaxy. It generated some
"Tophat deletions", "Tophat insertions" and "Tophat splice junctions"
results. These are all BED files. Does anyone know how to use/analyze these
kind of results?
Also, I used illumina RNA-seq. Each biological sample has 36-48 million
reads. The data for each sample were divided to 10-12 FASTQ files. When I
did the "FASTQ Summary Statistics" and draw "boxplot" for each of the
sub-file, the score value is about 9-10. Is it too low? Shall I combine the
FASTQ files for each biological sample and do the statistics again?
At last, does anyone know how to convert a long list of zebrafish genes
(500-1000 genes) to human or mammalian orthologs?
Thank you for your replies,
University of Florida
Roberta, I'm traveling right now so I'm forwarding your message to our
help list. Thanks.
---------- Forwarded message ----------
From: Roberta Galletti <roberta.galletti(a)ens-lyon.fr>
Date: Tue, Sep 11, 2012 at 5:19 AM
Subject: Re: Galaxy: RNA-seq analysis problems
To: James Taylor <james(a)jamestaylor.org>
sorry to bother you again, but I've one more question for you. I know
that most existing methodologies to analyze RNA-seq data, have a
strong dependency on sequencing depth for their differential
expression calls and that this results might have a considerable
number of false positives. Unfortunately, 1 out of 3 biological
replicates of a set of my samples have a much bigger seq depth with
respect to the other two samples. Do the programs in the Galaxy NGS:
RNA Analysis section take into account this problem and normalize it?
Thank you in advance for you help,
On 6/11/2012 5:36 PM, James Taylor wrote:
Glad to hear it! Thanks!
On Jun 8, 2012, at 9:37 AM, Roberta Galletti wrote:
I managed to make it work. Thank you for your help.
Roberta Galletti, PhD
Laboratoire de Reproduction et Développement des Plantes
Ecole Normale Supérieure de Lyon, UMR 5667
46, allée d'Italie
69364 LYON cedex 07
e-mail 1: roberta.galletti(a)ens-lyon.fr
e-mail 2: ro.galletti(a)tiscalinet.it
Skype contact: roberta1977
...A lab is just another place to play....
From 'Dancing naked in the mind field'
Kary B. Mullis, Nobel Prize in Chemistry 1993.
I am analysing RNA-seq datasets for differential splicing events between cell types.
Some of my reads contain bed nucleotides, should I run Filter FASTQ to remove these "not so good" reads? If I do need to, what is the "Minimum Quality" should I set for the Filter?
I have aligned RNA seq read with tophat to drosophila melanogaster 3
genome. However, I cannot view the alignment in UCSC (error Byte-range
request was ignored by server), nor in Ensembl.
Error in Ensembl:
The URL used to reach this page may be incomplete or out-of-date.
A location is required to build this page. For example, chromosomal
Perhaps can somebody find out what I am doing wrong?
Joachim Jacob, PhD
Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
I have ChIP-seq alignment files in .bowtie format and would like to perform
peak-calling using MACS. However, .bowtie format doesn't seem to be
supported in Galaxy. Is there a way around to have MACS analyze these files
within Galaxy, or is the only option to use MACS in command line?
Thank you for your help!
I'm new to Galaxy and have a very basic question. If I upload a dataset, (say, a protein sequence file in fasta format), and I want to use one of the tools on this dataset, what do I have to do to make the tool aware of my dataset? When I tried it the tools did not know that my uploaded data file existed.
I'm very new with the Galaxy and its abilities so I need to ask this; is it possible to use Galaxy to analyse Truseq Custom Amplicon data?? I would like to use it as an additional approach.
Juha-Pekka Pursiheimo, PhD
The Finnish Microarray and Sequencing Centre (FMSC)
Turku Centre for Biotechnology
University of Turku and Åbo Akademi University
Biocity, 5 th floor
FIN-20520 Turku, Finland
Phone: +358-2-333 7697
Mobile: +358-400-617 938
Fax: +358-2-333 8000
I ran MACS on my chipseq dataset and found various files:
1. under html report there ar etwo files one of negative peaks.xls and
second is peaks.xls the file peaks.xls is same as peaks .intreval file in
the right out put flow with one bp position added e..g if peak coordinate
under html report are 99 to 120 than in the peaks .interval it is 100 to
121. Which one should be followed?
2. What is the meaning of negative peak. interval file?
3. I have used ctrl and treated sample to run MACS - there are two wig
files one ctrl.wig and another treatment. Wig; Do these two files belong to
ctrl and treated samples then where are corresponding bed files.
If someone can direct me to the out put as we get in Galaxy while using
MACS that will be helpful