Tophat results

Xiefan Fang

10 Sep 2012 10 Sep '12

12:16 p.m.

Dear galaxy users, I aligned my RNA-seq data by using Tophat in galaxy. It generated some "Tophat deletions", "Tophat insertions" and "Tophat splice junctions" results. These are all BED files. Does anyone know how to use/analyze these kind of results? Also, I used illumina RNA-seq. Each biological sample has 36-48 million reads. The data for each sample were divided to 10-12 FASTQ files. When I did the "FASTQ Summary Statistics" and draw "boxplot" for each of the sub-file, the score value is about 9-10. Is it too low? Shall I combine the FASTQ files for each biological sample and do the statistics again? At last, does anyone know how to convert a long list of zebrafish genes (500-1000 genes) to human or mammalian orthologs? Thank you for your replies, Xiefan Fang University of Florida

Attachments:

attachment.htm (text/html — 2.7 KB)

Show replies by date

Jennifer Jackson

13 Sep 13 Sep

9:47 a.m.

Hello Xiefan, On 9/10/12 12:16 PM, Xiefan Fang wrote:

...

Dear galaxy users,

I aligned my RNA-seq data by using Tophat in galaxy. It generated some “Tophat deletions”, “Tophat insertions” and “Tophat splice junctions” results. These are all BED files. Does anyone know how to use/analyze these kind of results? Please see 'Tools on the Main server': http://wiki.g2.bx.psu.edu/Support#Interpreting_scientific_results The RNA-seq tutorial (hosted at Galaxy) and the web sites/paper by the tool authors should give you many good ideas for potential protocols.

...

Also, I used illumina RNA-seq. Each biological sample has 36-48 million reads. The data for each sample were divided to 10-12 FASTQ files. When I did the “FASTQ Summary Statistics” and draw “boxplot” for each of the sub-file, the score value is about 9-10. Is it too low? Shall I combine the FASTQ files for each biological sample and do the statistics again?

Combining the files will not change the quality values. If this is a Phred+33 scaled quality score, then yes, this is low. A double check that the 'FASTQ Groomer' was run with the correct options would be the first step. You also may want to run FastQC to generate broader statistics. See the RNA-seq tutorial for details about running this tool and then trimming sequences to improve overall quality. A direct link is: http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise

...

At last, does anyone know how to convert a long list of zebrafish genes (500-1000 genes) to human or mammalian orthologs?

There are a likely many ways to do this, here are some: 1 - 'Get Data -> UCSC Main' Track named "Human Proteins" with the primary table (blastHg18KG). 2 - 'Get Data -> BioMart' Ensemble Genes 68, Danio rerio genes (Zv9). Filters -> Homologs -> Ortholog. Help 'Using Galaxy' http://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012 Protocol 1 has examples of extracting data from the UCSC Table browser and joining data - the methods can be applied to any similar data. If you need to manipulate files, see Protocol 2, the last example is multi-stepped and demonstrates that just about any file can be converted to interval format and utilized. 3 - 'MAF predictions' 'Using Galaxy' (above) Protocol 5 has an alternate method for predicting "orthologs" (or maybe better described as 'syntenically conserved homologs', since function is not evaluated) from conservation tracks. Full details of MAF functions are in our 'Making whole genome alignments usable for biologists' paper: http://main.g2.bx.psu.edu/u/dan/p/maf The ZF Conservation track is not local to Galaxy, so you will need to obtain the data from UCSC and FTP to Galaxy to work with it ('Get Data' is not an option). Review the track description in the UCSC browser (track named "Conservation"), then find the data here: http://hgdownload.soe.ucsc.edu/goldenPath/danRer7/multiz8way/ Good luck for the choices you decide on! Jen Galaxy team

...

Thank you for your replies,

Xiefan Fang

University of Florida

___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:

http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

-- Jennifer Jackson http://galaxyproject.org

4685

Age (days ago)

4688

Last active (days ago)

List overview

Download

1 comments

2 participants

participants (2)

Jennifer Jackson
Xiefan Fang

Tophat results

Xiefan Fang

Jennifer Jackson

tags

participants (2)