Question regarding quality filtering of 454 amplicons
by Jackie Lighten
Hi,
I have a question for you guys regarding quality filtering.
I have a data set of double MID tagged 454 amplicons, from which I wish to
select high quality sequences above Q20.
The 454 quality filtering system seems to work differently from that given
for the Illumina sequencing i.e. 454 filtering takes high quality segments,
while Illumina (FASTQ) can select high quality full reads based on certain
parameters.
OK, so I know that the total length of my amplicon, including primers and
barcodes is around 260bp. If I then set the 454 quality filtering tool to
extract contiguous high quality sequence of >260, it gives me back around
45% of my raw data as hitting this criterion i.e. All 260bp are above Q20. I
don¹t necessarily need this high stringency as most bases may not be
informative.
But if I convert my 454 data to FASTQ format and then run the Illumina
filtering system which also allows me to set the number of bases allowed to
deviate from the Q20 criteria, I get back over 90% of my data (allowing 10bp
to deviate from Q20).
I then need to go ahead and convert back to 454 format.
Can you tell me if this is OK?
Will I loose /confuse information somewhere along these conversions?
It seems that if I do this, my barcodes are removed, as amplicons do not
sort properly when I parse them through my barcode filtering program.
Does anyone know of a program to filter 454 data based on average sequence
quality score, which doesn¹t involve Linux and the Roche off instrument
program (I have no experience in Linux! )
Thanks!
--
Jack Lighten,
Ph.D. Candidate,
Bentzen Lab,
Room 6078,
Department of Biology,
Dalhousie University,
Halifax, NS, B3H 4J1
Canada
Office:(902) 494-1398
Email: Jackie.Lighten(a)Dal.Ca
Profile: www.marinebiodiversity.ca/CHONe/Members/lightenj/profile/bio
11 years, 7 months
Mapping to only 3 genes / targeted resequencing / SOLiD4 / short reads
by Jochen Seggewiß
Hi!
Following situation:
10 barcoded "samples". Each sample consists of a mix of the sequences 3
independent genes (á 2 alleles).
I would like to map the SOLiD4 reads only to the sequences of those 3
genes, patient by patient.
First, the 10 barcoded samples have to be separated from each other.
Then, the short reads have to be mapped to the sequences of the 3 genes,
which are available in FASTA-format (single) or multi-FASTA-format (all
sequences in one file).
Is this possible using the available GALAXY tools?
How?
Thank you in advance.
Jose
11 years, 9 months
get wig file after tophat
by Ying Zhang
Hi:
I am using tophat in galaxy to analyze my paired-end RNA-seq data and find out
that after the tophat analysis, we can not get the wig file from it anymore
which is used to be able to. Do you have any idea of how to still be able to
get the wig file after tophat analysis? Thanks a lot!
Best
Ying Zhang, M.D., Ph.D.
Postdoctoral Associate
Department of Genetics,
Yale University School of Medicine
300 Cedar Street,S320
New Haven, CT 06519
Tel: (203)737-2616
Fax: (203)737-2286
11 years, 9 months
2011 Galaxy Community Conference, May 25-26, Lunteren, The Netherlands
by Dave Clements
Hello all,
We are pleased to announce the *2011 Galaxy Community Conference*, being
held *May 25-26 in Lunteren, The Netherlands*. The meeting will feature two
full days of presentations and discussion on extending Galaxy to use new
tools and data sources, deploying Galaxy at your organization, and best
practices for using Galaxy to further your own and your community's
research.
*Link: http://galaxy.psu.edu/gcc2011/
*
*Overview
*This event aims to engage a broader community of developers, data
producers, tool creators, and core facility and other research hub staff to
become an active part of the Galaxy community. We'll cover defining
resources in the Galaxy framework, increasing their visibility and making
them easier to use and integrate with other resources, how to extend Galaxy
to use custom data sources and custom tools, and best practices for using
Galaxy in your organization.
Additional topics include, but are not limited to:
* Talks submitted by the Galaxy community
* Integration of tools (including NGS analysis tools) and distributed job
management
* Deployment of Galaxy instances on local resources and on the Cloud
* Management of large datasets with the Galaxy Library System
* Using the Galaxy LIMS functionality at NGS sequencing facilities
* Visualizing Data without leaving Galaxy
* Performing reproducible research
* Performing and sharing complex analyses with Workflows
* An "Introduction to Galaxy" session, offered on May 24, for Galaxy
newcomers.
*Registration
*The conference fee is €100 on or before April 24, and €120 after that. The
meeting is being held at the Conference Centre De Werelt in Lunteren, The
Netherlands, which is also the conference hotel. You are encouraged to
register early, as space at the hotel (and at the "Intro to Galaxy" session)
is limited and is likely to fill up before the conference itself does.
*Link: http://galaxy.psu.edu/gcc2011/Register.html
**
Abstract Submission
*Abstracts are now being accepted for short oral presentations. Proposals
on any topic of interest to the Galaxy community are welcome and
encouraged. The abstract submission deadline is the end of February 28.
*Link: http://galaxy.psu.edu/gcc2011/Abstracts.html
*
*Sponsors
*The 2011 Galaxy Community Conference is co-sponsored by the US National
Science Foundation (NSF), and the Netherlands Bioinformatics Centre (NBIC).
NBIC is a collaborative institute of the bioinformatics groups in the
Netherlands. Together, these groups perform cutting-edge research, develop
novel tools and support platforms, create an e-science infrastructure and
educate the next generations of bioinformaticians.
*Links: http://www.nbic.nl/ and http://www.nsf.gov/
*
We are looking forward to a great conference and hope to see you in the
Netherlands!
The Galaxy and NBIC Teams
--
http://galaxy.psu.edu/gcc2011/
http://getgalaxy.org
http://usegalaxy.org/
11 years, 9 months
Regrading SNPs
by Nripesh Prasad
I wish to compare SNPs in my mouse sample (SNP file generated from Partek
Genomic suite) with SNPs from UCSC browser. how do i do that on galaxy?
Nripesh Prasad
11 years, 9 months
Re: [galaxy-user] Adding the Hydra genome to Galaxy
by Jennifer Jackson
Hello Rob,
We will add this to our to-do list for new genomes. Thanks for sending
the Genbank information!
Next time, if you could send requests to galaxy-user, that would be very
helpful for the team.
Best,
Jen
Galaxy team
On 1/3/11 12:37 PM, Rob Steele wrote:
> Hi Jennifer,
> Would it be possible to get the Hydra genome assembly added to Galaxy?
> It has been published and is available in GenBank under accession number
> ABRM00000000.
>
> Cheers,
> Rob
>
> Rob Steele, Ph.D.
> Professor
> D240 Medical Sciences I
> Department of Biological Chemistry
> School of Medicine
> University of California, Irvine
> Irvine, CA 92697-1700
>
> phone: 949-824-7341
> e-mail: resteele(a)uci.edu
> fax: 949-824-2688
> web: http://polyp.biochem.uci.edu/wiki/index.php/Main_Page
>
--
Jennifer Jackson
http://usegalaxy.org
11 years, 9 months
Re: [galaxy-user] [Genome] UCSC browser, access from Galaxy
by Hiram Clawson
Good Morning Dr. Hunt:
Can you please clarify what upload function from galaxy you are
trying to perform ? Perhaps the galaxy user help email list
could direct you to advice on this subject ? (copied on this
email) The Ensembl genome browser accepts the same types of
upload tracks as does the UCSC genome browser.
--Hiram
----- Original Message -----
From: "Pustulka-Hunt Elzbieta (SystemsX.ch)" <ela.hunt(a)systemsx.ch>
To: genome(a)soe.ucsc.edu
Sent: Thursday, March 31, 2011 5:19:55 AM
Subject: [Genome] USCS browser, access from Galaxy
Connection refused
Couldn't connect to 127.0.0.1 8080
Couldn't open http://127.0.0.1:8080/root/display_as?id=27&display_app=ucsc&authz_method...
Hi,
I have a local installation of Galaxy. Is it at all possible to upload (easily) data from Galaxy to your browser? I have no trouble uploading an extra track to Ensembl. Could you support such access?
Regards
Ela
------------------------------------------------------------------------------------------------
Dr Ela Hunt
SyBIT Deputy Project Manager
Clausiusstr. 45, CLP D 2
ETHZ, CH-8092 Zurich
+41 44 632 93 37
https://wiki.systemsx.ch/display/~ela.hunt@systemsx.ch/Dr+Ela+Hunt
11 years, 9 months
Recall: Upload file size and user authorization
by Paul-Michael Agapow
Paul-Michael Agapow would like to recall the message, "[galaxy-user] Upload file size and user authorization".
-----------------------------------------
**************************************************************************
The information contained in the EMail and any attachments is
confidential and intended solely and for the attention and use of
the named addressee(s). It may not be disclosed to any other person
without the express authority of the HPA, or the intended
recipient, or both. If you are not the intended recipient, you must
not disclose, copy, distribute or retain this message or any part
of it. This footnote also confirms that this EMail has been swept
for computer viruses, but please re-sweep any attachments before
opening or saving. HTTP://www.HPA.org.uk
**************************************************************************
11 years, 9 months
storage space/processor speed
by Keith E. Giles
I am trying to perform a "join" on two sets of intervals. There are
~20,000 intervals in one dataset and about 13 million on the other.
This has been running for about 3 days now, and I'm pretty certain
that its not going to work. Is there a way to know if there is enough
memory available for a given function to run ahead of time? Also, how
much storage space does each account have available?
I know that one can access cloud space online, through amazon for
example. However that seems to be fairly complicated and a bit out of
reach for me for the short term.
11 years, 9 months
error on indel analysis
by Evan Schwab
Hi,
I am receiving the following error when I try to run the Indel Analysis tool
on a sam file. I kept the Frequency threshold at the default 0.015.
Traceback (most recent call last):
File "/galaxy/home/g2main/galaxy_main/tools/indels/indel_analysis.py",
line 227, in
if __name__=="__main__": __main__()
File "/galaxy/home/g2main/galaxy_main/tools/indels/indel_analysis.py",
line 84, in __main__
add_to_mis_matches( mis_matches[ chrom ], pos, bases )
File "/galaxy/home/g2main/galaxy_main/tools/indels/indel_analysis.py",
line 34, in add_to_mis_matches
mis_matches[ pos + j ] = { base: 1 }
MemoryError
Prior to that I tried with Frequency threshold 0.0 and it ran for days on
end without any results. And I've tried Frequency threshold of 0.10 and it
gave and error that said "Killed" with no explaination.
Is there an error with the tool?
Thanks
Evan
--
Evan Schwab
Research Associate
Megason Lab
Department of Systems Biology
Harvard Medical School
200 Longwood Ave
Boston, MA 02115
908-938-3779
11 years, 9 months