Help with Summary Statistics
by D. A. Cowart
Hello,
I am attempting to use Galaxy to calculate the mean sequence read
length and identify the range of read lengths for my 454 data. The
data has already been organized and sorted by species. The format of
the data is as follows:
>HD4AU5D01BHBCQCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC
>HD4AU5D01A093MCTCTGTCGCTCTGTCTCTCTTCTCTCTCTCTCTCTCT
etc...for each species
I have attempted to use the "Summary Statistics" button, however it
appears to only be for numerical data and not sequence data. Is this
tool/task available
via Galaxy?
Thank you,
Dominique Cowart
User name: dac330
8 years, 8 months
Empty bowtie2 output
by IIHG Galaxy Administrator
In follow-up to http://user.list.galaxyproject.org/Empty-bowtie2-output-tp4656137.html, is there:
- an ETA on when the issue with Bowtie2, in August 2013 distribution, generating empty output will be fixed (if not already fixed) ?
- a suggested workaround (revert to an older version of that particular tool etc.) in the meantime ?
Thank you.
Unrelated: wasn't able to determine how to update that thread to request status, hence creating a new one.
9 years, 1 month
SNP finding
by Xiefan Fang
Dear galaxy users,
We have done deep sequencing on some known genomic loci using
Hiseq2000. I have already mapped the reads to the reference sequences by
using Galaxy. In the next step, I want to find SNPs and calculate the SNP
percentage within the reads. There are 500,000 to 1,000,000 reads per
biological sample. Can I do it with galaxy? If not, is there other programs
available in windows? Considering that I am not very familiar with
programming.
Thanks,
Xiefan
University of Florida
9 years, 1 month
FW: [galaxy-bugs] Galaxy tool error report from bsib@leeds.ac.uk
by Irene Bassano
Hi Jen,
thanks.
I am a bit confused: on Galaxy the only human genome listed is hg_g1k_v37.
So when I uploaded the new data from "Get data", under "Genome" I selected hg_g1k_v37
Now, all i want is to get cufflinks with gene names: which genome am I supposed to use? the only one I knew was hg19 from iGenomes...but seems i cannot use it. Do I have to select a genome when I upload raw fastq data? I havent stated doign anything so far, its just raw data
Thanks a lot,
Irene
________________________________________
From: Jennifer Jackson [jen(a)bx.psu.edu]
Sent: Wednesday, November 27, 2013 10:08 PM
To: Irene Bassano
Cc: galaxy-bugs(a)bx.psu.edu
Subject: Re: [galaxy-bugs] Galaxy tool error report from bsib(a)leeds.ac.uk
Hello,
iGenomes covers the UCSC build, and this named "Human Feb. 2009
(GRCh37/hg19) (hg19)" in the full name in the UI. The "hg19" key is the
important part - as the name may be abbreviated in some tools, but this
key will be in all. The genome with the "hg_g1k_v37" key is slightly
different and you will have another genome mismatch problem with the
RNA-seq (and most other) tools if you combine this genome with data from
UCSC (the source of hg19) or the wrong iGenomes file.
"hg_g1k_v37" (source: 1000 genomes via GATK) and "hg19" (source: UCSC)
are just about the same, but the identifiers are different. If you want
to examine the differences, both are available on our rsync server and
can be downloaded and compared. On the "Help -> Support" wiki are links
for reference genomes.
The iGenomes GTF file for hg19 is on the public Main server, if that is
more convenient for you, or should you just want to be sure you have the
right one. Look in Shared Data -> Data Libraries -> iGenomes.
Best,
Jen
Galaxy team
ps. please try to send new questions to one of our lists, thanks!
On 11/27/13 12:51 PM, Irene Bassano wrote:
> Hi Jen,
> I uploaded some fastq files and selected as Genome from the drop down list ""Homo sapiens b37(hg_g1k_v37).
>
> Is this the same as the genome listed in UCSC "February 2009 (GRCh37/hg19)"?
>
> I am using iGenomes to get the gene names rather than annotation such as NM_00xxxx and I fused the UCSC website choosing the newest genome, February 2009
>
> Thanks,
>
> best,
> Irene
> ________________________________________
> From: Jennifer Jackson [jen(a)bx.psu.edu]
> Sent: Monday, November 18, 2013 7:26 PM
> To: galaxy-bugs(a)bx.psu.edu; Irene Bassano
> Subject: Re: [galaxy-bugs] Galaxy tool error report from bsib(a)leeds.ac.uk
>
> Hi again,
>
> The database mismatch is a problem here is well for the same reasons. My
> guess is that you intended to run dataset #21 against hg19, and that the
> run against hg18 was a tool form input mistake?
>
> Good luck with the next runs,
>
> Jen
> Galaxy team
>
> On 11/18/13 5:21 AM, galaxy-bugs(a)bx.psu.edu wrote:
>> GALAXY TOOL ERROR REPORT
>> ------------------------
>>
>> This error report was sent from the Galaxy instance hosted on the server
>> "usegalaxy.org"
>> -----------------------------------------------------------------------------
>> This is in reference to dataset id 7081835 from history id 1686699
>> -----------------------------------------------------------------------------
>> You should be able to view the history containing the related history item
>>
>> 37: Cufflinks on data 21 and data 34: assembled transcripts
>>
>> by logging in as a Galaxy admin user to the Galaxy instance referenced above
>> and pointing your browser to the following link.
>>
>> usegalaxy.org/history/view?id=d88c1ef77619eb4f
>> -----------------------------------------------------------------------------
>> The user 'bsib(a)leeds.ac.uk' provided the following information:
>>
>> Same as before but the reference genome is UCSC MAin on Human:refFlat
>> -----------------------------------------------------------------------------
>> job id: 6096851
>> tool id: toolshed.g2.bx.psu.edu/repos/devteam/cufflinks/cufflinks/0.0.6
>> job pid or drm id: 136948
>> -----------------------------------------------------------------------------
>> job command line:
>> python /galaxy/main/migrated_tools/toolshed.g2.bx.psu.edu/repos/devteam/cufflinks/b01956f26c36/cufflinks/cufflinks_wrapper.py --input=/galaxy-repl/main/files/007/074/dataset_7074022.dat --assembled-isoforms-output=/galaxy-repl/main/files/007/081/dataset_7081835.dat --num-threads="8" -I 300000 -F 0.1 -j 0.15 -G /galaxy-repl/main/psufiles/004/831/dataset_4831015.dat -N -b --ref_file="None" --dbkey=hg18 --index_dir=/galaxy/main/server/tool-data -u
>> -----------------------------------------------------------------------------
>> job stderr:
>> Error running cufflinks.
>> return code = 1
>> Command line:
>> cufflinks -q --no-update-check -I 300000 -F 0.100000 -j 0.150000 -p 8 -G /galaxy-repl/main/psufiles/004/831/dataset_4831015.dat -u -N -b /galaxy/data/hg18/sam_index/hg18.fa /galaxy-repl/main/files/007/074/dataset_7074022.dat
>> Error: cannot open reference GTF file /galaxy-repl/main/psufiles/004/831/dataset_4831015.dat for reading
>>
>>
>> -----------------------------------------------------------------------------
>> job stdout:
>> cufflinks v2.1.1
>> cufflinks -q --no-update-check -I 300000 -F 0.100000 -j 0.150000 -p 8 -G /galaxy-repl/main/psufiles/004/831/dataset_4831015.dat -u -N -b /galaxy/data/hg18/sam_index/hg18.fa
>>
>> -----------------------------------------------------------------------------
>> job info:
>> None
>> -----------------------------------------------------------------------------
>> job traceback:
>> None
>> -----------------------------------------------------------------------------
>> (This is an automated message).
> --
> Jennifer Hillman-Jackson
> http://galaxyproject.org
>
--
Jennifer Hillman-Jackson
http://galaxyproject.org
9 years, 1 month
tophat issues
by miroslav.sotak
To whom it may concern
I do have a problem with tophat. I can easily put fastq data to
"history" and according to RNA-seq Analysis Exercise provided by Jeremy.
We checked the type of Ascii ofset for the quality estimation. I tried
even "quality data converter" set to 33 (we do have data of this ASCII
offset from 2 different sources) but "tophat for Illumina" simply can
not read the data before and even after quality format converter. We do
not have any idea what is going on. I am logged in Galaxy with current
email, can you check my data and is there any converter for quality
offset?
Sincerely
Miro Sotak
9 years, 1 month
Re: [galaxy-user] Problem loading BAM into IGV browser - invalid GZIP header error message
by Jim Johnson
Is galaxy returning an html page rather than the desired bam file?
Are you using an nginx or apache proxy server to your galaxy server?
I think that may be required, in order to view BAM files in IGV directly from Galaxy.
JJ
On 11/29/13, 11:00 AM, galaxy-user-request(a)lists.bx.psu.edu wrote:
> Send galaxy-user mailing list submissions to
> galaxy-user(a)lists.bx.psu.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.bx.psu.edu/listinfo/galaxy-user
> or, via email, send a message with subject or body 'help' to
> galaxy-user-request(a)lists.bx.psu.edu
>
> You can reach the person managing the list at
> galaxy-user-owner(a)lists.bx.psu.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of galaxy-user digest..."
>
>
> HEY! This is important! If you reply to a thread in a digest, please
> 1. Change the subject of your response from "Galaxy-user Digest Vol ..." to the original subject for the thread.
> 2. Strip out everything else in the digest that is not part of the thread you are responding to.
>
> Why?
> 1. This will keep the subject meaningful. People will have some idea from the subject line if they should read it or not.
> 2. Not doing this greatly increases the number of emails that match search queries, but that aren't actually informative.
>
> Today's Topics:
>
> 1. Problem loading BAM into IGV browser - invalid GZIP header
> error message (Vosberg, Sebastian)
> 2. Re: Problem loading BAM into IGV browser - invalid GZIP
> header error message (Jim Robinson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 29 Nov 2013 11:04:48 +0100
> From: "Vosberg, Sebastian" <sebastian.vosberg(a)helmholtz-muenchen.de>
> To: "galaxy-user(a)lists.bx.psu.edu" <galaxy-user(a)lists.bx.psu.edu>
> Subject: [galaxy-user] Problem loading BAM into IGV browser - invalid
> GZIP header error message
> Message-ID:
> <20854588711E4A489A3AD70C9BA5548A01AE291C4614(a)XCH11.scidom.de>
> Content-Type: text/plain; charset="utf-8"
>
> Dear all,
>
>
> sometimes I encouter a problem trying to load BAM files directly from Galaxy into the IGV browser. First I am starting the IGV browser locally, then clicking on the appropriate BAM file and on "display with IGV _local_" in Galaxy. In most cases it works, but for some reasons not with specific files. The error message says
>
> "Error loading http://_URL-to-file_/galaxy_example.bam: An error occured while accessing http://_URL-to-file_/galaxy_example.bam
> Invalid GZIP header"
>
> What does it mean? And why am I able to download the BAM file and load it from HDD into the IGV?
> The problem comes with all BAM files of one sample cohort, but not with another (but same sample design and workflow used). Rerunning the workflow doesn't help...
>
>
> I would be very thankful for every kind of help!
>
>
> Best,
> Sebastian
>
> Helmholtz Zentrum M?nchen
> Deutsches Forschungszentrum f?r Gesundheit und Umwelt (GmbH)
> Ingolst?dter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir?in B?rbel Brumme-Bothe
> Gesch?ftsf?hrer: Prof. Dr. G?nther Wess, Dr. Nikolaus Blum, Dr. Alfons Enhsen
> Registergericht: Amtsgericht M?nchen HRB 6466
> USt-IdNr: DE 129521671
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 29 Nov 2013 09:42:42 -0500
> From: Jim Robinson <jrobinso(a)broadinstitute.org>
> To: "Vosberg, Sebastian" <sebastian.vosberg(a)helmholtz-muenchen.de>,
> "galaxy-user(a)lists.bx.psu.edu" <galaxy-user(a)lists.bx.psu.edu>
> Subject: Re: [galaxy-user] Problem loading BAM into IGV browser -
> invalid GZIP header error message
> Message-ID: <5298A7E2.7000908(a)broadinstitute.org>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Hi Sebastian,
>
> Is it possible to share an example bam that exhibits this problem on a
> Galaxy server I can reach? Also, which version of IGV are you using
> (select Help > About... to see the version).
>
> -- Jim
>
>> Dear all,
>>
>>
>> sometimes I encouter a problem trying to load BAM files directly from Galaxy into the IGV browser. First I am starting the IGV browser locally, then clicking on the appropriate BAM file and on "display with IGV _local_" in Galaxy. In most cases it works, but for some reasons not with specific files. The error message says
>>
>> "Error loading http://_URL-to-file_/galaxy_example.bam: An error occured while accessing http://_URL-to-file_/galaxy_example.bam
>> Invalid GZIP header"
>>
>> What does it mean? And why am I able to download the BAM file and load it from HDD into the IGV?
>> The problem comes with all BAM files of one sample cohort, but not with another (but same sample design and workflow used). Rerunning the workflow doesn't help...
>>
>>
>> I would be very thankful for every kind of help!
>>
>>
>> Best,
>> Sebastian
>>
>> Helmholtz Zentrum M?nchen
>> Deutsches Forschungszentrum f?r Gesundheit und Umwelt (GmbH)
>> Ingolst?dter Landstr. 1
>> 85764 Neuherberg
>> www.helmholtz-muenchen.de
>> Aufsichtsratsvorsitzende: MinDir?in B?rbel Brumme-Bothe
>> Gesch?ftsf?hrer: Prof. Dr. G?nther Wess, Dr. Nikolaus Blum, Dr. Alfons Enhsen
>> Registergericht: Amtsgericht M?nchen HRB 6466
>> USt-IdNr: DE 129521671
>>
>> ___________________________________________________________
>> The Galaxy User list should be used for the discussion of
>> Galaxy analysis and other features on the public server
>> at usegalaxy.org. Please keep all replies on the list by
>> using "reply all" in your mail client. For discussion of
>> local Galaxy instances and the Galaxy source code, please
>> use the Galaxy Development list:
>>
>> http://lists.bx.psu.edu/listinfo/galaxy-dev
>>
>> To manage your subscriptions to this and other Galaxy lists,
>> please use the interface at:
>>
>> http://lists.bx.psu.edu/
>>
>> To search Galaxy mailing lists use the unified search at:
>>
>> http://galaxyproject.org/search/mailinglists/
>
>
> ------------------------------
>
> _______________________________________________
> galaxy-user mailing list
> galaxy-user(a)lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-user
>
> To search Galaxy mailing lists use the unified search at:
> http://galaxyproject.org/search/mailinglists/
>
> End of galaxy-user Digest, Vol 89, Issue 25
> *******************************************
--
James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota
9 years, 1 month
Problem loading BAM into IGV browser - invalid GZIP header error message
by Vosberg, Sebastian
Dear all,
sometimes I encouter a problem trying to load BAM files directly from Galaxy into the IGV browser. First I am starting the IGV browser locally, then clicking on the appropriate BAM file and on "display with IGV _local_" in Galaxy. In most cases it works, but for some reasons not with specific files. The error message says
"Error loading http://_URL-to-file_/galaxy_example.bam: An error occured while accessing http://_URL-to-file_/galaxy_example.bam
Invalid GZIP header"
What does it mean? And why am I able to download the BAM file and load it from HDD into the IGV?
The problem comes with all BAM files of one sample cohort, but not with another (but same sample design and workflow used). Rerunning the workflow doesn't help...
I would be very thankful for every kind of help!
Best,
Sebastian
Helmholtz Zentrum München
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
Ingolstädter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe
Geschäftsführer: Prof. Dr. Günther Wess, Dr. Nikolaus Blum, Dr. Alfons Enhsen
Registergericht: Amtsgericht München HRB 6466
USt-IdNr: DE 129521671
9 years, 2 months
human genome latest annotaiton
by Irene Bassano
Hi Jen,
I uploaded some fastq files and selected as Genome from the drop down list ""Homo sapiens b37(hg_g1k_v37).
Is this the same as the genome listed in UCSC "February 2009 (GRCh37/hg19)"?
I am using iGenomes to get the gene names rather than annotation such as NM_00xxxx and I fused the UCSC website choosing the newest genome, February 2009
Thanks,
(sorry, I think i sent a mail to bugs report by mistake)
best,
Irene
9 years, 2 months
Problems with Picard and GATK tools
by garzetti
Dear all,
I have been trying to analyze some recently acquired WGS reads
(re-sequencing with MiSeq) but I am having problems with both Picard and
GATK tools and I don't know where the problem is.
My fastq reads are already in the sanger/illumina 1.9 format, as
recognized by the FastQC tool. I have modified the attributes of the
read files from fastq to fastqsanger and successfully performed a BWA
mapping against my reference sequence.
I have then filtered the resulting SAM file with "NGS: SAM Tools, Filter
SAM" to have only paired-mapped reads and reordered the file with "NGS:
Picard, Reorder SAM/BAM", allowing the option Truncate sequence names
after first whitespace.
Since my reads are highly duplicated (from the FastQC output), I have
run the "NGS: Picard, Mark Duplicate reads" tool, obtaining the removal
of only 2 duplicated reads. I went on adding a Read Group with "NGS:
Picard, Add or Replace Groups" and starting the SNP calling with GATK
using the tool Realigner Target Creator. And here I have obtained an
empty file and I have started thinking something is wrong.
So, I have tried to perform the mapping again (as suggested by the GATK
wiki when someone got an empty file like me), running the same steps on
different sample reads, but I have always the same strange results from
the De-duplication step and the Realigner tool.
I think there is something wrong during the BWA mapping step, or even in
my fastq reads, but I cannot understand what it is.
Any idea?
And what is the read quality format accepted by Galaxy tools? I know
it's the PHRED+33, but how does it look like?
Example 1:
??A????BDDDEDDDDGGGGGGGHHHF##77AEFHIIHIHIIIH##77ACFFHHHIHIIHH#5AEFHHHHHHF#55AFHEAEDHHHHHHFFCFHHH#######64#66=+@DDEGGGGDEDEEBEECCECEEGGEGGGGGGGGEEGGA5C0
or
Example 2:
!!"!!!!#%%%&%%%%((((((()))'!!!!"&')**)*)***)!!!!"$'')))*)**))!!"&'))))))'!!!"')&"&%))))))''$')))!!!!!!!!!!!!!!!%%&((((%&%&&#&&$$&$&&((&((((((((&&(("!$!
I did BWA mapping with both types and it worked, but maybe the problems
lies somewhere here.
I hope someone can help me!
Thank you!!!!
Debora
9 years, 2 months
Re: [galaxy-user] Cufflinks returned 0 value in all RPKMs
by Jennifer Jackson
Hi Dao,
To run the analysis correctly on SOLiD data, a local or cloud Galaxy
would be needed. A cloud Galaxy is web based and if you follow the links
below, you will find exact instructions for getting set up. There are
Amazon fees, but they do have various grant programs that you can review
at their site for help with that. Galaxy itself is always free!!
About tools on the Test server .. these are constantly in flux in terms
of dependencies and such, and we don't support them because this is
truly a development environment for us. You may find that certain tools,
including this one, work on smaller datasets at some point in the
future, but for the public this shouldn't be used for serious work.
One last option, and I don't know for certain if there is one in here
that will accept your datatype and enough quota to do significant work
(these also change over time), are other public Galaxy servers. Each is
supported by the hosting group. A list is here and you can look
through/review the sites to see what is available:
http://wiki.galaxyproject.org/PublicGalaxyServers
Take care,
Jen
Galaxy team
On 11/26/13 7:43 AM, Ly, Dao wrote:
>
> Hi
>
> Thank you very much for your reply. For the time being, I just wanted
> to be familiarized with the workflow and the open resource of galaxy
> main to analyze NGS. If you could advise me how I can obtain the RPKM
> that will be great. I have tried many ways to map but no luck so far.
> I think I’m at the end of my wits.
>
> I did try tophat2 with ion torrent data and it worked fine. This solid
> SRA format is giving me a hard time and I can only work on webbase
> program. I also try tophat for solid on test server but it failed!
> Many thanks again
>
> Best regards
>
> Dao
>
> *From:*Jennifer Jackson [mailto:jen@bx.psu.edu]
> *Sent:* November 20, 2013 8:46 PM
> *To:* Ly, Dao; galaxy-user(a)lists.bx.psu.edu
> *Subject:* Re: [galaxy-user] Cufflinks returned 0 value in all RPKMs
>
> Hello,
>
> If the data is RNA from rat, then you will want to be using Tophat
> instead of Bowtie. Otherwise the data will not be mapped as spliced
> the results will be off in many ways (the fragments counts are a small
> symptom of a larger problem).
>
> You can use 'Tophat for SOLiD' on a suitable local or cloud Galaxy
> instance. It is available on the Test server, but tools are not
> supported here (we test/break things!) and the quotas are just 10G
> with an account. But maybe is a place to do a small trial run before
> committing to a cloud server.
> http://getgalaxy.org
> http://usegalaxy.org/cloud
> http://usegalaxy.org/toolshed
>
> More about RNA-seq is in our wiki and public server, including
> link-outs and tutorials, you can get started here:
> Example → RNA-seq analysis tools:
> http://wiki.galaxyproject.org/Support#Interpreting_scientific_results
> See RNA-seq examples: http://wiki.galaxyproject.org/Learn#Other_Tutorials
>
> Best,
>
> Jen
> Galaxy team
>
> On 11/20/13 6:09 AM, Ly, Dao wrote:
>
> Hi
>
> I have been trying to analyze a rat Solid SRA but I encountered a
> problem: cufflinks gave me 0 RPKM in all genes. Here is my workflow
>
> 1.Get data with EBI SRA: sent the fastaq file directly to galaxy
>
> 2.Fastaq groomer
>
> 3.Mapped with bowtie for Solid (paire-ended) with the built- in
> index rat rn5 as reference genome
>
> 4.Sam to Bam the bowtie mapping result
>
> 5.Cufflinks the bam file
>
> All RPKMs of gene expression and transcript expression have a 0
> value even thought the RPKM status is OK. I used default setting
> for all jobs. Am I missing something? Any help, suggestion will be
> greatly appreciated. Thank you very much
>
> Best regards
>
> Dao
>
>
>
>
> ___________________________________________________________
>
> The Galaxy User list should be used for the discussion of
>
> Galaxy analysis and other features on the public server
>
> at usegalaxy.org. Please keep all replies on the list by
>
> using "reply all" in your mail client. For discussion of
>
> local Galaxy instances and the Galaxy source code, please
>
> use the Galaxy Development list:
>
>
>
> http://lists.bx.psu.edu/listinfo/galaxy-dev
>
>
>
> To manage your subscriptions to this and other Galaxy lists,
>
> please use the interface at:
>
>
>
> http://lists.bx.psu.edu/
>
>
>
> To search Galaxy mailing lists use the unified search at:
>
>
>
> http://galaxyproject.org/search/mailinglists/
>
>
>
> --
> Jennifer Hillman-Jackson
> http://galaxyproject.org
--
Jennifer Hillman-Jackson
http://galaxyproject.org
9 years, 2 months