February 2012 - galaxy-user - lists.galaxyproject.org

Download multiple files from history
by Adhemar 02 May '12

02 May '12

Hi, I'm trying to download multiple files from a given history but I couldn't figure out how to do it. Is there a way? Thanks, Adhemar

5 4

question about uploading data through URL method
by Xiangming Ding 20 Mar '12

20 Mar '12

Hi galaxy I am a new user of galaxy. i met a problem and didnot find similar question in FAQ. I wanted to upload the data from DDBJ DRA dataset to galaxy through UTL method. The file is around 800M. However after uploading, the FASTQ file was just around 2M. So I wanted know whether it is possible to upload a large file to galaxy through URL method? or I should download the file to my pc and then uploading to galaxy through FTP method. Thanks xiangmimg

6 19

Speed up uploading into local Galaxy, terribly slow!!
by Alejandra Rougon 08 Mar '12

08 Mar '12

Hello, I tried to search in the forums and although this question has appeared many times I still don't have a solution. I cannot manage to upload big files into the local galaxy, it just takes ages. Can I not just copy and paste into a local directory? why do I have to upload the files if it is already installed locally? I do not have a server webpage in order to use the url address option If I do it through ftp (locally) what ftp address shall I put? ftp localhost:8080?? Is there any other option to speed up uploading, is so slow that is no longer worth using it, please help me!

7 17

description of "Find diagnostic hits"
by William Hsiao 08 Mar '12

08 Mar '12

Hi, Is there a more detailed explanation on what "Find Diagnostic hits" module in Metagenomic analysis does? Does it take all the hits above user specified cutoffs to one query and return the taxon name if and only if all the hits have the same taxon (my understanding)? Or, does it look at the top few hits or??? Thanks for providing some clarification on the method. Cheers, Will

2 2

Re: [galaxy-user] [galaxy-bugs] GI errors in the megablast table of results ?
by Guru Ananda 01 Mar '12

01 Mar '12

Dear Sandrine, Thanks for pointing out this issue. The BLAST databases we have on Galaxy are from last year, while those on NCBI website are the latest (Jan 2012). As pointed out on NCBI website ( http://www.ncbi.nlm.nih.gov/Sitemap/sequenceIDs.html) it appears that each time any change is made to a sequence/database, GI numbers change as well. This is perhaps why you're observing discrepancies in GI numbers and lengths between megablast outputs on Galaxy and NCBI. I'm currently in the process of downloading the latest BLAST databases from NCBI, and I'll let you know when they're available for use on Galaxy. Thanks for your patience, Guru Galaxy team. On Wed, Nov 9, 2011 at 8:03 AM, Sandrine Hughes <Sandrine.Hughes(a)ens-lyon.fr > wrote: > Dear all, > > I’m not sure where I need to send my email so I apologize if I’m wrong. > > I have a trouble with the Megablast program available in NGS Mapping and I > hope that you can help. Indeed, I think that there might be a problem with > the table given in output, and notably a shift between the GI numbers and > the parameters associated. > > Here are the details: > > I. First, what I have done : > I used the program to identify the species that I have in a mix of > sequences by using the following options: > Database nt 27-Jun-2011 > Word size 16 > Identity 90.0 > Cutoff 0.001 > Filter out low complexity regions Yes > I run the analyses twice and obtained exactly the same results (I used > the online version of Galaxy, not a local one). > > II. Second, I analysed the data obtained for one of my sequence (1-202). > The following lines are the beginning of the table that I obtained after > the megablast and two lines with troubles: > > 1-202 312182292 484 99.33 150 1 0 1 > 150 1 150 2e-75 289.0 > 1-202 312182201 476 99.33 150 1 0 1 > 150 1 150 2e-75 289.0 > 1-202 308228725 928 99.33 150 1 0 1 > 150 19 168 2e-75 289.0 > 1-202 308228711 938 99.33 150 1 0 1 > 150 22 171 2e-75 289.0 > 1-202 308197083 459 99.33 150 1 0 1 > 150 10 159 2e-75 289.0 > 1-202 300392378 920 99.33 150 1 0 1 > 150 10 159 2e-75 289.0 > 1-202 300392376 918 99.33 150 1 0 1 > 150 9 158 2e-75 289.0 > 1-202 300392375 922 99.33 150 1 0 1 > 150 11 160 2e-75 289.0 > 1-202 300392374 931 99.33 150 1 0 1 > 150 21 170 2e-75 289.0 > 1-202 300392373 909 99.33 150 1 0 1 > 150 21 170 2e-75 289.0 > 1-202 300392371 1172 99.33 150 1 0 1 > 150 9 158 2e-75 289.0 > ... > 1-202 179366399 151762 98.67 150 2 0 1 > 150 46880 47029 6e-73 281.0 > 1-202 58617849 511 98.67 150 2 0 1 > 150 21 170 6e-73 281.0 > > > III. Third, what I’ve noticed: > My first trouble was that among all the species identified, two were > very different from the expected ones (2 last lines). So I decided to > search if that could be possible for that sequence and performed > independently a megablast on the NCBI with similar options. I was not able > to find these two species in the results. > So, I decided to check the hits identified in the table above and > identified a second trouble. In the table, the second column give the GI of > the database hit and the third column give the length of the database hit. > However, when I manually checked in NCBI the length of the GI, this one was > incorrect. Indeed, for the GI 312182292, the length should be 580 and not > 484. > By checking different lines, I noticed that the length that is given > for a GI corresponds to the length of the GI-1. As you can see in the above > table, some GI are consecutive (300392376, 300392375,...). When checking > the length of 300392376 in NCBI, I should have 920. But when I checked > 300392375, I found 918. And this was true for the following lines : > 300392374 give normally 922 and 300392373 give 931... My conclusion at that > point was that there was a shift of –1 between the GI and the other > parameters of the line (indeed the parameters for the remaining columns are > in agreement with the length of the GI-1). However, that’s not always > true.... For some GI given in the table (for example, the two last lines), > if we check the parameters of the GI-1, the parameters are completely > different... So, I suppose that there is a trouble in the GI sorting during > the megablast but I’m not able to clearly define the problem. > > IV. Fourth, confirmed with an other dataset > In order to be sure that the problem was not linked to my data or my > process, I asked a colleague to do a megablast on independent data. The > conclusions were similar to mine : a shift in the GI given in the table > and the parameters associated, that most of the time but not always, > correspond to GI-1. > > Can you confirm that there is a problem with the output of the megablast > available in Galaxy ? If yes, do you think you can fix it ? > > Many thanks for your help, > > Best regards, > > Sandrine > -- Graduate student, Bioinformatics and Genomics Makova lab/Galaxy team Penn State University 505 Wartik lab University Park PA 16802 guru(a)psu.edu

2 1

Re: [galaxy-user] Visualise data
by Jennifer Jackson 29 Feb '12

29 Feb '12

Hi Ateeq, The BAM/SAM files can be visualized in Trackster, using your custom reference genome (the same dataset as used for Bowtie or TopHat). But, there are no Cufflinks results, and therefore nothing to visualize, due to the parameters used. Since you are working with a bacterial genome, the parameters will need to be tuned to account for the lack of transcript splicing. The best resources for advice are likely seqanswers or the tool authors, as explained in this prior answer to another bacterial genome/RNA-seq question: http://galaxy-users-list-archive.2308625.n4.nabble.com/Cufflinks-merging-mo… Recently, there has been some user discussion about RNA-seq analysis and bacterial genomes on the galaxy-user mailing list. If you want to search and read through the Q&A, using our custom google search is the best way to locate the threads (but, expect to find just a few): http://galaxy.psu.edu/search/mailinglists/ If anyone else reading this thread has help to offer, please feel free to jump in and share any working knowledge for this type of analysis. Best wishes for your project, Jen Galaxy team On 2/29/12 12:31 PM, Ateequr Rehman wrote: > hello Jennifer > Thanks a lot, here is the link > > Best > ateeq > > ------------------------------------------------------------------------ > *From:* Jennifer Jackson <jen(a)bx.psu.edu> > *To:* Ateequr Rehman <ateeqrr(a)yahoo.com> > *Cc:* "galaxy-user(a)lists.bx.psu.edu" <galaxy-user(a)lists.bx.psu.edu> > *Sent:* Wednesday, February 29, 2012 9:24 PM > *Subject:* Re: [galaxy-user] Visualise data > > Hi Ateeq, > > Please share a link to your history so that we can provide feedback. Use > "Options -> Share or Publish", generate the share link (first button), > copy the link into a reply email, and send that back to me directly. > > Also, your last few questions have been sent as replies to other > questions on the mailing list with a new subject line. This causes them > to thread/track incorrectly (and potentially be missed). When sending a > new question, please start with a brand new message, address the "to" as > "galaxy-user(a)bx.psu.edu <mailto:galaxy-user@bx.psu.edu>" and this will > reach us correctly. > > Thank you and I will watch for your reply, > > Jen > Galaxy team > > On 2/29/12 11:30 AM, Ateequr Rehman wrote: > > Hello Admin and users > > > > i wanted to visualize my data, i ran Tophat and converted sam to BAM and > > then cufflink, > > but totally unable to see any output data, > > > > any suggestion, how i could see my results > > > > For administrators, on my account run number 76 to 79 are the run i want > > to visualize > > > > > > Thanks > > Ateeq > > > > > > ___________________________________________________________ > > The Galaxy User list should be used for the discussion of > > Galaxy analysis and other features on the public server > > at usegalaxy.org. Please keep all replies on the list by > > using "reply all" in your mail client. For discussion of > > local Galaxy instances and the Galaxy source code, please > > use the Galaxy Development list: > > > > http://lists.bx.psu.edu/listinfo/galaxy-dev > > > > To manage your subscriptions to this and other Galaxy lists, > > please use the interface at: > > > > http://lists.bx.psu.edu/ > > -- > Jennifer Jackson > http://usegalaxy.org > http://galaxyproject.org/wiki/Support > > -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support

1 0

March 2012 Galaxy Update
by Dave Clements 29 Feb '12

29 Feb '12

Hello all, The March 2012 Galaxy Update<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_02>is now available. *Galaxy Update <http://wiki.g2.bx.psu.edu/GalaxyUpdates>* is a (mostly) monthly summary of what is going on in the Galaxy community. *Galaxy Updates * complements the *Galaxy Development News Briefs<http://wiki.g2.bx.psu.edu/DevNewsBriefs> * which accompany new Galaxy releases and focus on Galaxy code updates. *Highlights:* - 28 New Papers<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_03#New_Papers> - Open Positions<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_03#Who.27s_Hiring>at four different institutions - Upcoming Events and Deadlines<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_03#Upcoming_Events_and_Deadlin…> - GCC2012 Update<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_03#GCC2012_Update>, including - Abstract submission is open. - Training Day topics are set. - Tool Shed Contributions<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_03#Tool_Shed_Contributions> If you have anything you would like to see in the April *Galaxy Update<http://wiki.g2.bx.psu.edu/GalaxyUpdates> *, please let me know. Thanks, Dave C. -- http://galaxyproject.org/GCC2012 <http://galaxyproject.org/wiki/GCC2012> http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://galaxyproject.org/wiki/

1 0

Analysis 454 quality data
by Elad Firnberg 28 Feb '12

28 Feb '12

Hi, I am starting off with 454 read data in an sff file. I would like to get quality statistics on the data, but having trouble getting the tools to work. I first tried to convert to a fastq file and use the "Compute Quality Statistics" tool, but I get this error, "An error occurred running this job: *fastx_quality_stats: found invalid nucleotide sequence "* I then tried the "fastq groomer" and repeated the "Compute Quality Statistics", but got the same error. Perhaps it cannot handle the longer 454 sequences? * * Alternatively I tried converting the sff file to a fasta file and quality file. I had to manually convert the quality data file to qual454 for the "Build Base Quality Distribution" tool to recognize it, but upon doing that I got this error: *"*An error occurred setting the metadata for this dataset." And the Build Base Quality Distribution tool, also failed. * * * * Any help resolving this issue would be appreciated, Thank you, Elad

2 1

Sam filtering and Header/Sorting issues
by denis puthier 28 Feb '12

28 Feb '12

Dear All, I would like to add some filtering steps in my RNA-Seq pipeline. To do so, I used the accepted.hits from TopHat and apply a filter using NGS: SAM Tools > Filter SAM and select reads with bitwise flag 0x0002. This does the job. However, I am unable to use cufflink after this step and got the following error message that seems to indicate that the file contains no header and is unsorted. Is there a workaround ? Thanks a lot http://main.g2.bx.psu.edu/u/dputhier/h/srx011549 Error running cufflinks. return code = 1 cufflinks: /lib64/libz.so.1: no version information available (required by cufflinks) Command line: cufflinks -q --no-update-check -s 20 -I 300000 -F 0.100000 -j 0.150000 -p 8 -m 200 -g /galaxy/main_pool/pool5/files/003/858/dataset_3858145.dat /galaxy/main_pool/pool1/files/003/858/dataset_3858306.dat [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this is not a BAM file). File /galaxy/main_pool/pool1/files/003/858/dataset_3858306.dat doesn't appear to be a valid BAM file, trying SAM... [14:11:28] Loading reference annotation. [14:11:28] Inspecting reads and determining fragment length distribution. Error: this SAM file doesn't appear to be correctly sorted! current hit is at chr10:181061, last one was at chr1:245006405 Cufflinks requires that if your file has SQ records in the SAM header that they appear in the same order as the chromosomes names in the alignments. If there are no SQ records in the header, or if the header is missing, the alignments must be sorted lexicographically by chromsome name and by position. -- ==================================================================== Denis Puthier laboratoire INSERM TAGC/INSERM U928 Parc Scientifique de Luminy case 928 163, avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE Mail: puthier(a)tagc.univ-mrs.fr Tel: (National) 04 91 82 87 11 / (International) 33 4 91 82 87 11 Fax: (National) 04 91 82 87 01 / (International) 33 4 91 82 87 01 Web: http://tagc.univ-mrs.fr/puthier http://biologie.univ-mrs.fr/view-data.php?id=245 http://tagc.univ-mrs.fr/tbrowser ====================================================================

2 2

FASTQ groomer processing time
by Matthew McCormack 27 Feb '12

27 Feb '12

I used FASTQ groomer on a 29 Gb Illumina 1.5+ FASTQ file to go from Illumina 1.3-1.7+ to Sanger and it is still processing after over 30 hrs. Is this a normal time frame for a FASTQ file this size ? Matthew The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

2 1