October 2013 - galaxy-user - lists.galaxyproject.org

Help with Summary Statistics
by D. A. Cowart 23 May '14

23 May '14

Hello, I am attempting to use Galaxy to calculate the mean sequence read length and identify the range of read lengths for my 454 data. The data has already been organized and sorted by species. The format of the data is as follows: >HD4AU5D01BHBCQCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC >HD4AU5D01A093MCTCTGTCGCTCTGTCTCTCTTCTCTCTCTCTCTCTCT etc...for each species I have attempted to use the "Summary Statistics" button, however it appears to only be for numerical data and not sequence data. Is this tool/task available via Galaxy? Thank you, Dominique Cowart User name: dac330

6 5

SNP finding
by Xiefan Fang 03 Dec '13

03 Dec '13

Dear galaxy users, We have done deep sequencing on some known genomic loci using Hiseq2000. I have already mapped the reads to the reference sequences by using Galaxy. In the next step, I want to find SNPs and calculate the SNP percentage within the reads. There are 500,000 to 1,000,000 reads per biological sample. Can I do it with galaxy? If not, is there other programs available in windows? Considering that I am not very familiar with programming. Thanks, Xiefan University of Florida

5 9

new cuffdiff with readgroup problem
by Johanna Sandgren 19 Nov '13

19 Nov '13

Hi, I have tested the new cuffdiff version in galaxy and was very eager to now also get the replicate data for each test. Have anyone tried that yet and succeeded with replicates in downstream R package CummeRbund? Even though I successfully build database with the now 15 output files (including read group tracking), no error message, I can not plot anything with replicates=T. Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near "from": syntax error) Anyone know the problem? Thanks, Johanna ...................................................................................................................................................... Johanna Sandgren, PhD Department of Oncology-Pathology CCK, Karolinska Institutet SE-171 76 Stockholm, Sweden +46-8-517 721 35 (office), +46-8- 321047(fax), +46-708 388476 (mobile)

2 5

November 2013 Galaxy Update Newsletter
by Dave Clements 31 Oct '13

31 Oct '13

Hello all, The November 2013 Galaxy Update is out<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11> : *Highlights:* - Two new public Galaxy servers<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#New_Public_Servers> : CoSSci: Complex Social Science Gateway<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#CoSSci:_Complex_Social_…> (which has *nothing* to do with biology), and BioCiphers Lab Galaxy<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#BioCiphers_Lab_Galaxy> . - 53 new papers<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#New_Papers>, including "Expanding roles in a library-based bioinformatics service program: a case study," "DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data," and "Ten Simple Rules for Reproducible Computational Research" - Who's hiring<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#Who.27s_Hiring> - Upcoming Events<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#Other_Events>, including - Save these dates! GCC2014: June 30 - July 2, Baltimore<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#GCC2014:_June_30_-_July…> - Galaxy Day, December 4, Paris<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#Galaxy_Day.2C_December_…> - UC Davis Bioinformatics Boot Camps<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#UC_Davis_Bioinformatics…> - Lifeportal launched at the University of Oslo<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#Lifeportal_at_the_Unive…> - Tool Shed contributions<http://wiki.galaxyproject.org/GalaxyUpdates/2013_11#Tool_Shed_Contributions> If you have anything you would like to see in the next *Galaxy Update<http://wiki.galaxyproject.org/GalaxyUpdates> *, please let us know. Dave Clements and the Galaxy Team <http://wiki.galaxyproject.org/GalaxyTeam> -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://wiki.galaxyproject.org/

1 0

mm7 chromosome name
by Kreiling, Jill 31 Oct '13

31 Oct '13

Hello, I have a set of coordinates for mm7 that I have been using try to extract the genomic sequences. However it doesn't recognize the chromosome name column. The are currently listed as chr1, chr2, ....chrX. This is the error I get each time I try to extract sequences: Chromosome by name 'chr1' was not found for build 'mm7'. Skipped 1181 invalid lines, 1st is #1, "chr1 4558068 4561910 region_0 0 +" However if I change the build to mm10 it works fine - but the coordinates are not the same between builds. Also, mm7 can't be lifted over to mm9 or mm10. Does anyone know the proper format for chromosome name in mm7: Thanks, Jill

2 3

Tool XML form building bug with <options from_data_table="..."> ?
by damion＠learningpoint.ca 31 Oct '13

31 Oct '13

Has anyone run into this? I'm building a general-purpose filter control on my galaxy tool xml template for enabling numeric fields to be filtered by > < etc. parameters - in a user friendly way. I have a select list <para> driven by a data table: ... <param name="filter_column" type="select" label="Col"> <options from_data_table="bccdc_blast_fields" /> </param>...This works fine in building a list of fields to select from. Then I add <filter type="sort_by" column="1"/> <filter type="add_value" name="TESTESTTEST" value="WHWHWHW" /> but nothing happens, sort remains incorrect and no extra value. I try these same filters on a previous <param> in form that is driven by <options from_file="bccdc_blast_bins.loc"> and they work fine. I also tried applying <fiilter type="static_value" ...> to no avail on from_data_table options tag, but from from_file options, no problem. So I start to think there's a bug whereby NO filters work on <options from_data_table="bccdc_blast_fields" /> input? I've surveyed the galaxy.tools.parameters.dynamic_optionspython code, but can't see a decision point there in which from_data_table work vs from_file choice is made; is it in another script file? I'm using a BioLinux 2012 install that includes Galaxy. Help appreciated! Regards, Damion DooleyBC Centre for Disease ControlVancouver, BC

2 1

Server down
by Suhana 31 Oct '13

31 Oct '13

Hi, I have some data in tab delimited format and need to convert it to fasta format. I have been using Galaxy text formatting tool for this and other tools for a long time. It seems like the server is down and that tool is not working. Please can you check. Thank you, Regards, Suhana

2 1

Trackster Error: needLargeMem: trying to allocate 0 bytes (limit: 100000000000)
by Guest, Simon 31 Oct '13

31 Oct '13

I'm having problems getting Trackster working on my own Galaxy instance, so I thought I would check on the usegalaxy public server. However, I'm getting the same Trackster Error: needLargeMem: trying to allocate 0 bytes (limit: 100000000000) that was reported on this list in July, but there was no followup: http://user.list.galaxyproject.org/Trackster-Error-td4655737.html My history is at https://usegalaxy.org/u/simon-guest/h/trackster-error This is just an artificial test I made using a fragment of a reference genome, but I thought it should work OK. Any clues? cheers, Simon

3 3

bacterial metatranscriptomes
by Soren Franzenburg 31 Oct '13

31 Oct '13

Hi everybody, I am trying to perform metatranscriptomic analysis on gut samples. Whereas the host transcriptome was easy, the lack of a reference genome for my mixed bacterial population is a problem. Any suggestions how to do the alignment? Has anybody done metatranscriptomics on Galaxy before? A recent paper from Leimena et al. (2013, BMC Genomics) claims they have done the alignment against the complete NCBI prokaryotic genome database. This sounds like a good solution to me, but I struggle to get this running. I am normally on the wet side of the work and my experience with programming and R is limited. The aim is to compare the functional capacity of the bacterial communities in different closely related host species. Thanks for help, Sören --------------------------------------------- Sören Franzenburg 5130 Comstock Hall Cornell University 14853 Ithaca, NY

2 1

SOLID RNA-Seq De Novo Transcriptome Assembly
by Jennifer Jackson 31 Oct '13

31 Oct '13

Hi Oscar, You most likely want to explore tools that are designed specifically for this purpose, if the reference genome you are talking about is the assembled transcriptome. Trinity is one tool, but there are others in the Tool Shed and on some of the Public Servers. Links: http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server http://wiki.galaxyproject.org/Support#Custom_reference_genome http://wiki.galaxyproject.org/BigPicture/Choices http://wiki.galaxyproject.org/Tool%20Shed Your question is a bit confusing because the 'annotations' may already be what these tools would produce and I am not sure what you are trying to do next. If it is the assignment of putative function, then there are many paths to follow, some better suited for viral genomes. You'll want to find out what others doing this exact work are using right now and consider the same tools. Start by checking out the public Galaxy servers, many have trial tools that you can later include in a local/cloud from the tool shed: http://wiki.galaxyproject.org/PublicGalaxyServers If your question was misunderstood (the reference genome is in fact a DNA genome - and you have RNA sequence to align), then the RNA-seq pipeline can be used as-is with 'Tophat for SOLiD', Cufflinks, CuffMerge, CuffDiff - all on a local/cloud/slipstream with the reference genome as a cluster reference genome. There is no requirement for reference annotation with any of these tool - it helps to gain full functionality - especially with CuffDiff, but is not required. More assistance is at tophat.cufflinks(a)gmail.com. Hopefully this helps, Jen Galaxy team On 10/24/13 6:06 PM, Oscar Aguilar wrote: > Hi Dr. Jackson, > > I'm sorry to bother you but I have been searching for answers but I > can't seem to find any and I'm sure that you would be able to answer > my question. > > So I am trying to find a novel gene using de novo tramscriptome > assembly and I see that TopHat might just be able to help me out with > my dilemma. The viral genome not available on the galaxy website, and > the other issue is that I am using SOLID data. So my question is, can > I use TopHat with SOLID data by converting to nucleotide base fastq? > or do I have to use TopHat2 with a colourspace viral genome? I also > have to admit that I am completely new to bioinformatics and my > project as lead me here so I am trying to tackle it on my own. > > Fo the custom genome, I have managed to load it (in fasta, and > annotation in BED) but I am not sure how to assign the annotations to > the genome. Also, does TopHat require an annotated genome? I read that > it doesn't but I'm not sure...I fear that my gene is a spliced one and > I would like to be able to pull it out from output data. > > I'm sorry to bother you as I'm sure the answer is out there I just > really can't seem to find it and am now desperate. > > Thank you in advance, > Oscar > > > -- > Oscar A. Aguilar, M.Sc > PhD Candidate > Sunnybrook Research Institute > Department of Immunology > University of Toronto > 416-480-6100x89492 > oscar.aguilar(a)utoronto.ca <mailto:oscar.aguilar@utoronto.ca> -- Jennifer Hillman-Jackson http://galaxyproject.org

1 0