July 2012 - galaxy-user - lists.galaxyproject.org

Data from history now showing up in fastq drop down
by Aarti Desai 06 Jul '12

06 Jul '12

Hi All, We have a galaxy local install. Thanks to Carlos's suggestion, I was able to get the reference genome index to show up in the interface. Now, I am trying to get the data into the galaxy system. I have followed the instructions in the link below to create data libraries. http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Libraries I have modified the following sections in the universe_wsgi.ini file: # Add an option to the library upload form which allows administrators to # upload a directory of files. library_import_dir = /media/FreeAgent GoFlex Drive_/HDD1/Project # Add an option to the admin library upload tool allowing admins to paste # filesystem paths to files and directories in a box, and these paths will be # added to a library. Set to True to enable. Please note the security # implication that this will give Galaxy Admins access to anything your Galaxy # user has access to. allow_library_path_paste = True I created a data library and using the "Add dataset" function, I pasted the path of my data directory in the galaxy UI and selected the "link to files without copying into galaxy" option. This picked up all the files that were present in the directory and except for a couple of files, the job seems to have completed successfully. Now I am not sure how to actually analyze this data. I performed the "Import to current history" operation on two paired end fastq files I want to analyze. These show up in the history with the appropriate size. But when I choose the "Map with BWA for Illumina" option, the two fastq files do not show up in the FASTQ file drop down. These files do show up in the list of files for running fastqc I have also restarted the server after importing the data in the history, but the problem persists. Any input on how to go about analyzing the data in the local galaxy instance once it has been brought into the galaxy frame work is highly appreciated. Thanks for the help. Regards, Aarti Aarti Desai, Ph.D | Domain Specialist - Life Sciences aarti_desai(a)persistent.co.in<mailto:aarti_desai@persistent.co.in> | Cell: +91-9673009492 | Tel: + 91-20-67036348 Persistent Systems Ltd. | Partners in Innovation | www.persistentsys.com<http://www.persistentsys.com/> DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

2 2

Problem with ftp transfer of large bam files
by Hans Matsson 06 Jul '12

06 Jul '12

Hi, I´m using Galaxy (main) browser on a Win 7 PC to get statistics from my sequencing runs. Now I have bam files which are too big for upload from my local hard drive so I tried to ftp upload to main.g2.bx.psu.edu via a client (FileZilla). The transfer of files seems to be complete but the files do not appear under Get Data/Upload File and I have the message "Your FTP upload directory contains no files". I have tried to upload txt, zip, and bam files by ftp but nothing worked. Any suggestions? Many thanks /Hans Hans Matsson, PhD Karolinska Institutet Department of Biosciences and Nutrition Novum Hälsovägen 7-9 SE-141 83 Huddinge, Sweden Email: Hans.Matsson(a)ki.se Phone (office): +46-8-524 81143 ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

2 2

Re: [galaxy-user] Getting reference index files in local galaxy install
by Aarti Desai 06 Jul '12

06 Jul '12

Hi, We have a local install of galaxy and I'm trying to add the reference index files for bwa using the information provided in the following link http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup I have modified the bwa_index.loc file present in the ../tool-data directory by adding the path to where the index is on our server (Also attached). However, even after restarting the server, the reference genome does not show when choosing the "use a built-in index option". I'm not sure whether the loc file is correctly created and whether any other configuration file needs to be changed/updated. Help in the matter greatly appreciated. Thanks, Aarti From: galaxy-user-bounces(a)lists.bx.psu.edu [mailto:galaxy-user-bounces@lists.bx.psu.edu] On Behalf Of Jennifer Jackson Sent: Thursday, July 05, 2012 1:23 AM To: Lindsey Kelly Cc: galaxy-user(a)lists.bx.psu.edu Subject: Re: [galaxy-user] Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq data Hello Lindsey, Yes, you have this correct. The general path would be to: - join forward and reverse data per run - run FASTQ Groomer & FastQC (note: if your data is already in Sanger FASTQ format with Phred+33 quality scaled values, the datatype '.fastqsanger' can be directly assigned and the FASTQ Groomer step skipped. This is likely true if your data is a from the latest CASAVA pipeline, but please double check.) - discard data as needed based on quality - split forward and reverse data that passes QC - concatenate all forward reads from a sample into one FASTQ file - concatenate all reverse reads from a sample into one FASTQ file. - for each sample, run TopHat using the two concatenated FASTQ files To manipulate paired end data, please see the tools -> NGS: QC and manipulation: FASTQ splitter & FASTQ joiner. To combined data files head-to-tail from multiple runs into a single FASTQ file please see the tool -> Text Manipulation: Concatenate datasets. I am not sure of the actual volume of data, but if these start to get large or TopHat errors with a memory problem, a local or cluster instance would be the recommendation: http://getgalaxy.org For reference: http://tophat.cbcb.umd.edu/manual.html http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html Hopefully this helps. Others are welcome to post comments/suggestions. Jen Galaxy team On 7/2/12 11:17 AM, Lindsey Kelly wrote: I am trying to do RNAseq analysis on Paired end data from the Hiseq2000. I have about 50 files for each sample (25 forward and 25 reverse - although each sample has a different number of files). I think that I need to: -convert them into FASTQ sanger format using the FASTSQ groomer tool -check the quality using the FASTQqc tool I don't know how to handle this many files. Do I have to groom and run the QC for each file? Should I join the paired files and run both tools on each pair, or should I combine all of the data for each sample (which I don't know how to do) and then groom and run the QC for all of the reads for the sample. Thanks in advance for advice Lindsey ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://galaxyproject.org DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

3 3

Re: [galaxy-user] tool_path
by Fabien Mareuil 03 Jul '12

03 Jul '12

Hi, Thank for your answer, I copy you a part of the shell code LOCAL_DIR= #{GALAXY_DIR}/galaxy-dist/tools/AnnotateGenes R_DIR= #{R_PATH}/bin echo "ChIP:" >$LOG if [ -r $REG ]; then echo "1: perl $LOCAL_DIR/geneAnnotation.pl -g $LOCAL_DIR/$GENOME.noIdenticalTransc.txt -tf $CHIPFILE -selG $REG -o $OUTSTAT -lp $LEFTPROM -rightp $RIGHTPROM -enh $ENH -dg $DOWNGENE" >> $LOGTMP and a part of the xml <tool id="annotateGenes" name="Annotation of genes with Chip-Seq peaks" version="1.0"> <description> </description> <command interpreter="bash"> #if $use_reg.use_reg_selector == "no" and $use_control.use_control_selector == "no" #annotateGenes_wrapper.sh -f $inputfile -y $log -l $left -o $outputPNG -r $right -d $DownGene -h $EnhLeft -u $stats -v $input_organism.version #elif $use_reg.use_reg_selector == "no" and $use_control.use_control_selector == "yes" # annotateGene_wrapper.sh -f $inputfile -y $log -c $controlfile -x $statsControl -o $outputPNG -l $left -r $right -d $DownGene -h $EnhLeft -u $stats -v $input_organism.version #elif $use_reg.use_reg_selector == "yes" and $use_control.use_control_selector == "no" # annotateGenes_wrapper.sh -y $log -f $inputfile -e $regfile -l $left -o $outputPNG -r $right -d $DownGene -h $EnhLeft -u $stats -v $input_organism.version #else # annotateGenes_wrapper.sh -f $inputfile -c $controlfile -x $statsControl -l $left -y $log -o $outputPNG -r $right -d $DownGene -h $EnhLeft -u $stats -v $input_organism.version -e $regfile #end if This tools are avaible in : http://nebula.curie.fr/ You can see that the variable LOCAL_DIR is the PATH of the tool so I would like to know if it's possible to obtain this information without hard-coded this? Thank you for your answer. Best Regards, Fabien Mareuil > Hello Fabien, > I don't understand the issue - can you provide a sample tool config that includes these hard-coded paths? This initially sounds like an issue with > the tool configs, not the tool shed, but I may see the problem with your clarification. > Thanks, > Greg Von Kuster > On Jul 3, 2012, at 9:58 AM, Fabien Mareuil wrote: >> Hi, >> I have read the exchange betwen you and Florent Angly about "Problem with >> new tool shed" and I have a problem with Nebula Tools: >> http://nebula.curie.fr/. >> At the Pasteur Institute, we have 4 galaxy instances and I would like to >> use a local tool shed instance for Nebula installation. >> However, the nebula has tools with hard-coded path tool but I don't want >> hard-coded this so do you have a solution to add a thing like this ${tool.install_dir} in the xml? >> Thank you for your answer. >> Best Regards, >> Fabien Mareuil >> ___________________________________________________________ >> The Galaxy User list should be used for the discussion of >> Galaxy analysis and other features on the public server >> at usegalaxy.org. Please keep all replies on the list by >> using "reply all" in your mail client. For discussion of >> local Galaxy instances and the Galaxy source code, please >> use the Galaxy Development list: >> http://lists.bx.psu.edu/listinfo/galaxy-dev >> To manage your subscriptions to this and other Galaxy lists, >> please use the interface at: >> http://lists.bx.psu.edu/

2 1

tool_path
by Fabien Mareuil 03 Jul '12

03 Jul '12

Hi, I have read the exchange betwen you and Florent Angly about "Problem with new tool shed" and I have a problem with Nebula Tools: http://nebula.curie.fr/. At the Pasteur Institute, we have 4 galaxy instances and I would like to use a local tool shed instance for Nebula installation. However, the nebula has tools with hard-coded path tool but I don't want hard-coded this so do you have a solution to add a thing like this ${tool.install_dir} in the xml? Thank you for your answer. Best Regards, Fabien Mareuil

2 1

cufflinks
by Jennifer Jackson 03 Jul '12

03 Jul '12

On 7/1/12 8:30 PM, Paul > Hello Jennifer, > I was hoping you could enlighten me about a problem I am currently > having. I have two rna-seq datasets that I am trying to evaluate > using cufflinks - I keep getting null sets back with the second set, > no matter what I do. I am pretty sure that the data are identical in > nature, just from different conditions. Is there an issue with the > current cufflinks instance? Or am I screwing up somehow. I am trying > to evaluate item 149 from my history (which is the filtered and sorted > set from Bowtie analysis). This should be identical in nature to item > 113, with the only difference being the read source (same bug, > different conditions). Both were mapped to the same ref genome (from > the history) and the same annotation file (again, from the history). > Any help is much appreciated! > > -- > Paul -- Jennifer Jackson http://galaxyproject.org

2 3

GMOD Summer School application deadline
by Scott Cain 02 Jul '12

02 Jul '12

Hello, The deadline to apply for the GMOD Summer School is in one week, July 9th. The application is available as a Google Form: https://docs.google.com/spreadsheet/embeddedform?formkey=dG5hNGFiQ3UwYTV2LU… In the GMOD Summer School (August 24-29, 2012) we will cover the installation, configuration and use of a variety of GMOD tools, including Chado, GBrowse, JBrowse and Galaxy. For more information on the course, see the course web page at http://gmod.org/wiki/2012_GMOD_Summer_School The course will make heavy use of the Amazon Web Service (aka, the Cloud) via a grant from Amazon. Enrollment is limited to 24 students, and the application process is competitive: the last few years we've received over 75 applications for those 24 spots. I look forward to seeing you in North Carolina in August! -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research

1 0

request for a new genome (Aedes aegypti)
by Laurence Després 02 Jul '12

02 Jul '12

Dear Galaxy team, I would like you to add in the insect genomes available in Galaxy the genome of Aedes aegypti which is available in Vectorbase (http://aaegypti.vectorbase.org/) Thank you, Laurence Després -- Laurence Després Laurence Després Equipe Bases Génétiques de l'adaptation (GBA) Laboratoire d'Ecologie Alpine (LECA) UMR CNRS 5553 Université J. Fourier Domaine universitaire de Saint Martin d'Hères 2233, rue de la piscine Bât. D BiologieBP 53, 38041 Grenoble Cedex 9, France laurence.despres(a)ujf-grenoble.fr <mailto:laurence.despres@ujf-grenoble.fr> Tel: 33 (0)4 76 63 56 99 Fax: 33 (0)4 76 51 42 79 http://www2.ujf-grenoble.fr/leca/ Master2R Biodiversité-Ecologie-Environnement (BEE) http://www-biologie.ujf-grenoble.fr/SiteBio/ <http://www-biologie.ujf-grenoble.fr/SiteBio/articles.php?lng=fr&pg=280>

2 1

sanitizer for carriage return
by Katrien Bernaerts 01 Jul '12

01 Jul '12

Dear, I am making a Galaxy appliciation with a text area. In the text area, the user can copy/paste sequences. However, all carriage returns (e.g. after the comment line) are converted to XX by Galaxy. I found that a sanitizer can be used for specal characters, but I could not figure out how to configure the sanitizer for a carriage return. Does anyone have an idea how to handle carriage returns in the user input? Thanks in advance, -- Katrien Bernaerts

1 0