September 2012 - galaxy-user - lists.galaxyproject.org

Re: [galaxy-user] Galaxy error: can't fine fasta file
by Jennifer Jackson 18 Sep '12

18 Sep '12

Hi Kenneth, It is likely that the path is wrong in the .loc file. It has to point to the actual files, not just the directory. Inside here /9720/genome_references/ncbi/nr-protein-db is where all the nr.* files are? In that case, the path should be /9720/genome_references/ncbi/nr-protein-db/nr You will want to open the .loc file in a text editor that allows you to view the whitespace, too, to double check that the columns are single tab separated and that there are no extra spaces before or after the data in any individual column. The middle column can contain internal spaces, but those should be the only spaces in any row. You also do not want any trailing blank/empty lines in the file. And yes, restart the server. A checklist is at the top of the wiki to help: http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup Best, Jen Galaxy team On 9/17/12 12:46 PM, Kenneth R. Auerbach wrote: > Hi Jennifer, > > I changed the path of the nr files in the .loc file to the following: > > nr NCBI NR (non -redundant) /9720/genome_references/ncbi/nr-protein-db > > This is the directory that has all those many nr files (index files, > etc....) > > But I still get the same error where the old path is still referenced. > Does the galaxy server need to be restarted for the new .loc file to go > into effect? > > Thank you. > Ken. > > > > On Mon, 2012-09-17 at 10:42 -0700, Jennifer Jackson wrote: >> Hi Kenneth, >> >> Yes, target databases require indexes and *.loc file set-up. Please see >> this wiki for details. For Genbank data such as NR, FTP the pre-built >> indexes and use those (generating them with formatdb is not necessary). >> >> http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup >> >> See the second section, 'Tips for Installing Tools -> 'Megablast >> installation', then down in the wiki again under 'Megablast' for more >> detail. The same indexes can be used for BLAST+ (both now are based on >> BLAST+). The location of the data can be where you have it - it seems >> like Galaxy is looking in the right place for it (it does not go under a >> genome like the other indexes on the wiki). >> >> Also, make sure the data is uncompressed before you use it. And be sure >> to point to the data into the blastdb.loc file (this appears to be done >> already based on your error message, but double check). >> >> Hopefully this helps, >> >> Jen >> Galaxy team >> >> On 9/17/12 10:11 AM, Kenneth R. Auerbach wrote: >>> Hi Jennifer, >>> >>> Thank you for that info. I have another question, when I submit my job I >>> get this error: >>> >>> ----- >>> An error occurred running this job: BLAST Database error: No alias or >>> index file found for protein database >>> [/9720/genome_references/ncbi/nr-protein-db/nr-newstyle/nr] in search >>> path >>> [/9720/galaxyprod/galaxy-dist/database/job_working_directory/1560::] >>> Return error code 2 from command: >>> blast >>> ------ >>> I checked and the 'nr' database file is there in that path and it has >>> read permissions for everyone. It's in a directory called 'nr-newstyle' >>> with only its archive file (.gz). There are no other files. Should there >>> also be 'alias' or 'index' files as well? Are other files needed besides >>> 'nr'? >>> >>> Thank you. >>> >>> On Mon, 2012-09-17 at 09:29 -0700, Jennifer Jackson wrote: >>>> Hello Kenneth, >>>> >>>> Are you using BLAST+ in a local install or cloud instance? The problem >>>> may be that the query dataset needs to have the datatype assigned as >>>> "fasta". To do this, click on the pencil icon for the dataset to reach >>>> the Edit Attributes form. Then either scroll down to (or click on the >>>> tab for) the attribute "Datatype" and change to "fasta" and save. [The >>>> UI is undergoing some changes, so you may or may not have the new tabs >>>> style form in your instance yet). >>>> >>>> The best mailing list going forward for local/cloud support is >>>> galaxy-dev(a)bx.psu.edu. >>>> http://wiki.g2.bx.psu.edu/Mailing%20Lists >>>> >>>> Take care and please let us know if your question has been misunderstood, >>>> >>>> Jen >>>> Galaxy team >>>> >>>> On 9/17/12 8:52 AM, Kenneth R. Auerbach wrote: >>>>> Hello, >>>>> >>>>> I'm new to Galaxy. >>>>> When I read in a fasta file to Galaxy and then try to use it (in a blast >>>>> search) as the query sequence, I get the error message below, although >>>>> the uploaded fasta file is present in the history. Can anyone tell me >>>>> what the problem could be? Is there some other step I need to do? >>>>> >>>>> Thank you. >>>>> >>>>> Error that appears under "nucleotide query sequence": >>>>> ----- >>>>> "History does not include a dataset of the required format / build" >>>>> ----- >>>>> >>>>> ___________________________________________________________ >>>>> The Galaxy User list should be used for the discussion of >>>>> Galaxy analysis and other features on the public server >>>>> at usegalaxy.org. Please keep all replies on the list by >>>>> using "reply all" in your mail client. For discussion of >>>>> local Galaxy instances and the Galaxy source code, please >>>>> use the Galaxy Development list: >>>>> >>>>> http://lists.bx.psu.edu/listinfo/galaxy-dev >>>>> >>>>> To manage your subscriptions to this and other Galaxy lists, >>>>> please use the interface at: >>>>> >>>>> http://lists.bx.psu.edu/ >>>>> >>>> >>> >>> >> > > -- Jennifer Jackson http://galaxyproject.org

1 0

Re: [galaxy-user] Galaxy error: can't fine fasta file
by Jennifer Jackson 17 Sep '12

17 Sep '12

Hi Kenneth, Yes, target databases require indexes and *.loc file set-up. Please see this wiki for details. For Genbank data such as NR, FTP the pre-built indexes and use those (generating them with formatdb is not necessary). http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup See the second section, 'Tips for Installing Tools -> 'Megablast installation', then down in the wiki again under 'Megablast' for more detail. The same indexes can be used for BLAST+ (both now are based on BLAST+). The location of the data can be where you have it - it seems like Galaxy is looking in the right place for it (it does not go under a genome like the other indexes on the wiki). Also, make sure the data is uncompressed before you use it. And be sure to point to the data into the blastdb.loc file (this appears to be done already based on your error message, but double check). Hopefully this helps, Jen Galaxy team On 9/17/12 10:11 AM, Kenneth R. Auerbach wrote: > Hi Jennifer, > > Thank you for that info. I have another question, when I submit my job I > get this error: > > ----- > An error occurred running this job: BLAST Database error: No alias or > index file found for protein database > [/9720/genome_references/ncbi/nr-protein-db/nr-newstyle/nr] in search > path > [/9720/galaxyprod/galaxy-dist/database/job_working_directory/1560::] > Return error code 2 from command: > blast > ------ > I checked and the 'nr' database file is there in that path and it has > read permissions for everyone. It's in a directory called 'nr-newstyle' > with only its archive file (.gz). There are no other files. Should there > also be 'alias' or 'index' files as well? Are other files needed besides > 'nr'? > > Thank you. > > On Mon, 2012-09-17 at 09:29 -0700, Jennifer Jackson wrote: >> Hello Kenneth, >> >> Are you using BLAST+ in a local install or cloud instance? The problem >> may be that the query dataset needs to have the datatype assigned as >> "fasta". To do this, click on the pencil icon for the dataset to reach >> the Edit Attributes form. Then either scroll down to (or click on the >> tab for) the attribute "Datatype" and change to "fasta" and save. [The >> UI is undergoing some changes, so you may or may not have the new tabs >> style form in your instance yet). >> >> The best mailing list going forward for local/cloud support is >> galaxy-dev(a)bx.psu.edu. >> http://wiki.g2.bx.psu.edu/Mailing%20Lists >> >> Take care and please let us know if your question has been misunderstood, >> >> Jen >> Galaxy team >> >> On 9/17/12 8:52 AM, Kenneth R. Auerbach wrote: >>> Hello, >>> >>> I'm new to Galaxy. >>> When I read in a fasta file to Galaxy and then try to use it (in a blast >>> search) as the query sequence, I get the error message below, although >>> the uploaded fasta file is present in the history. Can anyone tell me >>> what the problem could be? Is there some other step I need to do? >>> >>> Thank you. >>> >>> Error that appears under "nucleotide query sequence": >>> ----- >>> "History does not include a dataset of the required format / build" >>> ----- >>> >>> ___________________________________________________________ >>> The Galaxy User list should be used for the discussion of >>> Galaxy analysis and other features on the public server >>> at usegalaxy.org. Please keep all replies on the list by >>> using "reply all" in your mail client. For discussion of >>> local Galaxy instances and the Galaxy source code, please >>> use the Galaxy Development list: >>> >>> http://lists.bx.psu.edu/listinfo/galaxy-dev >>> >>> To manage your subscriptions to this and other Galaxy lists, >>> please use the interface at: >>> >>> http://lists.bx.psu.edu/ >>> >> > > -- Jennifer Jackson http://galaxyproject.org

1 0

FTP upload problem
by Yan He 17 Sep '12

17 Sep '12

Hi everyone, When I tried to upload my files using Filezilla, I got the error message "530 Sorry, the maximum number of clients (3) for this user are already connected." Can anyone give me some suggestions how to solve this problem? I have been stuck here for a whole day. Thanks very much! Yan

2 3

Galaxy error: can't fine fasta file
by Kenneth R. Auerbach 17 Sep '12

17 Sep '12

Hello, I'm new to Galaxy. When I read in a fasta file to Galaxy and then try to use it (in a blast search) as the query sequence, I get the error message below, although the uploaded fasta file is present in the history. Can anyone tell me what the problem could be? Is there some other step I need to do? Thank you. Error that appears under "nucleotide query sequence": ----- "History does not include a dataset of the required format / build" -----

2 1

Galaxy CloudMan - Nodes can't make their own qsub calls?
by greg 17 Sep '12

17 Sep '12

Hi guys, I created a new Galaxy instance web launcher (https://biocloudcentral.herokuapp.com/launch) and then I ssh'd into the master node. I'm trying to run a Perl script that makes several qsub calls to other perl scripts. Now the catch is that one of those perl scripts makes its own qsub calls. And I'm getting this error when it tries to do that: Unable to run job: denied: host "ip-10-29-176-111.ec2.internal" is no submit host. Somehow this works fine on other clusters I've run this code on. Any idea what could be going on? Do I need to make all of the nodes "submit hosts"? Thanks a bunch! -Greg

3 8

Does Tophat output *.accepted hits file contain headers?
by Du, Jianguang 14 Sep '12

14 Sep '12

Dear All, I want to use the Tophat output files with ".accepted hits" to do analysis outside Galaxy. However, the program I am using requires the Tophat output to be indexed, sorted BAM files that contain headers. Do the Tophat ouputs with ".accepted hits" produced at Galaxy contain headers? Will the headers of BAM files generated by Tophat universally the same? Thanks, Jianguang

2 1

problem uploading via FTP server
by L.M. Slot 14 Sep '12

14 Sep '12

Dear Galaxy, I am using Galaxy on my work for analysing a whole genome sequencing project. On the server I wanted to upload two BAM files (ALL and FL). With FileZilla I uploaded them on the FTP server because the files are around 20 GB. When I tried to upload them in Galaxy I can see them in the list from the FTP server and when I try to upload them they appear in the column on the right, but they stay gray and the comment 'job is waiting to run' stays present instead of 'job running' or 'completed'. Hopefully you can help me with this problem. With kind regards, Linda Slot Linda Slot, MSc AMC Amsterdam Department of Pathology, L2-115 Meibergdreef 9 1105 AZ Amsterdam The Netherlands tel.nr. 020-5665638 What to include in a question 1. Where you are using Galaxy: Main<http://wiki.g2.bx.psu.edu/Main>, other public, local, or cloud instance 2. End-user questions from Test<http://wiki.g2.bx.psu.edu/Test> are generally not sent/supported - Test is for breaking 3. If a local or cloud instance, the distribution or galaxy-central hg pull # 4. If on Main<http://wiki.g2.bx.psu.edu/Main>, date/time the initial and ru-run jobs were executed 5. If there is an example/issue, exact steps to reproduce 6. What troubleshooting steps (if a problem is being reported) you have tested out 7. If on Main<http://wiki.g2.bx.psu.edu/Main>, you may be asked for a shared history link. Use Options → Share or Publish, generate the link, and email it directly back off-list. Note the dataset #'s you have questions about. 8. IMPORTANT: Get the quickest answer for data questions by leaving all of the input and output datasets in the analysis thread in your shared history undeleted until we have written you back. Use Options → Show Deleted Datasets and click dataset links to undelete to recover datasets if necessary 9. Always reply-all unless sharing a private link ________________________________ AMC Disclaimer : http://www.amc.nl/disclaimer ________________________________

2 1

none availability genome.fa files through mapping modules on local server
by Sandrine Imbeaud 14 Sep '12

14 Sep '12

Hello, I apologize for this probably very simple application. We have installed our own Galaxy server and started using in-house the NGS modules. However, during the mapping procedure using either BWA for illumina or BFAST tools, no reference genome index is available. To solve the problem, we have followed the tutorials and have uploaded the hg19.fa file and put it locally in the Galaxy-dist/database folder. We also have modified the *_index.loc files indicating the path to the file. We restarted the Galaxy server. However, still no reference is available through the NGS mapping modules. Is there anyone that may help use solving this probably simple problem? Kind regards / Sandrine

2 1

No output produced.....
by Neil.Burdett＠csiro.au 14 Sep '12

14 Sep '12

Hi, I have my own image registration tool that I've created on my own local instance of galaxy. The method takes in two images (*.nii.gz) formats and registers them together, and produces one registered *.nii.gz file and a *.trsf matrix file. The first issue encountered was the method was expecting *.nii.gz files as inputs but was receiving *.dat files. I navigated around this problem as shown by the files below: - <<tool id="RegisterAliBabaAffine" name="RegisterAffine"> < <description>two images</description> < <command interpreter="bash">$__root_dir__/tools/registration/reg-wrapper.sh $moving $fixed $outputTRSF $outputImage</command> - < <inputs> < <param format="binary" name="moving" type="data" label="Moving Image" /> < <param format="binary" name="fixed" type="data" label="Fixed Image" /> < <param type="hidden" name="outputTRSF" value="output.trsf" label="trsf file" help="Output File must have .trsf extension" /> < <param type="hidden" name="outputImage" value="output.nii.gz" label="Image output file" help="Output Image File must have .nii.gz extension" /> </inputs> - < <outputs> < <data format="input" name="output_TRSF" from_work_dir="output.trsf" /> < <data format="input" name="output_Image" from_work_dir="output.nii.gz" /> </outputs> < <help>This tool uses Affine Registration to register two images.</help> </tool> #!/bin/bash MOVING=`mktemp --suffix .nii.gz` FIXED=`mktemp --suffix .nii.gz` cat $1 > $MOVING cat $2 > $FIXED /usr/local/MILXView.12.08.1/BashScripts/RegisterAliBabaAffine -m $MOVING -f $FIXED -t $3 -o $4 RC=$? if [[ $RC == 0 ]]; then OUTPUTTRSF=`mktemp --suffix .trsf` OUTPUTIMG=`mktemp --suffix .nii.gz` cat $OUTPUTTRSF > $3 cat $OUTPUTIMG > $4 rm $OUTPUTTRSF rm $OUTPUTIMG fi rm $MOVING rm $FIXED exit $RC This allows them to pass the *.nii.gz files that the registration method is expecting. Everything works fine and I can see output generated in the job_working_dir and the history turns green... galaxy@bmladmin-OptiPlex-745:~$ ls -lrt ~/galaxy-dist/database/job_working_directory/000/27/ total 2940 -rw------- 1 galaxy nogroup 0 Sep 13 10:15 tmpRfHsOP_stderr -rw-r--r-- 1 galaxy nogroup 241 Sep 13 10:35 output.trsf -rw------- 1 galaxy nogroup 80 Sep 13 10:35 tmplmK0V2_stdout -rw-r--r-- 1 galaxy nogroup 2998272 Sep 13 10:38 output.nii.gz However, the problem occurs when the files are copied from ~/galaxy-dist/database/job_working_directory/000/27/ to ~/galaxy-dist/database/files/000/. When this happens the files become size = 0. Any ideas? -rw-r--r-- 1 galaxy nogroup 0 Sep 13 09:36 /home/galaxy/galaxy-dist/database/files/000/dataset_40.dat -rw-r--r-- 1 galaxy nogroup 0 Sep 13 09:36 /home/galaxy/galaxy-dist/database/files/000/dataset_41.dat -rw-r--r-- 1 galaxy nogroup 0 Sep 13 10:38 /home/galaxy/galaxy-dist/database/files/000/dataset_43.dat -rw-r--r-- 1 galaxy nogroup 0 Sep 13 10:38 /home/galaxy/galaxy-dist/database/files/000/dataset_42.dat The output in galaxy.log indicates it is successful: /home/galaxy/galaxy-dist/tools/registration/reg-wrapper.sh /home/galaxy/galaxy-dist/database/files/000/dataset_23.dat /home/galaxy/galaxy-dist/database/files/000/dataset_20.dat output.trsf output.nii.gz galaxy.jobs DEBUG 2012-09-13 10:38:10,334 The tool did not define exit code or stdio handling; checking stderr for success galaxy.jobs DEBUG 2012-09-13 10:38:10,361 finish(): Moved /home/galaxy/galaxy-dist/database/job_working_directory/000/27/output.trsf to /home/galaxy/galaxy-dist/database/files/000/dataset_42.dat as directed by from_work_dir galaxy.jobs DEBUG 2012-09-13 10:38:10,380 finish(): Moved /home/galaxy/galaxy-dist/database/job_working_directory/000/27/output.nii.gz to /home/galaxy/galaxy-dist/database/files/000/dataset_43.dat as directed by from_work_dir galaxy.jobs DEBUG 2012-09-13 10:38:10,609 job 27 ended Is the issue copying *.nii.gz files and *.trsf file into *.dat files? Anyway around this? I've also modified ~/galaxy-dist/lib/galaxy/jobs/__init__.py (line 363) to change shutil.move To shutil.copy2 (same results) Also put in a different output path to copy to. But essentially we have files with size in ~/galaxy-dist/database/job_working_directory/000/id/, but they files are size 0 after the move into ~/galaxy-dist/database/files/000 Thanks Neil

2 1

Counting RNA-seq reads per class.
by Mohammad Heydarian 14 Sep '12

14 Sep '12

Hi All, I have been trying to count the number of RNA-seq reads that fall into the various Cufflinks class codes ('=', 'j', 'u', 'x', etc...) and I am curious how others are determining how to count reads per class.. I tried first using the BedTools tool where you "count" the number of reads overlapping another set of intervals and later realized that each interval is extended1 kb up and downstream prior to the analysis (by default and not adjustable on Galaxy), so the number of reads that were "counted" for all of the classes was always much more than the amount of reads that I had for my Bam file. I then tried to isolate reads from each class into separate BAM files, using the BedTools "intersect" tool and there I consistently end up with significantly less reads than I have in my sample. I am very curious to find out how others are tackling this problem on Galaxy. Thanks for any input! Cheers, Mo Heydarian

2 2