Re: [galaxy-user] Galaxy error: can't fine fasta file
by Jennifer Jackson
Hi Kenneth,
It is likely that the path is wrong in the .loc file. It has to point to
the actual files, not just the directory. Inside here
/9720/genome_references/ncbi/nr-protein-db
is where all the nr.* files are? In that case, the path should be
/9720/genome_references/ncbi/nr-protein-db/nr
You will want to open the .loc file in a text editor that allows you to
view the whitespace, too, to double check that the columns are single
tab separated and that there are no extra spaces before or after the
data in any individual column. The middle column can contain internal
spaces, but those should be the only spaces in any row. You also do not
want any trailing blank/empty lines in the file.
And yes, restart the server. A checklist is at the top of the wiki to help:
http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup
Best,
Jen
Galaxy team
On 9/17/12 12:46 PM, Kenneth R. Auerbach wrote:
> Hi Jennifer,
>
> I changed the path of the nr files in the .loc file to the following:
>
> nr NCBI NR (non -redundant) /9720/genome_references/ncbi/nr-protein-db
>
> This is the directory that has all those many nr files (index files,
> etc....)
>
> But I still get the same error where the old path is still referenced.
> Does the galaxy server need to be restarted for the new .loc file to go
> into effect?
>
> Thank you.
> Ken.
>
>
>
> On Mon, 2012-09-17 at 10:42 -0700, Jennifer Jackson wrote:
>> Hi Kenneth,
>>
>> Yes, target databases require indexes and *.loc file set-up. Please see
>> this wiki for details. For Genbank data such as NR, FTP the pre-built
>> indexes and use those (generating them with formatdb is not necessary).
>>
>> http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup
>>
>> See the second section, 'Tips for Installing Tools -> 'Megablast
>> installation', then down in the wiki again under 'Megablast' for more
>> detail. The same indexes can be used for BLAST+ (both now are based on
>> BLAST+). The location of the data can be where you have it - it seems
>> like Galaxy is looking in the right place for it (it does not go under a
>> genome like the other indexes on the wiki).
>>
>> Also, make sure the data is uncompressed before you use it. And be sure
>> to point to the data into the blastdb.loc file (this appears to be done
>> already based on your error message, but double check).
>>
>> Hopefully this helps,
>>
>> Jen
>> Galaxy team
>>
>> On 9/17/12 10:11 AM, Kenneth R. Auerbach wrote:
>>> Hi Jennifer,
>>>
>>> Thank you for that info. I have another question, when I submit my job I
>>> get this error:
>>>
>>> -----
>>> An error occurred running this job: BLAST Database error: No alias or
>>> index file found for protein database
>>> [/9720/genome_references/ncbi/nr-protein-db/nr-newstyle/nr] in search
>>> path
>>> [/9720/galaxyprod/galaxy-dist/database/job_working_directory/1560::]
>>> Return error code 2 from command:
>>> blast
>>> ------
>>> I checked and the 'nr' database file is there in that path and it has
>>> read permissions for everyone. It's in a directory called 'nr-newstyle'
>>> with only its archive file (.gz). There are no other files. Should there
>>> also be 'alias' or 'index' files as well? Are other files needed besides
>>> 'nr'?
>>>
>>> Thank you.
>>>
>>> On Mon, 2012-09-17 at 09:29 -0700, Jennifer Jackson wrote:
>>>> Hello Kenneth,
>>>>
>>>> Are you using BLAST+ in a local install or cloud instance? The problem
>>>> may be that the query dataset needs to have the datatype assigned as
>>>> "fasta". To do this, click on the pencil icon for the dataset to reach
>>>> the Edit Attributes form. Then either scroll down to (or click on the
>>>> tab for) the attribute "Datatype" and change to "fasta" and save. [The
>>>> UI is undergoing some changes, so you may or may not have the new tabs
>>>> style form in your instance yet).
>>>>
>>>> The best mailing list going forward for local/cloud support is
>>>> galaxy-dev(a)bx.psu.edu.
>>>> http://wiki.g2.bx.psu.edu/Mailing%20Lists
>>>>
>>>> Take care and please let us know if your question has been misunderstood,
>>>>
>>>> Jen
>>>> Galaxy team
>>>>
>>>> On 9/17/12 8:52 AM, Kenneth R. Auerbach wrote:
>>>>> Hello,
>>>>>
>>>>> I'm new to Galaxy.
>>>>> When I read in a fasta file to Galaxy and then try to use it (in a blast
>>>>> search) as the query sequence, I get the error message below, although
>>>>> the uploaded fasta file is present in the history. Can anyone tell me
>>>>> what the problem could be? Is there some other step I need to do?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Error that appears under "nucleotide query sequence":
>>>>> -----
>>>>> "History does not include a dataset of the required format / build"
>>>>> -----
>>>>>
>>>>> ___________________________________________________________
>>>>> The Galaxy User list should be used for the discussion of
>>>>> Galaxy analysis and other features on the public server
>>>>> at usegalaxy.org. Please keep all replies on the list by
>>>>> using "reply all" in your mail client. For discussion of
>>>>> local Galaxy instances and the Galaxy source code, please
>>>>> use the Galaxy Development list:
>>>>>
>>>>> http://lists.bx.psu.edu/listinfo/galaxy-dev
>>>>>
>>>>> To manage your subscriptions to this and other Galaxy lists,
>>>>> please use the interface at:
>>>>>
>>>>> http://lists.bx.psu.edu/
>>>>>
>>>>
>>>
>>>
>>
>
>
--
Jennifer Jackson
http://galaxyproject.org
9 years, 9 months
Re: [galaxy-user] Galaxy error: can't fine fasta file
by Jennifer Jackson
Hi Kenneth,
Yes, target databases require indexes and *.loc file set-up. Please see
this wiki for details. For Genbank data such as NR, FTP the pre-built
indexes and use those (generating them with formatdb is not necessary).
http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup
See the second section, 'Tips for Installing Tools -> 'Megablast
installation', then down in the wiki again under 'Megablast' for more
detail. The same indexes can be used for BLAST+ (both now are based on
BLAST+). The location of the data can be where you have it - it seems
like Galaxy is looking in the right place for it (it does not go under a
genome like the other indexes on the wiki).
Also, make sure the data is uncompressed before you use it. And be sure
to point to the data into the blastdb.loc file (this appears to be done
already based on your error message, but double check).
Hopefully this helps,
Jen
Galaxy team
On 9/17/12 10:11 AM, Kenneth R. Auerbach wrote:
> Hi Jennifer,
>
> Thank you for that info. I have another question, when I submit my job I
> get this error:
>
> -----
> An error occurred running this job: BLAST Database error: No alias or
> index file found for protein database
> [/9720/genome_references/ncbi/nr-protein-db/nr-newstyle/nr] in search
> path
> [/9720/galaxyprod/galaxy-dist/database/job_working_directory/1560::]
> Return error code 2 from command:
> blast
> ------
> I checked and the 'nr' database file is there in that path and it has
> read permissions for everyone. It's in a directory called 'nr-newstyle'
> with only its archive file (.gz). There are no other files. Should there
> also be 'alias' or 'index' files as well? Are other files needed besides
> 'nr'?
>
> Thank you.
>
> On Mon, 2012-09-17 at 09:29 -0700, Jennifer Jackson wrote:
>> Hello Kenneth,
>>
>> Are you using BLAST+ in a local install or cloud instance? The problem
>> may be that the query dataset needs to have the datatype assigned as
>> "fasta". To do this, click on the pencil icon for the dataset to reach
>> the Edit Attributes form. Then either scroll down to (or click on the
>> tab for) the attribute "Datatype" and change to "fasta" and save. [The
>> UI is undergoing some changes, so you may or may not have the new tabs
>> style form in your instance yet).
>>
>> The best mailing list going forward for local/cloud support is
>> galaxy-dev(a)bx.psu.edu.
>> http://wiki.g2.bx.psu.edu/Mailing%20Lists
>>
>> Take care and please let us know if your question has been misunderstood,
>>
>> Jen
>> Galaxy team
>>
>> On 9/17/12 8:52 AM, Kenneth R. Auerbach wrote:
>>> Hello,
>>>
>>> I'm new to Galaxy.
>>> When I read in a fasta file to Galaxy and then try to use it (in a blast
>>> search) as the query sequence, I get the error message below, although
>>> the uploaded fasta file is present in the history. Can anyone tell me
>>> what the problem could be? Is there some other step I need to do?
>>>
>>> Thank you.
>>>
>>> Error that appears under "nucleotide query sequence":
>>> -----
>>> "History does not include a dataset of the required format / build"
>>> -----
>>>
>>> ___________________________________________________________
>>> The Galaxy User list should be used for the discussion of
>>> Galaxy analysis and other features on the public server
>>> at usegalaxy.org. Please keep all replies on the list by
>>> using "reply all" in your mail client. For discussion of
>>> local Galaxy instances and the Galaxy source code, please
>>> use the Galaxy Development list:
>>>
>>> http://lists.bx.psu.edu/listinfo/galaxy-dev
>>>
>>> To manage your subscriptions to this and other Galaxy lists,
>>> please use the interface at:
>>>
>>> http://lists.bx.psu.edu/
>>>
>>
>
>
--
Jennifer Jackson
http://galaxyproject.org
9 years, 9 months
FTP upload problem
by Yan He
Hi everyone,
When I tried to upload my files using Filezilla, I got the error message
"530 Sorry, the maximum number of clients (3) for this user are already
connected." Can anyone give me some suggestions how to solve this problem?
I have been stuck here for a whole day. Thanks very much!
Yan
9 years, 9 months
Galaxy error: can't fine fasta file
by Kenneth R. Auerbach
Hello,
I'm new to Galaxy.
When I read in a fasta file to Galaxy and then try to use it (in a blast
search) as the query sequence, I get the error message below, although
the uploaded fasta file is present in the history. Can anyone tell me
what the problem could be? Is there some other step I need to do?
Thank you.
Error that appears under "nucleotide query sequence":
-----
"History does not include a dataset of the required format / build"
-----
9 years, 9 months
Galaxy CloudMan - Nodes can't make their own qsub calls?
by greg
Hi guys,
I created a new Galaxy instance web launcher
(https://biocloudcentral.herokuapp.com/launch) and then I ssh'd into
the master node.
I'm trying to run a Perl script that makes several qsub calls to other
perl scripts. Now the catch is that one of those perl scripts makes
its own qsub calls.
And I'm getting this error when it tries to do that:
Unable to run job: denied: host "ip-10-29-176-111.ec2.internal" is no
submit host.
Somehow this works fine on other clusters I've run this code on. Any
idea what could be going on? Do I need to make all of the nodes
"submit hosts"?
Thanks a bunch!
-Greg
9 years, 9 months
Does Tophat output *.accepted hits file contain headers?
by Du, Jianguang
Dear All,
I want to use the Tophat output files with ".accepted hits" to do analysis outside Galaxy. However, the program I am using requires the Tophat output to be indexed, sorted BAM files that contain headers. Do the Tophat ouputs with ".accepted hits" produced at Galaxy contain headers? Will the headers of BAM files generated by Tophat universally the same?
Thanks,
Jianguang
9 years, 9 months
problem uploading via FTP server
by L.M. Slot
Dear Galaxy,
I am using Galaxy on my work for analysing a whole genome sequencing project.
On the server I wanted to upload two BAM files (ALL and FL). With FileZilla I uploaded them on the FTP server because the files are around 20 GB.
When I tried to upload them in Galaxy I can see them in the list from the FTP server and when I try to upload them they appear in the column on the right, but they stay gray and the comment 'job is waiting to run' stays present instead of 'job running' or 'completed'.
Hopefully you can help me with this problem.
With kind regards,
Linda Slot
Linda Slot, MSc
AMC Amsterdam
Department of Pathology, L2-115
Meibergdreef 9
1105 AZ Amsterdam
The Netherlands
tel.nr. 020-5665638
What to include in a question
1. Where you are using Galaxy: Main<http://wiki.g2.bx.psu.edu/Main>, other public, local, or cloud instance
2. End-user questions from Test<http://wiki.g2.bx.psu.edu/Test> are generally not sent/supported - Test is for breaking
3. If a local or cloud instance, the distribution or galaxy-central hg pull #
4. If on Main<http://wiki.g2.bx.psu.edu/Main>, date/time the initial and ru-run jobs were executed
5. If there is an example/issue, exact steps to reproduce
6. What troubleshooting steps (if a problem is being reported) you have tested out
7. If on Main<http://wiki.g2.bx.psu.edu/Main>, you may be asked for a shared history link. Use Options → Share or Publish, generate the link, and email it directly back off-list. Note the dataset #'s you have questions about.
8. IMPORTANT: Get the quickest answer for data questions by leaving all of the input and output datasets in the analysis thread in your shared history undeleted until we have written you back. Use Options → Show Deleted Datasets and click dataset links to undelete to recover datasets if necessary
9. Always reply-all unless sharing a private link
________________________________
AMC Disclaimer : http://www.amc.nl/disclaimer
________________________________
9 years, 9 months
none availability genome.fa files through mapping modules on local server
by Sandrine Imbeaud
Hello,
I apologize for this probably very simple application.
We have installed our own Galaxy server and started using in-house the
NGS modules. However, during the mapping procedure using either BWA for
illumina or BFAST tools, no reference genome index is available.
To solve the problem, we have followed the tutorials and have uploaded
the hg19.fa file and put it locally in the Galaxy-dist/database folder.
We also have modified the *_index.loc files indicating the path to the
file. We restarted the Galaxy server. However, still no reference is
available through the NGS mapping modules.
Is there anyone that may help use solving this probably simple problem?
Kind regards
/ Sandrine
9 years, 9 months
No output produced.....
by Neil.Burdett@csiro.au
Hi,
I have my own image registration tool that I've created on my own local instance of galaxy.
The method takes in two images (*.nii.gz) formats and registers them together, and produces one registered *.nii.gz file and a *.trsf matrix file.
The first issue encountered was the method was expecting *.nii.gz files as inputs but was receiving *.dat files. I navigated around this problem as shown by the files below:
- <<tool id="RegisterAliBabaAffine" name="RegisterAffine">
< <description>two images</description>
< <command interpreter="bash">$__root_dir__/tools/registration/reg-wrapper.sh $moving $fixed $outputTRSF $outputImage</command>
- < <inputs>
< <param format="binary" name="moving" type="data" label="Moving Image" />
< <param format="binary" name="fixed" type="data" label="Fixed Image" />
< <param type="hidden" name="outputTRSF" value="output.trsf" label="trsf file" help="Output File must have .trsf extension" />
< <param type="hidden" name="outputImage" value="output.nii.gz" label="Image output file" help="Output Image File must have .nii.gz extension" />
</inputs>
- < <outputs>
< <data format="input" name="output_TRSF" from_work_dir="output.trsf" />
< <data format="input" name="output_Image" from_work_dir="output.nii.gz" />
</outputs>
< <help>This tool uses Affine Registration to register two images.</help>
</tool>
#!/bin/bash
MOVING=`mktemp --suffix .nii.gz`
FIXED=`mktemp --suffix .nii.gz`
cat $1 > $MOVING
cat $2 > $FIXED
/usr/local/MILXView.12.08.1/BashScripts/RegisterAliBabaAffine -m $MOVING -f $FIXED -t $3 -o $4
RC=$?
if [[ $RC == 0 ]]; then
OUTPUTTRSF=`mktemp --suffix .trsf`
OUTPUTIMG=`mktemp --suffix .nii.gz`
cat $OUTPUTTRSF > $3
cat $OUTPUTIMG > $4
rm $OUTPUTTRSF
rm $OUTPUTIMG
fi
rm $MOVING
rm $FIXED
exit $RC
This allows them to pass the *.nii.gz files that the registration method is expecting.
Everything works fine and I can see output generated in the job_working_dir and the history turns green...
galaxy@bmladmin-OptiPlex-745:~$ ls -lrt ~/galaxy-dist/database/job_working_directory/000/27/
total 2940
-rw------- 1 galaxy nogroup 0 Sep 13 10:15 tmpRfHsOP_stderr
-rw-r--r-- 1 galaxy nogroup 241 Sep 13 10:35 output.trsf
-rw------- 1 galaxy nogroup 80 Sep 13 10:35 tmplmK0V2_stdout
-rw-r--r-- 1 galaxy nogroup 2998272 Sep 13 10:38 output.nii.gz
However, the problem occurs when the files are copied from ~/galaxy-dist/database/job_working_directory/000/27/ to ~/galaxy-dist/database/files/000/. When this happens the files become size = 0.
Any ideas?
-rw-r--r-- 1 galaxy nogroup 0 Sep 13 09:36 /home/galaxy/galaxy-dist/database/files/000/dataset_40.dat
-rw-r--r-- 1 galaxy nogroup 0 Sep 13 09:36 /home/galaxy/galaxy-dist/database/files/000/dataset_41.dat
-rw-r--r-- 1 galaxy nogroup 0 Sep 13 10:38 /home/galaxy/galaxy-dist/database/files/000/dataset_43.dat
-rw-r--r-- 1 galaxy nogroup 0 Sep 13 10:38 /home/galaxy/galaxy-dist/database/files/000/dataset_42.dat
The output in galaxy.log indicates it is successful:
/home/galaxy/galaxy-dist/tools/registration/reg-wrapper.sh /home/galaxy/galaxy-dist/database/files/000/dataset_23.dat /home/galaxy/galaxy-dist/database/files/000/dataset_20.dat output.trsf output.nii.gz galaxy.jobs DEBUG 2012-09-13 10:38:10,334 The tool did not define exit code or stdio handling; checking stderr for success galaxy.jobs DEBUG 2012-09-13 10:38:10,361 finish(): Moved /home/galaxy/galaxy-dist/database/job_working_directory/000/27/output.trsf to /home/galaxy/galaxy-dist/database/files/000/dataset_42.dat as directed by from_work_dir galaxy.jobs DEBUG 2012-09-13 10:38:10,380 finish(): Moved /home/galaxy/galaxy-dist/database/job_working_directory/000/27/output.nii.gz to /home/galaxy/galaxy-dist/database/files/000/dataset_43.dat as directed by from_work_dir galaxy.jobs DEBUG 2012-09-13 10:38:10,609 job 27 ended
Is the issue copying *.nii.gz files and *.trsf file into *.dat files? Anyway around this?
I've also modified ~/galaxy-dist/lib/galaxy/jobs/__init__.py (line 363) to change shutil.move
To shutil.copy2 (same results)
Also put in a different output path to copy to. But essentially we have files with size in ~/galaxy-dist/database/job_working_directory/000/id/, but they files are size 0 after the move into ~/galaxy-dist/database/files/000
Thanks
Neil
9 years, 9 months
Counting RNA-seq reads per class.
by Mohammad Heydarian
Hi All,
I have been trying to count the number of RNA-seq reads that fall into the
various Cufflinks class codes ('=', 'j', 'u', 'x', etc...) and I am curious
how others are determining how to count reads per class..
I tried first using the BedTools tool where you "count" the number of reads
overlapping another set of intervals and later realized that each interval
is extended1 kb up and downstream prior to the analysis (by default and not
adjustable on Galaxy), so the number of reads that were "counted" for all
of the classes was always much more than the amount of reads that I had for
my Bam file. I then tried to isolate reads from each class into separate
BAM files, using the BedTools "intersect" tool and there I consistently end
up with significantly less reads than I have in my sample.
I am very curious to find out how others are tackling this problem on
Galaxy.
Thanks for any input!
Cheers,
Mo Heydarian
9 years, 9 months