Re: [galaxy-user] [galaxy-bugs] Galaxy tool error report from taka.nakada@nifty.com

17 Feb 2011

      Hi, Taka,

I noticed that the full manhattan plot looks odd in the history I have
shared with you, and I think it's because the offsets for some of your
snp are wrong.

For example, the very last marker in chr1 in your data is rs11488669.
In your data, the offset is 2147483647 which is way beyond the end of
chr1 - the genome is only 3B  base pairs - so the manhatten plot looks
clumpy instead of uniform.

According to genome.ucsc.edu it is at chr1:153517269-153517769

I'm going to guess that your data (eg the map file) has at some stage
been changed using spreadsheet software such as excel which can easily
do strange things to numeric columns.

If all your processing is inside Galaxy, these kinds of errors can be
prevented. I can see you have tried unsuccessfully to upload some
plink lped files in the history you shared - here's some information
that might help you from a previous enquiry on galaxy-user a few weeks
ago:

==============================================
Hi, Sylvian,

The plink/rgenetics lped and pbed (compressed) formats are special
'composite' Galaxy datatypes because the map and pedigree/genotype
files need to be kept together correctly inside Galaxy. As a result,
the upload tool requires that the file type be specified so all of the
components can be properly uploaded and stored together.

For example, to upload pbed data from your local desktop, choose
'Upload file' from the Get Data tools.

When the upload form appears, the trick is that you *must* change the
default 'Autodetect' in the first (filetype) select box to the
specific rgenetics datatype - either 'pbed' as the format for
compressed plink data (or 'lped' for uncompressed plink genotype data)
as the very first step. Type the first few letters into the first box,
and select the right one from the list that appears.

Once this is done, you will see that the upload tool form will change
to show three separate file upload inputs - one each for the plink
xxx.bim xxx.bed and xxx.fam where xxx is the name you set when you ran
plink to create the files, or for uncompressed linkage format two
separate file upload inputs - the plink .ped and .map files.

Now you can  browse for the corresponding file for each input box from
your local machine - be careful not to mix them up as the upload tool
is unable to tell unfortunately.

At the bottom of the form, I suggest you then change the genome build
to the appropriate one (eg hg18 or hg19).

Finally, I'd recommend that you change the 'metadata value for
basename' (which will be the new dataset name) to something that will
remind you what the data are - something more meaningful than the
default 'rgenetics'.

Click 'execute' to upload the data and create the new dataset in your
history.  Compressed (pbed) format is preferred so the upload is
quicker.

Note that some tools will autoconvert between lped and pbed so there
is a delay the first time some tools are run on a new dataset. There
are built in converters (use the pencil icon) also if you need them.

I hope this helps - thanks for using Galaxy and Rgenetics - please let
us know how you go and feel free to contact me if you have other
questions.

On Fri, Feb 18, 2011 at 9:26 AM, Ross Lazarus
<ross.lazarus@channing.harvard.edu> wrote:
...
Hi, Taka.
Thanks for trying the tool. Sorry to hear you are having problems with
your data.
Unfortunately, the history associated with this error does not have
any datasets with data in the format required for the Manhatten/qq
plot tool.
The file you were attempting to use was a bed file. In fact it is not
even a valid bed format file because it has spaces instead of tabs as
delimiters. The tool is unable to parse the header row correctly so
you have the error about an index out of range on the header row.
As the tool form explains, the input required is:
"Tabular Data is a tab delimited header file with chromosome, offset
and p values to be plotted"
I tried changing the datatype from bed to tabular but discovered that
you have spaces as delimiters! So, I downloaded and repaired your
dataset by converting the delimiters to tabs interactively in python
so it is now in the required format and uploaded the first few
thousand rows to the original history and plotted it to check that the
tool works as expected. I also ran the plots for the entire million
rows and the plot is in the history.
I have shared the new history with you and attached is a low-res
version of the resulting output from the first few thousand p values.
You should be able to find the history by choosing 'options' from your
current history then 'histories shared with me'.
If you can ensure that the data are in the correct format (only use
tabs as delimiters and have a header row) then the tool should be able
to perform correctly.
Data in any other format is likely to cause the tool to crash.
Thanks for using Galaxy - I hope it is useful for your research.
In case you need to fix any other defective files, here's what I did:
rerla@rosst61:~/Downloads$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
...
...
...
f = 'nakada.tab'
bad = open(f,'r').readlines()
sbad = [x.split() for x in bad]
good = ['\t'.join(x) for x in sbad]
good[:3]
['CHR\tSNP\tBP\tA1\tF_A\tF_U\tA2\tCHISQ\tP\tOR',
'1\trs28659788\t713170\tG\t0.04094\t0.03725\tC\t0.08434\t0.7715\t1.103',
'1\trs3094315\t742429\tG\t0.1754\t0.1533\tA\t0.835\t0.3608\t1.175']
o = open(f,'w')
o.write('\n'.join(good))
o.close()
On Fri, Feb 18, 2011 at 8:32 AM,  <galaxy-bugs@bx.psu.edu> wrote:
...
GALAXY TOOL ERROR REPORT
------------------------
This error report was sent from the Galaxy instance hosted on the server
"main.g2.bx.psu.edu"
-----------------------------------------------------------------------------
This is in reference to dataset id 2071026 from history id 485682
-----------------------------------------------------------------------------
You should be able to view the history containing the related history item
12: Manhattan_and_QQ_plots.html
by logging in as a Galaxy admin user to the Galaxy instance referenced above
and pointing your browser to the following link.
main.g2.bx.psu.edu/history/view?id=df22bcb1488553c7
-----------------------------------------------------------------------------
The user 'taka.nakada@nifty.com' provided the following information:
Hi,
I am trying to run Manhattan and QQ plots using PLINK file, but have this error.  Would you please let me know how to solve it.
Thanks
Taka-aki Nakada
-----------------------------------------------------------------------------
job id: 1813212
tool id: rgManQQ1
-----------------------------------------------------------------------------
job command line:
python /galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py '/galaxy/main_database/files/002/070/dataset_2070806.dat' "Manhattan and QQ plots" '/galaxy/main_database/tmp/job_working_directory/1813212/galaxy_dataset_2071026.dat' '/galaxy/main_database/tmp/job_working_directory/1813212/dataset_2071026_files' '0' '2' '8' 'false'
-----------------------------------------------------------------------------
job stderr:
Traceback (most recent call last):
 File "/galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py", line 318, in <module>
   main()
 File "/galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py", line 287, in main
   rlog,flist = doManQQ(input_fname,chrom_col,offset_col,p,title,grey,ctitle,outdir)
 File "/galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py", line 219, in doManQQ
   newhead = [ohead[chrom_col],ohead[offset_col]]
IndexError: list index out of range
-----------------------------------------------------------------------------
job stdout:
-----------------------------------------------------------------------------
job info:
None
-----------------------------------------------------------------------------
job traceback:
None
-----------------------------------------------------------------------------
(This is an automated message).
_______________________________________________
galaxy-bugs mailing list
galaxy-bugs@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-bugs
--
Ross Lazarus MBBS MPH
Associate Professor, HMS; Director of Bioinformatics, Channing Laboratory;
181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850;
Head, Medical Bioinformatics, BakerIDI;  PO Box 6492, St Kilda Rd Central;
Melbourne, VIC 8008, Australia; Tel: +61 385321444
-- 
Ross Lazarus MBBS MPH
Associate Professor, HMS; Director of Bioinformatics, Channing Laboratory;
181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850;
Head, Medical Bioinformatics, BakerIDI;  PO Box 6492, St Kilda Rd Central;
Melbourne, VIC 8008, Australia; Tel: +61 385321444