Hi, Taka, I noticed that the full manhattan plot looks odd in the history I have shared with you, and I think it's because the offsets for some of your snp are wrong. For example, the very last marker in chr1 in your data is rs11488669. In your data, the offset is 2147483647 which is way beyond the end of chr1 - the genome is only 3B base pairs - so the manhatten plot looks clumpy instead of uniform. According to genome.ucsc.edu it is at chr1:153517269-153517769 I'm going to guess that your data (eg the map file) has at some stage been changed using spreadsheet software such as excel which can easily do strange things to numeric columns. If all your processing is inside Galaxy, these kinds of errors can be prevented. I can see you have tried unsuccessfully to upload some plink lped files in the history you shared - here's some information that might help you from a previous enquiry on galaxy-user a few weeks ago: ============================================== Hi, Sylvian, The plink/rgenetics lped and pbed (compressed) formats are special 'composite' Galaxy datatypes because the map and pedigree/genotype files need to be kept together correctly inside Galaxy. As a result, the upload tool requires that the file type be specified so all of the components can be properly uploaded and stored together. For example, to upload pbed data from your local desktop, choose 'Upload file' from the Get Data tools. When the upload form appears, the trick is that you *must* change the default 'Autodetect' in the first (filetype) select box to the specific rgenetics datatype - either 'pbed' as the format for compressed plink data (or 'lped' for uncompressed plink genotype data) as the very first step. Type the first few letters into the first box, and select the right one from the list that appears. Once this is done, you will see that the upload tool form will change to show three separate file upload inputs - one each for the plink xxx.bim xxx.bed and xxx.fam where xxx is the name you set when you ran plink to create the files, or for uncompressed linkage format two separate file upload inputs - the plink .ped and .map files. Now you can browse for the corresponding file for each input box from your local machine - be careful not to mix them up as the upload tool is unable to tell unfortunately. At the bottom of the form, I suggest you then change the genome build to the appropriate one (eg hg18 or hg19). Finally, I'd recommend that you change the 'metadata value for basename' (which will be the new dataset name) to something that will remind you what the data are - something more meaningful than the default 'rgenetics'. Click 'execute' to upload the data and create the new dataset in your history. Compressed (pbed) format is preferred so the upload is quicker. Note that some tools will autoconvert between lped and pbed so there is a delay the first time some tools are run on a new dataset. There are built in converters (use the pencil icon) also if you need them. I hope this helps - thanks for using Galaxy and Rgenetics - please let us know how you go and feel free to contact me if you have other questions. On Fri, Feb 18, 2011 at 9:26 AM, Ross Lazarus <ross.lazarus@channing.harvard.edu> wrote:
Hi, Taka.
Thanks for trying the tool. Sorry to hear you are having problems with your data. Unfortunately, the history associated with this error does not have any datasets with data in the format required for the Manhatten/qq plot tool.
The file you were attempting to use was a bed file. In fact it is not even a valid bed format file because it has spaces instead of tabs as delimiters. The tool is unable to parse the header row correctly so you have the error about an index out of range on the header row.
As the tool form explains, the input required is: "Tabular Data is a tab delimited header file with chromosome, offset and p values to be plotted"
I tried changing the datatype from bed to tabular but discovered that you have spaces as delimiters! So, I downloaded and repaired your dataset by converting the delimiters to tabs interactively in python so it is now in the required format and uploaded the first few thousand rows to the original history and plotted it to check that the tool works as expected. I also ran the plots for the entire million rows and the plot is in the history.
I have shared the new history with you and attached is a low-res version of the resulting output from the first few thousand p values. You should be able to find the history by choosing 'options' from your current history then 'histories shared with me'.
If you can ensure that the data are in the correct format (only use tabs as delimiters and have a header row) then the tool should be able to perform correctly. Data in any other format is likely to cause the tool to crash.
Thanks for using Galaxy - I hope it is useful for your research.
In case you need to fix any other defective files, here's what I did:
rerla@rosst61:~/Downloads$ python Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information.
f = 'nakada.tab' bad = open(f,'r').readlines() sbad = [x.split() for x in bad] good = ['\t'.join(x) for x in sbad] good[:3] ['CHR\tSNP\tBP\tA1\tF_A\tF_U\tA2\tCHISQ\tP\tOR', '1\trs28659788\t713170\tG\t0.04094\t0.03725\tC\t0.08434\t0.7715\t1.103', '1\trs3094315\t742429\tG\t0.1754\t0.1533\tA\t0.835\t0.3608\t1.175'] o = open(f,'w') o.write('\n'.join(good)) o.close()
On Fri, Feb 18, 2011 at 8:32 AM, <galaxy-bugs@bx.psu.edu> wrote:
GALAXY TOOL ERROR REPORT ------------------------
This error report was sent from the Galaxy instance hosted on the server "main.g2.bx.psu.edu" ----------------------------------------------------------------------------- This is in reference to dataset id 2071026 from history id 485682 ----------------------------------------------------------------------------- You should be able to view the history containing the related history item
12: Manhattan_and_QQ_plots.html
by logging in as a Galaxy admin user to the Galaxy instance referenced above and pointing your browser to the following link.
main.g2.bx.psu.edu/history/view?id=df22bcb1488553c7 ----------------------------------------------------------------------------- The user 'taka.nakada@nifty.com' provided the following information:
Hi,
I am trying to run Manhattan and QQ plots using PLINK file, but have this error. Would you please let me know how to solve it.
Thanks Taka-aki Nakada ----------------------------------------------------------------------------- job id: 1813212 tool id: rgManQQ1 ----------------------------------------------------------------------------- job command line: python /galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py '/galaxy/main_database/files/002/070/dataset_2070806.dat' "Manhattan and QQ plots" '/galaxy/main_database/tmp/job_working_directory/1813212/galaxy_dataset_2071026.dat' '/galaxy/main_database/tmp/job_working_directory/1813212/dataset_2071026_files' '0' '2' '8' 'false' ----------------------------------------------------------------------------- job stderr: Traceback (most recent call last): File "/galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py", line 318, in <module> main() File "/galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py", line 287, in main rlog,flist = doManQQ(input_fname,chrom_col,offset_col,p,title,grey,ctitle,outdir) File "/galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py", line 219, in doManQQ newhead = [ohead[chrom_col],ohead[offset_col]] IndexError: list index out of range
----------------------------------------------------------------------------- job stdout:
----------------------------------------------------------------------------- job info: None ----------------------------------------------------------------------------- job traceback: None ----------------------------------------------------------------------------- (This is an automated message). _______________________________________________ galaxy-bugs mailing list galaxy-bugs@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-bugs
-- Ross Lazarus MBBS MPH Associate Professor, HMS; Director of Bioinformatics, Channing Laboratory; 181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; PO Box 6492, St Kilda Rd Central; Melbourne, VIC 8008, Australia; Tel: +61 385321444
-- Ross Lazarus MBBS MPH Associate Professor, HMS; Director of Bioinformatics, Channing Laboratory; 181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; PO Box 6492, St Kilda Rd Central; Melbourne, VIC 8008, Australia; Tel: +61 385321444