The correct link: http://www.microbesonline.org/cgi-bin/genomeInfo.cgi?tId=59919


Previous mail:

I am a non-programmer working on  Prochlorococcus (a marine bacteria) for which UCSC and Ensembl do not yet have genome/transcriptome available or uploaded.  However, the genome and transcriptome of this organism have been solved and annotated and are available on microbes online (http://www.microbesonline.org/cgi-bin/genomeInfo.cgi?tId=59919).

I have been trying to run transcriptome analyses using cufflinks, for which I need gtf files of the transcriptome.  Microbes online has tab delimited files and I have been trying to convert them to gtf files using excel.   Basically, I reorganized the data so that the first 8 columns seem fine when uploaded to galaxy.  The way I have been doing this is to save the file as a tab delimited excel file, and then upload the file onto Galaxy by "telling" galaxy that it is a gtf file (instead of allowing galaxy to identify the file type itself using the auto-detect function) when using the file upload option.  However, when I do this, I cant get the 9th column (attributes) to work.

    I have tried either to separate the attributes in the 9th column in my excel spreadsheet by either a space or a tab (using concatenation with the char(9) function which I understand encodes a tab in excel).  In all cases, when I upload to galaxy by identifying the .txt file as a .gtf file, the 9th column splits into columns 9,10,11, etc when I use a char(9) function in excel) or I get an error message from cufflinks (An error occurred running this job: cufflinks v1.0.3 cufflinks -q --no-update-check -I 300000 -F 0.050000 -j 0.050000 -p 8 -G /galaxy/main_database/files/003/377/dataset_3377315.dat Error running cufflinks. [Errno 2] No such file or directory: 'transcripts.gtf') when I use spaces to separate the attributes.

I would be happy to know whether there is a way to convert my tab delimited transcriptome file from Microbes Online to a gtf file (either by excel or another program) which would enable me to use galaxy's NGS functions on Prochlorococcus.

Additionally, when I upload my data files I am able to choose the prochlorococcus genome on galaxy (genome 213 in the 'upload file' option), but am unable to chose it from the reference genome list when performing tophat on galaxy.  This may solve the problem (or may be part of the same issue).

Many thanks,

Noa Sher