The correct link: http://www.microbesonline.org/cgi-bin/genomeInfo.cgi?tId=59919
Previous mail:
I am a non-programmer working on Prochlorococcus (a marine
bacteria) for which UCSC and Ensembl do not yet have
genome/transcriptome available or uploaded. However, the genome
and transcriptome of this organism have been solved and annotated
and are available on microbes online (http://www.microbesonline.org/cgi-bin/genomeInfo.cgi?tId=59919).
I have been trying to run transcriptome analyses using cufflinks, for which I need gtf files of the transcriptome. Microbes online has tab delimited files and I have been trying to convert them to gtf files using excel. Basically, I reorganized the data so that the first 8 columns seem fine when uploaded to galaxy. The way I have been doing this is to save the file as a tab delimited excel file, and then upload the file onto Galaxy by "telling" galaxy that it is a gtf file (instead of allowing galaxy to identify the file type itself using the auto-detect function) when using the file upload option. However, when I do this, I cant get the 9th column (attributes) to work.
I have tried either to separate the attributes in the 9th
column in my excel spreadsheet by either a space or a tab (using
concatenation with the char(9) function which I understand encodes
a tab in excel). In all cases, when I upload to galaxy by
identifying the .txt file as a .gtf file, the 9th column splits
into columns 9,10,11, etc when I use a char(9) function in excel)
or I get an error message from cufflinks (An error occurred
running this job: cufflinks v1.0.3 cufflinks -q
--no-update-check -I 300000 -F 0.050000 -j 0.050000 -p 8 -G
/galaxy/main_database/files/003/377/dataset_3377315.dat Error
running cufflinks. [Errno 2] No such file or directory:
'transcripts.gtf') when I use spaces to separate the
attributes.
Many thanks,
Noa Sher