Python error when running Bowtie for Illumina
by Weng Khong Lim
Hi all,
I'm new to next-gen sequencing, so please be gentle. I've just received a
pair of Illumina FASTQ files from the sequencing facility and intend to map
them to the hg19 reference genome. I first used the FASTQ Groomer utility to
convert the reads into Sanger reads. However, when running Bowtie for
Illumina on the resulting dataset under default settings, I received the
following error:
An error occurred running this job: *Error aligning sequence. requested
number of bytes is more than a Python string can hold*
*
*
Can someone help point out my mistake? My history is accessible at
http://main.g2.bx.psu.edu/u/wengkhong_lim/h/chip-seq-pilot-batch
Appreciate the help!
Weng Khong, LIM
Department of Genetics
University of Cambridge
E-mail: wkl24(a)cam.ac.uk
Tel: +447503225832
12 years
The uploaded file contains inappropriate content
by Erick Antezana
Hi,
It seems there is a (serious) bug while "Adding datasets" via the
"Upload directory of files" (with or without copying the data into
Galaxy (3545:2590120aed68), i.e. (un)tick the 'No' box) if the files
are in ZIP format. Actually the files get erased from the filesystem.
In the "information" column, I get:
Job error (click name for more info)
Then, after clicking on the file name:
Information about 2010.fastq.zip
Message:
Uploaded by: erick(a)mydomain.com
Date uploaded: 2010-03-23
Build: ?
Miscellaneous information: The uploaded file contains
inappropriate content
error
We have no problems with adding unzipped files...
thanks,
Erick
12 years, 8 months
Re: [galaxy-user] downloading huge data sets from history using wget
by Florent Angly
Hi Peter,
Please use 'reply all' so that everyone on the mailing list can
participe in the discussion.
I did not publish my history, so that's probably not what causes
problems for you.
If you can click on the 'save' icon and it starts the download
successfully, then you ought to be able to copy the download link and
use it in wget and have it working. What happens when you click on
'save'? Does it start the download?
Florent
On 31/05/10 09:57, pis(a)duke.edu wrote:
> Hi Florent,
>
> Do you think that I need to publish it first as a history and then try
> it again?
> I suspect that may be the reason for the strange behavior.
>
> I will let you know when I get it to work
>
> Thank you very much for your help
>
> Have a nice day
> Peter
>
>
>
> Zitat von Florent Angly <florent.angly(a)gmail.com>:
>
>> Hi Peter,
>>
>> See an example below:
>>> $ wget
>>> http://main.g2.bx.psu.edu/datasets/59a2a6ec00c47fc4/display?to_ext=fasta
>>>
>>> --2010-05-31 09:45:24--
>>> http://main.g2.bx.psu.edu/datasets/59a2a6ec00c47fc4/display?to_ext=fasta
>>>
>>> Resolving main.g2.bx.psu.edu... 128.118.201.93
>>> Connecting to main.g2.bx.psu.edu|128.118.201.93|:80... connected.
>>> HTTP request sent, awaiting response... 200 OK
>>> Length: 1177056689 (1.1G) [text/plain]
>>> Saving to: `display?to_ext=fasta'
>>> 0% [ ] 224,112 243K/s
>> The download link was copied from the "save" icon.
>>
>> When I try with your link, I get:
>>> $ wget
>>> http://main.g2.bx.psu.edu/datasets/c3a8db0a339f7a43/display?to_ext=fastqs...
>>>
>>> --2010-05-31 09:49:10--
>>> http://main.g2.bx.psu.edu/datasets/c3a8db0a339f7a43/display?to_ext=fastqs...
>>>
>>> Resolving main.g2.bx.psu.edu... 128.118.201.93
>>> Connecting to main.g2.bx.psu.edu|128.118.201.93|:80... connected.
>>> HTTP request sent, awaiting response... 416 Request Range Not
>>> Satisfiable
>>>
>>> The file is already fully retrieved; nothing to do.
>> The "request range not satisfiable" makes me think that your download
>> link is not valid for some reason.
>>
>> Florent
>>
>>
>> On 30/05/10 23:43, pis(a)duke.edu wrote:
>>> Dear Florent Angly,
>>>
>>> Thank you very much for your response. I have actually tried to do
>>> that but it
>>> still does not work. When I choose "copy link location" in firefox
>>> (in my
>>> version no save link location appears" I get an URL with a strange
>>> data file
>>> name such as
>>> http://main.g2.bx.psu.edu/datasets/c3a8db0a339f7a43/display?to_ext=fastqs....
>>> This will not work with wget ("wget: No match"). I does not work
>>> either when I
>>> replace the data file name with the name of the file that apper when
>>> I want to
>>> download the file using the disl ikon. I would be happy for further
>>> advice on
>>> that since it has apparently worked for you.
>>>
>>> Thank you very much for your help
>>> Peter
>>>
>>> Zitat von Florent Angly <florent.angly(a)gmail.com>:
>>>
>>>> Hi Peter,
>>>> It's pretty easy. In the Galaxy interface, use the "save" icon
>>>> represented as a floppy disk. Instead of clicking on the icon, get
>>>> the URL for it (in Firefox: right click > save link location). Then
>>>> simply copy this URL in your terminal for wget to use.
>>>> Florent
>>>>
>>>> On 30/05/10 07:12, pis(a)duke.edu wrote:
>>>>> Dear Galaxy team,
>>>>>
>>>>> I am doing some standard clipping and trimming using galaxy and
>>>>> would be happy
>>>>> to download the generated files using a unix terminal and wget. Is
>>>>> it a way to
>>>>> figure it out the exact link of the data file to wget it easily? I
>>>>> am talking
>>>>> about file sizes over 5 Gb that are really time consuming and
>>>>> error prone to
>>>>> download using a web browser.
>>>>>
>>>>> Thank you very much for developing this excellent tool
>>>>> Peter
>>>>>
>>>>> _______________________________________________
>>>>> galaxy-user mailing list
>>>>> galaxy-user(a)lists.bx.psu.edu
>>>>> http://lists.bx.psu.edu/listinfo/galaxy-user
>>>>>
>>>>
>>>> _______________________________________________
>>>> galaxy-user mailing list
>>>> galaxy-user(a)lists.bx.psu.edu
>>>> http://lists.bx.psu.edu/listinfo/galaxy-user
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
12 years, 8 months
Re: [galaxy-user] downloading huge data sets from history using wget
by Florent Angly
Hi Peter,
See an example below:
> $ wget
> http://main.g2.bx.psu.edu/datasets/59a2a6ec00c47fc4/display?to_ext=fasta
> --2010-05-31 09:45:24--
> http://main.g2.bx.psu.edu/datasets/59a2a6ec00c47fc4/display?to_ext=fasta
> Resolving main.g2.bx.psu.edu... 128.118.201.93
> Connecting to main.g2.bx.psu.edu|128.118.201.93|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 1177056689 (1.1G) [text/plain]
> Saving to: `display?to_ext=fasta'
> 0% [ ] 224,112 243K/s
The download link was copied from the "save" icon.
When I try with your link, I get:
> $ wget
> http://main.g2.bx.psu.edu/datasets/c3a8db0a339f7a43/display?to_ext=fastqs...
> --2010-05-31 09:49:10--
> http://main.g2.bx.psu.edu/datasets/c3a8db0a339f7a43/display?to_ext=fastqs...
> Resolving main.g2.bx.psu.edu... 128.118.201.93
> Connecting to main.g2.bx.psu.edu|128.118.201.93|:80... connected.
> HTTP request sent, awaiting response... 416 Request Range Not Satisfiable
>
> The file is already fully retrieved; nothing to do.
The "request range not satisfiable" makes me think that your download
link is not valid for some reason.
Florent
On 30/05/10 23:43, pis(a)duke.edu wrote:
> Dear Florent Angly,
>
> Thank you very much for your response. I have actually tried to do
> that but it
> still does not work. When I choose "copy link location" in firefox
> (in my
> version no save link location appears" I get an URL with a strange
> data file
> name such as
> http://main.g2.bx.psu.edu/datasets/c3a8db0a339f7a43/display?to_ext=fastqs....
>
> This will not work with wget ("wget: No match"). I does not work
> either when I
> replace the data file name with the name of the file that apper when I
> want to
> download the file using the disl ikon. I would be happy for further
> advice on
> that since it has apparently worked for you.
>
> Thank you very much for your help
> Peter
>
> Zitat von Florent Angly <florent.angly(a)gmail.com>:
>
>> Hi Peter,
>> It's pretty easy. In the Galaxy interface, use the "save" icon
>> represented as a floppy disk. Instead of clicking on the icon, get
>> the URL for it (in Firefox: right click > save link location). Then
>> simply copy this URL in your terminal for wget to use.
>> Florent
>>
>> On 30/05/10 07:12, pis(a)duke.edu wrote:
>>> Dear Galaxy team,
>>>
>>> I am doing some standard clipping and trimming using galaxy and
>>> would be happy
>>> to download the generated files using a unix terminal and wget. Is
>>> it a way to
>>> figure it out the exact link of the data file to wget it easily? I
>>> am talking
>>> about file sizes over 5 Gb that are really time consuming and error
>>> prone to
>>> download using a web browser.
>>>
>>> Thank you very much for developing this excellent tool
>>> Peter
>>>
>>> _______________________________________________
>>> galaxy-user mailing list
>>> galaxy-user(a)lists.bx.psu.edu
>>> http://lists.bx.psu.edu/listinfo/galaxy-user
>>>
>>
>> _______________________________________________
>> galaxy-user mailing list
>> galaxy-user(a)lists.bx.psu.edu
>> http://lists.bx.psu.edu/listinfo/galaxy-user
>>
>
>
>
12 years, 8 months
downloading huge data sets from history using wget
by pis@duke.edu
Dear Galaxy team,
I am doing some standard clipping and trimming using galaxy and would be happy
to download the generated files using a unix terminal and wget. Is it a way to
figure it out the exact link of the data file to wget it easily? I am talking
about file sizes over 5 Gb that are really time consuming and error prone to
download using a web browser.
Thank you very much for developing this excellent tool
Peter
12 years, 8 months
Problems with Galaxy kindly reply urgently
by Amit Pande
Dear Galaxy,
We have installed Galaxy on one of our Institute's
server http://totoro:8080/ but the problem is when ever some file is
uploaded
for example in the bed format to retrieve sequences
from fetch sequences tool the message that gets displayed is :"Sequences
not
available for the specific build".
Kindly help.
warm regards,
Amit.
12 years, 8 months
Problems with Galaxy kindly reply urgently
by Amit Pande
Dear Galaxy,
We have installed Galaxy on one of our Institute's
server http://totoro:8080/ but the problem is when ever some file is
uploaded
for example in the bed format to retrieve
sequences from fetch sequences tool the message that gets displayed is
:"Sequences not
available for the specific build".
Kindly help.
warm regards,
Amit.
12 years, 8 months
installing galaxy
by pande
Dear Galaxy,
If I download galaxy on the server of my
Institute , will I be in a position to extract the sequences
of my interest ? and use the tools as I would do
them on your server....I have the ENCODE data
and it is difficult to upload these files from
the internet because of their enormous size.
Kindly help.
regards,
Amit.
12 years, 8 months
Issue with saving 'manipulate fastq' in workflow; and request for advice dealing with barcoded 454 data
by Pip Griffin
Hi,
I'm a new user, learning how to use Galaxy while I wait for my 454 results.
So I'm not actually playing with any data yet but I'm trying to set up a
draft workflow as practice. Two issues:
Issue 1.
I am having trouble with the 'manipulate fastq' command. Without this, my
workflow saves quickly and seems fine, but when I include even a (seemingly
simple) 'manipulate fastq' step, it tries to save for many minutes,
unsuccessfully, until I get sick of it and close the window.
Issue 2.
Well this isn't really an issue, just a request for advice! My dataset will
be a barcoded amplicon library, containing 8 different gene regions (which I
can recognise from the amplicon-specific primer sequences) amplified in 64
different individuals (which I can recognise by an individual-specific
barcode sequence). I thought I'd set up a workflow with the following steps:
1) convert to FASTQ format. 2) grooming, filtering to remove short reads
etc. 3) 'manipulate FASTQ' to match all sequences containing one of the
eight reverse primer sequences, and reverse-complement them. 4)
FASTQ--tabular format conversion. 5) eight separate 'select' steps to select
sequences with a match to either the forward primer or the
reverse-complemented reverse primer of the desired gene region.
My question is: does this seem sensible? Is there a more efficient way to do
this that I haven't discovered yet? I was thinking I'd then set up another
workflow to label barcoded individuals, for I could use each of the eight
gene 'output files' in turn as input.
Thanks so much for this service! The screencasts are especially great.
Pip Griffin
University of Melbourne, Australia
12 years, 8 months