Re: [galaxy-user] local sequence storage
by David Goodstein
We identified our local sequence file in tool-data/faseq.loc.
However, that file never shows up as an option under the "database/
build" pulldown
(when we "pencil" edit a dataset). What additional configuration is
necessary to have our local files show up in the database/build
pulldowns?
regards,
David Goodstein
> UCB Center for Integrative Genomics
>
> On 24 Jun 2008, at 16:47, Rochak Neupane wrote:
>
>> ---------- Forwarded message ----------
>> From: Greg Von Kuster <ghv2(a)psu.edu>
>> Date: Wed, Jun 18, 2008 at 1:06 PM
>> Subject: Re: [galaxy-user] local sequence storage
>> To: Rochak Neupane <rneupane(a)berkeley.edu>
>> Cc: galaxy-user(a)bx.psu.edu
>>
>>
>> Hello Rochak,
>>
>> Yes, this is certainly possible. The Galaxy distribution is
>> configured to look in the ~/tool-data directory for files which point
>> to your locally cached sequences. See the "tool_data_path"
>> setting in
>> the Galaxy config ( universe_wsgi.ini ). This directory contains
>> sample ".loc.sample" files that are included in the distribution to
>> provide information about how to point the ".loc" versions of the
>> same
>> files to your own locally cached sequences. The comments at the
>> beginning of each of these files provides this information.
>>
>> Greg Von Kuster
>> Galaxy Development Team
>>
>>
>> Rochak Neupane wrote:
>>>
>>> Hello,
>>>
>>> I recently started using galaxy. I installed a local copy of
>>> galaxy, and am wondering how I can add our own sequences so that
>>> galaxy understands them when fetching. So instead of fetching
>>> from PSU it'll look at a local source. Is this doable? If so,
>>> how can I configure this?
>>>
>>> Thanks,
>>> Rochak
>>> _______________________________________________
>>> galaxy-user mailing list
>>> galaxy-user(a)bx.psu.edu
>>> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
>>>
>>>
>>>
>
14 years, 7 months
local sequence storage
by Rochak Neupane
Hello,
I recently started using galaxy. I installed a local copy of galaxy,
and am wondering how I can add our own sequences so that galaxy
understands them when fetching. So instead of fetching from PSU it'll
look at a local source. Is this doable? If so, how can I configure this?
Thanks,
Rochak
14 years, 7 months
TFBS search
by Anton Nekrutenko
Tony:
We do not have a specific tool for clustering of TFBS. However, if
you know good open source tools that can be used for that purpose we
will consider integrating them into Galaxy framework.
Thanks,
anton
> Hello,
>
> I am just starting to use galaxy, and I was wondering if there is a
> simple way to search genome wide for clustering of custom
> transcription
> binding sites (preferentially with weight matrixes). All other
> programs
> (~10) I have used till so far have their limitations, either the
> length
> of the sequence they can handle is rather limited, they use only in
> house transcription factor binding sites or they are not able to check
> for evolutionary conservation.
>
> Thanks,
>
> Tony
Anton Nekrutenko
Asst. Professor
Department of Biochemistry and Molecular Biology
Center for Comparative Genomics and Bioinformatics
Penn State University
anton(a)bx.psu.edu
http://nekrut.bx.psu.edu
814.865.4752
14 years, 7 months
Re: [galaxy-user] galaxy query
by Anton Nekrutenko
Gareth:
Sorry for the delay. There are two way of dealing with this. I
attached a link to a screencast that highlights the two approaches.
In the first, you must upload datasets into galaxy and simply run the
join tool. The second approach is to use a new galaxy tool called
"Annotation profiler". It allows you to compare your set of genomic
features against the entire UCSC database in a single pass (at this
point it can only be used against hg18 annotations).
Try it out and let us know if you have any further questions. The
movie is here:
http://screencast.g2.bx.psu.edu/SNPs_TFBS.mov
anton
On Jun 5, 2008, at 11:21 AM, Whiteley, Gareth wrote:
> Hi Anton,
> That's exactly what I'm trying to do. I'm also trying to intersect
> the SNPs with the UCSC tracks - CpG islands, T-ScanS miRNA, PicTar
> miRNA - and any other regulatory sequence data.
> A step-by-step demo would be wonderful.
> Kind Regards,
> Gareth
>
> -----Original Message-----
> From: Anton Nekrutenko [mailto:anton@bx.psu.edu]
> Sent: 05 June 2008 15:58
> To: Whiteley, Gareth
> Cc: galaxy-user(a)bx.psu.edu
> Subject: Re: [galaxy-user] galaxy query
>
> Gareth:
>
> You can do all the intersects within Galaxy. If I understand
> correctly you are trying to intersect conserved TFBPs with SNPs?
> Right?
>
> Let me know and will send you a detailed step-by-step demo.
>
> anton
> galaxy team
>
> On Jun 5, 2008, at 8:02 AM, Whiteley, Gareth wrote:
>
>> Hello,
>>
>>
>>
>> If I was to choose the following search criteria Group:all tracks,
>> track:SNP, region: chr5:7922217-7954237, then click INTERSECT and
>> choose Group:all tracks, track:TFBS conserved and select GTF output
>> format. This example searches for any SNPs that are in the
>> Transcription factor binding sites within the genomic region that
>> codes for the MTRR gene, but the output does not tell me that the
>> SNP rs6868871 is found in the TFBS V$58_01 914, I have to work that
>> out for myself by then searching the whole table again but from a
>> TFBS start point and intersecting with SNPs.
>>
>>
>>
>> I am trying to use galaxy to Join two Queries side by side on a
>> specified field. I am trying to relate SNPs to the TFBS they are
>> associated with. For example, SNP rs6868871 is found in TFBS V
>> $58_01 914. However, i can not seem to get galaxy to work, i think
>> this is because the SNP site is something like this ‘165878639 -
>> 165878640’ and a TFBS site is something like this ‘165878630 -
>> 165878641’ and although the positions overlap, galaxy can not tell
>> that. Is this the case? Or do you know of how i can get around it?
>> Regards,Gareth Whiteley
>>
>>
>> Gareth Whiteley
>>
>> University of Liverpool
>>
>> Department of Pharmacology and Therapeutics
>>
>> The Sherrington Buildings
>>
>> Ashton Street
>>
>> Liverpool L69 7GE
>>
>>
>>
>> Tel: 0151 795 4224
>>
>> E-mail: g.whiteley(a)liverpool.ac.uk
>>
>>
>>
>> _______________________________________________
>> galaxy-user mailing list
>> galaxy-user(a)bx.psu.edu
>> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
>
> Anton Nekrutenko
> Asst. Professor
> Department of Biochemistry and Molecular Biology
> Center for Comparative Genomics and Bioinformatics
> Penn State University
> anton(a)bx.psu.edu
> http://nekrut.bx.psu.edu
> 814.865.4752
>
>
>
>
Anton Nekrutenko
Asst. Professor
Department of Biochemistry and Molecular Biology
Center for Comparative Genomics and Bioinformatics
Penn State University
anton(a)bx.psu.edu
http://nekrut.bx.psu.edu
814.865.4752
14 years, 7 months
Storing compressed data files
by Assaf Gordon
Hello,
As users add more and more data files to our galaxy server, disk space
becomes a problem...
What I'd ultimately like is to store the data files in some compressed
manner (at least some of the textual files), how would you suggest to do
that ?
A common scenario is:
1. User uploads a big Fastq/solexa file (=> 1.2 GB)
2. FASTQ file converted to FASTA file (=> 0.6 GB)
3. FASTA file trimmed, clipped, stripped, etc. (=> 100 MB)
4. BLAT, Histograms and other reports (=> ~50 MB)
The first three data sets take about 1.9 GB of disk space - and aren't
really needed by the user (as he/she is mostly interested in the
resulting report files). Since these are textual files, they compress
really well.
Currently, I store the FASTQ gzip'ed in galaxy, and my tools know how to
read gzip'ed data.
There are two shortcomings with this method:
1. datasets (green squares) of gzip'ed files don't display any data in
the peek window
2. Other galaxy tools which require FASTQ file as input can't read my file.
Perl has an I/O module (PerlIO::gzip) which makes reading gzipped files
transparent to the rest of the program. I think python has something
very similar (http://www.python.org/doc/lib/module-gzip.html).
If it's not too much to ask, would it be possible to add support for
reading gzip'ed files ? At least in the peek/preview window ?
Comments are welcomed,
Thanks,
Gordon.
14 years, 7 months
Two Administration questions
by Assaf Gordon
Hello,
I have two administration-related questions:
1. I have a tool which uses dynamic options from an external file.
A cron script updates this file, but the updated data doesn't appear in
the tool's list-box until galaxy is restarted.
Is there a way to re-load an option file without restarting galaxy?
2. In order to do some routine galaxy maintenance, I need to stop the
daemon - Is there way to make sure no jobs are currently running before
stopping the daemon?
Thanks,
Gordon.
14 years, 7 months
Re: [galaxy-user] Storing compressed data files
by Asim Siddiqui
Hi Assaf,
An alternative approach is to utilize a binary file format specifically
designed for sequence data in a compact manner. An example of that is
Sequence Read Format (SRF). SRF has been incorporated into the Illumina and
Helicos pipelines and will be available for the AB platform shortly. SRF
includes support for compression using several schemes including ZLIB.
This thread has been captured on the genographia website and I've commented
there. There is also a link to more information on SRF. Note: in terms of
implementation, there is a C version (most complete), a C++ prototype (with
a complete C++ implementation coming soon) and an early Java implementation.
http://www.genographia.org/portal/topics/sequence-read-format-srf/galaxy-and
-file-size-management
http://www.genographia.org/portal/topics/sequence-read-format-srf/sequence-r
ead-format-srf
Asim
------ Forwarded Message
From: Assaf Gordon <gordon(a)cshl.edu>
Date: Wed, 4 Jun 2008 16:26:30 -0700
To: <galaxy-user(a)bx.psu.edu>
Subject: [galaxy-user] Storing compressed data files
Hello,
As users add more and more data files to our galaxy server, disk space
becomes a problem...
What I'd ultimately like is to store the data files in some compressed
manner (at least some of the textual files), how would you suggest to do
that ?
A common scenario is:
1. User uploads a big Fastq/solexa file (=> 1.2 GB)
2. FASTQ file converted to FASTA file (=> 0.6 GB)
3. FASTA file trimmed, clipped, stripped, etc. (=> 100 MB)
4. BLAT, Histograms and other reports (=> ~50 MB)
The first three data sets take about 1.9 GB of disk space - and aren't
really needed by the user (as he/she is mostly interested in the
resulting report files). Since these are textual files, they compress
really well.
Currently, I store the FASTQ gzip'ed in galaxy, and my tools know how to
read gzip'ed data.
There are two shortcomings with this method:
1. datasets (green squares) of gzip'ed files don't display any data in
the peek window
2. Other galaxy tools which require FASTQ file as input can't read my file.
Perl has an I/O module (PerlIO::gzip) which makes reading gzipped files
transparent to the rest of the program. I think python has something
very similar (http://www.python.org/doc/lib/module-gzip.html).
If it's not too much to ask, would it be possible to add support for
reading gzip'ed files ? At least in the peek/preview window ?
Comments are welcomed,
Thanks,
Gordon.
_______________________________________________
galaxy-user mailing list
galaxy-user(a)bx.psu.edu
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
------ End of Forwarded Message
14 years, 7 months
galaxy query
by Whiteley, Gareth
Hello,
If I was to choose the following search criteria Group:all tracks, track:SNP, region: chr5:7922217-7954237, then click INTERSECT and choose Group:all tracks, track:TFBS conserved and select GTF output format. This example searches for any SNPs that are in the Transcription factor binding sites within the genomic region that codes for the MTRR gene, but the output does not tell me that the SNP rs6868871 is found in the TFBS V$58_01 914, I have to work that out for myself by then searching the whole table again but from a TFBS start point and intersecting with SNPs.
I am trying to use galaxy to Join two Queries side by side on a specified field. I am trying to relate SNPs to the TFBS they are associated with. For example, SNP rs6868871 is found in TFBS V$58_01 914. However, i can not seem to get galaxy to work, i think this is because the SNP site is something like this '165878639 - 165878640' and a TFBS site is something like this '165878630 - 165878641' and although the positions overlap, galaxy can not tell that. Is this the case? Or do you know of how i can get around it?
Regards,
Gareth Whiteley
Gareth Whiteley
University of Liverpool
Department of Pharmacology and Therapeutics
The Sherrington Buildings
Ashton Street
Liverpool L69 7GE
Tel: 0151 795 4224
E-mail: g.whiteley(a)liverpool.ac.uk
14 years, 7 months
admin password?
by Davide Cittaro
Can anybody use the admin tool to reload modules? I've set the
password in universe_wsgi.ini file but I always get an "Invalid
password" error...
thanks
d
/*
Davide Cittaro
Cogentech - Consortium for Genomic Technologies
via adamello, 16
20139 Milano
Italy
tel.: +39(02)574303007
e-mail: davide.cittaro(a)ifom-ieo-campus.it
*/
14 years, 8 months