June 2008 - galaxy-user - lists.galaxyproject.org

Re: [galaxy-user] local sequence storage]
by Greg Von Kuster 12 Sep '08

12 Sep '08

4 3

Re: [galaxy-user] local sequence storage
by David Goodstein 25 Jun '08

25 Jun '08

We identified our local sequence file in tool-data/faseq.loc. However, that file never shows up as an option under the "database/ build" pulldown (when we "pencil" edit a dataset). What additional configuration is necessary to have our local files show up in the database/build pulldowns? regards, David Goodstein > UCB Center for Integrative Genomics > > On 24 Jun 2008, at 16:47, Rochak Neupane wrote: > >> ---------- Forwarded message ---------- >> From: Greg Von Kuster <ghv2(a)psu.edu> >> Date: Wed, Jun 18, 2008 at 1:06 PM >> Subject: Re: [galaxy-user] local sequence storage >> To: Rochak Neupane <rneupane(a)berkeley.edu> >> Cc: galaxy-user(a)bx.psu.edu >> >> >> Hello Rochak, >> >> Yes, this is certainly possible. The Galaxy distribution is >> configured to look in the ~/tool-data directory for files which point >> to your locally cached sequences. See the "tool_data_path" >> setting in >> the Galaxy config ( universe_wsgi.ini ). This directory contains >> sample ".loc.sample" files that are included in the distribution to >> provide information about how to point the ".loc" versions of the >> same >> files to your own locally cached sequences. The comments at the >> beginning of each of these files provides this information. >> >> Greg Von Kuster >> Galaxy Development Team >> >> >> Rochak Neupane wrote: >>> >>> Hello, >>> >>> I recently started using galaxy. I installed a local copy of >>> galaxy, and am wondering how I can add our own sequences so that >>> galaxy understands them when fetching. So instead of fetching >>> from PSU it'll look at a local source. Is this doable? If so, >>> how can I configure this? >>> >>> Thanks, >>> Rochak >>> _______________________________________________ >>> galaxy-user mailing list >>> galaxy-user(a)bx.psu.edu >>> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user >>> >>> >>> >

1 0

local sequence storage
by Rochak Neupane 18 Jun '08

18 Jun '08

Hello, I recently started using galaxy. I installed a local copy of galaxy, and am wondering how I can add our own sequences so that galaxy understands them when fetching. So instead of fetching from PSU it'll look at a local source. Is this doable? If so, how can I configure this? Thanks, Rochak

2 1

TFBS search
by Anton Nekrutenko 13 Jun '08

13 Jun '08

Tony: We do not have a specific tool for clustering of TFBS. However, if you know good open source tools that can be used for that purpose we will consider integrating them into Galaxy framework. Thanks, anton > Hello, > > I am just starting to use galaxy, and I was wondering if there is a > simple way to search genome wide for clustering of custom > transcription > binding sites (preferentially with weight matrixes). All other > programs > (~10) I have used till so far have their limitations, either the > length > of the sequence they can handle is rather limited, they use only in > house transcription factor binding sites or they are not able to check > for evolutionary conservation. > > Thanks, > > Tony Anton Nekrutenko Asst. Professor Department of Biochemistry and Molecular Biology Center for Comparative Genomics and Bioinformatics Penn State University anton(a)bx.psu.edu http://nekrut.bx.psu.edu 814.865.4752

1 0

Re: [galaxy-user] galaxy query
by Anton Nekrutenko 13 Jun '08

13 Jun '08

Gareth: Sorry for the delay. There are two way of dealing with this. I attached a link to a screencast that highlights the two approaches. In the first, you must upload datasets into galaxy and simply run the join tool. The second approach is to use a new galaxy tool called "Annotation profiler". It allows you to compare your set of genomic features against the entire UCSC database in a single pass (at this point it can only be used against hg18 annotations). Try it out and let us know if you have any further questions. The movie is here: http://screencast.g2.bx.psu.edu/SNPs_TFBS.mov anton On Jun 5, 2008, at 11:21 AM, Whiteley, Gareth wrote: > Hi Anton, > That's exactly what I'm trying to do. I'm also trying to intersect > the SNPs with the UCSC tracks - CpG islands, T-ScanS miRNA, PicTar > miRNA - and any other regulatory sequence data. > A step-by-step demo would be wonderful. > Kind Regards, > Gareth > > -----Original Message----- > From: Anton Nekrutenko [mailto:anton@bx.psu.edu] > Sent: 05 June 2008 15:58 > To: Whiteley, Gareth > Cc: galaxy-user(a)bx.psu.edu > Subject: Re: [galaxy-user] galaxy query > > Gareth: > > You can do all the intersects within Galaxy. If I understand > correctly you are trying to intersect conserved TFBPs with SNPs? > Right? > > Let me know and will send you a detailed step-by-step demo. > > anton > galaxy team > > On Jun 5, 2008, at 8:02 AM, Whiteley, Gareth wrote: > >> Hello, >> >> >> >> If I was to choose the following search criteria Group:all tracks, >> track:SNP, region: chr5:7922217-7954237, then click INTERSECT and >> choose Group:all tracks, track:TFBS conserved and select GTF output >> format. This example searches for any SNPs that are in the >> Transcription factor binding sites within the genomic region that >> codes for the MTRR gene, but the output does not tell me that the >> SNP rs6868871 is found in the TFBS V$58_01 914, I have to work that >> out for myself by then searching the whole table again but from a >> TFBS start point and intersecting with SNPs. >> >> >> >> I am trying to use galaxy to Join two Queries side by side on a >> specified field. I am trying to relate SNPs to the TFBS they are >> associated with. For example, SNP rs6868871 is found in TFBS V >> $58_01 914. However, i can not seem to get galaxy to work, i think >> this is because the SNP site is something like this ‘165878639 - >> 165878640’ and a TFBS site is something like this ‘165878630 - >> 165878641’ and although the positions overlap, galaxy can not tell >> that. Is this the case? Or do you know of how i can get around it? >> Regards,Gareth Whiteley >> >> >> Gareth Whiteley >> >> University of Liverpool >> >> Department of Pharmacology and Therapeutics >> >> The Sherrington Buildings >> >> Ashton Street >> >> Liverpool L69 7GE >> >> >> >> Tel: 0151 795 4224 >> >> E-mail: g.whiteley(a)liverpool.ac.uk >> >> >> >> _______________________________________________ >> galaxy-user mailing list >> galaxy-user(a)bx.psu.edu >> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user > > Anton Nekrutenko > Asst. Professor > Department of Biochemistry and Molecular Biology > Center for Comparative Genomics and Bioinformatics > Penn State University > anton(a)bx.psu.edu > http://nekrut.bx.psu.edu > 814.865.4752 > > > > Anton Nekrutenko Asst. Professor Department of Biochemistry and Molecular Biology Center for Comparative Genomics and Bioinformatics Penn State University anton(a)bx.psu.edu http://nekrut.bx.psu.edu 814.865.4752

1 0

Storing compressed data files
by Assaf Gordon 10 Jun '08

10 Jun '08

Hello, As users add more and more data files to our galaxy server, disk space becomes a problem... What I'd ultimately like is to store the data files in some compressed manner (at least some of the textual files), how would you suggest to do that ? A common scenario is: 1. User uploads a big Fastq/solexa file (=> 1.2 GB) 2. FASTQ file converted to FASTA file (=> 0.6 GB) 3. FASTA file trimmed, clipped, stripped, etc. (=> 100 MB) 4. BLAT, Histograms and other reports (=> ~50 MB) The first three data sets take about 1.9 GB of disk space - and aren't really needed by the user (as he/she is mostly interested in the resulting report files). Since these are textual files, they compress really well. Currently, I store the FASTQ gzip'ed in galaxy, and my tools know how to read gzip'ed data. There are two shortcomings with this method: 1. datasets (green squares) of gzip'ed files don't display any data in the peek window 2. Other galaxy tools which require FASTQ file as input can't read my file. Perl has an I/O module (PerlIO::gzip) which makes reading gzipped files transparent to the rest of the program. I think python has something very similar (http://www.python.org/doc/lib/module-gzip.html) If it's not too much to ask, would it be possible to add support for reading gzip'ed files ? At least in the peek/preview window ? Comments are welcomed, Thanks, Gordon.

2 1

Two Administration questions
by Assaf Gordon 10 Jun '08

10 Jun '08

Hello, I have two administration-related questions: 1. I have a tool which uses dynamic options from an external file. A cron script updates this file, but the updated data doesn't appear in the tool's list-box until galaxy is restarted. Is there a way to re-load an option file without restarting galaxy? 2. In order to do some routine galaxy maintenance, I need to stop the daemon - Is there way to make sure no jobs are currently running before stopping the daemon? Thanks, Gordon.

2 1

Re: [galaxy-user] Storing compressed data files
by Asim Siddiqui 05 Jun '08

05 Jun '08

Hi Assaf, An alternative approach is to utilize a binary file format specifically designed for sequence data in a compact manner. An example of that is Sequence Read Format (SRF). SRF has been incorporated into the Illumina and Helicos pipelines and will be available for the AB platform shortly. SRF includes support for compression using several schemes including ZLIB. This thread has been captured on the genographia website and I've commented there. There is also a link to more information on SRF. Note: in terms of implementation, there is a C version (most complete), a C++ prototype (with a complete C++ implementation coming soon) and an early Java implementation. http://www.genographia.org/portal/topics/sequence-read-format-srf/galaxy-and -file-size-management http://www.genographia.org/portal/topics/sequence-read-format-srf/sequence-r ead-format-srf Asim ------ Forwarded Message From: Assaf Gordon <gordon(a)cshl.edu> Date: Wed, 4 Jun 2008 16:26:30 -0700 To: <galaxy-user(a)bx.psu.edu> Subject: [galaxy-user] Storing compressed data files Hello, As users add more and more data files to our galaxy server, disk space becomes a problem... What I'd ultimately like is to store the data files in some compressed manner (at least some of the textual files), how would you suggest to do that ? A common scenario is: 1. User uploads a big Fastq/solexa file (=> 1.2 GB) 2. FASTQ file converted to FASTA file (=> 0.6 GB) 3. FASTA file trimmed, clipped, stripped, etc. (=> 100 MB) 4. BLAT, Histograms and other reports (=> ~50 MB) The first three data sets take about 1.9 GB of disk space - and aren't really needed by the user (as he/she is mostly interested in the resulting report files). Since these are textual files, they compress really well. Currently, I store the FASTQ gzip'ed in galaxy, and my tools know how to read gzip'ed data. There are two shortcomings with this method: 1. datasets (green squares) of gzip'ed files don't display any data in the peek window 2. Other galaxy tools which require FASTQ file as input can't read my file. Perl has an I/O module (PerlIO::gzip) which makes reading gzipped files transparent to the rest of the program. I think python has something very similar (http://www.python.org/doc/lib/module-gzip.html) If it's not too much to ask, would it be possible to add support for reading gzip'ed files ? At least in the peek/preview window ? Comments are welcomed, Thanks, Gordon. _______________________________________________ galaxy-user mailing list galaxy-user(a)bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user ------ End of Forwarded Message

1 0

galaxy query
by Whiteley, Gareth 05 Jun '08

05 Jun '08

Hello, If I was to choose the following search criteria Group:all tracks, track:SNP, region: chr5:7922217-7954237, then click INTERSECT and choose Group:all tracks, track:TFBS conserved and select GTF output format. This example searches for any SNPs that are in the Transcription factor binding sites within the genomic region that codes for the MTRR gene, but the output does not tell me that the SNP rs6868871 is found in the TFBS V$58_01 914, I have to work that out for myself by then searching the whole table again but from a TFBS start point and intersecting with SNPs. I am trying to use galaxy to Join two Queries side by side on a specified field. I am trying to relate SNPs to the TFBS they are associated with. For example, SNP rs6868871 is found in TFBS V$58_01 914. However, i can not seem to get galaxy to work, i think this is because the SNP site is something like this '165878639 - 165878640' and a TFBS site is something like this '165878630 - 165878641' and although the positions overlap, galaxy can not tell that. Is this the case? Or do you know of how i can get around it? Regards, Gareth Whiteley Gareth Whiteley University of Liverpool Department of Pharmacology and Therapeutics The Sherrington Buildings Ashton Street Liverpool L69 7GE Tel: 0151 795 4224 E-mail: g.whiteley(a)liverpool.ac.uk

2 1

admin password?
by Davide Cittaro 03 Jun '08

03 Jun '08

Can anybody use the admin tool to reload modules? I've set the password in universe_wsgi.ini file but I always get an "Invalid password" error... thanks d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro(a)ifom-ieo-campus.it */

2 1