I am very new to genomic data analysis and I need to get some upstream and downstream of some chromosome regions of the pig genome. I have about 70 blat hits of a query of ca 100aa. I need to get 7000 nucleotides both upstream and downstream of this 100aa region.
I have tried to use Get flanks to get the "new" coordinates... bus instead of generating coordinates which would correspond to about 14000 nucleotides, it generates one coordinate for the upstream region and them another one for the downstream region.
Is there a way of doing what I need using Galaxy?
I would appreciate any help!
Thanks a lot!
All the best,
As we are trying to plan for next years GMOD meeting, we would like to
decide between two venues as soon as possible. To help us decide,
we've put together a simple survey. We are asking your help in
* San Diego, California in January before or after the Plant and
Animal Genomes meeting
* Cambridge, England in April before or after the International
Society of Biocurators meeting.
Each option has its upsides: the Plant and Animal Genomes is a large
meeting attended by several members of the GMOD community, so it would
likely have a fairly high attendance. On the other hand, having a
meeting in Cambridge would make it easier for European members of the
GMOD community to attend. Please share your thoughts with us and take
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
Earlier I tried to upload larger bam files (3.5GB, 3.4GB and
4GB) to Galaxy account, but failed. Your
advice was to use FTP upload at http://wiki.g2.bx.psu.edu/FTPUpload.
I followed the screencast in the website and did exactly as it has advised. I
used FileZilla ftp client, uploaded the files to Galaxy account and executed.
Now the problem is at the execution step. For example, my 3.5GB file is
accurately uploaded, but once I execute the file I get is 2.5GB. The file seems
to be somehow truncated. Please advice!
I have a sam file after running BWASW and want to extract unique
(alignments that are aligning once to genome) from this sam file. I read in
other posts that I may be able to use Sam tools> filter Sam option to
filter the said flag on wise flag. However I could not find whether I have
to use default setting of column 2? when I use option of add flags there
are different options for pair reads, however my data is single reads. So
exactly single read alignments sam file how we extract unique reads.
Am I missing something. I can also share history in order to explain my
point if required.
I want to make an intersection between a few hundreds of genomic intervals (predicted translocation sites from SVDetect) and low mappability regions in genomes (we are working with mm9 right now).
UCSC has an excellent mappability track that exactly matches our sequencing data (50 bp kmers), but it seems very difficult to get that data into Galaxy. I want a BED format that summarizes intervals of low mappability (ie. less than 0.5 on the scale used by UCSC). The UCSC Table Browser has a limit of 10M lines, which seems to give just part of chromosome 1. It will be very messy to try to get the whole genome bit by bit using this method and then stitch it back together using some sort of concatenation.
UCSC Help suggests downloading the mappability data for the whole genome as a bigwig formatted file, then convert to BED. I gave this a try, but we get a 4 GB file, with intervals of just one or two base pairs. Again, lots of work to get back to the nicer BED that I could make with the UCSC tools over smaller genomic regions. Also, super-painful to upload this huge file to Galaxy, and unhappy trying to write my own parsers to filter and smooth this file.
Any other suggestions? Maybe someone else knows where to find a mappability file (for mm9) that has nice intervals in a Galaxy compatible format.