Re: [galaxy-user] Question from yesterday
Hi David, Due to variations between genomes, there are often more than one MAF block per region. You can however, have 1 stitched FASTA block per region: Start with your BED file. Go to Fetch Alignments, select Stitch MAF Blocks Tool. Set Choose Intervals to your BED file. Select your Desired MAF source from the locally cached alignments or alignments in your History. Select desired Species. Click Execute. This new file will now have 1 FASTA alignment block per region with the total number of sequences equaling (# of regions) x (# of species). Dan Managadze, David (NIH/NLM/NCBI) [F] wrote:
Dear Dan,
I just wanted to follow up: have you received my emails yesterday with questions?
If not, then you can have a look at my questions here:
Question 1: I am trying to get the alignments of hg, panTro2, rheMac2. -Input data: I have initial 877 already merged (unique) regions (hg18); -I do multiple alignment and get 38,403 blocks; Now I need to get the alignments one per region (which should be close to that 877 value, right?). -I did "filter MAF blocks by species" and got 38,269 blocks; -I did "Join MAF blocks by Species" and got 5,098 blocks! pretty far from 877!
Could you please help me to solve this issue? What should I do to decrease the number of blocks to 877? The data can be found in my history item called "MAF, IR, NEW".
Question 2: I want to calculate pairwise distances between these alignments. Is it possible in Galaxy? I can not find it.
Sorry that I am boring you with my questions but can not find other way (or person to ask about).
David Managadze, PhD
Postdoctoral Fellow NCBI/NIH Bethesda, MD USA
On Dec 7, 2009, at 4:15 PM, Daniel Blankenberg wrote:
I noticed you are not looking at alignments between human and mouse, however the multiple blocks per region logic still holds true.
Thanks,
Dan
Daniel Blankenberg wrote:
Hi David,
Due to rearrangements etc. between the human and mouse genomes, there are often multiple MAF blocks which overlap a single region.
You can use the stitch MAF blocks tool, however, to create one FASTA alignment block per region. This will create 876 FASTA alignment blocks, each with 2 sequences.
As far as 'I try "Stitch MAF blocks given a set of genomic intervals" and I get 6,632! twice more than join! ', my guess is that you used the MAF file in your previous step as the input intervals (using implicit datatype conversion), resulting in twice as many 'sequences' (2 per alignment block) as input intervals (MAF blocks). Be sure that you are using your original bed file as the interval input.
Let us know if this isn't the issue you are experiencing,
Dan
-------- Original Message -------- Subject: Re: [galaxy-bugs] Galaxy tool error report from david.managadze@nih.gov Date: Mon, 7 Dec 2009 14:44:02 -0500 From: Managadze, David (NIH/NLM/NCBI) [F] <managdav@ncbi.nlm.nih.gov> To: Nate Coraor <nate@bx.psu.edu> References: <20091207190211.9D3171989F@coltrane.bx.psu.edu> <4B1D525F.4010804@bx.psu.edu> <13191C8B-2095-4F72-A23C-C6C97369FF1B@nih.gov> <4B1D5572.7060909@bx.psu.edu>
Thanks Nate,
Sorry, can I ask you another question (can not find in forums) ?
I am trying to get the pairwise alignments corresponding to my BED table. -Input data: I have initial 876 already merged (unique) regions (hg18), -I get 3,316 corresponding pairwise alignments (hg18, rheMac2)
Now I need to get the alignments one per region (which should be close to that 876 value, right?)
- I try "Join MAF blocks by Species" and get 3,316 blocks - I try "Stitch MAF blocks given a set of genomic intervals" and I get 6,632! twice more than join!
Am I doing something wrong? I can not find the mistake. Which tool to choose? Join or Stich? My guess is join because I already have an alignment but why does not it decrease the amount of alignments at all and why idoes it remain almost 4x more than initial regions?
Please help me to solve this problem (If you have an access to my history, it's in the one called "PAF"). Thanks,
David Managadze, PhD
Postdoctoral Fellow NCBI/NIH Bethesda, MD USA
On Dec 7, 2009, at 2:20 PM, Nate Coraor wrote:
Hi David,
Unfortunately there's no way to do this now, although there's a good reason: you can go back and undelete history items, so numbering would become very confused if you later undeleted an item.
--nate
Managadze, David (NIH/NLM/NCBI) [F] wrote:
Hi,
I have tried, it worked now :)
Can I ask you a quick question?
I am new in Galaxy. Sometimes I delete some of the steps in history and do something different, but the history number (I guess to remain unique) starts from the next one, not the one I deleted. So, I can have history items numbered like: 1,3,6,9,11 etc. instead of having 1,2,3,4,5.
Is it possible to "reset" the numbering of the history items so that it does not count deleted items? Not a big problem but still it would be nice to have such a possibility.
Thanks,
David Managadze, PhD
Postdoctoral Fellow NCBI/NIH Bethesda, MD USA
On Dec 7, 2009, at 2:07 PM, Nate Coraor wrote:
> Hi, > > A momentary cluster problem caused this error. Please try > submitting the job again. > > --nate > > galaxy-bugs@bx.psu.edu wrote: > >> GALAXY TOOL ERROR REPORT >> ------------------------ >> >> This error report was sent from the Galaxy instance hosted on the >> server >> "main.g2.bx.psu.edu" >> ----------------------------------------------------------------------------- >> >> This is in reference to dataset id 950377 from history id 307838 >> ----------------------------------------------------------------------------- >> >> You should be able to view the history containing the related >> history item >> >> 8: Extract Pairwise MAF blocks on data 7 >> by logging in as a Galaxy admin user to the Galaxy instance >> referenced above >> and pointing your browser to the following link. >> >> main.g2.bx.psu.edu/history/view?id=37b126e5445be2d0 >> ----------------------------------------------------------------------------- >> >> The user 'david.managadze@nih.gov' provided the following >> information: >> >> >> ----------------------------------------------------------------------------- >> >> job id: 731079 >> tool id: Interval2Maf_pairwise1 >> ----------------------------------------------------------------------------- >> >> job command line: >> python /galaxy/home/g2main/galaxy_main/tools/maf/interval2maf.py >> --dbkey=hg18 --chromCol=1 --startCol=2 --endCol=3 --strandCol=6 >> --mafType=PAIRWISE_hg18_rheMac2 >> --interval_file=/galaxy/home/g2main/galaxy_main/database/files/000/950/dataset_950316.dat >> --output_file=/galaxy/home/g2main/galaxy_main/database/tmp/job_working_directory/731079/galaxy_dataset_950377.dat >> --indexLocation=/galaxy/home/g2main/galaxy_main/tool-data/maf_pairwise.loc >> >> ----------------------------------------------------------------------------- >> >> job stderr: >> None >> ----------------------------------------------------------------------------- >> >> job stdout: >> None >> ----------------------------------------------------------------------------- >> >> job info: >> Unable to queue job for execution. Resubmitting the job may >> succeed. >> ----------------------------------------------------------------------------- >> >> job traceback: >> None >> ----------------------------------------------------------------------------- >> >> (This is an automated message). >> _______________________________________________ >> galaxy-bugs mailing list >> galaxy-bugs@lists.bx.psu.edu >> http://lists.bx.psu.edu/listinfo/galaxy-bugs >>
galaxy-lab mailing list galaxy-lab@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-lab
_______________________________________________ galaxy-bugs mailing list galaxy-bugs@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-bugs
participants (1)
-
Daniel Blankenberg