line estimation for pileup generation
Hello, I am curious if the line estimation shown in the history window for pileup generation is at all accurate. I am using the pileup files to generate expression data from bwa mapping for looking at differential expression, but I am having some trouble understanding the line estimates. For example, for one pileup file, when I cut the reference id column and the number of hits column (columns 1 and 4), the number of lines in the cut file is about 25% that of the pileup file, and for another file it will be 5000%. How can the number of lines grow 50x when I am just cutting columns from the file? Shouldnt the line estimate be the same? Thanks, Austin
As a first step, please confirm an exact line count for the files. See the "Line/Word/Character count" tool in the Text Manipulation section to do this. If the estimate is significantly off, please share the history with me and I'll take a look to see what happened with those particular datasets. Thanks! -Dannon On Aug 25, 2011, at 6:08 PM, Austin Paul wrote:
===> Please use "Reply All" when responding to this email! <===
Hello,
I am curious if the line estimation shown in the history window for pileup generation is at all accurate. I am using the pileup files to generate expression data from bwa mapping for looking at differential expression, but I am having some trouble understanding the line estimates. For example, for one pileup file, when I cut the reference id column and the number of hits column (columns 1 and 4), the number of lines in the cut file is about 25% that of the pileup file, and for another file it will be 5000%. How can the number of lines grow 50x when I am just cutting columns from the file? Shouldnt the line estimate be the same?
Thanks, Austin ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Dannon, Thanks for telling me about that count tool. I had not used it before. So, it seems the line estimates in the history windows are a bit screwy. One pileup file I mentioned estimated ~4,000,000 lines and the count tool showed 988,000. And the other pileup file I mentioned estimated ~200,000 and the count tool showed 6,382,447. The lines totals on the cut files were off as well, but the count tool showed consistent numbers between the pileup files and the cut files, so I feel better. Thanks again. Austin On Thu, Aug 25, 2011 at 3:19 PM, Dannon Baker <dannonbaker@me.com> wrote:
As a first step, please confirm an exact line count for the files. See the "Line/Word/Character count" tool in the Text Manipulation section to do this. If the estimate is significantly off, please share the history with me and I'll take a look to see what happened with those particular datasets.
Thanks!
-Dannon
On Aug 25, 2011, at 6:08 PM, Austin Paul wrote:
===> Please use "Reply All" when responding to this email! <===
Hello,
I am curious if the line estimation shown in the history window for pileup generation is at all accurate. I am using the pileup files to generate expression data from bwa mapping for looking at differential expression, but I am having some trouble understanding the line estimates. For example, for one pileup file, when I cut the reference id column and the number of hits column (columns 1 and 4), the number of lines in the cut file is about 25% that of the pileup file, and for another file it will be 5000%. How can the number of lines grow 50x when I am just cutting columns from the file? Shouldnt the line estimate be the same?
Thanks, Austin ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Sure, no problem. Those estimates are indeed way off, ideally they're within about 10% of the actual count. Would you mind sharing the history with me at this email address so that I might take a look and figure out where the estimation went wrong? Thanks! -Dannon On 08/25/2011 06:33 PM, Austin Paul wrote:
Hi Dannon,
Thanks for telling me about that count tool. I had not used it before. So, it seems the line estimates in the history windows are a bit screwy. One pileup file I mentioned estimated ~4,000,000 lines and the count tool showed 988,000. And the other pileup file I mentioned estimated ~200,000 and the count tool showed 6,382,447. The lines totals on the cut files were off as well, but the count tool showed consistent numbers between the pileup files and the cut files, so I feel better. Thanks again.
Austin
On Thu, Aug 25, 2011 at 3:19 PM, Dannon Baker <dannonbaker@me.com <mailto:dannonbaker@me.com>> wrote:
As a first step, please confirm an exact line count for the files. See the "Line/Word/Character count" tool in the Text Manipulation section to do this. If the estimate is significantly off, please share the history with me and I'll take a look to see what happened with those particular datasets.
Thanks!
-Dannon
On Aug 25, 2011, at 6:08 PM, Austin Paul wrote:
> ===> Please use "Reply All" when responding to this email! <=== > > Hello, > > I am curious if the line estimation shown in the history window for pileup generation is at all accurate. I am using the pileup files to generate expression data from bwa mapping for looking at differential expression, but I am having some trouble understanding the line estimates. For example, for one pileup file, when I cut the reference id column and the number of hits column (columns 1 and 4), the number of lines in the cut file is about 25% that of the pileup file, and for another file it will be 5000%. How can the number of lines grow 50x when I am just cutting columns from the file? Shouldnt the line estimate be the same? > > Thanks, > Austin > ___________________________________________________________ > The Galaxy User list should be used for the discussion of > Galaxy analysis and other features on the public server > at usegalaxy.org <http://usegalaxy.org>. Please keep all replies on the list by > using "reply all" in your mail client. For discussion of > local Galaxy instances and the Galaxy source code, please > use the Galaxy Development list: > > http://lists.bx.psu.edu/listinfo/galaxy-dev > > To manage your subscriptions to this and other Galaxy lists, > please use the interface at: > > http://lists.bx.psu.edu/
participants (2)
-
Austin Paul
-
Dannon Baker