Hi Tania,

The data is RNA? Did you filter for properly mapped pairs first ('
Filter SAM')? I am guessing not, and that is where the error about single end data is coming from, and possibly other insert size values that are skewing the mean.

If you plan on using the Tuxedo pipeline, you might be able to skip Picard and just use sample of the data to obtain this number. The tool authors suggest just running Tophat on a sample (few hundred pairs), as explained in the link below. A mean (and other calculations) can be generated on any tabular column of data using the tools 'Group', 'Compute', or 'Summary Statistics' (use the tool search to find these in the tool panel):
http://tophat.cbcb.umd.edu/faq.shtml#mate_inner_dist

Hopefully one of these options works for you,

Jen
Galaxy team


On 2/10/14 8:03 AM, Dottorini, Tania wrote:

Hi,

I am trying to determine the mean inner distance between mate pairs, but encountered odd results. Briefly, to calculate the mean inner distance I mapped PE data with 2x100bp read length with Bowtie against the reference transcriptome, and used the Picard InsertSize Metrics to calculate the Mean insert size. I then, subtracted the combined insert size (2*100) from the Mean insert size value, thus obtaining the mean inner distance between mate pairs. In all the cases studied, 4 samples and 4 controls, I have always obtained negative mean inner distance between mate pairs values. In addition, in some cases I had the following error running the Picard Tool:

"Unable to find expected pdf file /galaxy/main/jobdir/006/471/.…../InsertSizeHist.pdf

…This always happens if single ended data was provided to this tool"

But in all cases I can confirm I provided paired-end NGS data. For same of the runs that gave me this failure log, I rerun them again but mapping to the genome instead of the transcriptome, and in this case it did work but always giving me negative distance values.

I would like to know if this is the correct procedure to be followed or if there are other approaches I can use to find these distance values and if I can eventually use such negative distance values in Tophat 

 

Thank you for your help

Tania



___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
http://galaxyproject.org