I am trying to come up with a nice workflow/tutorial for the use of Galaxy to search for Transcription Factor binding sites on a genome wide scale using pattern search tools. I want to train my students to think genomically and to use clever tools to leverage their abilities. Galaxy is absolutely awesome for grabbing the upstream promoter regions for all genes from any organism with a whole genome in UCSC. It is also possible to use the integrated EMBOSS tools such as fuzznuc and dreg to search for a known TFBS (or any other simple nucleotide pattern). However, I can't get past the simple search into a more clever infomation-based search. In particular I have the following workflow in mind: 1. Collect upstream regions for all mouse (or human) genes 2. Search for a published TF binding site with a single base mismatch using FUZZNUC 3. Make a multiple alignment of the sequences returned by FUZZNUC (not possible in any way that I have been able to find) 4. Make a logo from the alignment to identify informative positions and conserved substitutions (not in Galaxy) 5. Make a PSSM profile, HMM profile, or other smart searching tool from the aligned sequences (not in Galaxy) 6. Search the upstream regions again with this more sensitive pattern search method. (not in Galaxy). 7. Make a list of genes targeted with this TFBS, 8. Compare list of genes to microarray data showing co-regulation of this gene set, or to pathways I am frustrated at step 3. Even if I bring the FUZZNUC results to my desktop, there is no easy way to extract just sequences and make a multiple alignment. Many of the 'allowed' Fuzznuc optional output formats produce an error, or no useable output. Thanks for any suggestions. Stuart M. Brown, Ph.D. Associate Professor Center for Health Informatics and Bioinformatics NYU School of Medicine 550 First Ave, NY, NY 10016 stuart.brown@med.nyu.edu (212)263-7689 FAX (212) 263-8139 ------------------------------------------------------------ This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. =================================