Hello again,
I am trying to randomly select sequences from an uploaded fasta file, but only about one-half of the randomly selected sequences actually contain sequence data (see below). The others contain only the name of the sequence. This happens even after making sure that in the initial file all of the sequences indeed have sequence data (by filtering to obtain only sequences with >100bp).
Any suggestions?
This is what the output looks like:
>scaffold1034 2.1 >scaffold1085 1.7 >scaffold1499 1.2 CCTTTGGATGTCACACATGTGCCATCCCGTAGCATTCTTAAAAGAAGGCTAAGGAGCAATTTGCGATTCCCAGTCTAAGTCAATTTACTGTTCGATTTTANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGACCTGATAATACTGTGTACACAATGAGAGCGACTTAATGCTTCATCATATGAAGAACTGTAGGCCATTTTTTCTAATCAAGTTTGTGGCGGATTCATCATAGCTGCTATTGGTGACAATTCTTT CTAAGGTTGCTAGAAATAGTGATGTGGAACACAAGTGCTGCAGGTCATTGCATGTCTCAATCAGTCGTTTGCTTCTCAAAACACGGGCTGTAGGAAGCGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCGAGTTATTGCCATCCTAATTTATCATTTCGTGCGCGATATATATCGACTTTTTTTCGTCCTGTCTGTGCCTCTCCTGCGAAGGCGCCATTCTAATCCCTGCGCGTGACGGCAGATTGACATGACCTCAAG CAACCTGAACACCCCTATCCCAAATATACTTGAGTCCCCCCTGCACATCCTCCGCATTACTCATGATAACCTGCCACATTGTTCATGGTAGCCCTTTAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCAGACATTACCTAGGATGATGTTTCTAATCGTAGCAAAAATTCTTGTAAATGACGTTCCAGTTGTTTA CAACCTAAAATTACACACATTAAAACTGCTGGCTAGAATTTACATTGAAACATTAAGATATATTACAAAATATGGACAAATAAATTCGTGACAAATATATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTGTTTGCGATACAAGGAAGGGCTTACGCAAAAATTTCCCAAGAAACCATGTGCGATGAGAAGCGAAACAGTAACTACAGGATTTCTTACCCATTAATTGCTCATTTCTCTAATCTGCATTTCCGTTGATCAATTT >scaffold2897 2.0 >scaffold2930 3.0 TCCTAAACGTACATATTTCAAACAAAGATGTTTGAAGCCTTAGCAATATTTAGATGACTTTGCTTTCAAGGGTTCTTTTCGTCACCTTTTGTCATCTGCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTACCAACGTTAAAAGAGTTGATTAGTAAGAGTGATATGCCCCTCGATCAAAAGCATCCCCCGGATATTTCAGCGACAGGGCAGCGGAACTT CCAGAAGTGTAGAGTTGATTTAGTTTTGTTGCAAGGGCTCCCAGATTAATTAGTAAGATATCAAAGTAAATAATAACATAGTTTTTACTCAAAGCAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGATTCTCGAAGCTGTTATTTCATTTTTTCCTATGAATTTTTTTAAGTTTTGAAGTATACTTTCGATTTTTGTAAAGCGGGCTATAATCTCGAGGGAAATATTCTAAAGCTGGGTGAAAAATTATTCCATCTCTTGGAAAATATGATTGAAAGGTTCCGTTCGGCAAAGGGTTATCCCTCTTCGGAGTAGCTCTGTTATGAGATGGTTCAACGTTATGTCATTTTTCATTTTCACTGCAGGAGG GAAGACGTTTCACTGATATATGCATTGCCCTCTGTCACATCGAATCACTGTATGATAATGCCCACGAGAAAAGATAACCCCATCCCAAACTTTTTATAGATGTAACAAATGGTTGAAGAAATTTGGCTATTCATGCTATTATGTTCATCTCTTTAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGTTCATCTCTTTAAAGCTTTCTTTCAGGGAAAAATTCCCATCCTCCAAAATATTCTCACACCGTACCGGGAGCAACAGGAAGGCACATCGACGTTTTATTTGGCACAATGAAAAAATATCACCAGCAATTTCTATTATAACTTGGAGCCTTGTTCTCTGGATTTATTGAGGC GTCGGAATTTTTCCTGGATCCACCCCTGGATACTTCATGCAGATCAACTTACCCATGCTTTAGCTGTGAGGAATTTATGAAGATTTTCATTGAAAGACATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCAAGAAGTCAGGATCAAATCTTTCAAAATGCTTGCTGAATGTAAACTTCTCAGGCAAGGGCTGAATAG GCCACAACACCTTACTGCGACTCCATGCTGTACAACGAATTATTGCTGCATTTAATCATTGTGACCATGATTTAATTCACATCGCACCTCCAAATTTGGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGGTAGCAGATTCAAAGCAATTTGCCAGCTTGCACATGTGATGAAAGGATGTGAAGCGATGGTTTCAGTTTCAACTACTAATTGCTGTAAACAATGCTAATAATTTTTCACAATTACTGCTGCACAGCCACAGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGACTCGTAAGAGAGAGTGCAGTCTCATTTTCAGCGACAAGTCCAAAGAGGCACTTGAGTTTCCCAGGACAAAAAAACACAAAAAACAGAAGATATTCAAAAATGCGAAATGAATTATTTCTTAATTGCTATCCACCAAAGTACCAACATATCAACATTAATGTGCTAGGTCAATCGTTTTTTACATTCCTCAAGGTTATG ATTGATAAAACTGTAAGGACTCTTCTTATACACAAAGGGCGTTTAATCTCTTTCAAGAGTTACACGACATTAGTTTTCAAGCAAATATAAAAGATTTTCANNNNNNNNNNNNNNNNCGATTTCTCACGGGTGAGAGATAGCCTCTGCACATTTATTAGCATGGGTTTGAAAATCTTTAGCATTTGTTTTAAAATCTATGCCTTATAACTATTGAGAGATGTAAAACGCCCATCGTGTATTGTTTTCTGTGAGGGAAGTCCTCGGGATTTGATCAACGTCTTAGGCCCTTTTTCAGTCATTTTGCAATCAT >scaffold4235 1.6 While the original file looks OK: >scaffold2 2.0 CAGATGATTCAAACAAAGAGACTGAAGAAATGACTTCATCCGCTGACATAGAGAAAAAGGGCAATGAGAGTGTACATGCAGACTTAGTTATGCAAATAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCCGAGAGCTTTGCAATGGGTAGTTGCCGTTCATTAACTGATATACTTGCCAAATTTAGTGAATTCCGT >scaffold7 2.5 CTCAAACTGGTTTGAAATATTTAAAATTTCTCCCCGATCTGAGTTGAACTCGGTGTCTGGTCTAAGCCGTTAGAGTGTTTACATGGAAAATGCAACTCAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGAAGGTGTGAGATGCAGTGTCTCGGAAGTCTGATTTAGCCTAGTGTTTTACTTGGCGCTCAATGCATGA >scaffold5 3.1 TTCTCTTCTCAACCCTCATTACGCAATCAGTAACCTTCTTCTTGGTCAACCCTGGACCATCGATGCAGCCAGTGAACGATTCAAACAAGGTAGTGTATTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCTTATTTACAGCTTTCAAGATGTCCTCTGGCTTAACCTTGAATTCATCTGAGAGAAATCCAAATCCTGTCCCGAAG >scaffold9 2.0 CTTTTACTTCAGGAGAAAATAACCTTTCAAACATCGTGCATTCTTTCTTACTCATAAGGTATAGATAGCTCTTTGTAATAATTCATACGTTCTCTTTCACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTAGTTTGTCTTATCTTAACAGCTCTTGTTACATAGATATATTTGGGAAGGAGTCGGTCAGTCAAGTTTG >scaffold10 2.4 GTTGTCTCCTGAAATCATAAATTAGTATCATCATTATCATCATTATCATCATTATTATTTTCAAGGAAATATTTGGTCTAAACATCATTAAGATTTCAACNNNN Thanks Daniel
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Daniel Sher, PhD Department of Marine Biology Leon H. Charney School of Marine Sciences University of Haifa, Mt. Carmel 31905, Haifa, Israel Office +972-4-8240731 Lab +972-4-8288961 email: dsher@sci.haifa.ac.il