Exercises
Print Strings
Write a series of print statements that returns the following (include a blank line between each answer):
- Post hoc ergo propter hoc
- What’s up with scientists using all of this snooty latin?
'atgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgc'
. Do this using the * operator to make 15 copies of'atgc'
.- Darwin’s “On the origin of species” is a seminal work in biology.
string Functions
Use functions from the
string
module or from base Python to print the following strings.'species'
in all capital letters'gcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgg'
with all of the occurrences of'a'
replaced with'A'
- ” Thank goodness it’s Friday” without the leading white space
(i.e., without the spaces before
Thank
) - The number of
'a'
s in'gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg'
. - Print the length of this dna sequence
'gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg'
string Methods
Use string methods to print the following strings. Remember that methods work by adding the function to the end of the object name using a
.
, likemystring = 'Hello World' print mystring.lower()
'species'
in all capital letters'gcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgg'
with all of the occurences of'a'
replaced with'A'
" Thank goodness it's Friday"
without the leading white space (i.e., without the spaces before"Thank"
)- The number of
'a'
s in'gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg'
.
Long Strings
For the DNA sequence below determine the following properties and print them to the screen (you can cut and paste the following into your code, it’s a lot longer than you can see on the screen, but just select the whole thing and when you paste it into Python you’ll see what it looks like):
dna='ttcacctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgtgtgtctagctaagatgtattattctgctgtggatcccactaaagatatattcactgggcttattgggccaatgaaaatatgcaagaaaggaagtttacatgcaaatgggagacagaaagatgtagacaaggaattctatttgtttcctacagtatttgatgagaatgagagtttactcctggaagataatattagaatgtttacaactgcacctgatcaggtggataaggaagatgaagactttcaggaatctaataaaatgcactccatgaatggattcatgtatgggaatcagccgggtctcactatgtgcaaaggagattcggtcgtgtggtacttattcagcgccggaaatgaggccgatgtacatggaatatacttttcaggaaacacatatctgtggagaggagaacggagagacacagcaaacctcttccctcaaacaagtcttacgctccacatgtggcctgacacagaggggacttttaatgttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcggcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgggattattccccacaaagggagtgggattaggagctgcatcatttacaagagcagaatgtttcaaatgcatttttagataagggagagttttacataggctcaaagtacaagaaagttgtgtatcggcagtatactgatagcacattccgtgttccagtggagagaaaagctgaagaagaacatctgggaattctaggtccacaacttcatgcagatgttggagacaaagtcaaaattatctttaaaaacatggccacaaggccctactcaatacatgcccatggggtacaaacagagagttctacagttactccaacattaccaggtaaactctcacttacgtatggaaaatcccagaaagatctggagctggaacagaggattctgcttgtattccatgggcttattattcaactgtggatcaagttaaggacctctacagtggattaattggccccctgattgtttgtcgaagaccttacttgaaagtattcaatcccagaaggaagctggaatttgcccttctgtttctagtttttgatgagaatgaatcttggtacttagatgacaacatcaaaacatactctgatcaccccgagaaagtaaacaaagatgatgaggaattcatagaaagcaataaaatgcatgctattaatggaagaatgtttggaaacct'
- How many occurrences of
'gagg'
occur in the sequence? - What is the starting position of the first occurrence of
'atta'
? Report the actual base pair position as a human would understand it. - How long is the sequence?
- What is the GC content of the sequence? The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs) Print the result as “The GC content of this sequence is XX.XX%” where XX.XX is the actual GC content. Do this using a formatted string.
- How many occurrences of
GC Content 1
A colleague has produced a file with one DNA sequence on each line. Download the file and load it into Python using
numpy.loadtxt()
. You will need to use the optional argumentdtype = str
to tellloadtxt()
that the data is composed of strings.Calculate the GC content of each sequence. The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs). Print the result for each sequence as “The GC content of the sequence is XX.XX%” where XX.XX is the actual GC content.
Expected outputs for GC Content 1: 1Split Strings
You have a data file with a single
Expected outputs for Split Strings: 1taxonomy
column in it. This column contains the family, genus, and species for a single taxonomic group. You need to figure out how to split that information into separate values forfamily
,genus
, andspecies
. To solve the basic problem take a single example string,'Ornithorhynchidae Ornithorhynchus anatinus'
, split it into three separate strings using a Python command, and then print the family, genus, and species, each on a separate line.