It’s time for another ROSALIND project update. In this blog entry I will explain the solutions of the last three solved problems. These have the IDs HAMM, PERM and LEXF. One of them was quite easy. The solutions for the other two took much more effort than the previous problems.
Additionally I changed some organizational stuff in the Visual Studio 2012 solution and refactored some unit tests and code.
Evolution is a sequence of mistakes. When a DNA strand gets replicated, there is a possibility for a mutation. To be more precise, a mutation is just a simple mistake during the creation or copying of a nucleic acid in the replicated DNA strand. The most common type of nucleic acid mutation is a point mutation. This replaces one base with another at a single nucleotide.
There is an easy way to detect different nucleotides between two DNA strands, which is the content of the HAMM problem. The Hamming distance between to strings is the number of corresponding symbols that differ in those strings. The implementation is very simple. I’ve just iterate through the symbols of one DNA strand and compare each symbol with the corresponding symbol on the same position of the second DNA strand. The following data is given and the shown result is expected, to solve this problem.
For every difference I increment a counter, which I return after the iteration. This is implemented in the
HammingDistance method of the
Dna class. But after I’ve written the code, ReSharper showed me the hint, that this code can be transformed to an equal LINQ statement. This statement is just a one liner and looks like this
Symbols.Where((t, i) => t != dna.Symbols[i]).Count(). Thats pretty nice! Thanks to ReSharper for this hint.
Another sort of mutation are the larger genome rearrangements. They move around huge blocks of DNA and have the power to create and differentiate entire species. The background for the PERM problem is to calculate the total number of permutations and a list of all such permutations for a given positive integer. Consider the following input and output data for an example.
1 2 3
1 3 2
2 1 3
2 3 1
3 1 2
3 2 1
The first line of the result is the total number of permutations. The list should contain all calculated permutations. I’ve solved this problem with a Heap Permutation. To be exact this is an implementation for an B-heap (binary heap). The following Gist will show my implementation. The Permute method is an extension method for the
IList<T> interface. So you can call this method on every collection that implements
IList<T> and directly calculate the permutations.
[gist id=5052484 file=BHeap.cs]
The associated unit test shows, that this implementation calculates the correct results.
The LEXF problem is some sort of subproblem when a collection of genetic strings needs to be cataloged. To solve this task, it is necessary to calculate all strings of a particular length, that can be formed from an given alphabet. This strings have to be ordered lexicographically.
The following input and output data demonstrates this with an example.
T A G C
The first input line is the given alphabet. The second line is and positive integer. This number determines how long the output combinations should be. The result in this example shows, that every combination for two of those symbols is calculated. This refers to a combination problem with repetition, because the symbols can be used multiple times.
The following Gist shows my solution for this problem. Again I’ve implemented an extension method, but this time for the
IEnumerable<char> interface with no generic type. Maybe I’ll implement a more generic solution but for this ROSALIND task the implementation was quite good.
[gist id=5052611 file=CombineWithRepetition.cs]
Changes in the project
Parallel to this three solutions, I’ve migrated from MSTest to NUnit as test driver for the unit test. The reasons are, that NUnit is much faster, have a very nice fluent syntax for the assertions and I’ve got a problem with the comparison of two nested Lists with MSTest. Under NUnit, these tests worked absolutely well. I think I’ll write another blog entry for this problem. The fact is, that NUnit is very fast and due to the fact, that I’m using ReSharper, there was no real problem in migrating or in the change of my workflow.
Additionally I’ve change the ROSALIND project page on my blog. The ROSALIND team provides more information about the status of each user so that I’ve decided to copy the information and images on the project page to show my current status and progress.
It was quite fun to solve the next three problems of the ROSALIND project. On the other side I need much more effort to solve these problems. The combinational tasks are more difficult than the first simple problems. But I think this is the beginning and there a much more much harder problems to solve. And I’m looking forward to solve them :).