ViennaRNA uses dynamic programming to find minimum free-energy secondary structures or partition functions of RNA molecules. Furthermore, an inverse folding heuristic is implemented to determine sets of structurally neutral sequences.
Principles Underlying ViennaRNA
A secondary structure on a sequence
The relation
Finding the Minimal Free Energy
We can find the minimum free-energy using the following pseudo code:
for i in range(1, n):
for j in range(i+1, n + 1):
minimal_energy_if_paired(i, j) = min(
# if no base pairs interior to (i,j)
energy_of_hairpin(i, j),
# if there exists exactly one branch to the loop interior to (i,j)
min([energy_of_loop(i,j : p,q) + minimal_energy_if_paired(p, q) for p in range(i + 1, j) for q in range(p + 1, j)])
# for multiple branched loops
min([min_energy_of_multiloop(i+1, k) + min_energy_of_multiloop(k+1, j-1) + cc for k in range(i+2, j-2)])
#cc is the base term where free_energy = cc + ci * (number of interior base pairs) + cu * (unpaired)
)
min_energy_of_all = min(
minimal_energy_if_paired(i,j) + ci, # exactly one interior base pair
min_energy_of_all(i+1, j) + cu, #i+1 is unpaired
min_energy_of_all(i, j-1) + cu, #j-1 is unpaired
min([min_energy_of_multiloop(i, k) + min_energy_of_multiloop(k+1, j) for k in range(i + 1, j)]) #all other cases
)
minimum_free_energy(i,j) = min(
minimal_energy_if_paired(i,j),
[minimum_free_energy(i, k) + minimum_free_energy(k, j) for k in range(i+1, j)],
)
minimum_free_energy = minimum_free_energy(1,n)
What is a multiloop? How does this code work?
Dangling, terminal ends are not considered in this code.
Finding the Partition Function
The partition function
Then the partition function can be found by similar principles.
def maxwell_factor(x, T):
return exp(-x/RT)
for i in range(1, n + 1):
for j in range(i + 1, n + 1):
j = i + d
partition_of_subseq_with_paired(i,j) =
maxwell_factor(energy_of_hairpin(i, j), T) +
sum([maxwell_factor(energy_of_loop(p, q), T) * partition_of_subseq_with_paired(p, q)]) +
sum([partition_of_subseq_with_multiloop_left(i+1, k-1) * partition_of_subseq_with_multiloop_right*(k, j-1) * maxwell_factor(cc, T) for k in range(i+2, j-1)])
partition_of_subseq_with_multiloop_left(i, j) = sum([
(maxwell_factor((k-i)*cu, T) + partiton_of_subseq_with_multiloop_left(i, k-1)) * partition_of_subseq_with_multiloop_right(k, j)
for k in range(i+1, j)
])
partition_of_subseq_with_multiloop_right(i,j) = sum([
partition_of_subseq_with_paired(i,k) * maxwell_factor((j-k)*cu, T)*maxwell_factor(ci, T)
for k in range(i+1, j+1)
])
partition_of_subseq(i, j) = 1 + partition_of_subseq_with_paired(i,j) +
sum([partition_of_subseq(i, p-1) * partition_of_subseq_with_paired(p,q) for p in range(i+1, j) for q in range(p+1, j)])
total_partition = partition_of_subseq(1, n)
Solving Inverse Folding
Inverse folding is used to find sequences that form a predefined structure, or to search for sequences for which the probability of the structure being observed is greater than some threshold
The approach is to modify an initial sequence
To minimize the chance of getting stuck in a local minimum, and the amount of computation in calculating the number of foldings, the authors start from a hairpin (which constitutes a substructure) and finds the corresponding subsequence. The sequence is then elongated by 1bp until a loop is reached.