The Surprising Predictability of Long Runs (2012) [pdf]

12 pointsposted 8 hours ago
by alexmolas

2 Comments

fastaguy88

an hour ago

One of the major breakthroughs in Bioinformatics was the recognition that local similarity scores (which can be thought of as runs of positive sequence similarity) are extreme-value distributed.[0] The logic of that discovery uses almost exactly the same mathematical argument as this paper [1], indeed I recognized some of the same equations.

It is difficult to overstate the importance of this discovery for biology, as today, the vast vast majority of protein functional inferences for newly sequenced genomes are based on the statistics of long runs of sequence similarity.

[0] https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html [1] https://www.pnas.org/doi/epdf/10.1073/pnas.87.6.2264

nuancebydefault

3 hours ago

I once saw on some website a chart with distribution of flat tire events. Often one does not encounter it in 10 years and suddenly 2 or 3 times in a year. Mathematically, chances of such distribution are quite high.