teekert
19 hours ago
We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.
Yes it requires chopping the genome opening small(er) pieces (than with Nanopore sequencing) and then reconstructing the genome based on a reference (and this has its issues). But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).
Nanopore devices are truly cool, small and comparatively cheap though, and you can compensate for the error rate by just sequence everything multiple times. I’m not too familiar with the economics of this approach though.
With sbs technology you could probably sequence your whole genome 30 times (a normal “coverage”) for below 1000€/$ with a reputable company. I’ve seen 180$, but not sure if I’d trust that.
the__alchemist
2 hours ago
I guess this depends on the applciation. For whole human genome? Not nanopore era. For plasmids? Absolutely.
I'm a nobody, and I can drop a tube into a box in a local university, and get the results emailed to me by next morning for $15USD. This is due to a streamlined nanopore-based workflow.
Metacelsus
19 hours ago
>you can compensate for the error rate by just sequence everything multiple times.
Usually, but sometimes the errors are correlated.
Overall I agree, short read sequencing is a lot more cost effective. Doing an Illumina whole genome sequence for cell line quality control (at my startup) costs $260 in total.
jefftk
5 hours ago
> We are (still) firmly in the sequencing by synthesis era.
It really depends what your goals are. At the NAO we use Illumina with their biggest flow cell (25B) for wastewater because the things we're looking for (ex: respiratory viruses) are a small fraction of the total nucleic acids and we need the lowest cost per base pair. But when we sequence nasal swabs these viruses are a much higher fraction, and the longer reads and lower cost per run of Nanopore make it a better fit.
bonsai_spool
19 hours ago
> But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).
There is no reason for Nanopore to supplant sequencing-by-synthesis for short reads - that's largely solved and getting cheaper all the while.
The future clinical utility will be in medium- and large-scale variation. We don't understand this in the clinical setting nearly as well as we understand SNPs. So Nanopore is being used in the research setting and to diagnose individuals with very rare genetic disorders.
(edit)
> We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.
I also strongly disagree.
SBS is very reliable but it's common (if Toyota is the most popular car, does that mean we're in the Toyota internal combustion era? Or can Waymo still matter despite its small footprint?).
Novelty in sequencing is coming from ML approaches, RNA-DNA analysis, and combining long- and short-read technologies.
teekert
18 hours ago
I agree with you. Long reads lead to new insights and over time to better diagnoses by providing better understanding of large(r) scale aberrations, and as the tech gets better will be able to do so more easily. But is really not there yet. It’s mostly research and somehow it’s not really improving as much as hoped, I get the feeling.
celltalk
9 hours ago
This is wrong, a lot of diagnostic labs are actually going for nanopore sequencing since its prep is overall cheaper compared to alternatives. Also the sensitivity for related regions are usually matching qPCR, and it can give you more information such as methylation on top of that.
A recent paper on classifying acute leukemia via nanopore: https://www.nature.com/articles/s41588-025-02321-z/figures/8
The timelines are exaggarated but still it works and that’s what matters in diagnostics.
BobbyTables2
16 hours ago
I’ve always wondered how the reconstruction works.
It would be difficult to break a modest program into basic blocks and then reconstruct it. Same with paragraphs in a book.
How does this work with DNA?
__MatrixMan__
6 hours ago
You align it to a reference genome.
Its like you have an intact 6th edition of a textbook, and you have several copies of the 7th edition sorted randomly with no page numbers. Programs like BLAST will build an index based on the contents of 6 and then each page of 7 can be compared against the index and you'll learn that for a given page of 7 it aligns best at character 123456 of 6 or whatever.
Do that for each page in your pile and you get a chart where on the X axis is the character index of 6 and on the Y axis is the number of pages of 7 which were aligned there. The peaks and valleys in that graph can tell you about the inductive strength of your assumption that a given read is aligned correctly to the reference genome (plus you score it based on mismatches, insertions and gaps).
So if many of the same pages were chosen for a given locus, yet the sequence differs, then you have reason to trust that there's an authentic difference between your sample and the reference in that location.
There's a lot of chemical tricks you can do to induce meaningful non-uniformity in this graph. See ChIP-Seq for instance, where peaks indicate methyl markers which typically correspond with a gene that was enabled for transcription when the sample was taken.
If you don't have a reference genome then you can run the sample on a gel to separate the sequences of different length, that'll group by chromosome. From there you've got a much more computationally challenging problem, but as long as you can ensure that it's cut at random locations before reads are taken you can use overlaps to figure out the sequence, because unlike the textbook page example, the page boundaries are not gonna line up (but the chromosome ends are):
Mary had a little
was white as snow
lamb whose fleece was
Marry had
had a little lamb
a little lamb
was white
white as snow
So you can find the start and ends based on where no overlaps occur (nothing ever comes before Mary or after snow) and then you can build the rest of the sequence based on overlaps.If you're working with circular chromosomes (bacteria and some viruses) you can't reason based on ends but as long as you have enough data there's still gonna be just one way to make a loop out of your reads. (Imagine the above example, but with the song that never ends. You could still manage to build a loop out of it despite not having an end to work from.)
nextaccountic
an hour ago
If you broke a string into overlapping blocks you could easily reconstruct it. The key here is that blocks form a sliding window on the string
If blocks were nonoverlapping then yeah the problem is much harder, akin to fitting pieces of a puzzle. I bet a language model still could do it though
jakobnissen
10 hours ago
There are two ways: Assembly by mapping and de Novo assembly.
If you already have a human genome file, you can take each DNA piece and map it to its closest match in the genome. If you can cover the whole genome this way, you are done.
The alternative way is to exploit overlaps between DNA fragments. If two 1000 bp pieces overlap with 900 basepairs, that's probably because they come from two 1000 regions of your genome that overlap by 900 baswpairs. You can then merge the pieces. By iteratively merging millions of fragments you can reconstruct the original genome.
Both these approaches are surprisingly and delightfully deep computational problems that have been researched for decades.
vintermann
12 hours ago
They exploit the fact that so much of our DNA is the same. They basically have the book with no typos, or rather with only the typos they've decided to call canonical.
So given a short sentence excerpt, even with a few errors thrown in, partial string matching is usually able to figure out where in the book it was likely from. Sometimes there may be more possibilities, but then you can look at overlaps and count how many times a particular variant appears in one context vs. another.
One problem is, DNA contains a lot of copies and repetitive stretches, as if the book had "all work and no play makes jack a dull boy" repeated end to end for a couple of pages. Then it can hard to place where the variant actually is. Longer reads helps with this.
bonsai_spool
16 hours ago
This is very easily googled. There are new algorithmic advances for new kinds of sequencing data but this is the key (from the 70s)
Danjoe4
14 hours ago
Nanopore is good for hybrid sequencing. You can align the higher quality illumina reads against its longer contiguous reads
Onavo
18 hours ago
You can get it pretty damn cheap if you are willing to send your biological data overseas. Nebula genomics and a lot of other biotechs do this by essentially outsourcing to China. There's no particular technology secret, just cheaper labor and materials.
vintermann
11 hours ago
Can you trust it though? It'd be trivially easy to do a 1x read, maybe 2x, and then fake the other 28 reads. And it'd be hard to catch someone doing this without doing another 30x read from someone you trust. There's famously a lot of cheating in medical research, it would be odd if everyone stopped the moment they left academia (there have been scandals with forensic labs cheating too, now that I think about it).
gillesjacobs
9 hours ago
They save money by cheap labour and batching large quantities for analysis. For the consumer this means long wait times and potentially expired DNA samples.
I tried two samples with Nebula, waited 11 months total. Both samples failed. Got a refund on the service but spent 50usd in postage for the sample kit.