Talk:DNA digital data storage

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Information density[edit]

I have definitely seen a chart in a peer-reviewed paper, showing that DNA computers will have even higher information densities than quantum computers. This means that for most applications (perhaps not all applications, but most) they will make quantum computing a moot point. The fact that DNA computers will still have classical bits rather than QBits is also an advantage in itself, since one needn't relearn how to program from scratch. At any rate, I am trying to track down that chart again. The Mysterious El Willstro (talk) 06:03, 2 July 2013 (UTC)[reply]

It sounds like you could create a new article about this if you have the inclination to do so. Also, if you want to add a section in this article about DNA computers, I don't see a problem, as long as you provide a couple of reliable sources. ---- Steve Quinn (talk) 06:07, 2 July 2013 (UTC)[reply]
Well, there may be an article about this already. It's hard to say. See DNA computing. — Preceding unsigned comment added by Steve Quinn (talkcontribs) 2 July 2013
Update: Here is the chart I was talking about in my original comment. Link: http://www.kurzweilai.net/how-to-store-a-book-in-dna. The Mysterious El Willstro (talk) 06:23, 2 July 2013 (UTC)[reply]
Keep in mind that this is all very speculative. Information density is not the only important attribute of computation; DNA computing loses on speed, because of all the lab work that needs to be done to make the DNA and read it out. Quantum computing also the advantage that it can run certain algorithms that classical computers are not capable of. DNA computing does have advantages in that it can interact with the molecular and biological worlds, and it could open up lots of new applications there, but it is unlikely to render silicon or quantum computing moot. Antony–22 (talkcontribs) 01:20, 3 July 2013 (UTC)[reply]
Thanks Antony. ---- Steve Quinn (talk) 05:53, 3 July 2013 (UTC)[reply]
For now, Antony, yes, but from what I've read it should be possible in principle to increase the speed of a DNA computer from the early experiments in DNA computing as we know them now. When we eventually figure out how to wire a molecule directly into microcircuitry and then allow interface with the microcircuitry to make changes in the molecule, that is. Quantum computing has some pretty serious disadvantages of its own (the much greater learning curve in programming for one, but a few other things too) enough that it will most likely never be useful for consumer devices (according to a friend of mine studying quantum computing at UC Berkeley). Security codes and passwords for confidential government or corporate materials, on the other hand; that is a place where quantum computing might have some serious uses. The Mysterious El Willstro (talk) 22:51, 6 July 2013 (UTC)[reply]

Retrieval issues[edit]

It's not that hard to pour information into DNA without limit, but getting it back becomes problematic when you have meaningful amounts of data. I would love to see this article expanded to address some of these issues (I don't think I'm qualified to do so however). Inside a cell nature generally relies on random diffusion to "search" for things. Inside a bacterium for example it takes a tiny fraction of a second to find a match between a stretch of template DNA (your search query) and a matching location in the bacteria's genome. Not only is it fast, but it requires zero energy to do it! But the time for random diffusion goes up exponentially as the volume to be searched increases. You can probably fit most of humanity's information into a liter of DNA, but you won't be able to find a specific item in there before the heat death of the universe. So for DNA storage to be viable, you need to keep lots and lots of relatively small storage "cells" (thus drastically reducing the effective storage density) *and* you have to know in advance which cell the information you want is stored in (unless you have machinery inside each storage cell that can accept an external search request and synthesize the search primer in place and thus broadcast every search to every storage cell in the whole system). So you probably need a complete external index on the data in some more accessible storage technology, and you can't have whole-text search without making the DNA storage redundant. So the practical applications of DNA data storage may turn out to be much more limited than what is suggested by the raw information density of a flask full of DNA. 2601:246:4B01:81E9:C13E:B87C:CBBE:1B9B (talk) 22:04, 5 July 2018 (UTC)[reply]

updates to state of the art[edit]

Thanks for being so diligent in maintaining the DNA data storage entry. I am writing because it can be improved in several ways:

  • Results in 2015/2016 by UIUC and UW/Microsoft demonstrated the ability to perform random access on data stored in DNA. These results were published in major peer reviewed venue and are cited frequently in recent advances.
  • In 2016, Microsoft/UW encoded an HD Movie from the Ok Go band. It was part of the 200MB world record that was announced and later published in Nature Biotech in March 2018 (long editing lag!), the premiere scientific venue for biotech-related results.
  • There have been multiple major public research programs (DARPA Molecular Informatics and I-ARPA Molecular Information Storage Technologies) in the last year that could be featured, to show that there is building momentum.

The reason I care is that people that want to know more go to wikipedia, but as is it is not pointing to the really fast developments in this field.

I am happy to help offering more information and links if you want. I tried editing the page, but Jytdog keeps undoing it :), so it might better for us to agree on how to improve the article.

Zambraia (talk) 00:47, 1 September 2018 (UTC)[reply]

Thanks for your note! Wikipedia is actually a lagging indicator of notability, and because we are focused on transmitting accepted knowledge to readers (not "news") we often don't have the hottest news. There are also always holes.
Content here should really be driven not by the research papers themselves (what we consider "primary sources" here in Wikipedia) but from things like literature reviews or say book chapters, where experts in the field summarize what is important in the field.
The main section has a tag for "primary sources" - we need less of them, not more. We need to find reviews or book chapters. I've been wanting to do that, and have not time. If you are aware of such sources, would you please cite them?
Thanks again Jytdog (talk) 01:09, 1 September 2018 (UTC)[reply]

Thanks. Well, nature biotech paper has a significant component of literature review, since it makes the case of how the state of the art is being advanced in a significant way. For a journal of that caliber, the level of scrutiny and peer review is very high. So that is why I’m still puzzled by why referencing it and summarizing it in the wikipedia article is considered in inappropriate...

I am an active researcher in this area and am writing a broader review article for a scientific journal, and that is why I decided to update the wikipedia article. I am happy to help but having edits bluntly undone makes me feel I’m wasting time... :/.

Maybe I just don’t understand how wikipedia editing works. You seem to “own” this entry? So all edits need to negotiated with you? Not trying to be confrontational here just trying to understand since if no edits that are genuinely trying to make the article better seem to be shot down. — Preceding unsigned comment added by Zambraia (talkcontribs) 03:13, 1 September 2018 (UTC)[reply]

@Zambraia: Sorry that Wikipedia editors can be brusque sometimes! The problem is that Wikipedia articles are written with a different style than other kinds of academic writing. Think of it as a big pyramid. Original research articles are at the base as "primary" sources. "Secondary" sources in the middle include review articles, book chapters, perspectives pieces you find in Science/Nature, and articles in technical magazines like Chemical & Engineering News; these cite and comment on the primary sources. Wikipedia, as an encyclopedia, is a "tertiary" source at the top, based on citing secondary sources.
Citing secondary sources like reviews instead of primary ones makes for a better article, because secondary sources provide context and show how the original studies relate to other. Articles based only on primary sources tend to turn into lists of "this article said this, that article said that", with no indication of whether these are the most important results or how other research builds on or refutes it. Let me know if that make sense to you. Antony–22 (talkcontribs) 04:15, 1 September 2018 (UTC)[reply]
@Antony-22: Thanks for explaining, and it does make some sense. It still surprises me the reluctance to make the article better while waiting for some secondary source. The edits I had proposed weren't just laundry lists of primary sources. They were additions to the narrative. One about scaling the results in two primary sources in the article (Goldman in Nature and Church in Science) by *several* orders of magnitude! And one on results from the computer architecture fields on building on these. And as I said before, as an active researcher in this area, I am personally working on a review article for a major journal, and hence my motivation to update the Wikipedia article. But that will take many, many months until it gets published (scientific publishers are really slow...). So it is just too bad that what should be a crowd-sourced, high-update-frequency source like Wikipedia, gets gated by waiting for slow-to-publish secondary sources. I appreciate the attention and will stop bugging you two about this, let me know when you are ready to hear more about genuine updates on this and I am happy to help. User:Zambraia
See also WP:EXPERT - that is written to help academics adapt to this environment. Jytdog (talk) 18:37, 2 September 2018 (UTC)[reply]
@Jytdog: Thank you. Well, I do think this article can be a lot better by adding the recent developments in random access, etc. Here are some secondary sources for your consideration. Scientific American, IEEE Spectrum on Random Access. New Scientist on recent advances. Zambraia
@Jytdog: I read through the edit history and now I am even more confused about what you did to this article. It used to have more context and references to work predating Churcn and Goldman and good synthesis about many important developments. The article right now does not have a neutral point of view and rather just highlights the work from famous labs, and not the ones with more technical depth. I plan to add remarks using the secondary references above unless you object and would undo the edits anyways :). So please let me know before I spend more time on this. Thank you. Zambraia