Talk:Database storage structures

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Database means ondisk[edit]

I deleted “in memory or” in the intro because the term “database” refers to data that is typically stored on disk. For example, in the database article, in-memory databases are a special kind of database. Of course the data resides in memory at times, also on tape, flash, etc.Lenshapir (talk) 00:58, 9 September 2009 (UTC)[reply]

Why is Citation Needed for Uncontroversial Statement? (and what's this article for, really?)[edit]

It seems odd to me that there is a "citation needed" flag next to this statement:

 "Unordered storage....  Typically these retrieval times are better, however, as most databases use indexes on the primary keys, resulting in retrieval times of O(log-n) or O(1) for keys that are the same as the database row offsets within the storage system."

This does not appear to me controversial to me, and I have a hard time figuring out how a citation would be given for something so obvious.

It'd be like if someone wrote "citation needed" for 1+1=2. I do understand how a citation could be given for that statement -- I believe someone once wrote a book on a lengthy proof that 1+1=2. What I don't see is how a citation is needed in such a case. And if someone hadn't written a book proving that 1+1=2, I'm not clear how such a citation would even be provided. And if a citation were not provided, it would seem that Wikipedia would forever be implying, (because: "citation needed,") that 1+1=2 might be held in some doubt -- that we don't really know that in some meaningful way, because nobody cited anybody on it.

In the case of the unordered storage, -- if you have a heap, and you have handles to the keys that are the index at which the data is stored at, then isolation of the location of the data is an O(1) operation. It's because the data is at the location that the handle is, and the seek is O(1) in the main, or O(log-n) if you get technical about the amount of time it takes for the disk drive to reposition or for the specific RAM chip to be isolated against all of the other RAM chips that could be considered. ("In the main," these log-n times, even for larger values of N, are dwarfed by the time it actually takes to send the signal for the request and send the signal for the response with the actual data.)

The inefficient search time O(n) is based in the premise that the search is linearly searching for the data in the database, to see if the key matches or not.

Typical books on algorithms will outline these dynamics. What they will not do, I believe, is make the specific statement that in unordered data structures, when keys are the same as the rows, that you get this O(1) (or O(log-n)) search behavior. That seems too obvious to me, that book writers would write it. Or if it is NOT too obvious, that it will be a very long k*N search process to find such an entry, for a very large value of k.

I'm unclear what to do with the article:

  1. Find an article or book that makes the very specific point. This seems unlikely to me, though it is plausible that somebody will find such a thing and put it here.
  2. Rewrite the section, so that the point is sufficiently clear that nobody is tempted to doubt it.
  3. Rewrite the article, such that the distinction between ordered and unordered databases are not made. This seems plausible to me, because the most typical unordered storage system that I know of is a heap, and heaps here are described as ordered. (Retrieval is also O(1) when the client of the heap keeps a pointer to the location of the data in the heap. Only a search is inefficient.) I'm not even sure, what, really, an unordered database would be: Isn't a database that "stores records in the order they are inserted" specifically describing an order? (Namely, a chronological order.)
  4. Delete the article entirely. It's not totally clear what cut of describing databases this article is intended to illustrate. If it is literally detailing "database storage structures," then I can imagine radically different ways that the article would go. For example, it might be talking about how disk drives work and store data in sectors, cylinders, heads. Or it might be describing database structures like a digital container format such as Matroska. Right now it seems to be describing "algorithms for structuring data," generally, and missing quite a lot of them. For instance, a Spatial database could very well not be a B+ tree. I'm not clear what niche, specifically, this article is intended to address. Maybe it is supposed to be about, for example, the pragmatics of rotating disk-based databases, vs. in-memory databases. But that would be very different than what the article is, and bill itself differently. When the article Database links here, for instance, it only says "Various low-level database storage structures are used by the storage engine to serialize the data model so it can be written to the medium of choice. Techniques such as indexing may be used to improve performance. Conventional storage is row-oriented, but there are also column-oriented and correlation databases." That seems to me to suggest that digital container format for instance would be something described here, and this "ordered vs/ unordered" distinction (and performance measurements) would be scrapped.

— Preceding unsigned comment added by LionKimbro (talkcontribs) 00:19, 2 December 2018 (UTC)[reply]