Five-minute rule

From Wikipedia, the free encyclopedia

In computer science, the five-minute rule is a rule of thumb for deciding whether a data item should be kept in memory, or stored on disk and read back into memory when required. It was first formulated by Jim Gray and Gianfranco Putzolu in 1985,[1][2] and then subsequently revised in 1997[3] and 2007[4] to reflect changes in the relative cost and performance of memory and persistent storage.

The rule is as follows:

The 5-minute random rule: cache randomly accessed disk pages that are re-used every 5 minutes or less.

Gray also issued a counterpart one-minute rule for sequential access:[5]

The 1-minute rule: cache sequentially accessed disk pages that are re-used every 1 minute or less.

Although the 5-minute rule was invented in the realm of databases, it has also been applied elsewhere, for example, in Network File System cache capacity planning.[6]

The original 5-minute rule was derived from the following cost-benefit computation:[4]

BreakEvenIntervalinSeconds = (PagesPerMBofRAM / AccessesPerSecondPerDisk) × (PricePerDiskDrive / PricePerMBofRAM)

Applying it to 2007 data yields approximately a 90-minutes interval for magnetic-disk-to-DRAM caching, 15 minutes for SSD-to-DRAM caching and 214 hours for disk-to-SSD caching. The disk-to-DRAM interval was thus a bit short of what Gray and Putzolu anticipated in 1987 as the "five-hour rule" was going to be in 2007 for RAM and disks.[4]

According to calculations by NetApp engineer David Dale as reported in The Register, the figures for disc-to-DRAM caching in 2008 were as follows: "The 50KB page break-even was five minutes, the 4KB one was one hour and the 1KB one was five hours. There needed to be a 50-fold increase in page size to cache for break-even at five minutes." Regarding disk-to-SSD caching in 2010, the same source reported that "A 250KB page break even with SLC was five minutes, but five hours with a 4KB page size. It was five minutes with a 625KB page size with MLC flash and 13 hours with a 4KB MLC page size."[7]

In 2000, Gray and Shenoy applied a similar calculation for web page caching and concluded that a browser should "cache web pages if there is any chance they will be re-referenced within their lifetime."[8]

References[edit]

  1. ^ Gray, Jim; Putzolu, Franco (May 1985), The 5 Minute Rule for Trading Memory for Disc Accesses and the 5 Byte Rule for Trading Memory for CPU Time (PDF)
  2. ^ Gray, Jim; Putzolu, Gianfranco R. (1987), "The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time", Proceedings of the ACM SIGMOD Conference, pp. 395–398, CiteSeerX 10.1.1.624.3312, doi:10.1145/38713.38755, ISBN 978-0897912365, S2CID 10770251
  3. ^ Gray, Jim; Graefe, Goetz (1997), "The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb", ACM SIGMOD Record, 26 (4): 63–68, arXiv:cs/9809005, doi:10.1145/271074.271094, S2CID 21524661
  4. ^ a b c Graefe, Goetz (2007), "The five-minute rule twenty years later, and how flash memory changes the rules", DaMoN '07: Proceedings of the 3rd international workshop on Data management on new hardware, pp. 1–9, doi:10.1145/1363189.1363198, ISBN 9781595937728, S2CID 14991801 Free version in ACM Queue, September 2008.
  5. ^ René J. Chevance (2004). Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, Storage Solutions. Digital Press. p. 542. ISBN 978-0-08-049229-2.
  6. ^ Gian-Paolo D. Musumeci; Mike Loukides (2002). System Performance Tuning. O'Reilly Media, Inc. p. 263. ISBN 978-0-596-55204-6.
  7. ^ "Flash and the five-minute rule • The Register". The Register.
  8. ^ Jim Gray, Prashant Shenoy, "Rules of Thumb in Data Engineering", MS-TR-99-100