Wikipedia:Reference desk/Archives/Computing/2023 October 30

From Wikipedia, the free encyclopedia
Computing desk
< October 29 << Sep | October | Nov >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


October 30[edit]

How does Microsoft Word and Excel store data?[edit]

I mean, is it stored in arrays, or linked lists? Or binary trees, etc. And how about for simpler applications like Wordpad and Notepad? 170.76.231.162 (talk) 18:09, 30 October 2023 (UTC).[reply]

It looks like it uses Office_Open_XML format. RudolfRed (talk) 18:39, 30 October 2023 (UTC)[reply]
And that probably means that internally it represents documents as trees, similar to the Document Object Model of HTML documents. --Stephan Schulz (talk) 21:52, 30 October 2023 (UTC)[reply]
We can't really know without looking at the source code, and as these applications are closed source, I wouldn't expect a definitive answer here on Wikipedia. Or you have to reverse engineer it from a memory dump.
For Microsoft Word and Excel, the documents may be internally represented in a way similar to the original .doc and .xls file formats of 30 years ago, instead of the more modern xml-based formats. The people at LibreOffice and precursors have done a decent job at reverse engineering that format, so you might find some details there. For a text editor like Notepad, dealing just with plain text, there are many ways to do this. One needs a compromise between easy deletion or insertion of characters, easy jumping over many characters and memory use. For applications that have been around for decades, memory use may have been more important than today. PiusImpavidus (talk) 09:30, 31 October 2023 (UTC)[reply]
I don't work with Word. For Excel, you can easily see how it stores data. The xlsx file is a basic zip file. So, unzip it. You will see folders. Go into xl. You will see sharedStrings.xml, which are values shared among sheets. Go into worksheets and there is an xml file for each sheet. The format is easy to look through. The basic concept is that values are stored in a table of values, each with an index. The cells in the spreadsheet are given an index, which is looked up to get the value when it is displayed. You can also see how styles are applied in the styles.xml file. Overall, Excel is a collection of XML files all zipped up. 12.116.29.106 (talk) 13:03, 31 October 2023 (UTC)[reply]