Wikipedia:Reference desk/Archives/Computing/2021 June 6

From Wikipedia, the free encyclopedia
Computing desk
< June 5 << May | June | Jul >> June 7 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


June 6[edit]

Unknown characters display as symbols[edit]

Pounds sterling signs, dollar signs, euro signs etc. What's this called and do we (where is) the article on it, please? Much appreciate your help in advance! ——Serial 13:03, 6 June 2021 (UTC)[reply]

You probably mean Specials (Unicode block)#Replacement character. Some systems may instead show a box with the the Unicode code point of the unavailable character (in hexadecimal). -- Finlay McWalter··–·Talk 13:17, 6 June 2021 (UTC)[reply]
The dollar sign has been part of the 7-bit ascii character set since at least 1965, so I doubt it's a special.--Shantavira|feed me 15:23, 6 June 2021 (UTC)[reply]
You may want to look at this article by Marcin Wichary, which I have now cited in that article. Blythwood (talk) 21:07, 6 June 2021 (UTC)[reply]
HTML pages can use several character encodings. The de facto standard has become UTF-8, the Unicode code points encoded in 8-bit bytes. The <meta> element of HTML5 has a charset attribute, and the HTML source code of a page (including this one) will contain, in the head, something like <meta charset="UTF-8"/> But pages can use a different character encoding, in particular encodings used in MS Windows, such as Windows-1252. Such a page should then contain <meta charset="Windows-1252"/>. If a page does not have the charset attribute defined, browsers will mostly assume the page is UTF-8 encoded, but they may in fact be a legacy page using Windows-1252. As long as only ASCII characters are used, you won't see the difference, because the Windows encoding and Unicode agree on those, but for all other characters, including "curly" quote signs and most diacritics as well as other scripts than Latin, interpreting a Windows encoding as if it is Unicode results in gibberish known as "mojibake" (pronounced with four syllables: mo-gee-bah-keh).  --Lambiam 06:35, 7 June 2021 (UTC)[reply]