Talk:OCR-A

Typography Mid‑importance

	This article is within the scope of WikiProject Typography, a collaborative effort to improve the coverage of articles related to Typography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.TypographyWikipedia:WikiProject TypographyTemplate:WikiProject TypographyTypography articles
Mid	This article has been rated as Mid-importance on the importance scale.

Articles for creation

	This article was reviewed by member(s) of WikiProject Articles for creation. The project works to allow users to contribute quality articles and media files to the encyclopedia and track their progress as they are developed. To participate, please visit the project page for more information.Articles for creationWikipedia:WikiProject Articles for creationTemplate:WikiProject Articles for creationAfC articles
	This article was accepted on 4 January 2008 by reviewer Graeme Bartlett (talk · contribs).

Someone?[edit]

"so someone undertook to create a free font"

Who?? —Preceding unsigned comment added by 208.10.44.2 (talk) 14:45, 3 April 2008 (UTC)[reply]

John Sauter. From the ReadMe.txt in the Debian source package.

A site license for the OCR-A font is very expensive[edit]

This claim is from John Sauter, the font's creator. I've removed the 'citation needed'. —Preceding unsigned comment added by 128.232.228.174 (talk) 21:15, 12 May 2008 (UTC)[reply]

Licensing[edit]

The font is not licensed under the GPL, since fonts are not subject to copyright in the United States. The font is free as in no cost. I will edit the article to make this clear.

The Debian packaging is licensed under the GPL. Here is the copyright file from that package. Note that my e-mail address has changed—it is now John_Sauter@systemeyescomputerstore.com.

This package was debianized by Gürkan Sengün <gurkan@phys.ethz.ch> on Wed, 2 May 2007 16:17:15 +0200.

It was downloaded from http://sourceforge.net/projects/ocr-a-font

Upstream Author:

   John Sauter <J_Sauter@Empire.Net>

License:

   Public Domain.

John Sauter (talk) 13:26, 22 October 2008 (UTC)[reply]

Spacing[edit]

This seems wrong:

   The font is monospaced, with the printer required to place glyphs 0.1 inch apart, 
   and the reader required to accept any spacing between 0.9 and 0.18 inch.

Should that be 0.09, not 0.9? --Mr z (talk) 15:42, 22 January 2009 (UTC)[reply]

You are quite correct. I could not find a citation on-line. I saw it in the ANSI documentation, but it is copyrighted and its owner does not permit it to be visible on-line. John Sauter (talk) 20:49, 26 January 2009 (UTC)[reply]

OCR-A Extended[edit]

I have raised the OCR-A Extended section to the top level rather than have it a subhead under Code Points, since it is a separate font. I do not have MS Office Publisher 2003 installed on my computer, so perhaps that is why I do not see the font samples correctly.

It would be better to render the characters of OCR-A Extended as images, so everyone can see what they look like. This is what was done with the characters of OCR-A. It would be sufficient, in my opinion, to show only the characters not present in OCR-A, and give their CP1252 and Mac Roman code points. John Sauter (talk) 10:57, 26 April 2009 (UTC)[reply]

I learned about a font that calls itself "OCR-A Extended" at this URL: http://www.de.newfonts.net/index.php?pa=show_font&id=142. Reading it into fontforge, I get these error messages:

 The following table(s) in the font have been ignored by FontForge
   Ignoring 'VDMX' vertical device metrics table
 Glyph 2 is called ".notdef", a singularly inept choice of name (only glyph 0
   may be called .notdef)
   FontForge will rename it.
 The glyph named mu is mapped to U+00B5.
   But its name indicates it should be mapped to U+03BC.
 The glyph named Omega is mapped to U+2126.
   But its name indicates it should be mapped to U+03A9.
 The glyph named Delta is mapped to U+2206.
   But its name indicates it should be mapped to U+0394.
 The glyph named fraction is mapped to U+2215.
   But its name indicates it should be mapped to U+2044.
 The glyph named fi is mapped to U+F001.
   But its name indicates it should be mapped to U+FB01.
 The glyph named fl is mapped to U+F002.
   But its name indicates it should be mapped to U+FB02.
 The glyph named periodcentered is mapped to U+2219.
   But its name indicates it should be mapped to U+00B7.
 The glyph named macron is mapped to U+02C9.
   But its name indicates it should be mapped to U+00AF.
 The glyph named foursuperior is mapped to U+F003.
   But its name indicates it should be mapped to U+2074.

The font appears to have many of the accented letters that are missing from OCR-A, but it does not have the OCR Hook, OCR Chair and OCR Fork characters. Is this font the same as the OCR-A Extended font that is distributed by Microsoft? John Sauter (talk) 17:47, 1 May 2009 (UTC)[reply]

None of the other font pages have the fonts displayed like this, so I have removed the font tag and panose information, and also edited the text to be clearer. I couldn't find a reference for the number of glyphs or languages present in the font anywhere. — M3TA info @ 23:20, 24 January 2010 (UTC)[reply]

Copyright Violation?[edit]

Someone marked the following text as a possible copyright violation:

"In 1968, American Type Founders produced OCR-A, one of the first optical character recognition typefaces to meet the criteria set by the U.S. Bureau of Standards. The design is simple so that it can be easily read by a machine, but it is more difficult for the human eye to read."

The footnote, labeled "background on OCR-A from Adobe", refers to the following paragraph:

"In 1968, American Type Founders produced OCR-A, one of the first optical character recognition typefaces to meet the criteria set by the U.S. Bureau of Standards. The design is simple so that it can be read by a machine, but it is slightly more difficult for the human eye to read."

Notice that in this two-sentence excerpt, two words have been changed. It hardly seems like a copyright violation to extract a two-sentence description, particularly when it is footnoted so there is no question of its source. In the absence of an objection by the copyright owner of this text, I regard the accusation of copyright violation as frivolous.

A google search for this text reveals several web sites that include it; perhaps it just needs more reference footnotes? John Sauter (talk) 20:24, 21 April 2010 (UTC)[reply]

Having heard no objections, I have added two references and removed the accusation of copyright violation. John Sauter (talk) 12:34, 25 April 2010 (UTC)[reply]

To John Sauter (and everyone else)[edit]

John Sauter has apparently removed my edit which I intended to note (though I now realize that that's the wrong template for this meaning) that I believe the text of the first sentence from "In 1968" to "eye to read" is perhaps a copy/paste copyvio from the (first) link provided and that since WP:NOR says "Best practice is to write articles by researching the most reliable sources on the topic and summarizing what they say in your own words" (emphasis added by me) and re:WP:COPYPASTE I think the text should be changed.

Also, even though this is unrelated to this specific page, does anyone know if there is a template that transmits the meaning of the slightly intrusive Template:Copypaste as the inline-superscripted-style comment?? —Preceding unsigned comment added by 207.65.109.10 (talk) 08:30, 11 May 2010 (UTC)[reply]

You are correct, I have removed your edit suggesting that the reworded two-sentence description is a copyright violation. Before doing so, I posted my reason here in the talk page, and waited for any response from you.

If I copy-and-paste a single letter of the alphabet from a copyrighted work, that is not a copyright violation. At the other extreme, if I copy-and-paste the complete text of a 500-page novel, that is a copyright violation. Where between these extremes copy-and-paste becomes a copyright violation is unclear, and reasonable persons may differ. I believe that the slightly reworded two-sentence description is not a violation of Adobe's copyright, and apparently some font vendors agree with me. See, for example, this web site: click on the “about” tab. John Sauter (talk) 21:15, 11 May 2010 (UTC)[reply]

Commercial links[edit]

There are a large number of links to off-the-page sales adverts from individual font vendors, some of these masquerading as references. I found this while cleaning up after an SEO / marketeer who had a significant number of consecutive edits adding links to a commercial site. Guy (Help!) 19:45, 30 May 2010 (UTC)[reply]

After reading the Wikipedia article on citation spam at WP:REFSPAM, I think this is a close call. The links provide useful background information about the font, in addition to the opportunity to purchase it. For example, links to commercial vendors are used when discussing the inconsistency in assigning code points to characters by those various vendors. The fact that the article contains a link to a free version also counts against a commercial motive. John Sauter (talk) 21:22, 30 May 2010 (UTC)[reply]

Exclamation Point versus Exclamation Mark[edit]

Someone changed the name of character U+0021 from Exclamation Mark to Exclamation Point. I am changing it back to Exclamation Mark because that is the name used in the Unicode code charts. See Unicode code charts: ASCII punctuation. John Sauter (talk) 22:07, 5 May 2011 (UTC)[reply]

alternate question mark[edit]

The alternate question mark as displayed here appears identical to the regular question mark. 72.229.42.246 (talk) 11:32, 4 July 2011 (UTC)[reply]

The regular question mark has a dot at the baseline; the alternate question mark instead extends the vertical line to the baseline. This is hard to see in the article, but if you click on the glyph you can see it easily in the magnified view. John Sauter (talk) 18:53, 4 July 2011 (UTC)[reply]

Skala code points[edit]

I've removed a subsection describing the code point assignments in Matthew Skala's version of OCR-A because most of the information given there was inaccurate. For instance, it describes the character he had encoded at U+00D6 (LATIN CAPITAL LETTER O WITH DIAERESIS) as "Latin Small letter o"; on examination of the font it's clear that the letter is a modified version of the capital O - as it should be for that code point - and not of the small o, which has a different shape. Similarly, the glyph at U+00DC has no tail at lower right - it is a capital glyph, correct for that code point - but the text I removed from the article claimed that it was a lowercase u and therefore incorrect. Other items on the list depend on assumptions that punctuation glyphs encoded at their official Unicode code points are "nonstandard" because of not being encoded at their legacy ASCII code points, or that the glyph that looks like a black box and is encoded as such should really be encoded as a nonprinting control character (which is an original interpretation of the standards, to say the least). The overall impression given was that the font contained many encoding errors; since that's both false and an original interpretation (the citation given was to the font itself, a primary source, not to anything supporting the claims of incorrect encoding), I think it's inappropriate for a Wikipedia article to go into this level of detail at all. Better not to list encoding errors unless we can both find them in a secondary source and present them accurately. Maybe the other "nonstandard encoding" subsections should be removed too; I haven't dug into their respective fonts in detail yet, but it's not clear that the article should contain this information for any fonts, correctly or not. 184.94.112.2 (talk) 01:31, 26 October 2011 (UTC)[reply]

It was not my intention to imply that Matthew Skala's rendition of OCR-A was in error—only that the code points he had chosen were not the same as those detailed in section 4. Since he used the same Metafont sources as I (and others) did, I assumed that his character shapes matched the glyphs in the tables. If his shapes for U+00D6 and U+00DC look like upper-case rather than lower-case letters, then we should present his glyphs to make this clear.

I suspect that U+25A0 (black square) is a better coding for Character Erase than U+007F (delete). Perhaps Matthew Skala's codings should be presented first, and the others as deviations from his choices.

As far as the citation is concerned, what better reference than to the primary source, which is the font itself, presented as a downloadable file? That reference permits anyone to verify the summary information in the article. How could a secondary source possibly be better?

I believe that there is value in noting the different code points used for some characters by the various sources of the OCR-A font. Users of these fonts will be made aware that the various sources code some characters differently, and will not be surprised when documents are printed incorrectly when they switch sources. John Sauter (talk) 01:58, 27 October 2011 (UTC)[reply]

I think it's problematic to present any set of codes as a base and then everything else in reference to it; with the best intentions in the world the "base" set of codes would inevitably end up getting inappropriate emphasis. A better approach might be to have a table with, say, the descriptions of the characters as rows and the fonts as columns, showing the code point used for each one without using any as a default. (Presumably limited to the characters on which any disagreement exists - there's no point regurgitating the entire ASCII set.) That kind of table might also be more convenient for readers because it makes it easier to recognize any consensus that may exist among the fonts. However, I still wonder about whether this information should be in this article in any form. It's important information, yes, but it also seems pretty squarely within Wikipedia's definition of "original research" because this kind of comparison is, as far as I can tell, published nowhere else but in this Wikipedia article. Wikipedia policy is that no information should be published only in Wikipedia: it all has to come from elsewhere, not only the individual facts but also any synthesis of them. So unless there's been a published comparison of code points in different OCR-A fonts, there shouldn't be one in Wikipedia even if the existence of one would be a valuable thing.

I think, also, that even if you and I agree that the presentation is neutral and appropriate, the fact that you seem to be (at least, you are using the same name as) the maintainer of one of the fonts in question, creates a huge opening for third parties to raise conflict of interest objections. Understand that I'm not raising that objection myself. My own suggestion, though I realize it'd create a fair bit of extra work for you and you're welcome not to do it, is that it'd be a good thing if you prepared a code point comparison (in whatever form you think best, including if that means using your own or Skala's or any other font as a base) and published it as an article in your own space - not on Wikipedia. You've clearly got the expertise as well as access to some commercial fonts that aren't available in free public space, so your own posting would be an appropriately authoritative secondary source that the Wikipedia article could then reference instead of including verbatim. 184.94.112.2 (talk) 02:48, 28 October 2011 (UTC)[reply]

Presenting a table of all characters on which there is disagreement over code points would be useful, but I fear that the table would be clumsy because of its size. That's why I presented as a series of lists, one per implementation. I do not have access to the commercial fonts; the lists are based on the public documentation of the fonts, and the references lead to that documentation. You are welcome to convert the information in the article into such a table, and judge for yourself whether it is better than the separate lists. Perhaps you can find an effective structure for the table that escapes me.

I also do not agree that Wikipeia policy forbids a synthesis of referenced facts, provided that the synthesis does not advance a point of view not clearly advanced by the sources. See your reference to Original Research for details about this rule. A presentation of the differences between implementations, without any statement that any of them are “bad,” meets this rule, I think. Based on the information in the tables the reader can judge the implementations for himself, of course. John Sauter (talk) 12:33, 29 October 2011 (UTC)[reply]

Refocus of article[edit]

I have refocused the article on Matthew Skala's implementation of OCR-A, rather than mine, and on Linotype's implementation, which is well documented. I have also fixed some obsolete links. John Sauter (talk) 09:39, 30 July 2015 (UTC)[reply]

Conflict of Interest suggestion[edit]

It has been suggested that I might have a conflict of interest because I produced and distributed a free implementation of the OCR-A font. I do not sell that implementation—it is freely available on SourceForge, including sources, as noted in the article. My employer at the time I produced and distributed that implementation appreciated the fact that it did not have to pay for an OCR-A font, but otherwise did not care what I did with it, and has received no benefit from my distribution.

I have no interest in OCR-A that conflicts with Wikipedia's aim, which is to produce a neutral, reliably sourced encyclopedia. John Sauter (talk) 12:48, 29 October 2011 (UTC)[reply]

Westminster?[edit]

The article claims that Westminster was "designed to be machine-readable" and has claimed this for a long time -- since 2008. Is there any proof this is true? Do any reliable references exist for this claim? Although I'm certainly no expert, it seems to me that (given Westminster's proportionally-spaced design) it was not designed to be machine-readable, but merely to visually mimic the E-13B MICR font [which is machine-readable]. The only reason I didn't go ahead and remove it was that I wanted to ask those of you who have more expertise than I do whether there's any proof it actually was designed to be machine-readable. Additionally, the article on Westminster is woefully devoid of references. Microsoft's page about Westminster says it was "made ... along the same lines" [as E-13B] but does not say that Westminster is, nor that it is not, designed to be machine-readable. Does anyone have any thoughts to add? -- 12.218.76.10 (talk) 06:15, 29 May 2013 (UTC)[reply]

reference to IDAutomation[edit]

The reference to the IDAutomation implementation of OCR-A has been flagged to say that the information on code points is “not in citation given”. The information is visible in the referenced page if you click on the “Product Information” tab. If there is no objection I will improve the reference to make this clear, and remove the “not in citation given” tag. John Sauter (talk) 04:59, 8 December 2015 (UTC)[reply]

Thank you, John! Actually, we don't even need any words of clarification, it already links to the correct tab when we add the anchor #product-information. — Sebastian 05:13, 8 December 2015 (UTC)[reply]

Linotype[edit]

The links to Linotype broke, and while I was fixing them I noticed that their two OCR-A fonts do not include either the alternate glyphs or the OCR characters (U+24xx). I can understand omitting the alternate glyphs, but omitting the OCR symbols from the an OCR font does not seem reasonable. They claim to have 251 glyphs in the OCR-A Extended font, but I count only 248 in the character charts, and that includes the space character. Does anybody know what is going on with Linotype? I hope their web site is wrong and they actually sell a useful OCR-A font, in which case this article should describe what they sell.