User:Shlomital/Pngtutorial

A Concise PNG Tutorial[edit]

The purpose of this relatively brief tutorial is to help people make the most of the PNG image file format.

Most people who know of the PNG file format think of mainly two features: its better compression in comparison to GIF, and its provision for an alpha channel (that is, variable transparency). However, the PNG file format offers many more features. This tutorial will elaborate on the PNG colour types and chunks.

Colour type and bit depth[edit]

PNG images have five colour types:

Greyscale (type 0). Every pixel in such an image has the same value for its Red, Green and Blue intensities.
Truecolour, or RGB (type 2). Each pixel may take any value for its Red, Green and Blue intensities within the range (the bit depth).
Greyscale with alpha (type 4). In addition to the grey value, each pixel can have a value determining how much of the background shows through it.
Truecolour with alpha, or RGBA (type 6). As above, but for truecolour images.
Indexed, or paletted colour, or colourmapped (type 3). To save on file size and application memory, the possible colour values each pixel can take are stored in an index at the beginning of the file, and referenced in the image data portion of the file. The price is fewer possible colours or transparency values than with the other types.

The number of colours is determined by the number of bits given to each pixel. The number can only be a power of two (1, 2, 4, 8 or 16), and even then, not all of those numbers are allowed in all the colour types.

Here is a breakdown of the combinations of colour types and bith depths:

Greyscale[edit]

All bit depths are possible in this mode, ie the bit depth can be 1, 2, 4, 8 or 16.

Bit depth 1 means an image containing only black (Web hex triplet #000000) and white (#FFFFFF).
Bit depth 2 means an image with greys in intensities of zero (black, #000000), one third (#555555), two thirds (#AAAAAA) and full (white, #FFFFFF).
Bit depth 4 means an image with greys in 16 intensities, from zero to full in steps of one-fifteenth (#000000, #111111, #222222 and so on until #FFFFFF).
Bit depth 8 means an image with 256 shades of grey, the type of image usually thought of as “greyscale”.

Bit depths 2 and 4 are rare; I haven’t found them in the wild, and I’ve managed to produce them only on purpose, by using a program called ppmdist (of the Netpbm tools), which takes the total number of colours in an image and distributes them in equal intensities. In any case, encoders are required to be able to read them, and I have not encountered any problem in that regard so far.

Bit depth 1 for greyscale and for indexed colour produces the same result (but read about a slight difference later), so it depends on the image editor which one you get in the end.

Bit depth 8 for greyscale and for indexed colour (if the latter has only values in which Red, Green and Blue are the same) produces the same result. Some applications may have trouble coping with the greyscale colour type. A few months ago I handed off a greyscale PNG to digital photo development and it came out freaked (rainbow colours); when I handed out the same image but converted to indexed colour, it worked.

Truecolour[edit]

Bit depth 8 or 16 is valid for this colour type. Truecolour has three colour channels for each pixel—Red, Green, Blue—so the total number of bits for each pixel is calculated by multiplying the bit depth by three.

Bit depth 8 means 256 intensities for each channel, 24 bits per pixel and 16,777,216 colours available. For the human eye, and if you use a normal-gamut colour space like sRGB (more about that to follow), 8-bit RGB is quite more than we can discern.
Bit depth 16 means 65,536 intensities for each channel, 48 bits per pixel and 281,474,976,710,656 colours available. For regular use this is overkill; however, for doing lots of histogram or levels operations on the image or working in a wide-gamut colour space like Adobe RGB, 16-bit RGB is a must if you want to avoid banding effects (also known as posterisation) on the image. The drawbacks are file size and the need to have an application that supports working with 16-bit RGB.

Greyscale with alpha[edit]

Bit depth 8 or 16 is valid for this colour type. Greyscale with alpha has two channels for each pixel: grey and alpha.

Bit depth 8 means 256 shades of grey plus 256 levels of transparency (16 bits per pixel). That’s for more frequent use.
Bit depth 16 means 65,536 shades of grey plus 65,536 levels of transparency (32 bits per pixel). May be beneficial for multiple operations or wide gamuts; though, images with alpha channels tend not to be in such scenarios, since they’re usually not naturalistic.

Truecolour with alpha[edit]

Bit depth 8 or 16 is valid for this colour type. Truecolour with alpha has four channels for each pixel: Red, Green, Blue and Alpha.

Bit depth 8 means 16,777,216 available colours plus 256 levels of transparency (32 bits per pixel).
Bit depth 16 means 281,474,976,710,656 available colours plus 65,536 levels of transparency (64 bits per pixel).

Indexed colour[edit]

Bit depth 1, 2, 4 or 8 (but not 16) is valid for this colour type. There is no concept of a “channel” in indexed-colour images, and the bit depth also signifies the number of bits per pixel.

Bit depth 1 means 2 possible colours. Unlike in 1-bit greyscale, these may be any colours you choose out of the 16,777,216 colours available in 8-bit truecolour. 1-bit greyscale permits only black and white.
Bit depth 2 means 4 possible colours. Not a lot of images use this bit depth (only 3 or 4 colours are its bounds). A graph with a white background, black grid and one or two lines (usually red and blue) is an example. Screenshots from CGA games (which had 4 colours in the 320×200 graphics modes) are another common case.
Bit depth 4 means 16 possible colours. This number of colours is usually the minimum for an image depicting something from real life, as well as the number of colours in a lot of old computer screenshots.
Bit depth 8 means 256 possible colours. This is the number of colours needed for realistic but not naturalistic scenes, such as comics and logos.

Image data (IDAT)[edit]

The image data, the actual pixels, are stored in the IDAT chunk. These are almost always compressed, though lots of programs have an option of using zero compression if you really want (for, say, supersensitive archives).

The image data can span multipled IDAT chunks. The whole image is derived from the union of all these chunks. Multiple IDAT chunks do not offer error correction, only error detection. If one of the IDAT chunks is corrupted, all the image from that point onwards is. The benefit of multiple IDATs is that each chunk begins with its own error-detection code (the CRC checksum), so that you don’t have to undergo the frustration of loading a 10 meg PNG to find out only at the very end of the download that it’s corrupted.

Palette index (PLTE)[edit]

The index for indexed-colour PNG images is stored in the PLTE chunk. Each palette entry is an 8-bit truecolour entry (16,777,216 possible colours for an entry). There cannot be more entries than the bit depth allows, but there can be less.

A PLTE chunk can also appear in truecolour and truecolour-alpha (types 2 and 6) images, as a suggested palette for display systems that can’t display truecolour. However, it’s better to use the sPLT (suggested palette) chunk instead, which was designed for that purpose. The sPLT chunk is also good for getting around some limitations of the PLTE chunk.

Transparency (tRNS)[edit]

For variable transparency, the alpha colour types (4 and 6) are necessary only if there are very many colours. For images of fewer colours you can use indexed colour and indicate transparency values with the tRNS chunk. The tRNS is a lookup table in addition to PLTE; it indicates, for each entry, what its transparency value is. For example, if entry 4 in PLTE has a value of #6080C0 and tRNS entry 4 has a value of 50%, then any pixel in the image referencing palette entry 4 will be semi-transparent.

The tRNS chunk can also be used for greyscale and truecolour (types 0 and 2) images, for indicating a single colour value being fully transparent. This isn’t very useful, because it may result in colour fringes when composed against a background (see Transparency (graphic) for more information).

Colour space information[edit]

It is often thought that RGB values like #FF0000 are set in stone. Anyone who’s worked on a number of computer monitors knows it isn’t so, and that one monitor’s #FF0000 (red) may look like another monitor’s #800000 (maroon). The colour space chunks in PNG images do not remedy this situation entirely, for you still have to calibrate your monitor, but if you have done so, images should look the same across various monitors and computing platforms.

RGB values are relative, just as a scale of 0–100 on a thermometer doesn’t mean much. With colour space information, the RGB values are tied to absolute colour-measurement, just as the Celsius scale is made absolute by stating that 0–100 signifies hundred equal parts between the freezing point and boiling point of water.

There are four colour space chunks:

gAMA, for gamma correction. This is a complex subject, so see the link for more. The end result is the brightness of the image.
cHRM, for chromaticity values. The cHRM chunk maps the RGB values according to the absolute CIE 1931 color space.
sRGB signifies that the image is to be interpreted according to the gamma correction and chromaticity values of the sRGB colour space, and also adds the ability to specify a rendering intent.
iCCP is an embedded International Color Consortium colour profile, offering detailed control for professionals.

When to use which[edit]

If the image has only black and white, you don’t need any of those chunks.
If the image has only the eight fully-saturated RGB colours (black, blue, green, cyan, red, magenta, yellow, white: #000000, #0000FF, #00FF00, #00FFFF, #FF0000, #FF00FF, #FFFF00, #FFFFFF), then you need cHRM but not gAMA.
If the image has only shades of grey (Red, Green and Blue intensities identical for each pixel), then you need gAMA but not cHRM.
If the image has colours with more intensities than just zero and full, then you need both gAMA and cHRM.

The sRGB and iCCP chunks do the work of both gAMA and cHRM.

gAMA[edit]

PCs have a monitor gamma of 2.2, Macintoshes have a monitor gamma of 1.8. Without gamma correction, an image created on a Mac could look too dark when viewed on a PC, and an image created on a PC could look too bright when viewed on a Mac.

If you have created the PNG image on a Mac, put a gAMA chunk with a value of 0.55555 (or an Apple RGB colour profile in iCCP).
If you have created the PNG image on a PC, put a gAMA chunk with a value of 0.45455 (or an sRGB chunk).

cHRM[edit]

Since the human eye is less sensitive to differences in chromaticity than in intensity, getting the chromaticity values right is less important than doing gamma correction; still, if the chromaticity values differ a lot (as with, for example, sRGB compared to Adobe RGB), the result can be seen. Note also that a lot of applications honour the gAMA chunk, but few honour chromaticity values.

For sRGB chromaticities, used on PCs, cHRM has these values: white point x = 0.3127, white point y = 0.329, red x = 0.64, red y = 0.33, green x = 0.3, green y = 0.6, blue x = 0.15, blue y = 0.06.
For Apple RGB chromaticies, cHRM has these values: white point x = 0.3127, white point y = 0.329, red x = 0.625, red y = 0.34, green x = 0.28, green y = 0.595, blue x = 0.155, blue y = 0.07.

sRGB[edit]

The sRGB chunk does the work of gAMA and cHRM; it denotes that the image has a gamma correction value of 0.45455 and the chromaticities written above. However, since many applications, including Web browsers, ignore the sRGB chunk, it is good to inclue a gAMA chunk with a value of 0.45455 as well. To include a cHRM chunk with the sRGB chromaticities may be good for the sake of completeness, though it is not as important as the inclusion of gAMA, because applications that ignore the sRGB chunk will usually ignore cHRM as well.

In addition to the built-in gamma and chromaticity values, the sRGB chunk holds a single byte denoting the rendering intent of the image. The rendering intent is what is to be done with out-of-gamut colours. Look at screenshots of colour pickers in any computing magazine and you’ll see that the full blue (#0000FF) is darker and slightly greener than that which you see on the screen; this is because full monitor blue is outside the gamut of colour printing inks. With the sRGB chunk you can specify how such colours should be treated.

The rendering intents are:

Absolute colorimetric (rendering intent 3)
Relative colorimetric (rendering intent 1)
Perceptual (rendering intent 0)
Saturation (rendering intent 2)

“Absolute colorimetric” keeps all the values of the source colour space as much as possible in the target colour space, with out-of-gamut colours clipped to the nearest available in the gamut. However, “absolute colorimetric” also keeps the white point of the source colour space. Since the human eye judges all colours in relation to white, this could result in the image looking colourised. “Absolute colorimetric” is useful for proofing images from one colour space against another colour space having a greater gamut. Non-professionals will rarely, if ever, need to use it.

“Relative colorimetric”, like the previous, keeps the colour values as much as possible on translation, clipping out-of-gamut values to the nearest in the range, but unlike the previous, scales the white point from the source to the target, so that the colourisation effect is prevented. “Relative colorimetric” is for when you want to match colours from the source to the target as much as possible. It’s good mainly for logos (where precise colours can be part of the corporate identity) and computer screenshots.

“Perceptual” does not clip out-of-gamut values; instead, it scales all colour values from the source to the target, keeping the relationship between them on expense of their precise values. It’s good especially for photographs, where the colour values themselves aren’t as important as preserving the overall relationship between all the colours in the image.

“Saturation” is like “perceptual” in scaling all colours from source to target, but it preserves the saturation of the colours on expense of their hue and lightness. This rendering intent is good for images with high-contrast, fully-saturated colours, such as graphs and charts. It can prevent the printer from dithering the colours.

The above guidelines for using the rendering intents are by no means ironclad. Some photographs may benefit from the relative colorimetric rendering intent, while some logos and screenshots may look better with the perceptual rendering intent. Remember that “relative colorimetric” shifts out-of-gamut colours if there are any, and “perceptual” shifts all colours even if they are all in-gamut. Trial and error is the only sure way to know which is best.

iCCP[edit]

The iCCP chunk, containing a compressed ICC profile, similarly does the work of gAMA and cHRM, and again it is prudent to include gAMA in addition to iCCP.

ICC profiles are a powerful tool for those who know how to create them. For PNG, the iCCP chunk is the only way to specify the rendering intent for colour spaces other than sRGB. If you want to use a colour space such as Adobe RGB but don’t need to specify the rendering intent, you can use gAMA and cHRM with the values of Adobe RGB instead of iCCP.

Note[edit]

There can only be one colour profile in a PNG file. Since the sRGB chunk counts as colour profile in its own right, iCCP and sRGB cannot appear both in the same image. Also, there can be only one gAMA chunk and only one cHRM chunk.

Without colour space chunks, the image is interpreted according to platform defaults.

Significant bits (sBIT)[edit]

I like to think of the sBIT chunk as PNG’s escaping mechanism. PNG cannot directly encode bit depths of other than powers of two, but the sBIT chunk provides for at least denoting that the original image had an unconventional bit depth.

An sBIT chunk contains values less than (or equal to, but that isn’t very useful) the current bit depth of the image. For example, a highcolour image is encoded as an 8-bit RGB PNG, the same as any other image containing more than 256 colours, but the sBIT chunk denotes that in its original form it has fewer bits per pixel. A high-resolution medical scan can be saved only as 16-bit greyscale, but an sBIT chunk can denote that it originally had only 12 bits per pixel.

The sBIT chunk contains as many values as there are channels: one for greyscale, two for greyscale with alpha, three for truecolour or indexed-colour and four for truecolour with alpha. There is no requirement that all the values should be identical; in fact, 16-bit highcolour images have a different value for the green channel than for red and blue.

For indexed-colour images, the sBIT chunk refers to the bit depths of the entries themselves, so that has to be 8 or less. A 256-colour image with sBIT values of 6 still has 256 colours, but each entry is recorded as being originally in the value 0–63 instead of 0–255.

Here are a few real-world uses:

CGA and EGA screenshots[edit]

Those graphic cards had four states, ie four values for each channel, two bits for each of red, green or blue. It doesn’t matter whether the image is 4-colour CGA or 16-colour EGA with the PC RGBI palette or 16-colour EGA with the full palette; those cards had a maximum range of 0–3 for each channel, so sBIT should have a value of 2 for each channel.

Images with full RGB only[edit]

Images with the eight zero-or-full RGB colours (see section “When to use which” under “Colour space information” above) are originally 1-bit (binary = all or nothing), so sBIT should have a value of 1 for each channel.

VGA screenshots[edit]

The VGA standard, at least in DOS, uses 6 bits for each channel, so sBIT should have a value of 6 for each. The Windows 16-colour VGA palette, on the other hand, does not fit this scheme, so sBIT should be left out of old or safe-mode Windows screenshots.

Highcolour[edit]

There are two sorts of highcolour image: 15 bits per pixel and 16 bits per pixel. The one favours an equal distribution of bits among the channels, the other a neat power of two for RAM’s sake. For 15-bit highcolour, sBIT should have a value of 5 for each channel. For 16-bit highcolour, the extra bit was usually added to the green channel, because of the human eye’s greater sensitivity to green, so sBIT should have a value of 5 for red and blue but 6 for green.

To check whether a highcolour image is 15 bits or 16 bits, and if 16 bits, which channel has the extra bit (sometimes it was red instead of green, especially for images with lots of skin tones), it is best to view the histogram of the image. In all cases the histogram should look like a row of combteeth, but while in 15-bit highcolour the row has the same density for all channels, in 16-bit highcolour the row is denser in one of them.

Background colour (bKGD)[edit]

For images with transparency, whether binary (on/off) or variable, the bKGD specifies the background colour if it is not specified in any other way (such as through CSS on a Web page). An added benefit of bKGD is that Internet Explorer 6, which disregards the alpha channel and composes the image upon a default grey, honours the bKGD chunk and composes the image onto the colour specified by it. It’s a fallback, but it’s better than IE 6’s default.

Histogram (hIST)[edit]

For indexed-colour images, a hIST chunk can be generated (using pnmtopng of the Netpbm suite, for example) that enables selection of the most important colours for display systems with a limited number of colours. I have not seen its effect, mainly because it’s been very long since I used a graphic display not capable of truecolour, but it may be helpful.

Suggested palette (sPLT)[edit]

The sPLT chunk is like hIST on steroids: it can be used on an image of any colour type, it can contain any number of entries (up to the general chunk size limit of 2GB), its entries can be in 16-bit truecolour instead of 8-bit truecolour, and it can contain transparency values as well—all in addition to the (optional) histogram values.

The sPLT chunk, then, in addition to being a more powerful version of hIST for helping display on low-capacity display systems, can also be used for getting around the limitations of the PLTE chunk. For example, a colour palette can be specified in 16-bit values for greater precision, if needed. Of course, this chunk lies dormant until an application is written that can take advantage of it.

Physical dimensions (pHYs)[edit]

The pHYs chunk specifies the size of the image in meatspace, or at least the size ratio between its pixels. The values are in pixels per metre, but many image editors provide for automatic conversion from the more popular DPI (for example pngcrush has you specify the resolution in DPI and inserts the metric value automatically). Here are some common resolutions:

Dots Per Inch	Pixels Per Metre
72	2835
96	3780
150	5906
200	7874
300	11811
600	23622
1200	47244
2540	100000

Alternatively, the pHYs chunk can denote just the ratio between the length and height of each pixel, without specifying the real-world size. Without pHYs, both the real-world size is absent and the ratio between the length and height of each pixel is assumed to be 1:1 (“square pixels”).

What values to specify[edit]

The clearest case is that of scanned images. You know at what resolution you have scanned the source, so you specify that resolution in pHYs. A photograph scanned at 300 dots per inch will have a pHYs chunk with a value of 11811 for both length and height.

For screenshots, image resolution is less meaningful, but can be included if there is a default resolution for the operating system.

Windows screenshots should have a resolution of 96 dpi (3780 pixels per metre).
Macintosh screenshots should have a resolution of 72 dpi (2835 pixels per metre)
For screenshots from the X Window System, older, plain vanilla X had a default of 75 dpi (2953 pixels per metre), but newer environments like GNOME have moved to 96 dpi.

All those graphical environments have an image resolution because it is necessary for determining what size to render fonts. Naturally, operating environments from before the age of graphical user interfaces had no need of that, therefore no image resolution. For most screenshots from such old sources, the PNG image needs no pHYs chunk at all. Some of those platforms, however, halved the available horizontal or vertical resolution of the screen in order to increase the number of available colours or sprites, resulting in non-square pixels, pixels with twice the horizontal or vertical size. Most famously in the Commodore 64, there was a “multicolour mode” where sprites could have more than one colour in exchange for a halved horizontal resolution, with pixels twice as wide as they are high (called “fat pixels”). The default size for Commodore 64 image is 320×200; it is possible to scale screenshots in multicolour mode to 160×200, but that looks unnatural, so it’s best to keep the image at 320×200 and add a pHYs chunk denoting a 2:1 ratio for each pixel.

Summary[edit]

Some of this tutorial may seem esoteric, and all of it geeky, yet I continue to marvel at the versatility of the PNG image file format, its ability to accommodate image data from professional transparencies in 64 bits per pixel with an Adobe RGB colour profile to screenshots of PC games running on the monochrome Hercules graphics card. I hope this tutorial shows how PNG can be used for accurately recording and preserving the information of any image, as well as giving the reader an appreciation of its design beyond the well-known benefits of better compression and variable transparency.