Jump to content

BCJ (algorithm)

From Wikipedia, the free encyclopedia

In data compression, BCJ, short for branch/call/jump, refers to a technique that improves the compression of machine code by replacing relative branch addresses with absolute ones. This allows a Lempel–Ziv compressor to identify duplicate targets and more efficiently encode them. On decompression, the inverse filter restores the original encoding. Different BCJ filters are used for different instruction sets, as each use different opcodes for branching.

A form of BCJ is seen in Microsoft's cabinet file format from 1996, which filters x86 CALL instructions for the LZX compressor.[1] The 7z and xz file formats implement BCJ for multiple architectures.[2] ZPAQ calls its x86 BCJ as "E8E9", after the opcode values.[3]

bsdiff, a tool for delta updates, circumvents the need of writing architecture-specific BCJ tools by encoding bytewise differences. This allows it to be much better than the "match and copy" type tools such as VCDIFF,[4] giving an output size of only 6% for Google Chrome. However, Google's courgette, which adds a layer of explicit disassembly, is able to produce 9× smaller diffs.[5]

Effect

[edit]

For a squashfs image of a Fedora Linux 31 live image, using x86 BCJ saves an extra 30 MB out of the ~1.7 GB compressed size, but doubles the installation time.[6]

References

[edit]
  1. ^ "cabextract: Free Software for extracting Microsoft cabinet files". Retrieved 17 March 2020.
  2. ^ "The .xz File Format".
  3. ^ "ZPAQ". mattmahoney.net.
  4. ^ Colin Percival, Naive differences of executable code, http://www.daemonology.net/bsdiff/, 2003.
  5. ^ "Software Updates: Courgette". www.chromium.org.
  6. ^ "Changes/OptimizeSquashFS - Fedora Project Wiki". fedoraproject.org.