Talk:Byte pair encoding

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

1994?[edit]

I find it a little hard to believe that this algorithm wasn't publicly discussed until 1994. For instance, the basic algorithm was used in the US version of Final Fantasy, which was released in 1990, which can be confirmed by analyzing the game's ROM data. I cannot be certain that they used exactly the same algorithm as Gage, but it seems likely that it was close. This compression method is also found in a number of games from the 8-bit and 16-bit eras (we ROM hackers call it "DTE", for "dual-tile encoding"), though I don't know when the earliest example is from. - furrykef (Talk at me) 15:57, 31 August 2012 (UTC)[reply]

The basic idea is simple enough to be re-invented multiple times by asm and video game developers. You can add mentions and references to similar techniques I suppose. Wqwt (talk) 03:13, 25 January 2022 (UTC)[reply]

Poor example[edit]

The example, although it technically shows the method, is unfortunate in that it does not actually result in less bytes. Jtgd (talk) 07:17, 21 April 2014 (UTC)[reply]

Byte pair encoding doesn't work well on very short strings. You need a few byte pairs that appear several times to get a noticeable improvement, and this might need a kilobyte of text. What's a suitably repetitive string I could use? --Damian Yerrick (talk) 23:02, 11 April 2015 (UTC)[reply]
True. The original article uses a hash table and pair table. Actually, BPE does work for short strings (with repeated pairs) because it's so simple.
The example should be modified to state it is simplified for educational purposes. Wqwt (talk) 03:08, 25 January 2022 (UTC)[reply]