Jump to content

Talk:Asymmetric numeral systems

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

History

[edit]

Here is a summary of history of ANS: http://encode.ru/threads/2612-ELI5-FSE-ANS?p=50332&viewfull=1#post50332 "The first variant (uABS, originally with switched encoder and decoder) was born in my physics MSc thesis mid-2006 (Polish: http://chaos.if.uj.edu.pl/%7Ekarol/prace/Duda06.pdf , alongside Maximal Entropy Random Walk). Both the concept and name was inspired by earlier working on Complex Base Numeral Systems (for my cs and math MScs). I have translated it into English as first version of http://arxiv.org/abs/0710.3861 (October 2007), early tANS variant can be seen in its later versions (end of 2007). The comp.compression discussion started by Mark Nelson was crucial - thanks of which Matt Mahoney found out about and implemented fpaqb and fpaqc (November 2007): https://groups.google.com/forum/#!topic/comp.compression/_cYQFMijqXE Regarding Andrew Polar, I have just found email from him about his implementation: 22nd April 2008. ... 2009 Wolfram Simulator ... ... a few years later intensive: 29.10.2013 Yann's thread: http://encode.ru/threads/1821-Asymetric-Numeral-System 11.11.2013 rewritten arxiv - http://arxiv.org/abs/1311.2540 26.11.2013 rABS: https://groups.google.com/d/msg/comp.compression/ZMKNBKp9rZI/c5CdgBXRx2cJ 16.12.2013 FSE: http://fastcompression.blogspot.com/2013/12/finite-state-entropy-new-breed-of.html 31.12.2013 rANS: https://groups.google.com/d/msg/comp.compression/ZMKNBKp9rZI/DnfEbRrHBUUJ 7.01.2014 updated arxiv with tANS, rANS names (inspired by mRNA, tRNA etc.) 2.02.2014 Fabian's rANS: https://fgiesen.wordpress.com/2014/02/02/rans-notes/ " 91.198.177.113 (talk) 12:48, 23 October 2016 (UTC)[reply]

Explaination of tANS?

[edit]

The current explanation of tANS is scarce, I see someone has added enigmatic example from Andrew Polar article. I have this diagram with example of complete process (used on encode.ru, then https://arxiv.org/abs/1612.04662 ) - it can be used if isn't too complex.

This addition also talks about tANS for adaptive compression - the only adaptive ANS I am aware of use rANS variant (LZNA, LZA), as tANS would require rebuilding the tables. And maybe general remarks could be taken to the "Remarks" section?

Example of generation of tANS tables and applying them for stream decoding for m = 3 size alphabet and L = 16 states.

== crackpot ==

A crackpot wrote this article about his own "work". Consider deletion and banning "Jarek Duda" 92.193.99.247 (talk) 09:53, 31 October 2018 (UTC)[reply]

Hi Michael, thanks for the comment and a dozen of papers about Huffman coding. Your removing information about their modern alternatives from Huffman Wikipedia article will not change the fact that authors of data compressors now choose more efficient methods instead - maybe it is time to move on from this defense of your articles, and work on something people will use. All the best. Jarek Duda (talk) 11:50, 31 October 2018 (UTC)[reply]
@Jarek Duda: please be aware that revealing personal information (right or wrong) is against the policy. See WP:OUTING. Retimuko (talk) 16:00, 31 October 2018 (UTC)[reply]
So how to defend from someone trying to anonymously vandalize to defend own specialty from modern alternatives, like calling crackpottery a method now used to encode data by users of e.g. Apple, Facebook, Linux? He has dominated Huffman coding article, we had a long fight about it I finally gave up - he removed examples of its suboptimality and ways it is now repaired - leaving only mysterious "However, although optimal among methods encoding symbols separately, Huffman coding is not always optimal among all compression methods". In contrast, the only situation it is optimal is generic: uncorrelated sequence with power of 1/2 probabilities. Jarek Duda (talk) 16:33, 31 October 2018 (UTC)[reply]
I could have patented ANS and be rich now, instead I have decided to leave it free as it means global savings we all use now - finally the only I got for ANS are nasty relations with Google. And additionally I need to fight for access to objective information about currently used methods with a guy who stayed in the past and cannot move on - please give me a break. Jarek Duda (talk) 16:53, 31 October 2018 (UTC)[reply]
First, it seems to me that both of you have conflict of interest (see WP:COI), and should not edit these articles at all. Second, by revealing the name of an anonymous editor here you violate the outing policy. Retimuko (talk) 17:30, 31 October 2018 (UTC)[reply]
I nearly haven't touched ANS article since its creation, and it would be great if someone competent and objective improve it. I also haven't touched Huffman article for years - which currently is not objective: undermines weaknesses and silence currently used alternatives, due to this nonobjective person pursuing personal interests - whose career depends on people seeing Huffman as the only way. It should be repaired by someone objective and locked from this person. My apologies for revealing personal information - you can remove above discussion to repair it. Jarek Duda (talk) 17:58, 31 October 2018 (UTC)[reply]
Regarding revealing personal information being against policy, notice that mine was in response to his using my personal information - so what is the proper defense from anonymous personal attack? I don't have doubts about its authorship - we have our history and he is probably the last person trying to erase information about weaknesses and alternatives of Huffman. Jarek Duda (talk) 20:31, 31 October 2018 (UTC)[reply]
For your information I'm not "Michael". I have just noticed your frequent questionable posts on reddit and stackexchange (reeking of crackpottery), which lead me to this (which was immediately clear can only be a self-authored) article and ultimately the talk section. No one has revealed your personal information but yourself given that you are quoting yourself in the article and you're using your actual name as your username (to create the article in the first place). So are we going to write wikipedia article now about every paper we publish?. 92.193.26.245 (talk) 11:33, 1 November 2018 (UTC)[reply]
Reading the discussion you linked, it appears more people have a problem with people using wikipedia to promote their own work and push their status. 92.193.26.245 (talk) 11:36, 1 November 2018 (UTC)[reply]
Interesting argumentation from a person wanting to erase article about method currently used by nearly everybody (and personally insulting me still without giving any specific argument) - does it apply to every scientific article? Generally there is a problem with competent authors for specialist articles - often the best person to create such initial article is the author of given method gaining popularity - writing in objective and accessible way, then others should improve it - like it has happened here. And generally scientists have responsibility to criticize or promote what needs it in their area of expertise, as there are no better persons to do that - no mater who is the author: scientists need to be objective, and I do my best for that. And Wikipedia is not a place for anonymous personal attacks: you could introduce and argument in those different places instead. Jarek Duda (talk) 12:04, 1 November 2018 (UTC)[reply]
I disagree. If everyone was to do what you are doing, any PhD student would be writing wikipedia articles about their current papers, while specifically mentioning their name in the introductory sentence ("Asymmetric numeral systems (ANS)[1] is a family of entropy encoding methods introduced by Jarosław (Jarek) Duda[2] from Jagiellonian University") as if it's a milestone in the field. It's blatantly obvious what you are doing. It's amazing that you would think other people are stupid enough to buy the above paragraph of "competent authors for specialist articles", "writing in objective and accessible way", "scientists need to be objective". I've read the discussion you linked and you've been called out for misunderstanding what wikipedia is before. It's not a platform to promote your papers. Content needs to be relevant for wikipedia. I think you are abusing wikipedia for your personal promotion (and feigning injury when people mention this). 92.193.70.49 (talk) 10:05, 2 November 2018 (UTC)[reply]

If you would look in the history of the article, here is the initial version I have prepared two years ago, not mentioning the author, only one late peer-reviewed article I am one of 4 authors, not my 3 earlier articles introducing ANS - I see still aren't there. I don't control how others change this article, only would react if seeing a clear vandalism. I don't care if it directly mentions the author - its sole purpose was objective introduction to currently used methods.

In contrast, here is the text you have removed from Huffman article - objectively describing its weaknesses and currently used alternatives, you have left only misleading understatement: "However, although optimal among methods encoding symbols separately, Huffman coding is not always optimal among all compression methods. ":

"Specifically, Huffman coding is optimal only if the probabilities of symbols are natural powers of 1/2. This is usually not the case. As an example, a symbol of probability 0.99 carries only log 2 ⁡ ( 1 / 0.99 ) ≈ 0.014 {\displaystyle \log _{2}(1/0.99)\approx 0.014} {\displaystyle \log _{2}(1/0.99)\approx 0.014} bits of information, but Huffman coding encodes each symbol separately and therefore the minimum length for each symbol is 1 bit. This sub-optimality is repaired in arithmetic coding and recent faster Asymmetric Numeral Systems family of entropy codings. "

While my industry ANS experience was the only "gratitude" being trials to get exclusivity for my work, my earlier academic experience: with people who built career on Huffman and AC, was a few years of pretending that ANS does not exist. But now when it is nearly everywhere, I don't believe a new determined crusader has appeared. Jarek Duda (talk) 10:59, 2 November 2018 (UTC)[reply]

encode.su forum as a source

[edit]

What claim exactly is supposed to be confirmed by this source in the first sentence of the lead? It seems to me that we need a better source. Retimuko (talk) 04:20, 26 July 2019 (UTC)[reply]

This link contains a hundred of ANS related links, and is updated by ANS author - there is no other such source. Also, this forum gathers top data compression specialists, also from Google, Facebook. What is a better place for such dynamic list of related materials? 89.64.56.69 (talk) 05:07, 26 July 2019 (UTC)[reply]
What claim it is supposed to confirm? Retimuko (talk) 05:23, 26 July 2019 (UTC)[reply]
The claim it is a reference for: of being "used in data compression since 2014" - the next paragraph of Wikipedia article contains later popular compressors, encode.su link contains also more historical, like 2014 Yann Collet's zhuff - predecessor of zstd. 89.64.56.69 (talk) 05:33, 26 July 2019 (UTC)[reply]
I would think that we need an independent source that would confirm the claim that this method was in use since 2014. I believe we should leave the request for a better source in place. Maybe this link belongs to the "External links" section, which, by the way, looks quite overloaded, and, probably, is in violation of WP:NOTLINKFARM Retimuko (talk) 05:55, 26 July 2019 (UTC)[reply]
The used practical data compressors do not come from peer-reviewed scientific papers, but blogs, github, forums - especially encode.ru, e.g. here is Yann Collet's thread starting application of ANS: https://encode.ru/threads/1821-Asymetric-Numeral-System . Here is zhuff's webpage, without even a date: http://fastcompression.blogspot.com/p/zhuff.html Jarek Duda (talk) 06:03, 26 July 2019 (UTC)[reply]
I am not sure what point are you trying to make. Sounds like an admission that the subject is not notable enough to be mentioned in some reputable magazine or something like that. Perhaps, we should remove the "used since" claim, that is hard to verify. What does it even mean without an independent source? Used by whom? How extensively? It is one thing to claim that it was published on a particular date, but quite another thing to claim that it was actually used. Retimuko (talk) 06:26, 26 July 2019 (UTC)[reply]
No, it is not hard to verify, for example as we have archive.org, zhuff's: https://web.archive.org/web/20140915000000*/http://fastcompression.blogspot.com/p/zhuff.html In contrast, it is hard to call methods used by nearly all electronic devices as "not notable". These are just two different worlds: practical programmers focused on making working programs, and academics focused on getting points from publications (evaluated by other theoreticians) - these two groups have separate motivations and skillsets. I am in the latter group, but classifying that what is the former group doing as "not notable" due to ways they present their work is not only discriminatory, but just wrong - we are using their work. Jarek Duda (talk) 06:50, 26 July 2019 (UTC)[reply]
The link you gave does not really confirm usage, but just a mention in a blog. So, perhaps, we could claim that the method was known since 2014. But blogs and forums are not reliable sources. Let's consider Wikipedia policies. If something is truly notable there should be no problem to find a mention of it in a reliable secondary source. This is true about both theoretical and practical work. The article should state verifiable facts. If we cannot find an independent secondary source stating that this method is in use since 2014, then we should not make such a claim. Retimuko (talk) 17:41, 26 July 2019 (UTC)[reply]
Peer-review is made by anonymous reviewers and it happens to such papers to be retracted - how can we trust them? Zhuff points FSE Github, which goes back to 2013 with available code. You have documented statements of authors e.g. of zstd, ANS, of Matt Mahoney and many others known in data compression, you can verify their dates with archive.org. Sure there is missing written history, this encode.su list is the best source now - wanting to contribute you can fill this gap. Jarek Duda (talk) 19:46, 26 July 2019 (UTC)[reply]

Are you questioning the peer review process? Should, for instance, reputable journals just accept papers for publication with no reviews? Yes, reliable sources publish corrections and retractions sometimes. And that is one of the reasons we consider them reliable - they have reputation and history of checking things and correcting mistakes if necessary. Blogs, forums and other self-published or user-generated content are not reliable sources. Why are we going in circles about such a basic Wikipedia policy? If this subject is truly notable it must be easy to find a mention of it in an independent reliable source. Claims in the article must be verifiable using such sources. Retimuko (talk) 18:01, 27 July 2019 (UTC)[reply]

This is material documenting a statement - discussions, bogs, additionally verifiable by archive.org, for example Yann Collet's (author of Zstandard) 2014 post on zhuff: http://fastcompression.blogspot.com/2013/12/zhuff-v09-or-first-fse-application.html If you undermine their documentary value: claim that all these people are lying, please provide some support for your belief - of the proper year ANS was first used in data compressors. 1st January 2014 has also started the Google ANS patent story, described e.g. in https://arstechnica.com/features/2018/06/inventor-says-google-is-patenting-work-he-put-in-the-public-domain/ Jarek Duda (talk) 03:59, 28 July 2019 (UTC)[reply]
I am not claiming that they are lying. Please learn basic principles of Wikipedia. We need independent reliable secondary sources. This is a fundamental requirement. Ars Technica is considered a reliable source (see WP:RSP). So if that article you linked above confirms some claims, then let's cite it in the appropriate places. But blogs and forums are not reliable sources. And I do not want to go in circles about this anymore. Retimuko (talk) 06:48, 28 July 2019 (UTC)[reply]
Please explain why you don't considered archive.ore-ed information by authors, public discussions of creators of many currently used compressors as a reliable source? This Wikipedia article requires some placing in time, especially when it has actually started being used in compressor - do you disagree? It needs objective evidence - do you disagree that archive.org provides objective evidence? Does current Wikipedia moderation restricts to "help" in removing evidence? Jarek Duda (talk) 08:01, 28 July 2019 (UTC)[reply]

uABS implementation

[edit]

I've tried to create some javascript code that implements the uABS pseudocode. Here it is:

encode = (bitsText, p) => {

   let x = 0;
   for (let i = 0; i < bitsText.length; i++) {
       let s = bitsText[i];
       if (s === '0') {
           x = Math.ceil((x+1)/(1-p)) - 1;
       } else {
           x = Math.floor(x/p);
       }
   }
   return x;

}

decode = (x, p) => {

   let result = "";
   while (x > 0)
   {
       let s = Math.ceil((x+1)*p) - Math.ceil(x*p)  // 0 if fract(x*p) < 1-p, else 1
       if (s === 0) x = x - Math.ceil(x*p)   // D(x) = (new_x, 0)
       if (s === 1) x = Math.ceil(x*p)  // D(x) = (new_x, 1)
       result += s;
   }
   return result;

}

This seems to work at first glance but always renders an X = 0 for any bitstring that starts with '1' i.e. s = 1 gives x = 0, as does 1111... Perhaps I'm misunderstanding something? — Preceding unsigned comment added by Denshade (talkcontribs) 11:35, 31 December 2020 (UTC)[reply]

Renders? Well, you set x to 0 and then do 0/p which is 0 (or undefined depending on what P is but javascript will throw that away without saying anything because it's a trash language) then take the floor of that which is still 0 then I can't say what's happening since you never actually created a probability table, updated anything but x, and it's questionable whether s==='0' is ever true depending on just exactly what bitsText is. Read the comments and the surrounding text. The pseudocode is garbage but it's obvious it isn't a full implementation. --A Shortfall Of Gravitas (talk) 15:47, 17 May 2024 (UTC)[reply]

Renamed from plural to singular "Asymmetric numeral system"?

[edit]

I have just noticed this change by @Michael Hardy and was a bit surprised, I don't remember seeing this name in singular before, searching for "Asymmetric numeral system" there are shown nearly only plural. This is not a single fixed e.g. decimal numeral system, but plurality is literally at heart of this approach - huge freedom to choose from for each specific situation. Maybe this change deserves at least some justification, discussion?

There are many reasons for plural name: this is a family of methods, starting with widely used very different tANS and rANS variants, also e.g. uABS is very different. If focusing on one of them, there is still gigantic number of choices (e.g. ~2^2048 in popular FSE), usually changed every data block, or even every symbol - e.g. adaptive rANS applications use different practically an unique "Asymmetric numeral system" (singular) for each symbol. Jarek Duda (talk) 07:41, 8 January 2023 (UTC)[reply]

Rich Geldreich "rANS entropy coding is overrated" post

[edit]

Somebody is adding this blog post: https://richg42.blogspot.com/2023/04/rans-entropy-coding-is-overrated.html comparing vectorized RC with non-vectorized rANS. Here it is compared by Hamid Buzidi (dnd) with others e.g. available vectorized rANS: https://encode.su/threads/1821-Asymetric-Numeral-System?p=79253&viewfull=1#post79253

  C Size  ratio%     C MB/s     D MB/s     Name            (bold = pareto) MB=1.000.0000
   62755432    62.8     441.07    1897.57   TurboANX 12     
   63083999    63.1     465.35    2061.05   TurboANX 32     
   63156820    63.2    1070.86    1667.47   TurboHF 16s5    
   63378220    63.4     670.40     813.82   fse 0           
   63394436    63.4    1149.91    1734.06   TurboHF 24s5    
   63423452    63.4    1333.76    1627.15   fsehuf 0 32K     
   63541469    63.5     731.75    1639.02   rans32avx2 0 32K
   63639217    63.6     920.83    1891.04   rans32avx2 0 1M 
   63906072    63.9      90.87     970.30   sserc - RC from blog
Jarek Duda (talk) 03:05, 12 May 2023 (UTC)[reply]
My blog post compares a vectorized SSE 4.1 24-bit interleaved range decoder vs. a well-known (and well-implemented) SSE 4.1 vectorized interleaved rANS decoder. This vectorized rANS implementation is available on github (which I clearly linked to in the blog post):
https://github.com/rygorous/ryg_rans
Specifically, the SSE 4.1 rANS code I am benchmarking is here:
https://github.com/rygorous/ryg_rans/blob/master/rans_word_sse41.h
Quick benchmark on Ice Lake, compiling and running the source code on github:
SIMD Range: 64 streams: 738 MiB/sec., 436,349 bytes
SIMD Range: 16 streams: 619.1 MiB/sec.
SIMD rANS: 556.5 MiB/s, 435,626 bytes
Unrolling the SIMD rANS decoder to 16 streams vs. 8 should help make it more competitive, although I don't know for sure.
It turns out that many techniques used to accelerate vectorized rANS decoding are compatible with fast range and Huffman decoders. The inner loops of both are surprisingly similar: both have to do some expensive table gathers, fetch some bytes from the source and renormalize the lanes using quite similar instructions. After comparing the inner loops of working vectorized Huffman, range and rANS decoders, many of the perceived advantages of rANS for decoding fade away.
More work has been invested creating highly optimized decoders for rANS vs. other solutions, so in my opinion the superiority of claims for rANS for decoding vs. other approaches have been exaggerated. Also note that vectorized Huffman decoding can be easily scaled to decode 2 or 3 symbols per iteration, but I've not seen this capability in any available rANS decoder.
rANS for encoding still seems to have advantages (currently), because interleaving its output is easier/faster. However, on the flip side the patent situation makes using rANS in a commercial context in the US very problematic, according to our IP lawyer. At least one patent from a well-known tech corp describes the rANS algorithm basically to a tee. We can't safely use this algorithm in commercial code, unfortunately.
The "sserc" entry above was for my SSE 4.1 24-bit range coder, which is available in full source code form here. At its core it's a vectorized Pavlov/Subbotin range coder reduced to 24-bits and vectorized in a straightforward way, and fetching interleaved bytes much like the linked to vectorized rANS decoder does. It's full open source code is here:
https://github.com/richgel999/sserangecoding
A straightforward port of this SSE 4.1 range coder to AVX 2 is ~65% faster on Ice Lake. (So if the benchmark above gets 970 MiB/sec., AVX-2 should get roughly 1620 MiB/sec. or so, making it roughly as fast as rANS.) With AVX-2 range coding itself is not the scaling bottleneck, it's renormalizing and feeding the 8 lanes efficiently. Richgel999 (talk) 07:22, 12 May 2023 (UTC)[reply]

How did Microsoft possibly get this patent?

[edit]

I do not know how if

" The USPTO rejected its application in 2018, and Google subsequently abandoned the patent.[24] "

Than how did Microsoft possibly get a patent seeing it could be based on prior artwork, unless somehow a large amount more was added, maybe?

" In June 2019 Microsoft lodged a patent application called "Features of range asymmetric number system encoding and decoding".[25] The USPTO issued a final rejection of the application on October 27, 2020. Yet on March 2, 2021, Microsoft gave a USPTO explanatory filing stating "The Applicant respectfully disagrees with the rejections.",[26] seeking to overturn the final rejection under the "After Final Consideration Pilot 2.0" program.[27] After reconsideration, the USPTO granted the application on January 25, 2022.[28] "

https://en.wikipedia.org/wiki/Patentability#Infringement

https://en.wikipedia.org/wiki/Prior_art

https://www.gnu.org/philosophy/software-patents.html#patent-overturning

I see that it may have happened, though I may wish to

" There is reasonable allowance for speculation, suggestion, and personal knowledge on talk pages, with a view to prompting further investigation "

have further investigation into how it can be that way, and not to

"argue any point that has not met policy requirements."

https://en.wikipedia.org/wiki/Wikipedia:Talk_page_guidelines#Central_points — Preceding unsigned comment added by Other Cody (talkcontribs) 16:03, 19 January 2024 (UTC)[reply]

We have discussed this patent with data compression experts, but we were not able to understand what was supposed to be its difference from standard rANS introduced years earlier, e.g. here: https://encode.su/threads/3863-RANS-Microsoft-wins-data-encoding-patent/ Jarek Duda (talk) 08:18, 20 January 2024 (UTC)[reply]
The USPTO is overworked, staffed by non-experts, probably doesn't care, and may even accept "cash bonuses" although I doubt anything but their incompetence was required. I've read through some patents on pharmaceuticals that were so vague that they cover the drug in question and every organic chemical with a 6 or 7 membered carbon ring anywhere in it. --A Shortfall Of Gravitas (talk) 15:52, 17 May 2024 (UTC)[reply]