Talk:FMA instruction set

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Incompatible extension sets FMA4 and FMA3[edit]

I have deleted the statement regarding the impossibility of implementing both extensions simultaneously. Either you can or you can't. Near the bottom of the article, it is stated that it "is also possible that future processors will support both forms". I have decided to get on with it by making the change.Swapshop1 (talk) 00:54, 23 April 2011 (UTC)[reply]

Misspelled name[edit]

George Woltmann should be George Woltman. --Michael Rolle 02:04, 6 October 2012 (UTC) — Preceding unsigned comment added by Mrolle (talkcontribs)

Out of date tag[edit]

An editor tagged this article with 'out of date' in June 2012, apparently concerned that we do not indicate whether future Intel processors will support the FMA4 instruction. In my opinion we have accurately stated the level of uncertainty about Intel FMA support based on the references currently in the article, so I have removed the tag. Per WP:CRYSTAL it is not up to Wikipedia to predict what Intel will do in the future. It appears that chips supporting the Haswell (microarchitecture) will be the first to support either FMA3 or FMA4. Haswell chips will not be on the market until June 2013. AMD's Piledriver chips are already available and are said to support both FMA3 and FMA4. For example see New “Bulldozer” and “Piledriver” Instructions (PDF), AMD. EdJohnston (talk) 07:55, 19 March 2013 (UTC)[reply]

AMD Zen and FMA4 support[edit]

https://sourceware.org/ml/binutils/2015-03/msg00078.html says: "TBM, FMA4, XOP, LWP: ISAs are not supported." Thus it seems that FMA4 is not supported by AMD Zen. Otherwise an explanation needs to be added. Vincent Lefèvre (talk) 23:02, 21 March 2015 (UTC)[reply]

Hi Lefevre, I made the edit and it's the first one I've made so not entirely sure what I should be doing now (wrt an explanation). The exchange does say that FMA4 has been dropped from Zen but if you check the patch they are referring to you will see that FMA4 is actually supported

Gains CpuADX|CpuRdSeed|CpuSMAP|CpuSHA|CpuXSAVEC|CpuXSAVES|CpuClflushOpt|CpuCLZERO

Loses CpuXOP|CpuLWP|CpuTBM

Please check the patch, it is there clear as day (it's the znver.patch file you want to read) — Preceding unsigned comment added by 151.228.184.222 (talk) 11:49, 23 March 2015 (UTC)[reply]

I emailed the programmer (Ganesh at AMD) about the FMA4 confusion. He said CpuFMA4 was incorrectly specified in the patch. — Preceding unsigned comment added by 202.139.145.148 (talk) 00:21, 5 June 2015 (UTC)[reply]

I have tested the Ryzen processor. FMA4 works correctly, but CPUID reports FMA4 not supported. I think we must regard FMA4 as not officially supported on Ryzen. I have updated the history with this. Agnerf (talk) 11:34, 2 May 2017 (UTC)[reply]

I wonder whether the fact that CPUID does not report FMA4 as supported is a bug (e.g., they just forgot) or there is some reason. For instance, it might be possible that it works in most cases, but there are corner cases where it may fail; or it just hasn't been fully tested or proved. I would not say just after a few tests that it works. Things can be very complex with dynamic rounding modes, interrupts, and so on. Better rely on official information. Vincent Lefèvre (talk) 17:11, 2 May 2017 (UTC)[reply]
I've just updated the article, with a reference. As I guessed, it has been shown that FMA4 can give wrong results on this processor. According to one of the comments, FMA4 might have been dropped late in the process and they may not have had the time to modify the decoder to trigger an interrupt. Vincent Lefèvre (talk) 22:17, 9 May 2017 (UTC)[reply]
This probably means that AMD and Intel will converge to FMA3 only. Vincent Lefèvre (talk) 22:20, 9 May 2017 (UTC)[reply]
That was the idea with Zen. The removed most of the AMD-only extensions, such as FMA4, XOP and TBM. Carewolf (talk) 10:54, 12 May 2017 (UTC)[reply]
Any of the other removed instructions that still work too?Carewolf (talk) 18:33, 2 May 2017 (UTC)[reply]

At least one Ryzen Pro processor lists FMA4 as supported. (I was comparing the Ryzen 1700X with the Ryzen Pro 1700X) MoHaG (talk) 20:10, 20 December 2017 (UTC)[reply]

It would be interesting to know whether this is confirmed by the CPUID instruction on this processor. Vincent Lefèvre (talk) 23:02, 20 December 2017 (UTC)[reply]

Purpose of the instruction set?[edit]

It would be nice, to have a section on what the instruction set is for.
I know Wikipedia is kinda like the Microsoft help/hotline nowadays: Giving you lots of perfectly correct but utterly useless “robot” information that focuses on data memorization and acts as if the concept of understanding itself was foreign. Kinda like the current educational system that breeds such robots. But it would still be nice, to have a section for us leftover actual humans.
2A0A:A540:C250:0:200A:BD44:6939:8D (talk) 19:30, 27 September 2018 (UTC)[reply]

That is what the link to fused multiply–add is for. Carewolf (talk) 19:44, 1 October 2018 (UTC)[reply]

An editor questions this statement for verifiability reasons[edit]

The "AMD" bullet point in "CPUs with FMA3" reads:

AMD introduced FMA3 support in processors starting with Piledriver architecture for compatibility reasons.[1] The 2nd generation APU processors based on "Trinity" (32nm) supporting FMA3 instructions were launched May 15, 2012. The 2nd generation Bulldozer processors with Piledriver cores supporting FMA3 instructions were launched October 23, 2012.

Not only does that first sentence not even really make sense, at least without further explanation — what "compatibility reasons"?!? — but it's unclear how the cited reference is relevant since it never mentions Piledriver, only Trinity. Now, by chasing wikilinks I was eventually able to discern that Trinity is a Piledriver-based processor — the odd way it's mentioned here seemed like it was implying the opposite, since it explicitly notes that Bulldozer gen. 2 processors have Piledriver cores, after making no mention of that same fact when discussing Trinity. When in reality, Trinity is simply the first example of a Piledriver-based processor. Now, you could argue I was reading too much into the information presented, and you'd be correct. But Long story short, I'm going to cut all that down to:

AMD introduced FMA3 support in processors starting with the Piledriver microarchitecture. Piledriver-based "Trinity" (32nm) APU processors were launched May 15, 2012, and second-generation Bulldozer processors with Piledriver cores followed on October 23, 2012.[2]

I just wanted to leave this note to document the issues that caused me to radically trim the content of that paragraph, and demonstrate how sometimes less really is more. -- FeRDNYC (talk) 18:47, 10 January 2020 (UTC)[reply]

I agree that the article is a bit ugly. Perhaps the History section (which explains things) should be moved earlier. The "for compatibility reasons" could also be moved to the History section, where it really belongs, IMHO. Vincent Lefèvre (talk) 21:35, 10 January 2020 (UTC)[reply]

References

  1. ^ Maffeo, Robin (March 1, 2012). "AMD and the Visual Studio 11 Beta". AMD. Archived from the original on November 9, 2013. Retrieved 2018-11-07.
  2. ^ Maffeo, Robin (March 1, 2012). "AMD and the Visual Studio 11 Beta". AMD. Archived from the original on November 9, 2013. Retrieved 2018-11-07.

Test code[edit]

Check is fnmadd = - (a*b + c) or - a*b + c. You find both explanations.

#include <stdio.h>
#include <immintrin.h>

__m256d const A = { 3.16, 1.73 };
__m256d const B = { 3.17, 1.74 };
__m256d const C = { 5.00, 1.5  };

static void
print (char const* str, __m256d const& r)
{
	printf ("%-10.10s %10.6G %10.6G %10.6G %10.6G\n", str, r.m256d_f64[0], r.m256d_f64[1], r.m256d_f64[2], r.m256d_f64[3]);
}

int main()
{
    __m256d D;
	
	D = ::_mm256_fmadd_pd   (A, B, C); ::print ("fmadd"   , D);
	D = ::_mm256_fmsub_pd   (A, B, C); ::print ("fmsub"   , D);
	D = ::_mm256_fnmadd_pd  (A, B, C); ::print ("fnmadd"  , D);
	D = ::_mm256_fnmsub_pd  (A, B, C); ::print ("fnmsub"  , D);
	D = ::_mm256_fmaddsub_pd(A, B, C); ::print ("fmaddsub", D);
	D = ::_mm256_fmsubadd_pd(A, B, C); ::print ("fmsubadd", D);
	return 0;
}