User talk:BryceMW-CA/Drafts/Replay System

Potential references[edit]

Here is an excerpt from a Stackoverflow answer by Peter Cordes that describes some reasons why replay can happen. All of the links are to other Stackoverflow posts so finding good sources still seems difficult but I trust this to be reasonably accurate. (Do answers where actual experiments were done count as primary sources because they are original research or secondary sources because Intel/AMD made the actual chips and these are just documenting what they do?)

The RS can replay uops in a few cases, e.g. for the other half of a cache-line-split load, or if it was dispatched in anticipation of load data arriving, but in fact it didn't. (Cache miss or other conflicts like Weird performance effects from nearby dependent stores in a pointer-chasing loop on IvyBridge. Adding an extra load speeds it up?) Or when a load port speculates that it can bypass the AGU before starting a TLB lookup to shorten pointer-chasing latency with small offsets - Is there a penalty when base+offset is in a different page than the base?

— Bryce (Talk) 14:42, 3 June 2024 (UTC)[reply]

"The allocator's job is threefold: 1- specify the port(s) on which a uop should be executed, 2- specify where to fetch the operands of each uop from (ROB or bypass network), 3- allocate for each uop entries in the ROB and the RS (this particular step is called issuing)..." - Hadi Brais — Bryce (Talk) 14:45, 11 June 2024 (UTC)[reply]

"Terminology: the multiply result doesn't go into the ROB. It goes over the forwarding network to whatever other uops read it, and goes into the PRF." - Peter Cordes — Bryce (Talk) 13:58, 13 June 2024 (UTC)[reply]