Comparison of CPU microarchitectures

The following is a comparison of CPU microarchitectures.

Microarchitecture	Year	Pipeline stages	Misc
Elbrus-8S	2014		VLIW, Elbrus (proprietary, closed) version 5, 64-bit
AMD K5	1996	5	Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming^[a]
AMD K6	1997	6	Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming^[b]
AMD K6-III	1999		Branch prediction, speculative execution, out-of-order execution^[1]
AMD K7	1999		Out-of-order execution, branch prediction, Harvard architecture
AMD K8	2003		64-bit, integrated memory controller, 16 byte instruction prefetching
AMD K10	2007		Superscalar, out-of-order execution, 32-way set associative L3 victim cache, 32-byte instruction prefetching
ARM7TDMI (-S)	2001	3
ARM7EJ-S	2001	5
ARM810		5	static branch prediction, double-bandwidth memory
ARM9TDMI	1998	5
ARM1020E		6
XScale PXA210/PXA250	2002	7
ARM1136J(F)-S		8
ARM1156T2(F)-S		9
ARM Cortex-A5		8	Multi-core, single issue, in-order
ARM Cortex-A7 MPCore		8	Partial dual-issue, in-order, 2-way set associative level 1 instruction cache
ARM Cortex-A8	2005	13	Dual-issue, in-order, speculative execution, superscalar, 2-way pipeline decode
ARM Cortex-A9 MPCore	2007	8–11	Out-of-order, speculative issue, superscalar
ARM Cortex-A15 MPCore	2010	15	Multi-core (up to 16), out-of-order, speculative issue, 3-way superscalar
ARM Cortex-A53	2012		Partial dual-issue, in-order
ARM Cortex-A55	2017	8	in-order, speculative execution
ARM Cortex-A57	2012		Deeply out-of-order, wide multi-issue, 3-way superscalar
ARM Cortex-A72	2015
ARM Cortex-A73	2016		Out-of-order superscalar
ARM Cortex-A75	2017	11–13	Out-of-order superscalar, speculative execution, register renaming, 3-way
ARM Cortex-A76	2018	13	Out-of-order superscalar, 4-way pipeline decode
ARM Cortex-A77	2019	13	Out-of-order superscalar, speculative execution, register renaming, 6-way pipeline decode, 10-issue, branch prediction, L3 cache
ARM Cortex-A78	2020	13	Out-of-order superscalar, register renaming, 4-way pipeline decode, 6 instruction per cycle, branch prediction, L3 cache
ARM Cortex-A710	2021	10
ARM Cortex-X1	2020	13	5-wide decode out-of-order superscalar, L3 cache
ARM Cortex-X2	2021	10
ARM Cortex-X3	2022	9
ARM Cortex-X4	2023	10
AVR32 AP7		7
AVR32 UC3		3	Harvard architecture
Bobcat	2011		Out-of-order execution
Bulldozer	2011	20	Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 cores per chip, up to 16 MB L3 cache, Virtualization, Turbo Core, FlexFPU which uses simultaneous multithreading^[2]
Piledriver	2012		Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 MB L2 cache, up to 16 MB L3 cache, Virtualization, FlexFPU which use simultaneous multithreading,^[2] up to 16 cores per chip, up to 5 GHz clock speed, up to 220 W TDP, Turbo Core
Steamroller	2014		Multi-core, branch prediction
Excavator	2015	20	Multi-core
Zen	2017	19	Multi-core, superscalar, 2-way simultaneous multithreading, 4-way decode, out-of-order execution, L3 cache
Zen+	2018	19	Multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache
Zen 2	2019	19	Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache
Zen 3	2020	19	Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, SMT, L3 cache
Zen 4	2022		Multi-chip module, multi-core, superscalar, L3 cache
Crusoe	2000		In-order execution, 128-bit VLIW, integrated memory controller
Efficeon	2004		In-order execution, 256-bit VLIW, fully integrated memory controller
Cyrix Cx5x86	1995	6^[3]	Branch prediction
Cyrix 6x86	1996		Superscalar, superpipelined, register renaming, speculative execution, out-of-order execution
DLX		5
eSi-3200		5	In-order, speculative issue
eSi-3250		5	In-order, speculative issue
EV4 (Alpha 21064)			Superscalar
EV7 (Alpha 21364)			Superscalar design with out-of-order execution, branch prediction, 4-way simultaneous multithreading, integrated memory controller
EV8 (Alpha 21464)			Superscalar design with out-of-order execution
65k			Ultra low power consumption, register renaming, out-of-order execution, branch prediction, multi-core, module, capable of reach higher clock
P5 (Pentium)	1993	5	Superscalar
P6 (Pentium Pro)		14	Speculative execution, register renaming, superscalar design with out-of-order execution
P6 (Pentium II)		14^[4]	Branch prediction
P6 (Pentium III)	1995	14^[4]
Intel Itanium "Merced"	2001		Single core, L3 cache
Intel Itanium 2 "McKinley"	2002	11^[5]	Speculative execution, branch prediction, register renaming, 30 execution units, multithreading, multi-core, coarse-grained multithreading, 2-way simultaneous multithreading, Dual-domain multithreading, Turbo Boost, Virtualization, VLIW, RAS with Advanced Machine Check Architecture, Instruction Replay technology, Cache Safe technology, Enhanced SpeedStep technology
Intel NetBurst (Willamette)	2000	20	2-way simultaneous multithreading (Hyper-threading), Rapid Execution Engine, Execution Trace Cache, quad-pumped Front-Side Bus, Hyper-pipelined Technology, superscalar, out-of order
NetBurst (Northwood)	2002	20	2-way simultaneous multithreading
NetBurst (Prescott)	2004	31	2-way simultaneous multithreading
NetBurst (Cedar Mill)	2006	31	2-way simultaneous multithreading
Intel Core	2006	12	Multi-core, out-of-order, 4-way superscalar
Intel Atom		16	2-way simultaneous multithreading, in-order, no instruction reordering, speculative execution, or register renaming
Intel Atom Oak Trail			2-way simultaneous multithreading, in-order, burst mode, 512 KB L2 cache
Intel Atom Bonnell	2008		SMT
Intel Atom Silvermont	2013		Out-of-order execution
Intel Atom Goldmont	2016		Multi-core, out-of-order execution, 3-wide superscalar pipeline, L2 cache
Intel Atom Goldmont Plus	2017		Multi-core
Intel Atom Tremont	2019		Multi-core, superscalar, out-of-order execution, speculative execution, register renaming
Intel Atom Gracemont	2021		Multi-core, superscalar, out-of-order execution, speculative execution, register renaming
Nehalem	2008	14	2-way simultaneous multithreading, out-of-order, 6-way superscalar, integrated memory controller, L1/L2/L3 cache, Turbo Boost
Sandy Bridge	2011	14	2-way simultaneous multithreading, multi-core, on-die graphics and PCIe controller, system agent with integrated memory and display controller, ring interconnect, L1/L2/L3 cache, micro-op cache, 2 threads per core, Turbo Boost,
Intel Haswell	2013	14–19	SoC design, multi-core, multithreading, 2-way simultaneous multithreading, hardware-based transactional memory (in selected models), L4 cache (in GT3 models), Turbo Boost, out-of-order execution, superscalar, up to 8 MB L3 cache (mainstream), up to 20 MB L3 cache (Extreme)
Broadwell	2014	14–19	Multi-core, multithreading
Skylake	2015	14–19	Multi-core, L4 cache on certain Skylake-R, Skylake-U and Skylake-Y models. On-package PCH on U, Y, m3, m5 and m7 models. 5 wide superscalar/5 issues.
Kaby Lake	2016	14–19	Multi-core, L4 cache on certain low and ultra low power models (Kaby Lake-U and Kaby Lake-Y),
Intel Sunny Cove	2019	14–20	Multicore, 2-way multithreading, massive OoOE engine, 5 wide superscalar/5 issue.
Intel Cypress Cove	2021	14	multicore, 5 wide superscalar/6 issues, massive OoOE engine, big core design.
Intel Willow Cove	2020		Multicore
Intel Golden Cove	2021		Multicore
Intel Xeon Phi 7120x	2013	7-stage integer, 6-stage vector	Multi-core, multithreading, 4 hardware-based simultaneous threads per core which can't be disabled unlike regular HyperThreading, Time-multiplexed multithreading, 61 cores per chip, 244 threads per chip, 30.5 MB L2 cache, 300 W TDP, Turbo Boost, in-order dual-issue pipelines, coprocessor, Floating-point accelerator, 512-bit wide Vector-FPU
LatticeMico32	2006	6	Harvard architecture
Nvidia Denver	2014		Multicore, superscalar, 2-way decode, L2
Nvidia Carmel	2018		Multicore, 10-way superscalar, L3
POWER1	1990		Superscalar, out-of-order execution
POWER3	1998		Superscalar, out-of-order execution
POWER4	2001		Superscalar, speculative execution, out-of-order execution
POWER5	2004		2-way simultaneous multithreading, out-of-order execution, integrated memory controller
IBM POWER6	2007		2-way simultaneous multithreading, in-order execution, up to 5 GHz
IBM POWER7+			Multi-core, multithreading, out-of-order, superscalar, 4 intelligent simultaneous threads per core, 12 execution units per core, 8 cores per chip, 80 MB L3 cache, true hardware entropy generator, hardware-assisted cryptographic acceleration, fixed-point unit, decimal fixed-point unit, Turbo Core, decimal floating-point unit
IBM POWER8	2013	15–23	Superscalar, L4 cache
IBM POWER9	2017	12–16	Superscalar, out-of-order execution, L4 cache
IBM Power10	2021		Superscalar
IBM Cell	2006		Multi-core, multithreading, 2-way simultaneous multithreading (PPE), Power Processor Element, Synergistic Processing Elements, Element Interconnect Bus, in-order execution
IBM Cyclops64			Multi-core, multithreading, 2 threads per core, in-order
IBM zEnterprise zEC12	2012	15/16/17	Multi-core, 6 cores per chip, up to 5.5 GHz, superscalar, out-of-order, 48 MB L3 cache, 384 MB shared L4 cache
IBM A2		15	multicore, 4-way simultaneous multithreaded
PowerPC 401	1996	3
PowerPC 405	1998	5
PowerPC 440	1999	7
PowerPC 470	2009	9	Symmetric multiprocessing (SMP)
PowerPC e300		4	Superscalar, branch prediction
PowerPC e500		Dual 7 stage	Multi-core
PowerPC e600		3-issue 7 stage	Superscalar out-of-order execution, branch prediction
PowerPC e5500	2010	4-issue 7 stage	Out-of-order, multi-core
PowerPC e6500	2012		Multi-core
PowerPC 603		4	5 execution units, branch prediction, no SMP
PowerPC 603q	1996	5	In-order
PowerPC 604	1994	6	Superscalar, out-of-order execution, 6 execution units, SMP support
PowerPC 620	1997	5	Out-of-order execution, SMP support
PWRficient PA6T	2007		Superscalar, out-of-order execution, 6 execution units
R4000	1991	8	Scalar
StrongARM SA-110	1996	5	Scalar, in-order
SuperH SH2		5
SuperH SH2A	2006	5	Superscalar, Harvard architecture
SPARC			Superscalar
hyperSPARC	1993		Superscalar
SuperSPARC	1992		Superscalar, in-order
SPARC64 VI/VII/VII+	2007		Superscalar, out-of-order^[6]
UltraSPARC	1995	9
UltraSPARC T1	2005	6	Open source, multithreading, multi-core, 4 threads per core, scalar, in-order, integrated memory controller, 1 FPU
UltraSPARC T2	2007	8	Open source, multithreading, multi-core, 8 threads per core
SPARC T3	2010	8	Multithreading, multi-core, 8 threads per core, SMP, 16 cores per chip, 2 MB L3 cache, in-order, hardware random number generator
Oracle SPARC T4	2011	16	Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, SMP, 8 cores per chip, out-of-order, 4 MB L3 cache, out-of order, Hardware random number generator
Oracle Corporation SPARC T5	2013	16	Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 16 cores per chip, out-of-order, 16-way associative shared 8 MB L3 cache, hardware-assisted cryptographic acceleration, stream-processing unit, out-of order execution, RAS features, 16 cryptography units per chip, hardware random number generator
Oracle SPARC M5		16	Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 6 cores per chip, out-of-order, 48 MB L3 cache, out-of order execution, RAS features, stream-processing unit, hardware-assisted cryptographic acceleration, 6 cryptography units per chip, Hardware random number generator
Fujitsu SPARC64 X			Multithreading, multi-core, 2-way simultaneous multithreading, 16 cores per chip, out-of order, 24 MB L2 cache, out-of order, RAS features
Imagination Technologies MIPS Warrior
VIA C7	2005		In-order execution
VIA Nano (Isaiah)	2008		Superscalar out-of-order execution, branch prediction, 7 execution units
WinChip	1997	4	In-order execution

Notes[edit]

^ According to AMDs K5 data sheet. The design incorporates many ideas and functional parts from AMDs Am29000 32-bit RISC microprocessor design.
^ According to AMDs K6 data sheet. The design is based on NexGen's Nx686 and therefore not a direct successor to the K5.

References[edit]

^ "Products We Design". amd.com. Retrieved 19 January 2014.
^ ^a ^b "wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer". cdn3.wccftech.com. Archived from the original on 17 October 2013. Retrieved 19 January 2014.
^ "Cyrix 5x86 ("M1sc")". pcguide.com. Retrieved 19 January 2014.
^ ^a ^b "Computer Science 246: Computer Architecture" (PDF). Harvard University. Archived from the original (PDF) on 24 December 2013. Retrieved 23 December 2013. P6 pipeline
^ Intel Itanium 2 Processor Hardware Developer's Manual. p. 14. http://www.intel.com/design/itanium2/manuals/25110901.pdf (2002) Retrieved 28 November 2011
^ "Multi Core Processor SPARC64 Series : Fujitsu Global". fujitsu.com. Retrieved 19 January 2014.

[1] According to AMDs K5 data sheet. The design incorporates many ideas and functional parts from AMDs Am29000 32-bit RISC microprocessor design.

[2] According to AMDs K6 data sheet. The design is based on NexGen's Nx686 and therefore not a direct successor to the K5.

[amd-3] "Products We Design". amd.com. Retrieved 19 January 2014.

[wccftech-4] "wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer". cdn3.wccftech.com. Archived from the original on 17 October 2013. Retrieved 19 January 2014.

[pcguide-5] "Cyrix 5x86 ("M1sc")". pcguide.com. Retrieved 19 January 2014.

[P6-6] "Computer Science 246: Computer Architecture" (PDF). Harvard University. Archived from the original (PDF) on 24 December 2013. Retrieved 23 December 2013. P6 pipeline

[7] Intel Itanium 2 Processor Hardware Developer's Manual. p. 14. http://www.intel.com/design/itanium2/manuals/25110901.pdf (2002) Retrieved 28 November 2011

[fujitsu-8] "Multi Core Processor SPARC64 Series : Fujitsu Global". fujitsu.com. Retrieved 19 January 2014.

[a]

[b]

[1]

[2]

[3]

[4]

[5]

[6]

See also[edit]

Notes[edit]

References[edit]