Manycore processors are specialist multi-core processors designed for a high degree of parallel processing, containing a large number of simpler, independent processor cores (e.g. 10s, 100s, or 1,000s). Manycore processors are used extensively in embedded computers and high-performance computing. As of July 2016, the world's fastest supercomputer (as ranked by the TOP500 list), the Chinese Sunway TaihuLight, obtains its performance from 40,960 SW26010 manycore processors, each containing 260 cores.
Contrast with multicore architecture
Manycore processors are distinct from multi-core processors in that they are optimised from the outset for a higher degree of explicit parallelism, and for higher throughput (or lower power consumption) at the expense of latency and lower single thread performance.
The broader category of multi-core processors, by contrast, are usually designed to efficiently run both parallel and serial code, and therefore place more emphasis on high single thread performance (e.g. devoting more silicon to out of order execution, deeper pipelines, more superscalar execution units, and larger, more general caches), and shared memory. These techniques devote runtime resources toward figuring out implicit parallelism in a single thread. They are used in systems where they have evolved continuously (with backward compatibility) from single core processors. They usually have a 'few' cores (e.g. 2,4,8), and may be complemented by a manycore accelerator (such as a GPU) in a heterogeneous system.
Cache coherency is an issue limiting the scaling of multicore processors. Manycore processors may bypass this with tricks such as message passing, scratchpad memory, DMA, partitioned global address space, or read-only/non-coherent caches. A manycore processor using a network on a chip and local memories gives software the opportunity to explicitly optimise the spatial layout of tasks (e.g. as seen in tooling developed for TrueNorth).
GPUs may be considered a form of manycore processor having multiple shader processing units, and only being suitable for highly parallel code (high throughput, but extremely poor single thread performance).
Suitable programming models
- Message passing interface
- OpenCL or other APIs supporting compute kernels
- Partitioned global address space
- Actor model
- ZettaScaler, Japanese PEZY Computing 2048-core chip systems, currently most energy-efficient (on Green500), and in 4th fastest supercomputer
- Sunway TaihuLight, a Chinese supercomputer, the fastest supercomputer in the world, using a home grown manycore architecture
- GPUs, which can be described as manycore vector processors
- Xeon Phi coprocessor, referred to as MIC (Many Integrated Cores)
- Adapteva Epiphany Architecture, a manycore chip using PGAS scratchpad memory
- Coherent Logix hx3100 Processor, a 100-core DSP/GPP processor based on HyperX Architecture
- Movidius Myriad 2, a manycore Vision processing unit
- Kalray, a manycore PCI-e accelerator for data-intensive tasks
- Teraflops Research Chip a manycore processor using message passing
- TrueNorth a neuromorphic processor with a manycore network on a chip architecture
- Massively parallel processor array
- Asynchronous array of simple processors
- Green arrays a manycore processor using message passing aimed at low power applications
- Eyeriss, a manycore processor designed for running convolutional neural nets for embedded vision applications
- XMOS Software Defined Silicon quad-core XS1-G4
- Vector processor
- High performance computing
- Computer cluster
- Vision processing unit
- memory access pattern
- Mattson, Tim (January 2010). "The Future of Many Core Computing: A tale of two processors" (PDF).
- Hendry, Gilbert; Kretschmann, Mark. "IBM Cell Processor" (PDF).
- Olofsson, Andreas; Nordström, Tomas; Ul-Abdin, Zain (2014). "Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany". arXiv: [cs.AR].
- Amir, Arnon (June 11, 2015). "IBM SyNAPSE Deep Dive Part 3". IBM Research.
- "cell architecture"."The Cell architecture is like nothing we have ever seen in commodity microprocessors, it is closer in design to multiprocessor vector supercomputers"
- Rick Merritt (June 20, 2011), "OEMs show systems with Intel MIC chips", www.eetimes.com, EE Times
- Barker, J; Bowden, J (2013). "Manycore Parallelism through OpenMP". OpenMP in the Era of Low Power Devices and Accelerators. IWOMP. Lecture Notes in Computer Science, vol 8122. Springer.
- Chen, Yu-Hsin and Krishna, Tushar and Emer, Joel and Sze, Vivienne (2016). "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks". IEEE International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers. pp. 262–263.
- Architecting solutions for the Manycore future, published on Feb 19, 2010 (more than one dead link in the slide)
- Eyeriss architecture