Talk:Transport triggered architecture

Interrupt latency[edit]

How to solve the interrupt latency problem?

Reading a result too early results in reading the result of a previously triggered operation, or in case no operation was triggered previously, the read value is undefined. On the other hand result must be read early enough to make sure the next operation result does not overwrite the current result in the output port.

For performance reasons there should be several operations in flight at any given moment. But the requirement not to read to early or to late will require that interrupts are disabled while operations are performed.

One approach might be to not handle interrupt in TTA processor but in an IO processor optimized for interrupt handling. Much like the ND-500.

--RogerJL 23:46, 6 November 2006 (UTC)[reply]

Yes, this is the another of the worst drawbacks of the TTA approach. Interrupts are costly to implement. Potentially very wide instruction width is the another problematic spot, which has been overcome by use of instruction compression. I added some text about the interrupts to the wiki page.

PekkaJ 19:06, 9 June 2007 (UTC)[reply]

I think part of what confuses people (and could be discussed on this page) is there are really at least two and perhaps more "classes" of CPUs that are of commercial interest. If you are looking to replace the Pentium 2000 or whatever version they are at, then yes you need superscalar out of order execution, etc. However, if you consider embedded systems, these are often not useful (and even undesirable) features. Where TTA seems especially interesting to me is as an embedded core for FPGAs. Two reasons for this: first, I can easily add "instructions" to the CPU even if I don't really grok CPU design. Second, on a large enough FPGA or ASIC I can get parallelism by sticking more than one core in the FPGA. In fact, we are seeing this with Intel's new architecture where you get a large number of cores but they are simpler and don't do out of order execution, etc.

So my point is, if you want to make a big superscalar out of order pipelined CPU, then TTA interrupts can be difficult (among other things). Not impossible, mind you, but difficult. But if you are not worried about these things then interrupts are highly doable (the One-Der CPU has interrupts although most of its FUs are not multi-cycle). In the case of multi-cycle FUs there are several possible options but as always you are trading against complexity. Sort of like criticizing a chop stick because it has no tines.

I was playing with some text, but decided I didn't like it. But my 3 points were: (1) TTA is simple for simple processors which may be useful when customizing or optimizing power consumption for embedded systems or systems where parallelism is achieved through multiple cores. (2) Adding complex functional units makes it harder for the TTA to save its state to handle interrupts and may require different strategies (e.g., handling interrupts autonomously in a functional unit). (3) Multiple execution has similar issues to VLIW and requires additional buses (which may not fully connect to each FU). This drives up complexity, chip real estate demands, and compiler scheduling complexity.

I may find some words I'm happy with and edit the page, but if someone else beats me to it, it won't hurt my feelings.

Wd5gnr (talk) 14:15, 26 November 2009 (UTC)[reply]

Copper?[edit]

Would a mention of the "Copper" display-synchronized coprocessor in Amiga computers be appropriate here? True, it's not a pure TTA, since it had two additional instructions for synchronizing to the display hardware, but it still had all the basic features (including write access to its own program counter) and was part of a widely sold desktop computer architecture. —Ilmari Karonen (talk) 23:10, 12 June 2008 (UTC)[reply]

Mali 200/400?[edit]

According to this http://limadriver.org/Lima+ISA/ it seems it is not strictly a TTA, although a more "exposed datapath architecture" than a regular "operation triggered" VLIW. It enables the software bypassing optimization (as one can refer to FU ports in the instruction word) but doesn't enable the transport freedom (one cannot move the operand data in programmer selected cycles).

PekkaJ (talk) 15:46, 27 September 2012 (UTC)[reply]