Computing has come a long way , remember the cray-1, it could do one Gigaflop . Recently AMD announced Teraflop in a box : One Opteron processor and two R600 Gpu’s combining to dish out more than 1 trillion floating-point calculations per second using a general “multiply-add” calculation . Intel at their Beijing IDF showed off their concept 80 core teraflop processor . The Cell B.E is a multicore processor of sorts with one PPE and eight SPE’s capable of crunching out 256Gflops . Teraflop scale processing is already happening on the cell with folding@home and the PS3 and according to reports its showing some great results.

With the industry moving towards this parallel processing revolution , the development on the software side of things seems to be remarkably slow , What I mean here is that the software which uses these multicore processors should be optimized for the same , Most of the ISV’s don’t have in-house programming expertize to do multithreading applications.Switching from one core to many core presents its own set of developmental , and debugging challenges and a large percentage of mission-critical enterprise applications are not “multi-core optimized ” leading to applications not showing any kind of performance boost when switched to multicore processors and in some cases showing even poorer performance, this happens because a single-threaded application can’t utilize the additional cores in the processor efficiently without sacrificing ordered processing. This results in a huge drop in performance due to cores being not optimally utilized . Currently there are no automated compilers for multicore processors , then there is the problem of application priority, there are many such issues which need to be sorted out before the transition to multicore can be complete.

Moving further,The Cell B.E is going mainstream slowly with IBM announcing a new line of servers powered by the Cell B.E . Researchers have already released initial details about the EDGE processor architecture , which stands for Explicit Data Graph Execution.Instead of one instruction at a time, EDGE handles large blocks of data all at once. Using many copies of a small number of replicated tiles, the target for TRIPS ( the first prototype chip on the EDGE architecture ) by 2009 is to hit 5 TFLOPs on 32nm manufacturing. Intel is also looking at Larabee to give them similar numbers.
In the future , there is a good chance that we will be moving to some non X-86 ISA processor , which does the job in a better more efficient way.

More on EDGE and TRIPS
A whitepaper on EDGE