Results Summary: Our parallel implementation of ray-casting delivers
close to 5.8x performance improvement on quad-core Nehalem
over an optimized scalar baseline version running on a single core
Harpertown. This enables us to render a large 750x750x1000 dataset
in 2.5 seconds. In comparison, our optimized Nvidia GTX280 implementation
achieves from 5x to 8x speed-up over the scalar baseline.
In addition, we show, via detailed performance simulation, that
a 16-core Intel Larrabee [26] delivers around 10x speed-up over single
core Harpertown, which is on average 1.5x higher performance
than a GTX280 at half the flops. At higher core count, performance
is dominated by the overhead of data transfer, so we developed a lossless
SIMD-friendly compression algorithm that allows 32-core Intel
Larrabee to achieve a 24x speed-up over the scalar baseline.
Bookmarks