I wonder what algorithm it uses...

750000! can be done well under a second with Mathematica 6.0 single-threaded on any i7.

Yet it takes 1+ minutes using CUDA?



*Sorry for being off topic.