turning on sse is easy. vectorizing and optimizing can be hard especially with organizing memory in 16B alignments. there are a bunch of techniques like SOA, AOS and AVX will support scatter/gather.
btw, you should read the page of the article "why x87?". double precision isn't needed.





Bookmarks