It makes it clear to the human what's going on, but more important it makes it really clear to the compiler's optimizer how the data's being accessed. It can then use that information to replace entire spans of code with auto-parallelized equivalents taking advantage of SSEx and/or multiple threads.

I think the compiler in use at the time just wasn't very smart.