GPU blocks in parallel is a terrible idea. If you think about what happends when you split wateflow accross 2 channels and the reason why we try to get the highest flowrate in loop, youll realize why.
There is NO PERFORMANCE REASON for turning the storm in any one direction. Convection heat transfer can only happen in still or quasi-still liquids. Ultra-turblulent flow @ 1+gpm through the tiny channels in a storm makes that whole argument risible, I'm afraid.
There is an advantage if you have a Rev. 1 stomr, in that bubbles tend to get trapped in the top part if outlet below inlet. This does not happen in Rev. 2 Storms.
There is also 0 measurable performance difference between different loop orders for end-users, because trying to make your loop comply to some idealized loop order will always screw you over.
The 0.1c you could gain by putting the rad after the pump to shed those 9W of heat before the coolant hits the CPU gets overshadows by the 0.5c you gain from the extra tubing and bends. Not that you can measure anything like that with onboard sensors, and any temp difference less than 2-3c measured with onboard sensors can be safely attributed to ambient variations and whatnot, unless youre measure ambeint with a caled probe and run multiple samples and average the deltas.
Bottom line: Always use series, always go for the shortest loop with the least sharp bends and no kinks. Everything else is meaningless complication with no real effect on performance.
