Thanks informal and JF-AMD for clearing that up, now i was thinking about how is STORE used in bulldozer i mean if say a 256bit operation has to be stored can this be done in one cycle or will it need two cycles.

Is 256bit STORE possible in one cycle and does it have to be broken into two 128bit STORE or not?