Results 1 to 22 of 22

Thread: CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug

  1. #1
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    213

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug

    ---------------------------------------------------------------------------------------------------

    This project is looking at CL-based performance and scaling on Sandybridge, taking into account the effects of CL/WCL/RTL.

    The aim is to give meaningful context to CL performance and performance change, based on testing and observation.

    For quick reference, see tables in Post 2. The tables briefly summarise the contents of Posts 3 and 4.

    ---------------------------------------------------------------------------------------------------

    CL = CAS Latency

    WCL = CAS Write Latency (also known as CWL or WL)

    RTL = Round Trip Latency


    ---------------------------------------------------------------------------------------------------
    Index
    ---------------------------------------------------------------------------------------------------

    Post 1 :

    • Index.


    Post 2 :

    • Hyper & PSC : CL and WCL scaling tables for RTL, vDIMM, AIDA Read/Latency, 32M ranking.


    Post 3 :

    • PSC WCL > CL performance bug.


    Post 4 :

    • Hyper & PSC : CL/WCL/RTL 32M performance scaling @ x-10-7-26 (direct comparison timing & subtiming).

    • table+chart format to show setting-to-setting scaling for CL/WCL/RTL changes.

    • 32M + AIDA screenshots.


    Post 5 :

    • Hyper : CL10 to CL6 1067/933/800 32M performance scaling @ x-7-5-20 + optimal WCL.

    • charts and 32M + AIDA screenshots.


    Post 6 : reserved.


    ---------------------------------------------------------------------------------------------------


    by cheapseats 2011-2012.
    Last edited by cheapseats; 04-06-2012 at 03:01 PM.

    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  2. #2
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    213
    CL | WCL | RTL : Tables (Hyper & PSC)

    Tables show generalised summary of found behaviour (SB 1155)

    Tables assume* WCL range 8 max to 6 min
    (*as found for test hardware)


    RTL | vDIMM | AIDA read : scaling table for CL & WCL combinations





    ---------------------------------------------------------------------------------------------------


    Superpi 32M : performance ranking table for CL & WCL combinations

    Ranking position based on 32M test results (see Post 4)

    Note: where PSC WCL > CL, significant overall performance limitations are found in 32M.


    Last edited by cheapseats; 04-06-2012 at 03:02 PM.

    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  3. #3
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    213
    PSC WCL > CL performance bug

    (also see tables in Post 2 and charts in Post 4)

    ---------------------------------------------------------------------------------------------------

    Please note:

    This 'analysis' is based on my observations and test results. It is limited in depth and is non-technical.

    The general principles and behaviour described here for PSC on SB also may be found with other IC-types (not tested), and/or other memory controllers, eg on SB-E, IB, AMD (not tested).

    ---------------------------------------------------------------------------------------------------


    Overview

    The issue can be shown as the key difference in CL-based scaling behaviour found between PSC and Hyper based DIMMs on SB.

    With PSC, significant performance limitations apply when WCL value is greater than CL value (WCL > CL).

    PSC WCL > CL conditions produce a negative offset in net read performance scaling, and restrict overall performance.

    The negative effect on net read performance is due to increased min RTL per CL where PSC WCL > CL.
    This is compared to min RTL per CL where PSC WCL = CL, or compared to min RTL per CL with Hyper.
    (effects of RTL changes shown in AIDA Read and Latency, for example)

    The PSC WCL effect on RTL is further to write-based effects of WCL changes, which are the same for PSC and Hyper.
    (effects of WCL changes shown in AIDA Copy, for example)


    RTL change





    General behaviour:

    • RTL changes directly affect net read performance (at IMC) and significantly affect min vDIMM.


    • CL changes do not directly affect net read performance, and do not directly affect min vDIMM (except marginally).


    • CL changes which affect net read performance do so indirectly, by allowing (forcing) RTL change.


    • CL changes which do not allow RTL change, do not affect net read performance.


    • CL changes either allow or do not allow RTL changes, as determined by conditions and IC-type.


    • WCL changes either allow or do not allow RTL changes, as determined by conditions and IC-type.


    IC-based differences in RTL behaviour:

    • Hyper CL changes allow (force) RTL changes, regardless of WCL.

    • Hyper WCL changes do not affect RTL.



    • PSC CL changes either allow or do not allow RTL changes, as determined by (initial and final) CL and WCL conditions.

    • PSC WCL changes either allow or do not allow RTL changes, as determined by (initial and final) CL and WCL conditions.


    In general for PSC compared to Hyper:

    if WCL > CL, then SB min RTL with PSC is higher per CL than with Hyper.
    if WCL = CL, then SB min RTL with PSC is the same per CL as with Hyper.



    RTL 'thresholds' with PSC CL and WCL

    • Threshold values can be described in terms of CL and WCL values in DRAM clocks.

    • Thresholds are crossed by CL or WCL changes between threshold values of CL-WCL or WCL-CL (DRAM clocks).

    • Crossing or not crossing of thresholds by WCL or CL changes determines effect on RTL.


    If a threshold is crossed:

    PSC CL +/-1 change does not allow RTL change.
    PSC WCL +/-1 change allows (forces) RTL change.

    If a threshold is not crossed:

    PSC CL +/-1 change allows (forces) RTL change.
    PSC WCL +/-1 change does not allow RTL change.


    PSC WCL changes which affect min RTL:

    Thresholds are crossed by WCL changes between:

    WCL-CL = 1 and WCL-CL = 0 (DRAM clocks)

    and (@ CL10) between:

    WCL-CL = -3 and WCL-CL = -4

    or if described by WCL value in terms of CL value, between:

    WCL = CL+1`and WCL = CL

    and (@ CL10) between:

    WCL = CL-3 and WCL = CL-4





    PSC CL changes which do not affect min RTL:

    Thresholds are crossed by CL changes between:

    CL-WCL = 0 and CL-WCL = -1 (DRAM clocks)

    and between:

    CL-WCL = 3 and CL-WCL = 4

    or if described by CL value in terms of WCL value, between:

    CL = WCL`and CL = WCL-1

    and between:

    CL = WCL+3 and CL = WCL+4





    Variable RTL and vDIMM per CL

    Due to the restriction (per CL) of min RTL and net read performance for PSC @ WCL > CL, it follows that PSC min vDIMM (per CL) is significantly lower where WCL > CL than where WCL = CL (or WCL < CL).

    This is not found with Hyper as changes to min RTL are forced by CL changes, regardless of WCL.

    For PSC at CL values which are lower than max WCL value but are equal to or higher than min WCL value, it is found that min RTL (per CL) and min vDIMM (per CL) are both significantly variable with WCL.

    therefore:

    if WCL8 = max, then 'binning' of PSC @ CL7 or CL6 is not meaningful without consideration of WCL.
    (due to variability in min RTL and min vDIMM, with WCL changes between WCL > CL and WCL = CL)

    if WCL6 = min, then PSC @ CL < 6 is not meaningfully possible.
    (due to restriction of min RTL, by min WCL)


    PSC WCL > CL Superpi 32M performance summary

    (comparisons for CL/WCL/RTL changes, all other timing and subtiming unchanged)


    • PSC @ CL7 WCL8 has 8% (approx) higher min vDIMM but lower 32M performance than CL9 WCL6.

    • PSC @ CL6 WCL8 (or WCL7) has 11% (approx) higher min vDIMM but lower 32M performance than CL8 WCL6.


    General conclusions

    In the context of both performance and vDIMM scaling derived from CL/WCL/RTL changes,
    the conditions created by PSC WCL > CL are significantly restrictive, misleading, and are effectively bugged.

    As such, the use of PSC WCL > CL on SB should be viewed only as a performance-limiting exploit to minimise vDIMM per CL.

    This may be seen as useful for CPU-Z validation of PSC @ CL7 and CL6, but is shown to be counter-productive and nonsensical in terms of performance optimisation.



    ---------------------------------------------------------------------------------------------------

    Scaling behaviour and 32M performance ranking tables in Post 2.
    32M performance tests and scaling charts in Post 4.
    Last edited by cheapseats; 04-06-2012 at 03:17 PM.

    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  4. #4
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    213
    CL | WCL | RTL : Superpi 32M performance scaling charts

    Hyper & PSC @ x-10-7-26

    ---------------------------------------------------------------------------------------------------

    2600K / P67A-UD5 (B2) F7e / 2x2GB Corsair CMGTX2 (Hyper) / 2x2GB Exceleram EB3103A (PSC)

    100.0 x 50 (5GHz)

    DDR3-2133 (1067MHz) / DDR3-1866 (933MHz) / DDR3-1600 (800MHz)

    BIOS manual set : x-10-7-26 / tRC 33 / tRFC 72 / tWTR 4 / tRRD 4 / tRTP 4 / tWR 6 / tFAW 16 / CR1

    XP Pro sp2 + Spitweaker v1.0 (copy waza)

    ---------------------------------------------------------------------------------------------------

    Scaling shown in 3 ways:

    (1) by 'sequential' CL & WCL change :
    eg from CL9 @ WCL8, 7, 6; to CL8 @ WCL8, 7, 6; etc
    (zig-zag line)

    (2) by fixed WCL with CL change :
    eg CL9-CL6 @ WCL8; CL9-CL6 @ WCL7; CL9-CL6 @ WCL6
    (blue, orange, green lines)

    (3) grouped by WCL offset to CL :
    eg 'WCL = CL+1' grouping; 'WCL = CL, & WCL = CL-1' grouping; 'WCL = CL-2' grouping
    ('parallel' lines)

    ---------------------------------------------------------------------------------------------------

    Hyper : 1067MHz : CL9 to CL6 (x-10-7-26)

    (1) (2) (3)

    ---------------------------------------------------------------------------------------------------

    PSC : 1067MHz : CL9 to CL6 (x-10-7-26)

    (1) (2) (3)

    ---------------------------------------------------------------------------------------------------

    PSC : 933MHz : CL9 to CL6 (x-10-7-26)

    (1) (2) (3)

    ---------------------------------------------------------------------------------------------------

    PSC : 800MHz : CL9 to CL6 (x-10-7-26)

    (1) (2) (3)


    ---------------------------------------------------------------------------------------------------

    Hyper & PSC : CL9 to CL6 (x-10-7-26)

    single image for each scaling type (as columns^)

    (1) 'sequential'
    (2) 'fixed WCL'
    (3) 'parallel'

    note : 1327 x 4040 px / ~300kb (per image)


    (1) (2) (3)


    Hyper & PSC : CL10 to CL6 (x-10-7-26)

    note : 1513 x 4536 px / ~380kb (per image)


    (1) (2) (3)


    ---------------------------------------------------------------------------------------------------
    ---------------------------------------------------------------------------------------------------


    32M Screenshots

    note: all screens with AIDA v1.50 1200
    (approx for newer AIDA versions: Read = -100 to -150Mb/s; Copy = -1Gb/s; Latency = same)

    ---------------------------------------------------------------------------------------------------


    Hyper : 1067MHz : CL10 to CL6 (x-10-7-26)

    ---------------------------------------------------------------------------------------------------


    PSC : 1067MHz : CL10 to CL6 (x-10-7-26)

    ---------------------------------------------------------------------------------------------------


    PSC : 933MHz : CL10 to CL6 (x-10-7-26)

    ---------------------------------------------------------------------------------------------------


    PSC : 800MHz : CL10 to CL6 (x-10-7-26)
    Last edited by cheapseats; 04-06-2012 at 03:07 PM.

    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  5. #5
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    213
    Hyper CL | RTL : Superpi 32M performance scaling charts

    (x-7-5-20 + optimal WCL)

    ---------------------------------------------------------------------------------------------------

    2600K / P67A-UD5 (B2) F7e / 2x2GB Corsair CMGTX2 (Hyper)

    100.0 x 50 (5GHz)

    DDR3-2133 (1067MHz) / DDR3-1866 (933MHz) / DDR3-1600 (800MHz)

    BIOS manual set : x-7-5-20 / tRC 25 / tRFC 55 / tWTR 4 / tRRD 4 / tRTP 4 / tWR 6 / tFAW 16 / CR1

    XP Pro sp2 + Spitweaker v1.0 (copy waza)

    ---------------------------------------------------------------------------------------------------

    Hyper : 1067MHz : CL10 to CL6 (x-7-5-20)



    ---------------------------------------------------------------------------------------------------

    Hyper : 933MHz : CL10 to CL6 (x-7-5-20)



    ---------------------------------------------------------------------------------------------------

    Hyper : 800MHz : CL10 to CL6 (x-7-5-20)



    ---------------------------------------------------------------------------------------------------

    1414 x 3018px / ~220kb



    ---------------------------------------------------------------------------------------------------
    ---------------------------------------------------------------------------------------------------

    32M Screenshots

    note: all screens with AIDA v1.50 1200
    (approx for newer AIDA versions: Read = -100 to -150Mb/s; Copy = -1Gb/s; Latency = same)

    ---------------------------------------------------------------------------------------------------

    Hyper : 1067/933/800MHz : CL10 to CL6 (x-7-5-20)

    Last edited by cheapseats; 04-06-2012 at 03:09 PM.

    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  6. #6
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    213
    reserved.
    Last edited by cheapseats; 04-06-2012 at 03:10 PM.

    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  7. #7
    Aussie God
    Join Date
    Feb 2005
    Location
    Copenhagen, Denmark
    Posts
    4,702
    Great job, and that is ALOT of testing indeed!
    Thanks for heads up.
    Competition ranking;
    2005; Netbyte, Karise/Denmark #1 @ PiFast
    2008; AOCM II, Minfeld/Germany #2 @ 01SE/AM3/8M (w. Oliver)
    2009; AMD-OC, Viborg/Denmark #2 @ max freq Gigabyte TweaKING, Paris/France #4 @ 32M/01SE (w. Vanovich)
    2010: Gigabyte P55, Hamburg/Germany #6 @ wprime 1024/SPI 1M (w. THC) AOCM III, Minfeld/Germany #6 @ 01SE/AM3/1M/8M (w. NeoForce)

    Spectating;
    2010; GOOC 2010 Many thanks to Gigabyte!


  8. #8
    Brotherhood
    Join Date
    Apr 2005
    Location
    Land Of KADISOKA
    Posts
    1,231
    Great experience this awesome really...
    Quote Originally Posted by LardArse View Post
    i think you are asking the wrong person about safety limits, but

  9. #9
    PIfection
    Join Date
    May 2006
    Posts
    1,002
    seriously sick testing dude, when i get time i will read the whole thing, thanks for sharing the hard work

  10. #10
    I am Xtreme
    Join Date
    Aug 2008
    Posts
    5,584
    Thanks ChiefSeats
    Last edited by Hondacity; 04-06-2012 at 06:53 PM.


  11. #11
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    213
    Thanks very much for your comments guys

    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  12. #12
    Xtreme Enthusiast
    Join Date
    Jan 2005
    Location
    Italy
    Posts
    960
    Quote Originally Posted by Hazzan View Post
    Great experience this awesome really...
    +1!!!
    Puni's rig _.->*new?D*<-._ PRIME STABLE OC - CLICK HERE: Cooler Master HAF X NVIDIA EDITION Asus M4A89TD PRO/USB3 + AMD FX-8350 @210x21=4,42Ghz blue voltages (NB/HT @2320Mhz/2730Mhz) + Asus/Zotac GTX580 SLI @851-1702/4204Mhz 1,063V + 8GB of Samsung 30nm DDR3 @1966Mhz 9-9-9-27 1,5V | Caviar Black 3TB SATA III RAID0 | BenQ XL2420TX NVIDIA 3D Vision 2 Lightboost ready | Win7x64 | 3DMark11 - CLICK HERE | 3DMark Vantage - CLICK HERE

  13. #13
    3D Team Captain
    Join Date
    May 2007
    Location
    Munich, Germany
    Posts
    4,130
    Wow, that must have been a lot of work! Thanks for posting!

    Quote Originally Posted by chew* View Post
    You can never have enough D9's.

  14. #14
    Xtreme Member
    Join Date
    Aug 2007
    Location
    Germany/Bavaria
    Posts
    463
    Awesome thread, thank you for sharing the information!
    I'll take a deeper look later ...
    The Initial value finished
    ...
    PI calculation is done!


    Quote Originally Posted by Zeus View Post
    Software tweaks are not for sale.

  15. #15
    Memory Addict
    Join Date
    Aug 2002
    Location
    Brisbane, Australia
    Posts
    11,801
    wow, I am absent from forums for a while, and such great info gets posted!

    awesome stuff cheapseats !
    ---

  16. #16
    I am Xtreme
    Join Date
    Jan 2005
    Posts
    4,812
    Whoah! One of the better threads I've seen here in the last year or so.

    Where courage, motivation and ignorance meet, a persistent idiot awakens.

  17. #17
    Xtreme Enthusiast
    Join Date
    Sep 2008
    Posts
    518
    damn, that's some serious testing
    thanks for sharing

  18. #18
    Xtreme Member
    Join Date
    Feb 2010
    Location
    Sweden
    Posts
    285
    Very helpful, thanks a lot! I tried 8-9-6 wcl 6 instead of 7-9-7 wcl 8(min with cl7) with ripjaws bbse and got around 1 sec in 32m
    Overclocking, it's a lifestyle

  19. #19
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Surrey, UK
    Posts
    213
    Thanks to all

    Quote Originally Posted by Calathea View Post
    Very helpful, thanks a lot! I tried 8-9-6 wcl 6 instead of 7-9-7 wcl 8(min with cl7) with ripjaws bbse and got around 1 sec in 32m
    nice to hear you got a boost

    WCL 7 or 6 should also be possible at CL7, but setting either of those WCL values
    will force RTL decrease compared with RTL @ CL7 WCL8.

    The RTL decrease will require a significant increase in min vdimm, eg ~11%
    so CL7 WCL7 (or WCL6) will be unbootable until vdimm is raised sufficiently.

    Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)

    CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug



  20. #20
    Xtreme X.I.P.
    Join Date
    Feb 2006
    Posts
    2,745
    Woah, lots of data must of taken a huge chunk of time. Bottom line is Hyper ICs could handle higher IO at a given clock frequency (where signal integrity is not a limting factor) than the newer parts.

    The RTL changes simply restrict the peak IO when multiple transactions are made and provide a little bit of breathing room plus are affected by any IO buffer delays and clock skew. What you are effectively playing with is a combination of setting adjustments needed to remain within the IO capabilities of the memory ICs at a given level of VDIMM.

    The overall balance for scores in a benchmark as far as adjustments to WCL versus CL come from the associated read to write delays, some of which are limited due to the architecture, and electrical mechanisms.
    ASUS North America Technical Marketing - If you are based outside North America and require technical assistance or have a query please contact ASUS Support for your region.


    Rampage IV Extreme tweaking guide

    ASUS Z77 UEFI Tuning Guide for overclocking

    Maximus 5 Gene OC Guide

    Maximus VI Series UEFI OC Guide

  21. #21
    Aussie God
    Join Date
    Feb 2005
    Location
    Copenhagen, Denmark
    Posts
    4,702
    The thread should deff be sticky...
    Competition ranking;
    2005; Netbyte, Karise/Denmark #1 @ PiFast
    2008; AOCM II, Minfeld/Germany #2 @ 01SE/AM3/8M (w. Oliver)
    2009; AMD-OC, Viborg/Denmark #2 @ max freq Gigabyte TweaKING, Paris/France #4 @ 32M/01SE (w. Vanovich)
    2010: Gigabyte P55, Hamburg/Germany #6 @ wprime 1024/SPI 1M (w. THC) AOCM III, Minfeld/Germany #6 @ 01SE/AM3/1M/8M (w. NeoForce)

    Spectating;
    2010; GOOC 2010 Many thanks to Gigabyte!


  22. #22
    Registered User
    Join Date
    Aug 2010
    Posts
    6
    Respect for sharing the result of your hard work Thumbs up!

    Thank you man

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •