Results 1 to 11 of 11

Thread: [AMD] Flip Queue Size / [Nvidia] Maximum Pre-rendered Frames

  1. #1
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060

    Question [AMD] Flip Queue Size / [Nvidia] Maximum Pre-rendered Frames

    I stumbled across this option in RadeonPro...
    Supposedly both Nvidia's and AMD's versions do about the same.
    Found this thread about the issue (worth reading, IMO).
    The default is 3 and usually increasing the number will decrease your framerate and increase responce time. The flipside is that everything will look smoother. Setting it to 0 will make things respond a little faster, but makes things choopy from my understanding. It is basically the number of frames the GPU renders in advance before displaying it on the screen.
    I read a bunch of threads. Some say that setting it to 2 removes stuttering in BFBC2, some say that unless you set it to 0 Oblivion is bound to lag...
    I would assume the number of frames requested by this setting is in addition to the double-buffering that games typically use.
    Is this statement correct? This means that setting it to 0 would override Triple Buffering...
    Or does it work like this:
    I assume that the conversion works like this, though the wording is confusing:
    0 frames ahead = single buffering (not always possible, never a good idea).
    1 frame ahead = double buffering (minimum suggested for 3D graphics).
    2 frames ahead = triple buffering (probably best setting for most users)

    3+ frames ahead will not increase frame rate, but will use extra video ram and increase latency.
    Then again I don't think single buffering is quite possible, I gave a couple games a try at "0" and couldn't notice a dramatic difference...

    One more quote from that thread to add some confusion:
    It is not clear to me if the "flip-queue" is the same as nVidia's "frames to render ahead" setting, but it might be. For one thing, "render" implies something the GPU is doing, while the post indicates that "flip-queue" size sets how many frames the CPU can progress ahead of the GPU, or in other words, something the CPU is doing.

    If they are the same thing, then I was totally wrong, and "frames to render ahead" has nothing to do with single/double/triple-buffering - and furthermore, it would always be a valid setting regardless of VSYNC. Of course, in that case, it is named incorrectly, since it how many frames ahead of the video card the CPU can get has nothing to do with how many frames are rendered ahead... which is why the only real way to figure out what the setting means is to test it.
    OK, so my situation is: I have 2 Radeon cards in CFX, I force VSync, and I force Triple Buffering. What would be the best Flip Queue Size setting for me? And will I have to re-consider Triple Buffering (that depends on the answer to the previous question, I guess).
    I hate tearing (hence VSync on), but having extra input lag is definitely not something I want (hence the whole issue).

    Anyway, feel free to share your ideas and experience.
    Last edited by zalbard; 02-01-2011 at 12:29 PM.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  2. #2
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Okay, found some more information about the subject: [AnandTech] Triple Buffering: Why We Love It
    There has been a lot of discussion in the comments of the differences between the page flipping method we are discussing in this article and implementations of a render ahead queue. In render ahead, frames cannot be dropped. This means that when the queue is full, what is displayed can have a lot more lag. Microsoft doesn't implement triple buffering in DirectX, they implement render ahead (from 0 to 8 frames with 3 being the default).

    The major difference in the technique we've described here is the ability to drop frames when they are outdated. Render ahead forces older frames to be displayed. Queues can help smoothness and stuttering as a few really quick frames followed by a slow frame end up being evened out and spread over more frames. But the price you pay is in lag (the more frames in the queue, the longer it takes to empty the queue and the older the frames are that are displayed).

    In order to maintain smoothness and reduce lag, it is possible to hold on to a limited number of frames in case they are needed but to drop them if they are not (if they get too old). This requires a little more intelligent management of already rendered frames and goes a bit beyond the scope of this article.

    Some game developers implement a short render ahead queue and call it triple buffering (because it uses three total buffers). They certainly cannot be faulted for this, as there has been a lot of confusion on the subject and under certain circumstances this setup will perform the same as triple buffering as we have described it (but definitely not when framerate is higher than refresh rate).

    Both techniques allow the graphics card to continue doing work while waiting for a vertical refresh when one frame is already completed. When using double buffering (and no render queue), while vertical sync is enabled, after one frame is completed nothing else can be rendered out which can cause stalling and degrade actual performance.

    When vsync is not enabled, nothing more than double buffering is needed for performance, but a render queue can still be used to smooth framerate if it requires a few old frames to be kept around. This can keep instantaneous framerate from dipping in some cases, but will (even with double buffering and vsync disabled) add lag and input latency. Even without vsync, render ahead is required for multiGPU systems to work efficiently.
    Since we can force Triple Buffering in D3D these days, I see no reason not to set Flip Queue Size to 0 to reduce the lag.

    We need some testing done using Multi-GPU setups, to see if Flip Queue Size affects performance (and CF / SLI scaling). I suppose this would be best done with VSync off.
    Last edited by zalbard; 02-01-2011 at 01:26 PM.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  3. #3
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Performance difference people are talking about is simply not there.
    FQS = 3 (default) => P8427 in 3DM11, 1397 in Heaven.
    FQS = 0 => P8418 in 3DM11, 1396 in Heaven.
    Setting FQS to 0 is performance free input lag reduction procedure.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  4. #4
    Xtreme Addict
    Join Date
    Sep 2010
    Location
    US, MI
    Posts
    1,680
    On nvidia cards you can go up to 255 fps render ahead.
    What happens is that you run a refresh rate lower then what the card can actually put out.
    So you can compensate and force the card to run at full speed by using that option.
    But this sometimes creates a shiz load of mouse lag.

    There's a way around this by messing with a few blitting queue options but the timing gets messed up.
    Set to "0" .

    The default option on nvidia cards is 3, which will cause you major mouse lag on some games, set to 0 .

    For benching, max that sucker if you want lol.
    But only for benching.

    Edit:
    My notes.

    Code:
    ;Prerender Limit
    HKR,	,	D3D_98764205,	%REG_BINARY%,	00,00,00,00	;0 Frames
    ;HKR,	,	D3D_98764205,	%REG_BINARY%,	01,00,00,00	;1 Frame ,1st Limit Shuts off
    ;HKR,	,	D3D_98764205,	%REG_BINARY%,	02,00,00,00	;2 Frames ,Barely Noticable Mouse Lag
    ;HKR,	,	D3D_98764205,	%REG_BINARY%,	0A,00,00,00	;10 Frames ,Low Low ,High High
    ;HKR,	,	D3D_98764205,	%REG_BINARY%,	0B,00,00,00	;11 Frames ,High Low ,Low High ,2nd Limit Shuts off
    ;HKR,	,	D3D_98764205,	%REG_BINARY%,	0F,00,00,00	;15 Frames
    ;HKR,	,	D3D_98764205,	%REG_BINARY%,	19,00,00,00	;25 Frames
    ;HKR,	,	D3D_98764205,	%REG_BINARY%,	1A,00,00,00	;26 Frames
    ;HKR,	,	D3D_98764205,	%REG_BINARY%,	FF,00,00,00	;255 Frames
    
    ;Prerender Frame Limit
    HKR,	,							OGL_MaxFramesAllowed,			%REG_BINARY%,	00,00,00,00	;0 Frames
    ;HKR,	,							OGL_MaxFramesAllowed,			%REG_BINARY%,	03,00,00,00	;3 Frames
    ;HKR,	,							OGL_MaxFramesAllowed,			%REG_BINARY%,	0B,00,00,00	;11 Frames
    ;HKR,	,							OGL_MaxFramesAllowed,			%REG_BINARY%,	FF,00,00,00	;255 Frames
    Back in the day, when nvidia was new, it used to work good at 255...
    They have timing issues with there driver though that have been around for a very long time now .
    It's a shame.
    You can actually get alot of fps from it if your card is being held back by your monitor.

    Oh and btw, if you're thinking of page flipping..., it's not.
    Only quadro's can do page flipping, well that's not 100% true but it's supposed to be for the avg person .
    Last edited by NEOAethyr; 02-02-2011 at 05:00 AM.

  5. #5
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    Out of curiosity I gave this a try with Trackmania United since my GTX480 has poor performance with the game (4870x2 is smooth as butter on it at way higher settings) and found that although my FPS did not increase my playability did. I went from the default of 3 to 2, btw.

    Now if I could just learn to drift i'd cut off the final second or two on my lap times..

    All along the watchtower the watchmen watch the eternal return.

  6. #6
    Xtreme Addict
    Join Date
    Sep 2010
    Location
    US, MI
    Posts
    1,680
    I was just thinking of this thread...

    Ehhh if I keep posting stuff like this I might piss off someone from nvidia, I didn't sign an nda but still.
    I'll refrain from posting to much src...
    I probably shouldn't be posting any.

    Code:
            /*
             * Check how many frames we are ahead
             *  we want to limit this since we have serious lag effect on fast CPUs
             */
            // free count for d3d stuff must be accurate
    Code:
    // Prerender limits
    #define D3D_REG_PRERENDERLIMIT_STRING                   "PRERENDERLIMIT"
    #define D3D_REG_PRERENDERLIMIT_MIN                      1
    #define D3D_REG_PRERENDERLIMIT_MAX                      1000
    Code:
        reg_entry szPreRenderLimitString;              // NOVSYNCPRERENDERLIMIT
    Code:
        // update frame tracker if appropriate
        if ((dwPbdFlags & DDBLT_LAST_PRESENTATION)
            ||
            // DX7 - if registry key is set, allow at most "regPreRenderLimit" blits to primary with the
            // DDBLT_WAIT flag to queue up. this prevents horrible lag if we queue too many.
            ((getDC()->nvD3DRegistryData.regD3DEnableBits1 & D3D_REG_LIMITQUEUEDFBBLITSENABLE_ENABLE) &&
             (dst.dwCaps & (DDSCAPS_FRONTBUFFER | DDSCAPS_PRIMARYSURFACE | DDSCAPS_VISIBLE))))
        {
            nvCheckQueuedBlits();
            nvUpdateBlitTracker();
        }
    Code:
    void nvCheckQueuedBlits (void)
    {
        // max queued frames is regPreRenderLimit (defaults to 3 - PC99 spec)
        DWORD dwMaxQueuedBlits = getDC()->nvD3DRegistryData.regPreRenderLimit;
    
        // read HW blit #
        DWORD dwCompletedBlit = getDC()->pBlitTracker->get();
    
        // have we progressed too far?
        while ((getDC()->dwCurrentBlit - dwCompletedBlit) > dwMaxQueuedBlits)
        {
            // kick off buffer
            nvPusherStart (TRUE);
            // wait for HW to catch up
            nvDelay();
            dwCompletedBlit = getDC()->pBlitTracker->get();
        }
    }

    This explains the lag when upping the values:
    I should go back and check out this kind of stuff again to see if I can put together my alt renderer...
    It's not 100% related to the above but it's interesting.

    Code:
    // SSYNCENABLE
    // This is a major hack to work around input lag in stupid applications that
    // want to use blits instead of flips to do there screen updates but then don't
    // make any getblitstatus calls to see if the blit has completed before beginning
    // to render the next frame.
    // This is not something that you want to have enabled unless you absolutely need
    // to have it enabled.
    #define D3D_REG_SSYNCENABLE_STRING                      "SCENESYNCENABLE"
    #define D3D_REG_SSYNCENABLE_MASK                        (1 << D3D_REG_BIT_SSYNCENABLE)
    #define D3D_REG_SSYNCENABLE_DISABLE                     (0 << D3D_REG_BIT_SSYNCENABLE)
    #define D3D_REG_SSYNCENABLE_ENABLE                      (1 << D3D_REG_BIT_SSYNCENABLE)
    They cuss here and there in there src it's funny lol...(literally, the comment "stupid" is not quite what I'm talking about but you get the idea maybe)
    There encryption keys are funny too...

    This is a mix of comments and src from diff drivers..., some as old as the nv4.
    (I have code up to the gf3 and that's all, 1st encrypted driver)

    I use the src code to help put together my driver inf once in a while.
    I used to beable to dump the values from the binary's along time ago, by hand, but it wasn't easy and that was years ago.
    My love of nvidia got me this src from microsoft japan...(probably one of the only cool things that microsoft has done for me lol)

    Blah blah lol.
    Anyways it's really for the best to set to 0, no lag that way.
    Last edited by NEOAethyr; 02-04-2011 at 12:16 AM.

  7. #7
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Quote Originally Posted by STEvil View Post
    Out of curiosity I gave this a try with Trackmania United since my GTX480 has poor performance with the game (4870x2 is smooth as butter on it at way higher settings) and found that although my FPS did not increase my playability did. I went from the default of 3 to 2, btw.
    Sounds about right.
    Should decrease input lag without any FPS difference or side effects.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  8. #8
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    Now if only the game would use more than 36% of my GTX480 so I could get a decent framerate with 20-50+ cars on screen..

    All along the watchtower the watchmen watch the eternal return.

  9. #9
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    AT: Exploring Input Lag Inside and Out
    For everyone without multiGPU soluitons, we recommend setting flip queue or max pre-rendered frames to either 1 or 0. Set it to 1 if framerate is always less than monitor refresh and set it to 0 if framerate is always greater than or equal to monitor refresh.
    Quite interesting read.
    Improperly handling vsync (enabling or disabling a 1 frame flip queue at the wrong time) can degrade performance by at least one additional whole frame. But with multiGPU options, we really don't have a choice. With more than one GPU in the system, you will want to leave maximum pre-rendered frames set to the default of 3 and allow the driver to handle everything. Input lag with multiGPU systems is something we will want to explore at a later time.
    I'm not sure I agree since I've never noticed that... This needs more testing.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  10. #10
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Well, did some testing... Results are pretty funny.
    This is for multi-GPU: CFX 2x 6970 2GB.
    Played an MMO called "Rift: Planes of Telara". DX9.0c based.
    VSync = Enabled.
    Flip Queue Size = 1: around 50% performance loss (from 60FPS to under 30...), a lot of input lag.
    Flip Queue Size = 2: FPS seemed a lot better (compared to 1), less input lag for sure.
    Flip Queue Size = 3 (which is driver default): FPS seemed a little higher (compared to 2). Hard to tell about input lag reduction...
    Flip Queue Size = 0: seems like I gained 1-3 extra FPS compared to 3 (might be placebo effect), seems to have the least input lag compared to all of the above (I think it's actually a little better than 3, but might be a placebo effect once again).

    So I'm sticking to 0 which so far seems to net me the best results.

    P.S. After doing some reading, apparently FQS = 0 can cause tearing when FPS > Refresh Rate, but this is irrelevant if you're using VSync... And quite frankly, this happens with default FQS setting anyway.
    Last edited by zalbard; 06-11-2011 at 05:04 AM.
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  11. #11
    Xtreme Addict
    Join Date
    Oct 2006
    Posts
    2,141
    I was testing some of this and with a single 6970 and a flip queue size of 5, Guild Wars lags out pretty bad. Peoples movements are all jerky and they just teleport 5ft at a time when moving with no animations playing. Same thing, but to a bit less extent with FQS of 4. and 3 and under were all exactly the same.
    Rig 1:
    ASUS P8Z77-V
    Intel i5 3570K @ 4.75GHz
    16GB of Team Xtreme DDR-2666 RAM (11-13-13-35-2T)
    Nvidia GTX 670 4GB SLI

    Rig 2:
    Asus Sabertooth 990FX
    AMD FX-8350 @ 5.6GHz
    16GB of Mushkin DDR-1866 RAM (8-9-8-26-1T)
    AMD 6950 with 6970 bios flash

    Yamakasi Catleap 2B overclocked to 120Hz refresh rate
    Audio-GD FUN DAC unit w/ AD797BRZ opamps
    Sennheiser PC350 headset w/ hero mod

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •