Results 1 to 15 of 15

Thread: [News] AMD Says The Windows Thread Scheduler is "operating properly" for Ryzen.

  1. #1
    Join XS BOINC Team StyM's Avatar
    Join Date
    Mar 2006
    Location
    Tropics
    Posts
    9,468

    [News] AMD Says The Windows Thread Scheduler is "operating properly" for Ryzen.

    https://www.techpowerup.com/231472/a...erly-for-ryzen

    In a blog post that is sure to stun many users expecting a "thread scheduler patch" in modern Windows versions for AMD Zen-based CPUs, AMD has apparently investigated the reports of thread scheduling issues and found that "the Windows 10 thread scheduler is operating properly for "Zen," and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture."

    So, if you were expecting a Windows 10 or maybe even 7 patch to address some performance concerns, don't hold your breath. The company notes that they tested both Windows 10 and Windows 7 and they "do not believe there is an issue with scheduling differences between the two versions of Windows." In other words, 7 is already ok as far as scheduling, no patch required.
    The company does still recommend users utilize the "High Performance" plan in their Windows setup for best performance, claiming the software management of CPU speed interferes with Ryzen's native management. There may be an update forthcoming for the Windows "Balanced" plan to fix how it operates with Ryzen, but there will not be a scheduler update planned as of now.

  2. #2
    Registered User
    Join Date
    May 2009
    Location
    Caldas da Rainha, Portugal
    Posts
    38
    Would like AMD to tell me is what happens here is also normal:

    https://forums.anandtech.com/threads...#post-38789965

    Pay special attention to the 2nd video, but watch the 1st one 1st.

  3. #3
    Moderator
    Join Date
    Oct 2007
    Location
    Oregon - USA
    Posts
    830
    Yeah, this needs addressed
    Asus Rampage IV Extreme
    4930k @4.875
    G.Skill Trident X 2666 Cl10
    Gtx 780 SC
    1600w Lepa Gold
    Samsung 840 Pro 256GB


  4. #4
    I am Xtreme zanzabar's Avatar
    Join Date
    Jul 2007
    Location
    SF bay area, CA
    Posts
    15,871
    it has the same problem all MCM parts have https://www.youtube.com/watch?v=6laL-_hiAK0


    they need to let windows know that they have one cpu and 2 numa nodes, but windows for consumers only lets you have one numa node and one cpu for home, and one numa node and 2 cpus for pro. it is also why intel stopped making MCM and why amd did put out the single socket version of the g34.
    Last edited by zanzabar; 03-14-2017 at 12:42 PM.
    5930k, R5E, samsung 8GBx4 d-die, vega 56, wd gold 8TB, wd 4TB red, 2TB raid1 wd blue 5400
    samsung 840 evo 500GB, HP EX 1TB NVME , CM690II, swiftech h220, corsair 750hxi

  5. #5
    Moderator
    Join Date
    Oct 2007
    Location
    Oregon - USA
    Posts
    830
    Which version of Windows would you need go take advantage of multiple Numa nodes? Server?
    Asus Rampage IV Extreme
    4930k @4.875
    G.Skill Trident X 2666 Cl10
    Gtx 780 SC
    1600w Lepa Gold
    Samsung 840 Pro 256GB


  6. #6
    Brilliant Idiot
    Join Date
    Jan 2005
    Location
    Hell on Earth
    Posts
    11,015
    enterprise I guess?
    heatware chew*
    I've got no strings to hold me down.
    To make me fret, or make me frown.
    I had strings but now I'm free.
    There are no strings on me

  7. #7
    Xtreme Enthusiast
    Join Date
    Oct 2012
    Posts
    687


    Intel 5960X@4.2Ghz[Prime stable]@4.5 [XTU stable] 1.24v NB@3.6ghz Asrock X99 Extreme 3 4x8GB Corsair Vengeance@3200 16-17-17
    Sapphire nitro+ VEGA 56 Samsung SSD 850 256GB Crucial MX100 512GB HDD:WD10TB WD:8TB Seagate8TB

  8. #8
    I am Xtreme zanzabar's Avatar
    Join Date
    Jul 2007
    Location
    SF bay area, CA
    Posts
    15,871
    Quote Originally Posted by Ace123 View Post
    Which version of Windows would you need go take advantage of multiple Numa nodes? Server?
    with 2008r2 you need server enterprise for i think 4, and you need data center with the cluster related feature/rolls for up to i think 64. on 2008r2 things like web server allowed one per socket. with server 2012 it looks like you can have up to 320 cores per windows instance and 10(?) numa nodes.

    the problems with having a single socket home system split in half with numa nodes is that 1) windows does not like scheduling non numa aware programs over multiples so if a game needed 8 threads it would pull the logical cores from one node instead of the physical cores, and 2) you have to assign memory but there is only one memory controller. it creates an interesting problem that might need a white list driver like xfire/sli need.
    5930k, R5E, samsung 8GBx4 d-die, vega 56, wd gold 8TB, wd 4TB red, 2TB raid1 wd blue 5400
    samsung 840 evo 500GB, HP EX 1TB NVME , CM690II, swiftech h220, corsair 750hxi

  9. #9
    Xtreme Member
    Join Date
    Jul 2008
    Location
    NYC
    Posts
    325
    Quote Originally Posted by vario View Post


    Ok, so if I have 10 channels with virtual instruments as well as effects running in my digital audio workstation, how does Windows know where to park a thread?

    If my software essentially gives all processing on any given track its own thread, then that's easy enough. But now let's say I choose to drive part of my processing on channel 9 from a signal sent from channel 3? Channel 3 needs to complete before data is sent to 9 so that it can complete. How does Windows know this?

    Now, if Ryzen had had just one large CCX then Windows moving threads around would have been no big deal, but as it stands now does it really know that Channel 9 needs data from Channel 3, and that it is mission critical to get that quickly? Does it know this automatically or does it need to be told to?
    Win XP Pro x64 / Win 7 x64 / Phenom II / Asus m3a79-t Deluxe / 8x2 GB GSkill and some other stuff.....

  10. #10
    Xtreme Enthusiast
    Join Date
    Oct 2012
    Posts
    687
    Quote Originally Posted by MattiasNYC View Post
    Ok, so if I have 10 channels with virtual instruments as well as effects running in my digital audio workstation, how does Windows know where to park a thread?

    If my software essentially gives all processing on any given track its own thread, then that's easy enough. But now let's say I choose to drive part of my processing on channel 9 from a signal sent from channel 3? Channel 3 needs to complete before data is sent to 9 so that it can complete. How does Windows know this?

    Now, if Ryzen had had just one large CCX then Windows moving threads around would have been no big deal, but as it stands now does it really know that Channel 9 needs data from Channel 3, and that it is mission critical to get that quickly? Does it know this automatically or does it need to be told to?
    First of all, props for using Xp 64
    As for your question, this is vert complicated thing to answer, because software can do a lot of things in a lot of ways. Simple truth is, windows sheduler doesnt have a clue abot CCX`s. It doesnt however automatically mean that such siftware will be unusable on ryzen, because even then it can be better to have more cores and big L3 caches instead of 4 cores of kaby lake, and you can mitigate this problem to some degree by using fast ram, as it will mean not only ram goes faster but also uncore and infinity fabric which connects two CCX`s .One thing is for sure, power plan in windows should be set to high performance, because cores will be much quicker to ramp up (if i remember correctly 1ms vs 30ms) in need.
    Or just set it at locked freq that you need.
    Somebody should do a test of things you say about ,properly, thats for sure.
    And to keep things light
    Intels marketing department be like:
    Intel 5960X@4.2Ghz[Prime stable]@4.5 [XTU stable] 1.24v NB@3.6ghz Asrock X99 Extreme 3 4x8GB Corsair Vengeance@3200 16-17-17
    Sapphire nitro+ VEGA 56 Samsung SSD 850 256GB Crucial MX100 512GB HDD:WD10TB WD:8TB Seagate8TB

  11. #11
    Xtreme Member
    Join Date
    Jul 2008
    Location
    NYC
    Posts
    325
    Quote Originally Posted by vario View Post
    First of all, props for using Xp 64
    Lol... talk about old sig...!

    Quote Originally Posted by vario View Post
    As for your question, this is vert complicated thing to answer, because software can do a lot of things in a lot of ways. Simple truth is, windows sheduler doesnt have a clue abot CCX`s.
    Well, that's sort of what I meant. It isn't really 'broken' and it's not really to blame for CCX communication "problems". It was designed a certain way and now there's a new CPU it's faced with and it hasn't be programmed with that in mind.

    Quote Originally Posted by vario View Post
    It doesnt however automatically mean that such siftware will be unusable on ryzen, because even then it can be better to have more cores and big L3 caches instead of 4 cores of kaby lake, and you can mitigate this problem to some degree by using fast ram, as it will mean not only ram goes faster but also uncore and infinity fabric which connects two CCX`s .One thing is for sure, power plan in windows should be set to high performance, because cores will be much quicker to ramp up (if i remember correctly 1ms vs 30ms) in need.
    Or just set it at locked freq that you need.
    Somebody should do a test of things you say about ,properly, thats for sure.
    Well that's the thing; it's been done already.

    Techreport did "half" of a test and came to the correct conclusion that it was an awesome value. That "half" test simply pushed the workload that was more "linear", in other words where there is a stream of audio per channel, along with processing on that channel, something the workstation software by default contains within one thread. It's a far bigger issue when there's cross-communication going on, and the "other half" of that test is I think indicating that. So the effect of that other test was that stacking processing showed a clearly lower capacity than the 7700K - at low audio latency. And so this, to me at least, hints at exactly this cross-CCX communication being the issue.

    So, to recap; I don't think there's anything wrong with the scheduler, it's just that some workloads expose the issue of having two CCX. The only solution then seems to me to be on the software developer's side, not Microsoft's.... though I could be wrong....

    Quote Originally Posted by vario View Post
    And to keep things light
    Intels marketing department be like:
    Yep, pretty much....

    I should add that people who actually do media creation for a living in a lot of cases aren't concerned with spending an extra $200 by choosing Intel over AMD, because the entire system with add-in cards and peripherals is far more expensive, and those $200 is something that is earned in anything from four hours to a half hour, depending on the market.
    Win XP Pro x64 / Win 7 x64 / Phenom II / Asus m3a79-t Deluxe / 8x2 GB GSkill and some other stuff.....

  12. #12
    I am Xtreme zanzabar's Avatar
    Join Date
    Jul 2007
    Location
    SF bay area, CA
    Posts
    15,871
    amd has been selling a many core (or thread since it was not many core when BD came out) MCM platform since 2010, you would think they would have this worked out by now. it also looks like am4 is messing it up since it need more ram channels so it wont have to essentially run pooled memory.
    5930k, R5E, samsung 8GBx4 d-die, vega 56, wd gold 8TB, wd 4TB red, 2TB raid1 wd blue 5400
    samsung 840 evo 500GB, HP EX 1TB NVME , CM690II, swiftech h220, corsair 750hxi

  13. #13
    Xtreme Enthusiast
    Join Date
    Oct 2012
    Posts
    687
    Well, one thing is this whole problem would be solved by crystalwell like L4 cache, and it would give them a big boost where they need it the most.
    Broadwell showed that, it can still be faster or same performing than kaby lake even though they have lower ipc and much lower clocks.
    AMD must have known this problem pretty much on the design phase. So either they are stupid (i doubt that) or something went a bit wrong and there is already a solution for this in the next silicon, but they are not talking about it, because they need to sell the thing they have right now. It would also explain why they arent really willing to solve it through other means like scheduler/driver/app.
    Intel 5960X@4.2Ghz[Prime stable]@4.5 [XTU stable] 1.24v NB@3.6ghz Asrock X99 Extreme 3 4x8GB Corsair Vengeance@3200 16-17-17
    Sapphire nitro+ VEGA 56 Samsung SSD 850 256GB Crucial MX100 512GB HDD:WD10TB WD:8TB Seagate8TB

  14. #14
    I am Xtreme zanzabar's Avatar
    Join Date
    Jul 2007
    Location
    SF bay area, CA
    Posts
    15,871
    Quote Originally Posted by vario View Post
    Well, one thing is this whole problem would be solved by crystalwell like L4 cache, and it would give them a big boost where they need it the most.
    Broadwell showed that, it can still be faster or same performing than kaby lake even though they have lower ipc and much lower clocks.
    AMD must have known this problem pretty much on the design phase. So either they are stupid (i doubt that) or something went a bit wrong and there is already a solution for this in the next silicon, but they are not talking about it, because they need to sell the thing they have right now. It would also explain why they arent really willing to solve it through other means like scheduler/driver/app.
    didnt they talk about they could use HMB for high speed cache a couple years ago. maybe that is what is missing.
    5930k, R5E, samsung 8GBx4 d-die, vega 56, wd gold 8TB, wd 4TB red, 2TB raid1 wd blue 5400
    samsung 840 evo 500GB, HP EX 1TB NVME , CM690II, swiftech h220, corsair 750hxi

  15. #15
    Xtreme Enthusiast
    Join Date
    Oct 2012
    Posts
    687
    Quote Originally Posted by zanzabar View Post
    didnt they talk about they could use HMB for high speed cache a couple years ago. maybe that is what is missing.

    Oh yeah, HBM2 is missing for some time now Vega was supposed to launch last year
    As for cache i dont know, i know that they were planning to use HBM on APUs tho.
    I have a theory about clockspeeds, and a bug. Few months back and also almost a year ago, there were reports they cant get the clockspeeds where they want them, also a bug that drops down performance big time in some scenarios.maybe it was about uncore/infinity fabric clock speeds and general IMC problems, which they "kinda" sorted for launch, but not entirely .This whole IMC and memory support thing is pretty weird, like they fixed it JUST enough for launch but not fully.I also read that there are variances in chips, not all of them can do more than 2666, and its not really memory related, but cpu.
    It somewhat resembles the phenom I launch, a bug and low clocks, later on there was another revision that was much better.
    But anyhow, for most people this achilles heel, wont make any difference so its a bit overblown by my standard.
    Intel 5960X@4.2Ghz[Prime stable]@4.5 [XTU stable] 1.24v NB@3.6ghz Asrock X99 Extreme 3 4x8GB Corsair Vengeance@3200 16-17-17
    Sapphire nitro+ VEGA 56 Samsung SSD 850 256GB Crucial MX100 512GB HDD:WD10TB WD:8TB Seagate8TB

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •