PDA

View Full Version : Ryzen-Test & Stress-Run Cause Segmentation Faults On Zen CPUs [Update: AMD Confirmed]



StyM
08-04-2017, 12:33 PM
http://www.phoronix.com/scan.php?page=news_item&px=Ryzen-Test-Stress-Run



Besides using ryzen-test to hammer the CPU to easily reproduce the issue, I also decided to use phoronix-test-suite stress-run. The stress-run command within the Phoronix Test Suite has been used by enterprise customers for stress testing / burn-ins of hardware and checking for stability. Rather than benchmarking for performance, stress-run allows executing multiple test profiles in parallel for fully loading the system with whatever workloads you would like. Using PTS_CONCURRENT_TEST_RUNS=4 TOTAL_LOOP_TIME=60 phoronix-test-suite stress-run build-linux-kernel build-php build-apache pgbench apache redis will have the Phoronix Test Suite continually running four different benchmarks simultaneously for a period of 60 minutes. As soon as one test finishes, another is fired up. The stress-run algorithm randomly picks the tests of your set to run, but does look at the test profile to ensure if the tests stress multiple subsystems, it tries to ensure stress on all subsystems are always being stressed. The Phoronix Test Suite's stress-run functionality isn't advertised as much as its other features, but is very useful for loading up a system with plenty of real-world workloads concurrently.

We'll see now if AMD will provide public comments or if they investigate further as they now have another reproducible test case to slam the Ryzen chips hard in just a few minutes even with SMT disabled and running at DDR4-2133. As far as whether this just affects Ryzen or also Threadripper and Epyc remains unclear. While there are many Windows reviewers out there now with Threadripper, it doesn't look like AMD will be sending any Threadripper samples to Phoronix, at least in the immediate days ahead but I have asked if at least can get SSH access to a TR system for a few hours to be able to run some Linux benchmarks. We'll see. For the Epyc server processors as well, no samples are available according to a motherboard vendor that has been trying to get them on my behalf.

Just to reiterate, while this problem is easy to cause under very heavy workloads, under normal Linux desktop workloads and even normal benchmarking, I haven't run into any Ryzen problems. I will be running some more Ryzen stress-tests today.

Dr_Swizz
08-05-2017, 01:40 PM
Thank you for posting this.
I was just about to buy a 1300X, but since I use Linux for everything except benchmarking I will read up on this issue further before I buy anything.

dave_graham
08-05-2017, 11:04 PM
I've not had any issues under linux with Epyc. currently running Ubuntu 16.04.02 with 4.10 kernel just fine.

drmrlordx
08-07-2017, 09:47 AM
https://www.reddit.com/r/Amd/comments/6runcc/reported_epyc_segfault_might_not_be_true/

Something to think about.

StyM
08-08-2017, 04:12 AM
AMD Confirms Ryzen Marginality Performance Issue Under Linux, TR and EPYC Clear (https://www.techpowerup.com/235923/amd-confirms-ryzen-marginality-performance-issue-under-linux-tr-and-epyc-clear)

vario
08-08-2017, 03:43 PM
AMD Confirms Ryzen Marginality Performance Issue Under Linux, TR and EPYC Clear (https://www.techpowerup.com/235923/amd-confirms-ryzen-marginality-performance-issue-under-linux-tr-and-epyc-clear)

Seems weird.
So AMD is saying ryzen is affected but only early cpus`, and not TR and EPYC, even tho TR uses B1 stepping also.So there had to be some changes to the silicon either hardware wise or programing wise even in the same revision

AlleyViper
08-10-2017, 09:28 AM
Really weird. If it was willingly corrected at some point and there won't be a bios errata for affected CPUs, at least they could indicate affected batches for recall. It won't be nice for an early Ryzen adopter to find a defective CPU a couple years after when some other heavy-threaded app starts to error out.