The Dhrystone and Whetstone scores show something close to the max possible performance using simple calculations that fit entirely into cache (ie, the instructions can fit into the L1, so there is no instruction fetching). Since the work units are composed of much more complex code and don't fit entirely into L1 cache, architecture differences come into play far more. Actual performance (as measured in PPD) will vary based on the particular work units being processed. Each work unit type does something different in code, and some may be more optimized than others (generating more PPD). Linux is going to run most/all work units faster than on Windows, typically with a 10-20% speedup from what I can tell. This isn't captured by the PPD output, and I have no idea as to why one nets less PPD although having increased performance in linux.
I'm sure others here can explain/fill in details far better than I.