The CDT and copywaza lab

**Supertim0r** · 11-26-2007, 07:39 PM

hey KTE

are you using opb tweaks ? (those 4 files)

**KTE** · 11-26-2007, 08:57 PM

Supertim0r: nope, not at all.

I haven't hidden anything from my runs which someone else needs to replicate or understand them. What I do before a run is covered. Pretty much nothing but the tweak being tested.

**Supertim0r** · 11-26-2007, 09:28 PM

give them a try, it's worth

**dinos22** · 11-26-2007, 10:10 PM

Originally Posted by massman

Oh, I forgot to post this here:

http://youtube.com/watch?v=3oFmSuswKDk

something odd occurred around 7:30

have a look at his Physical Memory in that windows (supposedly after applying maxmem 600 before reboot

)

either he mixed the wrong video or something odd with two different XP setups and went into the other XP setup to do the file compression

strange lol

**T_M** · 11-26-2007, 10:10 PM

Originally Posted by KTE

T_M: Something I've explained many times in this thread. 384-384 or 512-512 makes no difference in 4M-32M from my testing posted early so I choose 384-384 because it's better at 1M for me and so I don't have to reboot to change it time and time again.

But if we are trying to test CDT (the point of the thread) then we should at least be following what we are told is the procedure.
Notice that 512 is a close number to what cache ends up?

**mrlobber** · 11-27-2007, 12:20 AM

Eldonko, I actually think if Kevin wants the CDT secret to remain unsolved, then such drops of information from others are the best way to do it - since now for example the latest pictures massman posted show a completely different CDT than it has been shown before

T_M, actually, cache never ends up to 512 Mb. Even if you have a completely stock XP install, it usually is using around 70Mb of PF, so when doing CDT or copywaza properly, cache would end up at around 530Mb (maxmem=600 minus 70). Now available memory is a different story, here you can get any figures in range from 500 to 550 Mb depending on what kind of copywaza or CDT are you doing.

Supertimor, you say those OPB files are worth it. Why not to back up your argument with some testing?

First of all, that OPB cleaner is a nice garbage cleaning tool indeed, however, show me it gaining more than half a second in 32M (hint: fresh XP install, run 32M, then reboot, run OPBcleaner, run 32M, compare).
Second, opbspi.bat is a batch file which stops several services - nothing what you can't do through services.msc. Moreover, with my own set of services disabled, I get better times than relying on opbspi.bat only and it seems that opbspi.bat is intended for Win2003 Server or a very special XP install because a couple of services listed in the batch file couldn't be found on default XP at all.
opbtweak.reg is a much more interesting stuff - but do you know why? Because I have seen at least 3 different variants of this reg file floating around including one Kevin sent to me by PM himself. Now which one is the right one ? Anyway, this reg file contains the known registry tweaks, such as LSC, DisablePagingExecutive etc, and from my testing I've not seen any notable difference in 32M times by using only one of the reg variants, or all three of them... or even if writing the known tweaks manually myself.

And now to the last part: my last late night thoughts on the information massman posted from coolaler.

I say: this isn't the full CDT tweak again, it just can't be. People are saying "understand what's this all about". However, the understanding itself cannot bring the system run faster unless that understanding involves some more steps of doing the tweak we still don't know... or doing the steps we know a different way. Why I'm saying it's not the full tweak? Look at the pictures massman posted yesterday again.

Let's focus on the 2nd set first. This is basically 1) cut a 1.85Gb big file from D:\ to C:\ and 2) copy back 3 times. Sounds like a copywaza, right? Tell me one reason why it doesn't You say, that's because the "D: is balanced"? But if OPB himself has said in the 1st set pictures that all the balance is pre-tweak job which doesn't need to be repeated if doing the CDT later. Which brings me to the question for the 1st set: if we do the copying stuff of the small-cdt files to "simulate the D balancing" and rar them together... and then reboot, how could Windows know later that we did the copying and balancing that tricky way instead of just copying the file 2 times on itself or any of other 100 possible ways? - after all, the structure and principles of files being written in an empty HDD should remain the same, shouldn't they - especially if the HDD has been completely empty before? (correct me if my technical knowledge isn't enough to make this conclusion). And the next question: even if the "D: balance" is there, how and why does the mechanism of achieving the balance on C: (whole system) work? Because obviously even if you get available memory balanced with system cache (by numbers in task manager - this is achievable with copying CDT-style), this doesn't mean a successful CDT tweak at all. So what does? Is the one and only indicator a superb 32M time? Or an even better 1M time? And what about the difference in the CDT descriptions? (remember, D:\-> C:\ x3, D:\->D:\cdt x3, C:\->D:\ x3 versus now D:\->C:\ and C:\-> D:\x3). And what about what KTE has been doing (C:\->D:\ x3, C:\->C:\cdt x3, D:\->C:\x3) because he has something working there? Is this the CDT tweak he has got to work or has he accidentally stumbled on something very close to it? And why on some other systems this is not repeatable even if done 100% to the instructions? And even on KTE's own system? Do our XP default installations have an unknown difference between them?

These are the questions which are burning for answers. I'm asking them, to understand what's behind the words of "system balancing". I don't want to blindly repeat CDT by accident (and I haven't succeeded either)... even more because the thing I can throw myself in the face is that actually people have gotten the CDT tweak to work at least partially, and I see it my own eyes it being confirmed but not on my system although I've tried every guide out here as well as some hints from some friendly OCX guys, and have done this for a month... Again, despite of being told by other OCX guys (very good tweakers who haven't got CDT-IV/CDT-V working either) that their personal thoughts are you need some not-yet-revealed registry tweak (obviously not a part of opbtweak.reg) and/or good karma for CDT to work 100%

So the answers... they're out there, we just haven't grabbed them, and have to stick with our karma - those who have it good, have also a partially working CDT tweak, those who don't (like me), are left wondering... why?

One more thing: why did I say "partially working CDT tweak"? Because nobody has duplicated the performance we saw at 3.6 Ghz - where the gain from CDT, roughly speaking, is at least 15 seconds, not 7 and not even 10 (compared to gandalfone's run which are the closest settings I managed to found to OPB's run). And actually, we have got only a couple of runs with a semi-working CDT tweak - and always taken from other forums probably out of the context, settings they were run at - and no one of the respectable overclockers who have said they have CDT working (Masell, elmor, to name a few) have bothered to participate in this thread at least with a comparison at 3600Mhz of their own... And Zeus, please, if you run your 514x7 PL=6 4:5 5-5-5-5 12m58s result with CDT, could you compare it with copywaza at similar settings?

Thus, I have even more respect to KTE's testing he has been doing despite the problems he's been facing - so far he has been the engine and fuel of this thread

And, of course massman (I wonder where does he dig it all out

)

Nevertheless, despite of starting to lose faith, I'll be back with more testing anyway

**KTE** · 11-27-2007, 01:38 AM

Originally Posted by Supertim0r

give them a try, it's worth

OK mate. Just keep in mind that I know the code of the files already so I'm aware of what they do. Crap Cleaner works wonders too.

In fact, there's another software called "cleaner" which totally erases empty marked spaces on the HDD that seems to help benches a little which I normally would use. I try to keep things simple so they are reproducible. I don't have hours to do this, I bet I have less than half the free time than most XS visitors, daily and I'm quite phobic to staying indoors much.

Originally Posted by T_M

But if we are trying to test CDT (the point of the thread) then we should at least be following what we are told is the procedure.

I already have. As a volunteer tester since the beginning of this thread due to seeing the reluctance of most others from doing so, one who normally doesn't even run Super Pi, ever, travelling 2x 1000 miles to build a system just for that testing; don't you reckon I would've tried what was said out word for word many times until I become a fly?

Just because I didn't write out 512-512 doesn't mean I didn't test it on every testing to know there is no difference in 32M for me. It's not like I haven't tested 512-512 to not know. However, if it makes you happy that I only keep 512-512 from now on on every single test, then fair enough.

Notice that 512 is a close number to what cache ends up?

Yep, but it makes zero difference in my case since around 50x 32M tries now and the cache ends up >515 with me regardless of what pagefile I set, even if I set it to no pagefile. 1M is definitely slower with 512-512 PF for me, repeatedly.

I'm going to be changing to the Abit board in ~2 hours so if you want me to run any other tests on DS4, then please mention it now. Won't be building it back again.

1M slows down with this CDT method quite considerably (for me now) and it slows down with CW too (slows with anything that increases cache including extracting a big RAR file). Same CDT/CW that gave gains before. I had a feeling since the start that the slowing down is caused by one typical service being disabled but I haven't ran many tests to confirm this yet I have experienced it before. Which service? Plug and Play and yes that's odd. Another one I already know about is the Themes services. If disabled on this build it will slow the time down very much yet not on another XP install I have.

Join us in part two when we unravel the mysteries of Super Pi after it sleeps with CDT...

**dinos22** · 11-27-2007, 04:31 AM

it would be really nice to see someone show results similar to OPB particularly in 1M

**kiwi** · 11-27-2007, 05:45 AM

Here is the link but I don't understand their language, sorry

http://forum.coolaler.com/showthread.php?t=158520

**CapFTP** · 11-27-2007, 07:20 AM

Originally Posted by kiwi

Here is the link but I don't understand their language, sorry

http://forum.coolaler.com/showthread.php?t=158520

neither I...but basically seems to tell always the same things:
the .reg file, the net stop which kills the services

yes, there are a couple of processes that I don't recognize...I tested same batch file and reg yesterday evening.

I 'm testin on my system exactly as reported by massman (i had these pics from kind courtesy of OBP).
On my system i have clearly some problem to run as it (probably) should....even with this method i always have the available mem = system cache-25/30 MB and I don't succeed in balancing...

i'll go on testing again, hope I can understand the point.....

**KTE** · 11-27-2007, 10:02 AM

Originally Posted by dinos22

it would be really nice to see someone show results similar to OPB particularly in 1M

Before I could even begin to test that, I need high FSB which I can't get. I'm also not sure of what other tweaks he tried because I've not spoken to him. Is that the only tweak which made him get that v.fast time? Everything could make a difference and I know many of the "Super Pi" guys believe they should never give their secret away to anyone else which gives them an edge over another. If that's the case, then no one has any room to know everything of what OPB did unfortuantely, thus no replication.

Originally Posted by CapFTP

On my system i have clearly some problem to run as it (probably) should....even with this method i always have the available mem = system cache-25/30 MB and I don't succeed in balancing...

Make sure to recheck the LSC value right before the run. When I was experiencing this, it was being reset to 0 automatically and causing low numbers.

Also try two different drives rather than two different partitions. It may help, it did with me.

I'm going to mess about more tonight to see what it can do.

**Supertim0r** · 11-27-2007, 11:28 AM

I already posted this somewhere (don't remember where)

***please keep in mind i'm really BAD at tweaking for spi. I don't know anything about CW or CDT (and I don't want to )***

this was a brief test with the 4 files/tweaks shown in my last post.

stock, fresh install daily os = 13m21.547s

stock, fresh install daily os BUT with ONLY those 4 tweaks applied = 13m04.562s

**KTE** · 11-27-2007, 11:41 AM

So where can I get the 4 files from...

Need to test this.

**mrlobber** · 11-27-2007, 12:58 PM

Originally Posted by Supertim0r

this was a brief test with the 4 files/tweaks shown in my last post.

stock, fresh install daily os = 13m21.547s

stock, fresh install daily os BUT with ONLY those 4 tweaks applied = 13m04.562s

The gain you're seeing is mostly from LargeSystemCache=1 tweak which is by default disabled in WinXP

**CapFTP** · 11-27-2007, 04:49 PM

Originally Posted by mrlobber

The gain you're seeing is mostly from LargeSystemCache=1 tweak which is by default disabled in WinXP

yep....seems so..

KTE, really LSC was auto set to 0 ??...strange..i'll check.

different drives you mean....copying between 2 HDD or changing the HDD (i.e. a faster one) and creating all same system again ?

I'm planning to do the second...i have this doubt to be solved

**kiwi** · 11-27-2007, 05:00 PM

Originally Posted by KTE

So where can I get the 4 files from...

Need to test this.

Check my link, it has 2 files

This is cleaner:
http://www.pctools.com/forum/archive...p/t-45783.html

The 4th throttling file I assume is the one that AMD users could use when X2 came out. Check AMD section, there must be somewhere this throttling reg tweak

**dinos22** · 11-27-2007, 06:44 PM

Originally Posted by KTE

Before I could even begin to test that, I need high FSB which I can't get. I'm also not sure of what other tweaks he tried because I've not spoken to him. Is that the only tweak which made him get that v.fast time? Everything could make a difference and I know many of the "Super Pi" guys believe they should never give their secret away to anyone else which gives them an edge over another. If that's the case, then no one has any room to know everything of what OPB did unfortuantely, thus no replication.
Make sure to recheck the LSC value right before the run. When I was experiencing this, it was being reset to 0 automatically and causing low numbers.

Also try two different drives rather than two different partitions. It may help, it did with me.

I'm going to mess about more tonight to see what it can do.

superpi is not that mysterious and the best superpi benchers traditionally have always been Japanese (no pun intended to anyone else) and we all know how open Japanese benchers were and share absolutely everything and people match/beat them in efficiency now.

OPB has put up a CDT thread with those benches as examples of how much it impacts performance hence why i believe that is what was used but no matter how much i try i just cannot seem to do better than norm (best tweaks possible aside from those runs)

we've all been hammering it all hard and getting the times on the board with 32M challenge thread for example and no bencher has even done remotely close to 32M efficiency shown there....12m 39s 32M with cas5 and ~55x on RAM which absolutely amazing to say the least.

i've done some testing and it isn't making much difference. I am not saying it isn't possible but since it was shared one would assume that it would be possible to replicate or beat those times if you have 600MHz CAS4 on RAM compared to those runs right......i have CPUs capable of doing 600MHz FSB as well no worries but those times i am just scratching my head and wondering what's up.....it's a monster tweak and so far no one has replicated it. relative difference means nothing to me. I am interested in seeing those times or better times with same or better system configuration. I know that you and a couple of others have done a great job testing so far but it is hard to conclude a tweak works when you are producing numbers i can get without any tweaks

i don't have exact same system but that should not matter....if it did the runs could be bugged as you should be able to replicate it on any half decent system and i've got one i think

**KTE** · 11-27-2007, 09:39 PM

OK thanks guys...

Yeah I understand what you're saying dinos22, I would feel the same way. These times are slow but on the setup I have I can't make them better (at those settings) no matter what tweak I try.

They can be faster but those timings/clocks/volts needed are one offs and I can't test repeatedly for hours, they would be just one offs. OPB's time was at quite high timings which even I can drop lower than at same frequency with the CDT I understand but it won't even come close to that time, it would end up how you expect it to; above the time of those with higher frequencies/lower subtimings. If 500FSB was doable these RAMs could very easily run 514x7 3600 4:5 642 4-4-4-4 tRFC 39 PL5 tWTR 10 tWTP 10 at 2.35V. More volts (2.60V) would get you tRFC 29 32M stable max but I'm not risking them, have already done it once.

I noticed a problem on this setup which gives slow times. I lean strongly towards Super Pi performance being completely NB strap, motherboard and chipset based first and OS based very close by second. When I bootup at below <475FSB and <DDR2-1000 I can only get PL7 minimum. When I bootup at ~DDR2-1080-DDR2-1180 I can get PL6. When I bootup at DDR2-1200-DDR2-1290 I can get PL5. Cannot get PL5/6 at low RAM speed because PL seems to change on the BIOS value of tRFC.

So If I set a tRFC above 42 in BIOS at bootup for DDR2-900, it should give PL5... but it's not that simple for some reason.

Also 2:3 ratio is very quick for me compared to the rest.

Look at this - all BIOS values;
400 FSB DDR2-1200 4-4-4-4 PL5 tRFC 45 (all else constant)
450 FSB DDR2-1126 4-4-4-4 PL6 tRFC 40 (all else constant)
450 FSB DDR2-900 4-4-4-4 PL7 tRFC 25 (all else constant)

This sort of PL changing by BIOS values is definitely strap based but it also improves performance quite a bit.

Does tRFC make that much a difference?... we'll see.

Here, for your benefit. I did tests at lower RAM frequencies for effects of stock/small window/disabled theme/OPBCleaner/CW/CDT/tRFC/tRP/tWTP/tWTR.

Small window = when you make the SPi window small so only the bar remains while the calc runs.
Disabled Theme = when the Theme service is disabled (all else constant)

SETUP
Maxmem=520, LSC=1, that totally white theme, etc.
Pagefile 512-512 for all but 384-384 for the CW run.

3600 - 450 (1:1) - 4-4-4-4 PL7 tRFC 25 tWTP 10 tWTR 10

Stock: 13m 49.094s

Smallwindow: 13m 48.625s

OPBtweak(theme on)/smallwindow: 13m 48.812s

OPBtweak(theme off)/smallwindow: 13m 49.406s

OPBtweak(theme on)/smallwindow/tRFC 45: 13m 50.265s

OPBtweak(theme on)/smallwindow/4-5-5-5-5: 13m 57.625s

After that, I kept the theme running even with OPBtweak/cleaner because it slows everything down if off.

CW4GB/OPBtweak/smallwindow: 13m 43.625s

CDT/OPBtweak/smallwindow: 13m 41.437s

CDT/OPBCleaner/smallwindow/tRP2/tWTP9/tWTR9: 13m 38.578s

So you know what effect which values can have.

-Can a chipset be better on one board than another? Yup.
-Can each motherboard be a llittle more tweaked than another of the same make/model/rev? Of course, look at CPU oc, MB FSB oc and RAM oc to see this. IC's always have these differences.

I've seen people run my exact timings/clocks and get far faster. If I was a doubting Tom and couldn't understand what I posted above, I'd say "cheat! make video for me or I don't believe its possible" -> because I'd be comparing everything to my board performance, afterall we have similar CPU/OS/RAM.

To stop wasting time and pages over and over anytime someone gets a quick time, not just now, but for years to come, all we need is better verifcation and some authentic method of verifying clocks/speeds of a run rather than purely trusting others words. In a competition, this "online trust" is very easy to bypass.

**massman** · 11-28-2007, 01:42 AM

A few days ago, I ran some tests on the way the ram timings effect the SPI run (I tested the basic timings, not subtimings). My conclusion: no effect on how 32m is calculated.

Though, I spend several hours on this, so I'm just gonna post these :p

400MHz 3-3-3-9
400MHz 4-4-4-10
400MHz 5-5-5-15
500MHz 4-4-4-10
500MHz 5-5-5-15
600MHz 5-5-5-15

**mrlobber** · 11-28-2007, 02:09 AM

Originally Posted by massman

A few days ago, I ran some tests on the way the ram timings effect the SPI run (I tested the basic timings, not subtimings). My conclusion: no effect on how 32m is calculated.

Though, I spend several hours on this, so I'm just gonna post these :p

In the Excel graph, I wouldn't suggest smoothing out the lines... one could better spot the differences between individual loops that way

Btw, it seems my original hypothesis about timings impacting the Pi output curve doesn't hold ground anymore

Instead, I suspect I now know what might cause this, but without more testing on my own system I wouldn't like to comment on it yet

**Zeus** · 11-28-2007, 02:44 AM

For most this will be obvious but still for some it might be of some interest.

Here's a little math about memtimings:

To know what's fastest setting, the first thing we have to know is the cycletime as most memorysettings are just the number of cycle times, for example, CAS3 means 3 cycle times.

Now to get that cycle time we do a littlle math: t (cycletime) =1/f (frequency).

So, if frequency varies, cycle time does as well.

Here goes:

400MHz (DDR800) 1/0.4GHz= 2.5ns (nanosecond)
500MHz (DDR1000) 1/0.5GHz= 2ns
600MHz (DDR1200) 1/0.6GHz=1.66ns

Now running DDR800 with for example cas4, we have 4x2.5ns=10 ns latency, running cas5 at that same speeds gives 12.5ns latency and so on, needless to say, the more latency the slower the run will be.

Now it becomes obvious that running 5-5-5 at 600MHz is almost as fast as 500MHz 4-4-4 and 500 5-5-5 is as fast as 400 4-4-4.

Hope this helps a little.

**~~Gaayam~~** · 11-28-2007, 03:56 AM

well he may did some of that CDT in this run

http://valid.x86-secret.com/show_oc.php?id=274308

sorry for the OT guys

**mrlobber** · 11-28-2007, 04:11 AM

Originally Posted by Gaayam

sorry for the OT guys

Well, this is indeed offtopic to this thread and should deserve its own topic in Xtreme Overclocking section if you think so.

But may I ask you the name you're registered with at OCX?

I strongly suspect it is different there and (probably) much more widely known than the one you're posting with here?

**massman** · 11-28-2007, 07:02 AM

Originally Posted by mrlobber

Btw, it seems my original hypothesis about timings impacting the Pi output curve doesn't hold ground anymore

Instead, I suspect I now know what might cause this, but without more testing on my own system I wouldn't like to comment on it yet

Not all timings have effect on that curve, that's for sure

. If I can help, please PM.

@ ZEUS: Thanks for the explanation, although my testings were not intended ti find the fastest solution. I wanted to see whether changing those timings had any effect on the curve.

**KTE** · 11-28-2007, 08:11 AM

Originally Posted by massman

A few days ago, I ran some tests on the way the ram timings effect the SPI run (I tested the basic timings, not subtimings). My conclusion: no effect on how 32m is calculated.

Thanks mate.

No effect at all? Are you talking about how the graph pattern looks or the times in seconds? Your subtimings were changing too so I can never be sure what is causing the lowering of times because those were the exact subtimings which were giving me lower/higher times.

Beware, SPi is strongly divider/strap based in my experience and can easily make one lead to making false conclusions (it did with me anyway). 32M I can have one setting and run CAS4 4-4-4-4 and get time x and then change it to CAS5 5-5-5-8 and still get x+1 second depending on which strap I'm on. At a quicker strap this won't be the case. However, at faster straps the PL doesn't allow to be changed below 7 and the board is running quicker (more efficient) than at higher speeds/lower PL/higher tRFC (same other settings). You'd have to boot at every strap and divider to see this, and it takes about a week... I know, I tried it.

I'm not using the DS4 now to compare.

The obvious point of all this is to compare with OPB's gain and his timings were quite high so we have to work out what and why are the possibilities of SPi behavior in a given situation which could allow 1second 1M drop at lower frequencies/higher timings than a run with higher frequencies/lower timings. That's looking very tricky and I'm stumped.

Originally Posted by Zeus

Now running DDR800 with for example cas4, we have 4x2.5ns=10 ns latency, running cas5 at that same speeds gives 12.5ns latency and so on, needless to say, the more latency the slower the run will be.

Now it becomes obvious that running 5-5-5 at 600MHz is almost as fast as 500MHz 4-4-4 and 500 5-5-5 is as fast as 400 4-4-4.

Thanks Zeus.

Tha's exactly what I look for to see if my times are faster than they should be or slower. On the DS4 that pattern was not being observed in 1M/2M unfortunately. 620 4-4-4-4 was just as fast as 450 4-4-4-4. 32M however does show the difference quite clearly.

Thread: The CDT and copywaza lab

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions