Calmatory
08-08-2010, 09:26 AM
I wrote a small benchmark few years ago which used to brute force method for finding prime numbers. Back then I knew very little about algorithm and code optimizing, so now with loads more experience under my belt I decided to give the algorithm a go, and see what I've learnt during the past 30 months or so. The unoptimized code resembles my early implementation quite well.
A glance towards prime number benchmarks. All written in C using gcc v. "4.5.0 20100610 (prerelease) (GCC)" and Intel Celeron M 420 x86 Processor running at 1.6 GHz, with 2 GB of RAM(533 MHz DDR2) and using 32-bit Arch Linux with linux kernel 2.6.34-ARCH.
All the runs are compiled with gcc using -Wall and particular -O<n> flags.
The run times are gathered by using the time tool found in practically all unix-like operating systems (time ./a.out).
Some source code can be found from the bottom.
Using basic brute force method of finding primes under 131072(or maybe better known as 128K, 2^17). There are 12251 such primes.
So let's see... The actual optimization can be seen [like this].
No optimizations:
-O0 92.317 s
-O1 93.001 s
-O2 93.894 s
-O3 94.237 s
-Os 91.467 s
Best: 91.467 s
Worst: 94.237 s
Difference between best and worst: 2.93 %
Only check non-even numbers:
[for(i=2;i<num;i+=2)]
-O0 46.160 s
-O1 46.724 s
-O2 46.404 s
-O3 46.990 s
-Os 45. 760 s
Best: 45.760 s
Worst: 46.990 s
Difference between best and worst: 2.61 %
Only check numbers against non-even numbers:
[for(j=3;j<i;j+=2)]
-O0 23.085 s
-O1 23.348 s
-O2 23.495 s
-O3 23.695 s
-Os 23.078 s
Best: 23.078 s
Worst: 23.695 s
Difference between best and worst: 2.60 %
Only check numbers up to the square root of the number:
[for(j=3;j*j<=i;j++]
-O0 0.270 s
-O1 0.263 s
-O2 0.260 s
-O3 0.260 s
-Os 0.253 s
Best: 0.270 s
Worst: 0.253 s
Difference between best and worst: 6.29 %
At this point I made a huge mistake and the rest of the measurements got messed up. What I did was not applying the "only check numbers against non-even numbers" optimization to the code(I had to do some debugging so I reverted it and so it was left out by accident).
There is still much more to optimize:
- Apply the "only check numbers against non-even numbers" optimization
- break out from the inner loop if the number is proven to be a non-prime
- Move the tmp=0 assignment to else-clause of the if(tmp==0)
- Store the i and j variables in the CPU registers for faster acces(register keyword)
With all of these applied I was able to squeeze the running time down to 0.030s. Quite a leap from the inital ~1.5 minutes. ;)
Here is some of the source code:
Basic brute force method with no optimizations:
for(i=2;i<max;i++) {
for(j=2;j<i;j++) {
if((i%j)==0) { tmp=1;}
}
if(tmp==0) { primes++; }
tmp=0;
}
Whole brute force function with all possible optimizations I can think of applied:
unsigned int brute_primes(unsigned int max) {
register unsigned int i=0;
register unsigned int j=0;
unsigned int tmp=0;
unsigned int primes=1; /* 2 is a special case so we start from 3 */
for(i=3;i<=max;i+=2) {
for(j=3;j*j<=i;j+=2) {
if((i%j)==0) { tmp=1; break;}
}
if(tmp==0) { primes++; } else tmp=0;
}
return primes;
}
Well, I admit that this post is a complete mess. My intention was to gather all the possible optimizations and apply them one-by-one to see the benefits. After I realized half of the measurements were wrong, I decided to post the results as they were, and exclude the bad measurements.
Maybe I'll make another post about another prime number algorithm - sieve of eratosthenes. The basic implementation can be made quite much faster too, mainly applying the same optimizations since they apply to that algorithm aswell.
A glance towards prime number benchmarks. All written in C using gcc v. "4.5.0 20100610 (prerelease) (GCC)" and Intel Celeron M 420 x86 Processor running at 1.6 GHz, with 2 GB of RAM(533 MHz DDR2) and using 32-bit Arch Linux with linux kernel 2.6.34-ARCH.
All the runs are compiled with gcc using -Wall and particular -O<n> flags.
The run times are gathered by using the time tool found in practically all unix-like operating systems (time ./a.out).
Some source code can be found from the bottom.
Using basic brute force method of finding primes under 131072(or maybe better known as 128K, 2^17). There are 12251 such primes.
So let's see... The actual optimization can be seen [like this].
No optimizations:
-O0 92.317 s
-O1 93.001 s
-O2 93.894 s
-O3 94.237 s
-Os 91.467 s
Best: 91.467 s
Worst: 94.237 s
Difference between best and worst: 2.93 %
Only check non-even numbers:
[for(i=2;i<num;i+=2)]
-O0 46.160 s
-O1 46.724 s
-O2 46.404 s
-O3 46.990 s
-Os 45. 760 s
Best: 45.760 s
Worst: 46.990 s
Difference between best and worst: 2.61 %
Only check numbers against non-even numbers:
[for(j=3;j<i;j+=2)]
-O0 23.085 s
-O1 23.348 s
-O2 23.495 s
-O3 23.695 s
-Os 23.078 s
Best: 23.078 s
Worst: 23.695 s
Difference between best and worst: 2.60 %
Only check numbers up to the square root of the number:
[for(j=3;j*j<=i;j++]
-O0 0.270 s
-O1 0.263 s
-O2 0.260 s
-O3 0.260 s
-Os 0.253 s
Best: 0.270 s
Worst: 0.253 s
Difference between best and worst: 6.29 %
At this point I made a huge mistake and the rest of the measurements got messed up. What I did was not applying the "only check numbers against non-even numbers" optimization to the code(I had to do some debugging so I reverted it and so it was left out by accident).
There is still much more to optimize:
- Apply the "only check numbers against non-even numbers" optimization
- break out from the inner loop if the number is proven to be a non-prime
- Move the tmp=0 assignment to else-clause of the if(tmp==0)
- Store the i and j variables in the CPU registers for faster acces(register keyword)
With all of these applied I was able to squeeze the running time down to 0.030s. Quite a leap from the inital ~1.5 minutes. ;)
Here is some of the source code:
Basic brute force method with no optimizations:
for(i=2;i<max;i++) {
for(j=2;j<i;j++) {
if((i%j)==0) { tmp=1;}
}
if(tmp==0) { primes++; }
tmp=0;
}
Whole brute force function with all possible optimizations I can think of applied:
unsigned int brute_primes(unsigned int max) {
register unsigned int i=0;
register unsigned int j=0;
unsigned int tmp=0;
unsigned int primes=1; /* 2 is a special case so we start from 3 */
for(i=3;i<=max;i+=2) {
for(j=3;j*j<=i;j+=2) {
if((i%j)==0) { tmp=1; break;}
}
if(tmp==0) { primes++; } else tmp=0;
}
return primes;
}
Well, I admit that this post is a complete mess. My intention was to gather all the possible optimizations and apply them one-by-one to see the benefits. After I realized half of the measurements were wrong, I decided to post the results as they were, and exclude the bad measurements.
Maybe I'll make another post about another prime number algorithm - sieve of eratosthenes. The basic implementation can be made quite much faster too, mainly applying the same optimizations since they apply to that algorithm aswell.