PDA

View Full Version : Who wants to work on fast prime number algorithms?



eshbach
12-08-2005, 05:17 PM
Just for kicks, a friend and I are working on making a really fast prime number generator. Our current algorithm is faster than teh sieve of erastonaes, but slower than the sieve of atkin... we want to stay away from sieve of atkin because it takes sooo much memory for large primes (say, the 10 trillionth prime).... much more memory than we have in our pc's.

So anyone know of any good ones or want to help us modify and optimize ours?

here's the current code for our algorithm, we can do 1 million primes in less than 30 seconds on an A64 3000:


int MakePrimes(int limit)
{
int i=0,j=2,k=0,p=5;
primes[0]=2,primes[1]=3,primes[2]=5;
start=time(0);
int maybeprime = primes[j]+2;
while (limit > 0) {
bool done = false;
int tempindex = 0;
while (!done) {
double root = sqrt(maybeprime + 0.0) + 1;
int currentprime = primes[tempindex];
if (currentprime > root) {
primes[++j] = maybeprime;
tempindex = 0;
done = true;
maybeprime = maybeprime + 2;
}
if (maybeprime % currentprime == 0) {
maybeprime = maybeprime + 2;
tempindex = -1;
}
tempindex++;
}
limit--;
}
end=time(0);
return primes[j];
}

Calderbrae
12-11-2005, 12:45 AM
Is that java code there sir??

Vapor
12-11-2005, 07:10 AM
Looks like C++ to me. And that code looks really good to me, but I'm not much of a coder, so if you think there's a way to improve it, there might be :D

Although the compiler might take care of it and optimize it in this sense....being a little more verbose might actually be faster.

snq
12-11-2005, 11:03 AM
There's a small error in your code ;)
For example MakePrimes(5) returns 19 while the fifth prime actually is 11..

I'm trying to optimize it a bit, I'll post my code as soon as I get some improvement :)

snq
12-11-2005, 11:29 AM
This is twice as fast :banana:

Edit: btw both this code and your original code overflow when looking 1M primes.. Can be solved by using 64 bit ints (long int as i use for the square numbers) but it'll be about twice as slow.


int MakePrimes3(int limit)
{
primes[0]=2, primes[1]=3, primes[2]=5;
DWORD start = GetTickCount();

int maybeprime = primes[2] + 2;
int *primePtr = primes;
for(int j=3; j<limit; j++)
{
for(;;)
{
int currentprime = *(primePtr++);
long int currentprime2 = currentprime * currentprime;

if(currentprime2 > maybeprime)
{
primes[j] = maybeprime;
primePtr = primes + 1;
maybeprime += 2;
break;
}

if(maybeprime % currentprime == 0)
{
maybeprime += 2;
primePtr = primes + 1;
}
}
}

DWORD end = GetTickCount();
printf("%d ms\n", end-start);

return primes[limit-1];
}

snq
12-11-2005, 01:07 PM
Here's an asm version of my code.. It's only about 4-5% faster than the C++ code tho, VC8 seems to do a good job optimizing :)


int MakePrimes4(int limit)
{
primes[0]=2, primes[1]=3, primes[2]=5;
DWORD start = GetTickCount();

int *primePtr = (int*)primes;
__asm
{
mov ecx, 3
mov ebx, 7
mov esi, primePtr
lea edi, [esi+4*ecx]

$bigloop:

push ecx

align 16
$loop:
mov eax, [esi]
add esi, 4
mov ecx, eax

imul eax

// test high dword
test edx, edx
jnz $isprime
// compare low dword
cmp eax, ebx
jg $isprime

//$noprime:
xor edx, edx
mov eax, ebx
div ecx

test edx, edx
jnz short $loop

add ebx, 2
mov esi, primePtr
add esi, 4

jmp short $loop

$isprime:
pop ecx

mov esi, primePtr
add esi, 4

mov [edi], ebx
add edi, 4

add ebx, 2

inc ecx
cmp ecx, limit
jl $bigloop
}

DWORD end = GetTickCount();
printf("%d ms\n", end-start);

return primes[limit-1];
}

eshbach
12-11-2005, 02:28 PM
cool SNQ, your code is really about twice as fast :)

unfortunantly i need to get somewhere on the order of 1000x faster to do any really big primes :D

I think I'm going to have to come up with a completely different algorithm to accomplish that. you've optimized my original algorithm quite a bit already and i don't see much more room for improvement there.

we're going to really start working on this when my friend is done with finals (friday i think)... i'll post here again when we have a new algorithm and you're more than welcome to help us keep optimizing :banana:

faster primes!

snq
12-11-2005, 09:51 PM
Sure :)

Btw if you like mathematical problems and programming I can recommend this site, it's great fun :)
http://mathschallenge.net/

Gimmpy224
12-12-2005, 06:49 AM
C++ and JAVA = damn near the same thing, its probably C++ since its a program that wont be running online ;)

Maybe we could use threads? would that allow us to get more primes going?

eshbach
12-12-2005, 07:06 PM
C++ and JAVA = damn near the same thing, its probably C++ since its a program that wont be running online ;)

Maybe we could use threads? would that allow us to get more primes going?

C++ and java have syntactical similarities, but they're really very different when you look past the syntax... but that's a topic for another thread.

..speaking of threads, using more would get more primes, but the threads would have to communicate their latest results to the other threads in order to keep going...

and i don't have any machines with more than 2 cores, so it really wouldn't help that much.

Axylone
12-12-2005, 08:00 PM
good recommendation snq, now I'm going to be occupied for a week. Pff forget homework

Gimmpy224
12-12-2005, 10:00 PM
no doubt they are very different past the syntax :D I was just refering to the syntax though.

snq
12-13-2005, 09:36 PM
Here's a multithreaded version :D
Pretty much twice as fast on my X2 :woot:

For half a million primes:
Asm version: 3329 ms
Multithreaded: 1703 ms
Now I can finally go get some rest :D


const int numthreads = 2;
const int partsize = 10000;

int limit = 0;
int *primes;

int currentIndex = 0;
int currentMax = 0;


bool IsPrime(int x)
{
//if(x<=2) return (x==2);
//if((x&1)==0) return false;

for(int i=1;; i++)
{
int p = primes[i];
long int p2 = p*p;
if(p2>x) return true;
if((x%p)==0) return false;
}
}


void __cdecl PrimeThread(void *_threadNum)
{
static volatile int whoseWriteTurn = 0;
int myPrimes[partsize];
int threadNum = (int)_threadNum;

while(currentIndex<limit)
{
int start = currentMax;
currentMax += partsize;
int end = currentMax;

int myIndex = 0;
for(int candidate=start; candidate<end; candidate+=2)
{
if(IsPrime(candidate))
myPrimes[myIndex++] = candidate;
}

while((whoseWriteTurn%numthreads)!=threadNum) ;
int ci = currentIndex;
if(currentIndex+myIndex>limit)
myIndex = limit-currentIndex;
currentIndex += myIndex;
memcpy(primes+ci, myPrimes, myIndex*sizeof(int));
whoseWriteTurn++;
}
}


int MakePrimes5(int l)
{
limit = l;
primes = new int[limit];
memset((void*)primes, 0, sizeof(primes));

int start = clock();

primes[0]=2, primes[1]=3, primes[2]=5;

// Calc first part so the threads have something to work with
currentIndex = 3;
for(int i=7; i<partsize; i+=2)
{
if(IsPrime(i))
primes[currentIndex++] = i;
currentMax = i + 2;
}

// Create threads
for(int i=0; i<numthreads; i++)
_beginthread(PrimeThread, 0, (void*)i);

// Wait for :banana::banana::banana::banana: to finish
while(!primes[limit-1])
Sleep(1);

int end = clock();
printf("%d ms\n", 1000*(end-start)/CLOCKS_PER_SEC);

return primes[limit-1];
}

snq
12-14-2005, 12:16 PM
Optimized it a bit by hardcoding the first 100 primes (decreases memory access quite a bit).. Time for 500k primes is down to <1500 msec now :D

Also it detects the # of CPUs/cores you have now and adjusts the # of threads.. It becomes slow if you try running more threads than you have cores :) I wonder if would actually be 8 times faster on a machine with 8 cores :)

partSize cannot be too small because it will give slightly incorrect results.. Not sure why, but it does. Probably something with multiple threads processing the same batch.


const int partSize = 10000;
int numThreads = 1; // This is adjusted later on
int limit = 0;
int *primes = NULL;

volatile int currentIndex = 0;
volatile int currentMax = 0;

bool IsPrime(int x)
{
if(x>100)
{
if((x%3)==0) return false;
if((x%5)==0) return false;
if((x%7)==0) return false;
if((x%11)==0) return false;
if((x%13)==0) return false;
if((x%17)==0) return false;
if((x%19)==0) return false;
if((x%23)==0) return false;
if((x%29)==0) return false;
if((x%31)==0) return false;
if((x%37)==0) return false;
if((x%41)==0) return false;
if((x%43)==0) return false;
if((x%47)==0) return false;
if((x%53)==0) return false;
if((x%59)==0) return false;
if((x%61)==0) return false;
if((x%67)==0) return false;
if((x%71)==0) return false;
if((x%73)==0) return false;
if((x%79)==0) return false;
if((x%83)==0) return false;
if((x%89)==0) return false;
if((x%97)==0) return false;

for(int i=24;; i++)
{
int p = primes[i];
long int p2 = p*p;
if(p2>x) return true;
if((x%p)==0) return false;
}
}
else
{
if(x<=2) return (x==2);
if((x&1)==0) return false;

for(int i=1;; i++)
{
int p = primes[i];
long int p2 = p*p;
if(p2>x) return true;
if((x%p)==0) return false;
}
}
}


void __cdecl PrimeThread(void *_threadNum)
{
static volatile int whoseWriteTurn = 0;
int myPrimes[partSize];
int threadNum = (int)_threadNum;

while(currentIndex<limit)
{
int end = (currentMax+=partSize);
int start = end - partSize;

int myIndex = 0;
for(int candidate=start; candidate<end && currentIndex+myIndex<limit; candidate+=2)
{
if(IsPrime(candidate))
myPrimes[myIndex++] = candidate;
}

// Wait until its our turn to write
// Theoretically we shouldn't have to wait at all
while(whoseWriteTurn!=threadNum) ;
memcpy(primes+currentIndex, myPrimes, myIndex*sizeof(int));
currentIndex += myIndex;
whoseWriteTurn = (whoseWriteTurn+1)%numThreads;
}
}


int MakePrimes5(int l)
{
// Find number of CPUs/cores
SYSTEM_INFO si;
GetSystemInfo(&si);
numThreads = si.dwNumberOfProcessors;

limit = l;
primes = new int[limit];
memset(primes, 0, limit*sizeof(int));

int start = clock();

primes[0]=2, primes[1]=3;

// Calc first part so the threads have something to work with
currentIndex = 2;
for(int i=primes[currentIndex-1]+2; i<partSize; i+=2)
{
if(IsPrime(i))
primes[currentIndex++] = i;
currentMax = i + 2;
}

// Create threads
for(int i=0; i<numThreads; i++)
_beginthread(PrimeThread, 0, (void*)i);

// Wait for :banana::banana::banana::banana: to finish
while(!primes[limit-1])
Sleep(1);

int end = clock();
printf("%d ms\n", 1000*(end-start)/CLOCKS_PER_SEC);

return primes[limit-1];
}

eshbach
12-14-2005, 02:19 PM
snq, I really like the multithreaded approach, but I am getting access violations when using a limit bigger than 10,000.

Do i need to use a special compiler option?

process.h is the header you're using for threading, right?

snq
12-14-2005, 02:48 PM
Hmm, that's odd..
What compiler are you using? I'm using VC 8.0 (aka .NET 2005)

I changed quite some options..
Detect 64-bit portability issues: No (gets rid of annoying warning ;))
Favor size or speed: Favor fast code
Omit frame pointers: Yes
Enable enhanced instruction set: SSE2
Runtime library: Multi-threaded
Buffer security check: No
Other than those everything is at the default setting.

If that doesn't help, try running it in debug mode and see where exactly it crashes.

The headers I'm using:

#include <windows.h>
#include <math.h>
#include <stdio.h>
#include <process.h>
#include <time.h>

eshbach
12-14-2005, 03:25 PM
i'm using .net 2005 as well.

edit: compiler options worked, all is well now.

Gimmpy224
12-15-2005, 07:53 AM
so did my thread idea help at all? if so i feel important now lol.

finaly me studying programming helped a great cause *tear*

i960
12-17-2005, 06:42 PM
snq,

Does windows have an equivalent of pthread_join() ?
That sleep(1) in the main thread is kinda clumbsy, but I can't see it taking away that many cycles. pthread_join() was designed for that type of thing, however.

snq
12-17-2005, 10:09 PM
WaitForSingleObject is supposed to do the same as pthread_join, but I haven't tested that. Take a look at the last example at:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt__beginthread.2c_._beginthreadex.asp

You can wait for a thread to finish like this:

HANDLE hThread = (HANDLE)_beginthread(....);
WaitForSingleObject(hTread, INFINITE);

One problem is that we don't know which one of the threads will calculate the final value (we can predict that tho) so we might end up waiting for the wrong thread and resuming too early or too late. I doubt it'll be any faster, but I agree it would be a nicer solution :)

Sleep(1) in reality sleeps more than 1 msec tho, it really doesn't take any cpu time at all. I'm pretty sure that WaitForSingleObject does the same thing, but instead of checking if the value has been calculated yet it'll check validity of the given handle.. Which would probably be more time consuming :D

i960
12-17-2005, 11:04 PM
sleep(1) historically means 1 second anyways. But the typical way to wait for multiple threads is to duplicate the for loop used to create them and instead of a (I'm going to use POSIX terms here) pthread_create(), one uses a pthread_join() instead. main() (or the calling thread to pthread_join()) will just join the next thread (assuming one uses an array of pthread_t 's with an interrated loop variable) as each pthread_join() returns. I don't speak windows so :)

snq
12-17-2005, 11:25 PM
sleep() with lower case s doesn't exist on Windows :)
Sleep() with upper case S does and it wants milliseconds instead of seconds.. When I'm porting code I usually have a #define Sleep(x) usleep(1000*(x)) somewhere because I'm so used to Windows.. There's no usleep in Windows either I think, and besides I usually stay away from the standard libs as much as possible unless the app might need to be ported some day, libc makes things bloated and I like size optimizing ;)

Anyway, I just found out there's also WaitForMultipleObjects(), that could be used to wait for all threads to finish. Still I think it'd be slower but I'm too tired/lazy to test right now :)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjects.asp

i960
12-18-2005, 12:14 AM
libc is typically a shared library. It will definitely not result in any size bloat. In unix, almost all libs, even static ones, use some form of memory sharing to prevent redundant use of memory. If the library is shared, and another executable is using the same shared library, it will reference the in-memory copy - no additional memory overhead. If the library is linked statically, the executable will still share common elements with other executables loaded into memory with the same statically linked libraries. When linking, statically, the linker will not link in any more function or library code than what is absolutely necessary for the executable to function. If linked shared, it's moot, and the savings is inherit. Basically, there's no size bloat to libc unless one linked statically (pointless for libc) and basically used every function within it (highly unlikely).

If you want size opt, if there's anything you should be staying away from - it's Microsoft Windows. :) gcc -Os will beat MS on size optimization anyways - not to mention the primitive shared library methodology that Windows uses (dlls, etc). On MS platforms, I don't even use MS compilers. I use gcc + cygwin/mingw and write everything POSIX. Builds are portable across unix and windows unless I'm using something windows specific.

WaitForMultipleObjects(), if done right, won't be slower - because it will use signals to be notified of thread termination. This will definitely take less cycles than sleeping for 1 ms.

snq
12-18-2005, 12:41 AM
Check out the size of the exes in my "useful tools" thread ;) All tools were written in C++, with perhaps a few lines of inline asm here and there, but not much. So I doubt very much that size optimizing is that impossible on Windows :) I'm not saying it's better than gcc, but I doubt gcc would do a much better job.

Anyway about libc.. On windows you have 2 alternatives, either use a statically linked version which makes the app big (in my eyes anyway). Or use the DLL version (msvcrt.dll) which actually results in very small exes. However there's a small problem with msvcrt, MS has decided to make new versions of it that do not come with windows by default. So in order for your app to work on someone elses PC they'd probably need to have some kind of update installed, or you'd have to supply the DLL with the app, which doesnt make sense at all imo.

Anything can be done on windows without ever having to use libc, everything one could need comes in some DLL :) I personally prefer developing for Windows over developing for other OSs any day.

The thing I truly hate about linux is the incompatibility between different versions of libraries. Say I link an app on OS A with libc version A. That same binary will (usually?) not work on a machine with the same OS A but with libc version B, unless you link it statically.. Which in its turn usually results in a 500k+ binary. That's really worthless imo. Come to think of it, that's probably the reason why opensource is so popular nowadays, otherwise everyone would have to link their apps statically :D (j/k)
And instead of a nice descriptive error message such as "X.dll could not be found on your system" all you get to see is "segmentation fault". At least that's what has happened to me several times :)

Sorry ;) Not trying to turn this into an OS battle or anything, but all these frustrations I have had in the past just came up :)

i960
12-18-2005, 12:51 AM
Usually when there is a library run-time issue, the run-time loader will indicate a missing symbol, incompatible symbol, etc. Anyways, for unix, most people do not distribute binaries. They distribute a source package and assume the user is smart enough to do a "./configure; make". The advantage here is that portable code will compile everywhere. The disadvantage is that some people do not want to share their source code - which means it won't work out that type of system. Alternatives are for the developer to cross-compile executables for different unix platforms. Typically this is done merely to be convenient for the user, not to hide source code. Sometimes it's done for the latter. In almost all cases it's more of a pain in the ass to cross-compile than it is to distribute portable code with a standard configure script.

It's pretty strange that Microsoft stopped making msvcrt.dll available by default. Might have something to do with them actually wanting developers to write code which is NOT quickly portable across other OSes - hence keeping people on Windows.

i960
12-18-2005, 12:58 AM
Actually from what I can tell it seems that msvcrt.dll used to NOT be available, but theyre distributing it with most modern Windows versions now (atleast since 2k). So I guess it's the opposite.

snq
12-18-2005, 01:24 AM
They didn't stop including msvcrt.dll in windows, but when I compile something with VC 8.0 it needs msvcr80.dll which isn't available.. I didn't even have it in my system32 dir after installing VS 2005.. Same goes for anything newer than VC 6.0 afaik. It's a shame really because it was a lot easier to create small apps with that that ran pretty much everywhere with VC 6.0, but that's old and the newer IDEs are so much nicer to work with.
Afaik msvcrt.dll is comes with windows 98 and newer (may require that IE is installed on 98).

Don't get me started on the ./configure madness :D For work I had to compile PHP with certain modules added on a 64 bit version of redhat. Nothing I hadn't done before. No strange modules either, only stuff that comes with the PHP source distribution. After a whole day of updating libs and such to try to get it to compile neither me nor my boss could get it working while configure didn't give any errors or warnings. We finally ended up reinstalling the server with a 32 bit OS because we really needed to get it up and running that day, and it compiled just fine straight out of the box :)

i960
12-18-2005, 01:45 AM
Welp, that's a 64-bit OS implementation issue, not a configure issue :P.

Besides, NON-portable code is where you start to see problems on 64-bit vs 32-bit because of assumptions being made. That's why it's best to start as portable as possible and not try to be slick and cut corners in places (likely what some of the code in the php build was doing).

i960
12-18-2005, 01:47 AM
What you need, snq, is a nice Solaris x86 box. You'll then a well-done OS without any memory management issues and running quite very lean and ready to take hard load.

snq
12-18-2005, 07:26 AM
Hehe, enough about this now ;)

After getting some sleep (actual sleep, not sleep() :D) I changed the prime code to use WaitForMultipleObjects(). As expected it didn't make it any faster but not slower either, for a 1M run times were exactly the same.

Here's the changed code:

// Create threads
HANDLE handles[32]; // max 32 threads
for(int i=0; i<numThreads; i++)
handles[i] = (HANDLE)_beginthread(PrimeThread, 0, (void*)i);

// Wait for :banana::banana::banana::banana: to finish
//while(!primes[limit-1])
// Sleep(1);
WaitForMultipleObjects(numThreads, handles, true, INFINITE);