Big milestone:
Name: XtremeSystems
URLs crawled: 30,011,997,950
Data (MB)*: 766,566,997
Congratulations to everyone who helped us get here!! :):cheer::cheer::party::cheer::cheer:
Printable View
Big milestone:
Name: XtremeSystems
URLs crawled: 30,011,997,950
Data (MB)*: 766,566,997
Congratulations to everyone who helped us get here!! :):cheer::cheer::party::cheer::cheer:
oh come on I can't be the only one impressed :p:
It's alright I guess...
Yay! missed this completely ...WELL DONE TEAM :D :up:
There we go :D
we need to catch up refic at least... he has reached the PB
LOL! little late to reply to that but better than never.
This hare isn't snoozing, it's crawling as fast as it can. I know it's not much, and my monthly output is only a fraction of what i used to output in an hour, but that's all i can do without my 100mbps line :(
And if it regains its speed......http://www.karin-lisbeth.dk/images/e/elmer/001.jpg
:D
Either way it's still good to have you back back Hixie :up:
I've been crawling for quite a while ... just not large figures, and been too busy to post here.
BTW dave, why aren't you going to vegas? I'll be there with a booth this year.
Finally have conquered a quarter of the pie http://i.imgur.com/jy5qi.gif Tasty URLs! :D
http://i.min.us/ibHDzQ.png
1.7 final node released:
You can increase max reserve buckets from 1 to 3 for more sustained crawling :up:Quote:
v1.7.0 5/02/11
+ Support for new generation of central server in parallel with current
! Better handling of redirects
! Better control of domain counts during crawling
! Improved analysis of crawl errors
! Fixed rare issue with empty indexed data written incorrectly
! Mono - support for alternative spawning of archiver, .NET 2.0 build is now the only one available
! Mono builds - Less junk in log on communication errors
! Mono - better logic for handling multiple crawlers on same box
! Added MaxPriorityBuckets parameters to options
! Max reserved (pre-cache) buckets is forced to 1 in order to enable more efficient crawling on whole distributed network
! New SQLite build used (once run database won't be backwards compatible with 1.6.x series)
! Bundled 64-bit SQLite build with Mono distributions
! Mono builds now support https protocol crawling
! Reduced number of messages printed by default (can still be shown if Warnings mode is on)
! Put a limit on barrel archiving to avoid create too many temporary files in odd data sets
I expect he is in the same boat as the rest of us...apart from being away just now.... waiting for the new server to be up in order to redo settings to maximise output. I see we can start in on this tomorrow ...he said hopefully.
Well done on the numbers by the way
Two issues:
First is I haven't been able to get more than 6mbit total from 3 machines since Alex made those changes.
Second is I have one machine down.
Swapped cpu's that had been in that machine and fine and now won't post..
Clueless as to why and that was my main machine..:shrug:
Dave : use a linux box ( ubuntu). Installation takes 5 mins at most with no previous knowledge.
Also: refic has made a script to install mono ( the toughest part of linux install).
We can guide you and you can call DF who knows it quite a bit :P.
The setup of the ubntu box ( of mj12) could take you 30-40 mins) and you can use all the 30 mbit from a single quad core ( old q6000 would doing it with no sweat)
Dave, under the 'More Crawler' tab in options, try setting 'Maximum deep crawl buckets' to 0 and 'Maximum priority buckets' to 50(or even 100 if 50 works ok). Also, Alex just released the 1.7 final version which lets you raise your reserved buckets from 1 to 3, which should use up more of your connection while crawling so upgrading should give you a boost.
regarding linux, it's very easy to install and run BUT it's a bit of a pain in the a$$ because mono is less stable than .NET under Windows so you constantly have to watch for bugs(not fun).
http://i.imgur.com/L7WRY.png
Congrats on 11 Billion Dave! :up::up:
I got hit by a nasty concoction of viruses and malware last week, just finished painstakingly reinstalling Windows and everything else(though I have made a clone image of my HD and am going to do monthly backups in case something goes wrong again! :)).
sh.. DF.. I always have such sh.. I wonder why alex would not include an AV+malware. .. he could even sell a list of webs containing that sh..