Very interesting study though a couple comments and observations:

First per your article you were using two ARC-1680IX-12's and using LVM2 to stripe at the OS level. What was your stripe size both for the LVM2 and for your array. LVM2 should be a multiple of your sub-array stripe size (probably equal to your stripe width of the base array that way each controller would have enough information to write full stripes which work here, but this is for large data not for 8K database blocks so some testing would need to be done. At least it should be => a multiple of your stripe size though). Also are you testing w/ a partition table on the array or raw (ie, alignment issues). Normally I test with no partition table (raw raid volume gets added to lvm, and then put a file system right on the lv).

In your write up you mentioned that you got ~3200MB/s out of two cards? That to me screams ram testing not disk testing. I could see with 2 controllers around 1800MB but not double that, unless I've mis-read what you were doing there.

As for IOPS from your results it really looks like a raid controller limit (IOP34x series) more than anything, but there are a couple things that can be tested.

- create a ram disk (system ram) and run your tests against that (with lvm2 & put a file system on it). This would rule out completely the drive/controller subsystem and still test your OS/filesystem. If you're still hitting 65K then it's something in kernel space (though honestly I've never heard of an issue like that before).
- if possible, try cards that DO NOT have a built-in expander (the -IX line from areca does, so you're going through that chip each time). The ARC-1680 doesn't have the expander which may improve performance a bit (also may help resolving some incompatibility issues you mentioned in the article)