Finally, found a solution to intermittent server performance issues.
At all times, we have a lot of bots accessing ACDB. As far as Google Analytics is to be trusted, of our 500,000 - 1,000,000 daily page views, roughly 10% are from human sources. That other 90% is a legion of bots set upon us.
Our web server works a little weirdly in that for each connection, it allocates roughly 256 file descriptors. So it was giving me a useless error stating "out of fds" as it crashed. I set up trips and traps to monitor at a kernel level how many fds were actually in use as tracked by the OS, and it never went much beyond 3000. So how in the world could more than 5000 have been exceeded by the web server? It's that fractional reserve banking like counting, it is.
Even then, all I learned was another config value, one that turned a hard crash ( non-recoverable ) into a soft crash ( recoverable ).
Still, performance was shit. And it made no sense to be soft crashing over and over and over again.
So I turned my attention to the firewall. It's quite amazing how even after all these years, you can't have a firewall rule with two limit conditions.
A) limit the connections per IP
B) limit the connections per rule
but not both.
I can not say a max of X IP's get Y connections each to a max of Z connections. Nope.
Made no diff anyways, more connections STILL came in than the server could handle. There was no 1-1 ratio here.
Thinking my only resource would be to hack the code myself to give me more info, I came across a better solution using mod_status. At one glance I understood the problem, the solution, and the action to take.
We're now severing 30 requests per second with 80% slack capacity.Bring on the traffic !