Well it looks like our anti-chia tool clobbered everything last night. We had a clever way of detecting the plots and without going into a lot of detail it required us to log a whole lot of specifics about the size of read requests. This logging itself wasn’t the problem but the database it was going into became an issue at about 1am.
When the database failed the local (On each storage server) logging daemons that we’d written went back into their debug mode and started logging to syslog…. Rapidly filling every /var/log partition on each storage node.
We’re cleaning them out now. We rushed this tool out to combat Chia without testing it as heavily as we should and removing some debugging options. I still think it was the right thing to do, we HAVE to keep the plots off the network but in the future we’ll make it handle a full database more elegantly.
Expect service to return to normal in 2 hours.