The issue is resolved.

The problem was that the region servers ran out of memory. While we have memory limit monitoring on these systems our alarms were obviously not triggered. We will investigate the alarms and make appropriate improvements.

Sorry for the inconvenience.