The recent Intel cpu bugs resulted in us doing a hardware overhaul of most of our infrastructure. Unfortunately the cpu bugs didnt only lower the security of our infrastructure but also dropped the performance significantly.

DB Cluster:
On all of our services we always work with a cpu margin of 20% that is used in emergencies or temporary expansion. On our DB cluster the aggregate of all nodes meant we had 2.6 extra servers than what was needed. Our DB cluster suffered most with an average of 42% drop in performance, which in essence meant we needed an extra 3 servers to even cope with the current requirements.

Solution:
We added 3x new AMD EPYC 7401P 24/48 in the cluster that provides the same core count as 12x Xeon E5 6/12 and 75% more performance in computing power. The new cluster of only 3x new servers is at 47% load on average. We will be adding another server in mid-2018 just to increase redundancy and another one as planned in mid-2019.

Status: DB Cluster Is fully functional and ready.

VPS:
In the summer of 2017 (2017/08/09) we switched all of our nodes from Xeon E5 6/12 to EPYC 7401P in a rate of 3:1. That was done purely for performance reasons, at the same time we completely elliminated magnetic storage on our VPS service and lowered the prices dramatically.

Status: VPS Service is fully functional and ready.

Shared Hosting:

In our original schedule we would have started upgrading our platform mid-2019 but the Intel bug has forced us to expedite the process. We switched all Xeon E5 dual socket servers (effective 12/24 per server) to EPYC 7601 (effective 64/128 per server), the RAM was upgraded from 128GB to 256GB. Finally we moved our hybrid shared hosting to pure SSD hosting. With this change we quintupled our cores thus octupling our computing power, doubled our memory capacity, quadrupled our disk bandwidth and octupled our iops capacity.

Status: We are in the process of starting the move of all accounts to their new home.


We will update with more information on the the bugs and mitigation issues.

Regards,
eSG NOC



Tuesday, January 9, 2018





« Back