Note: During this whole issue no client services were affected since they run on top of Halcyon 6.4.

During the maintenance a number of cascading issues emerged. During the software stack upgrade (halcyon 6.5 Beta 20181206-19243) a major library issue was not identified by our engineers. After finishing the update the changes were pushed to Halcyon and at that point we started having misconfiguration issues. 

Unfortunately at that point Coeus had already started replicating the changes to all of our nodes which in effect made all instances of our members area application inaccessible.  The engineer on call decided to restore the backup of the last version which unfortunately was based on a completely different schema, as a result our wiki and web site had become inaccessible too since the new DB had not been fully populated from the members area application as intended.

At 2018/12/08 @03:41 UTC+2 two more of our engineers were brought in and started working on a fix for halcyon to work with the older library, approximately 2 hours later the fix was pushed (halcyon 6.5 Beta 20181206-19245) and Coeus did an emergency push to all nodes running the beta. 

At 05:02 all of our services were fully functional again and by 05:23 we were serving proper data to all of our nodes.

During this maintenance window we had planned on transferring the members area data to Mnemosyne (Alpha), update Coeus rules and the php engine but due to the issues we are postponing these changes for the future.

Saturday, December 8, 2018

« Back