Previous incidents
EU cluster Realtime is down
Resolved Mar 21 at 06:45pm CET
EU cluster Realtime recovered.
1 previous update
EU cluster frontend is down
Resolved Feb 10 at 03:51pm CET
EU cluster frontend recovered.
1 previous update
EU cluster API and EU cluster frontend are down
Resolved Feb 09 at 10:09pm CET
EU cluster frontend recovered.
5 previous updates
Brief downtime
Resolved Feb 02 at 05:43pm CET
Two short periods of degraded service were experienced today between 15:51 and 15:57 CET and 16:13 and 16:21 CET. API, front-end, and webhook response rates were slow during this time and and some customers may have experienced time outs.
No data loss resulted.
The cause was due to flows abusing the storage (Maester) and queueing service limits. These flows have been suspended and we are working with the client to optimise their flows.
All services are restored and working normally....
EU cluster API, EU cluster frontend, EU cluster Realtime, and 1 other service...
Resolved Feb 02 at 04:21pm CET
EU cluster frontend recovered.
11 previous updates
EU cluster API, EU cluster frontend, EU cluster Realtime, and 1 other service...
Resolved Feb 01 at 07:58pm CET
EU cluster frontend recovered.
11 previous updates
EU cluster API, EU cluster frontend, EU cluster Realtime, and 1 other service...
Resolved Jan 31 at 08:10pm CET
EU cluster frontend recovered.
10 previous updates
EU cluster API, EU cluster frontend, EU cluster WebHook, EU cluster Realtime,...
Resolved Jan 23 at 11:21am CET
Post-mortem
Issue
On Friday 20th Jan, 2023 a significant spike in data volume was observed, this spike was also accompanied by an increase in data package size. The following increase in data processing times, increased message queue processing times and data volume combined to overwhelm our Kubernetes master node.
At 16:02 CET, the platform service became unavailable.
Google support where unable to correct this and so it was necessary to recreate the platform. During the downt...
8 previous updates
EU cluster API, EU cluster frontend, EU cluster WebHook, EU cluster Realtime,...
Resolved Jan 20 at 10:59pm CET
Most of the flows have started. We are monitoring the situation.
11 previous updates
Flow sample retrieval is not working
Resolved Jan 05 at 06:02pm CET
Flow sample retrieval is not working
Issue
The recent issues experienced in the platform are because of our core scheduling service becoming temporarily overwhelmed because of a high volume of messages combined with many errors generated by integration flows transporting large batches of data.
Issues experienced included:
* Slow and failing sample retrieval
* Slow processing of message queues
Status/Resolution
The issues are now resolved, and no data loss is observed ...
3 previous updates