ℹ️ Service Impact Announcement ℹ️ Code Red incident that accrued August 15th 2022 and impacted the US cloud device connectivity service (API & UI)

Lytx employees,

We have prepared an internal massaging document regarding the last Code Red incident that accrued August 15th 2022 and impacted the US cloud device connectivity service (API & UI).

Root Cause of Issue:

The latest deployment to production included a fix for AI-14 device.

This fix included the addition of checking each device for its connectivity status through connectivity service for all devices.

Since connectivity service was deployed with only one instance, while the monolith has about 10 instances, it caused a large number of requests from multiple instances to one instance of connectivity service, which resulted in failure of the service, and multiple restarts. This, in turn, caused multiple failed requests.  

Corrective Action:

We increased the replica count for the connectivity service from 1 to 10, which spread the load and caused an almost immediate resolution for pending requests.

We created a hotfix in which we added try/catch to the new section which checks the status through the connectivity service from monolith.

We added a condition so that the connectivity service will be checked only for AI-14 devices - the request might include both AI-12 and AI-14 devices.

Time Started: August 15th 2022 10:04 AM EST

Time Ended: August 15th 2022  11:37 AM EST

Please use this massaging while communicating to partners.

This notice was delivered to internal employee only.

For more information please reach out to the Technical support team.