Medchat - System Inaccessible – Incident details

System Inaccessible

Resolved
Major outage
Started about 1 year agoLasted about 1 hour

Affected

Authentication

Major outage from 4:35 PM to 5:21 PM

Medchat Auth Application

Major outage from 4:35 PM to 5:21 PM

Google SSO

Major outage from 4:35 PM to 5:21 PM

Custom OIDC SSO

Major outage from 4:35 PM to 5:21 PM

Custom SAML SSO

Major outage from 4:35 PM to 5:21 PM

Live Chat

Major outage from 4:35 PM to 5:21 PM

Updates
  • Resolved
    Resolved

    Throughout the morning of 10/3/23 and in the weeks leading up to this incident, the support team received automated alerts from the system regarding elevated (though not critical) DB resource consumption. The elevated DB consumption would quickly return to acceptable levels without affecting any users.

    After observing the pattern over several weeks, the support team applied a request to scale out the DB, an operation which was not expected to cause any downtime. The scaling operation took approximately 18 minutes, during which the system was largely inaccessible to MedChat users. While the databases were still available during the scaling operation, they appear to have had a reduced capacity (further investigation still in progress). Once scaling completed, the system returned to normal operations.

    In the future, all DB scaling will be performed off-hours to avoid the potential for service disruptions.

  • Monitoring
    Monitoring

    The root cause appears to be a downtime from scaling out DB resources. The team is continuing to investigate the outage, which was not expected from the scaling operation.

    The scaling operation is now complete and the system appears to be back to normal operations. Continuing to monitor.

  • Investigating
    Investigating

    We are currently investigating this incident. Users have been logged out of the application and are no longer able to sign in.