High availability for gateways
This is an Early Access feature. To enable it, contact Okta Support.
Gateways use a variety of techniques to ensure high availability, which provides improved reliability over traditional SSH bastions.
- Gateway load balancing
- Gateway status checks
- Create temporary access to a server without using a gateway
- Troubleshoot gateways
- Related topics
If you have multiple gateways that can be used by a project, then one of those gateways is selected at random each time a client attempts to access a server. This has the effect of automatically load balancing requests across all gateways that are valid for a project. If one gateway goes down, other gateways can continue to serve requests. However, without additional configuration, requests sent to the downed gateway will fail.
To ensure that requests are properly handled, you should configure gateway load balancing similarly to how you'd configure a web application server behind a load balancer. Configure standard load balancing by doing the following:
- Ensure that clients can connect to the load balancer on port 7234 or another port that you configure.
- Ensure that the load balancer can connect to all your gateways on port 7234 or another port that you configure.
- Ensure that your gateway can connect to your target servers using SSH on port 22.
- Configure all of your gateways to use the same AccessAddress. This must be the address that's used to access the load balancer, which is usually a domain name or static IP. Different ports are treated as different addresses.
- In addition to being necessary for clients to know how to connect to the gateway, this setting must be accurate so that the host certificate used to prove the gateway’s identity to the client is considered valid.
- Host certificates for multiple gateways are also considered to be valid if they share the same default address, which is detected using cloud instance metadata if using Amazon Web Services (AWS) or Google Cloud Platform (GCP).
After setting up your gateways behind a load balancer, it's recommended that you add additional health checks to remove gateways from the load balancer pool if they're not healthy or reachable. Some examples of good health checks are:
- Ensure that the gateway is listening on the correct port (7234 by default).
- Ensure that the gateway has sufficient storage to store logs if using local storage, or to temporarily store logs before moving to cloud storage.
- Ensure that gateway memory and CPU utilization are not excessive.
See your cloud provider and load balancer documentation for how to implement health checks for your platform and tools.
Gateways automatically send data about their health to Advanced Server Access, which you can use to manage them. You can see the most recent health data for a gateway by viewing its details in the Advanced Server Access dashboard.
In addition to helping manage your gateways, gateway status checks control whether a particular gateway is selected when a user attempts to access a server using a gateway.
Gateways report their status every two minutes. If more than five minutes have passed since a gateway last reported its status, it's automatically removed from the pool. If more than five minutes have passed since any gateway has reported their status, then connections are sent to the gateway that most recently reported its status in the past 24 hours. If no gateway has reported their status in the past 24 hours, then the connection fails.
In the unlikely event that all of the mechanisms used to support high availability fail and you need to allow restricted access to a server without going through a gateway, you can do the following to allow temporary, audited server access:
- Create a temporary group that contains only the users who need to be granted temporary access.
- Remove all groups from the project that the server to access is enrolled in.
- Add the temporary group to the project.
The new authorization configuration is synchronized to your servers. This process can take a few minutes, after which members of the temporary group can access the server. After the incident is resolved, you can restore the old configuration.
Note: Any connections made to the server using this method do not pass through a gateway, and these sessions are not logged. However, who connected to the server and when the connection took place are still logged to the Advanced Server Access events log.
You can configure Advanced Server Access to allow authorized users to SSH to a gateway to troubleshoot it. To do so, use the Advanced Server Access server agent to manage access to the gateway and enroll the gateway in a project that doesn't require gateways. Provided the server is active and accessible, users who belong to the gateways project can connect to the server that contains the gateway using SSH.