This topic describes best practices when deploying the Okta RADIUS Server agent. While the topic uses the Cisco ASA VPN as a VPN Device and F5 as the Load Balancer, customers may replace these with other similar products configured adequately.
For installation information, see Installing and Configuring the Okta RADIUS Server Agent.
For information on the Okta RADIUS app, see Using the Okta RADIUS App.The app distinguishes between different RADIUS-enabled apps and supports them concurrently by setting up an Okta RADIUS app for each configuration and supports policy creation and then assigning RADIUS authentication to groups. With the Okta RADIUS App, you can configure a RADIUS-enabled app to only use the second factor in multifactor authentication (passwordless mode).
The Okta RADIUS Server agent:
- Delegates authentication to Okta using single-factor authentication (SFA) or multi-factor authentication (MFA).
- Installs as a Windows or Linux service
- Supports the Password Authentication Protocol (PAP),
Extensible Authentication Protocol Tunneled Transport Layer Security (EAP/TTLS), and
Extensible Authentication Protocol/Generic Token Card (EAP/GTC)
- Communicates via UDP, over default port 1812, and supports multiple ports simultaneously.
You should use the Okta RADIUS Server agent for authentication, when authentication is being performed by:
- VPN devices that don’t support SAML
- Virtual Desktops and Reverse Proxies that don’t support SAML
Active Directory user, Okta Verify OTP
- User sends credentials to VPN device connected to Okta via RADIUS
- VPN device forwards user credentials to the Okta RADIUS Server Agent
- Okta RADIUS Server Agent uses Okta APIs to validate credentials
- Okta validates user credentials
- Okta APIs respond with MFA challenge based on configured policy
- RADIUS Server Agent sends challenge to VPN device
- VPN device presents RADIUS challenge to end user
- VPN device sends RADIUS challenge response to Okta RADIUS
- Okta RADIUS sends response to Okta APIs to be validated
- Okta APIs respond with correct/incorrect for the response
- Okta RADIUS sends ACCEPT or REJECT to the VPN device
See the RADIUS Server Agent Throughput And Scaling section for sizing guidance
Active-Passive failover behind a VPN such as Cisco ASA
This is the simplest deployment model and is sufficient for environments that don’t have high throughput requirements beyond what a single active Okta RADIUS Server Agent can provide.
In this approach, configure one Okta RADIUS Server Agent as the active server on the VPN device, along with another Okta RADIUS Server as passive failover. The total throughput is capped by what a single RADIUS Server Agent can achieve.
Active-Active behind a load balancer – high throughput
Some examples and terms assume F5 load balancer with Cisco ASA VPN client
For best throughput and availability we recommend deploying two or more Okta RADIUS Server Agents behind a load balancer. This approach allows horizontal scaling by adding additional RADIUS Server Agents into the load balancing pool and distributing the traffic load evenly between them. Number of RADIUS Server Agents will depend on the anticipated volume and peak transactions per minute.
- Virtual networks
- Session Persistence
- Load balancing method
- Health check
Set up a separate Virtual Server for each device sending RADIUS requests. Create a separate server pool for each virtual server.
Load balancing should be done using session persistence (aka sticky sessions) based on the end-user’s VPN client or IP to optimize performance, especially in situations where waiting for user input to 2FA challenge is done off-band (e.g. Okta Verify w/ Push). The Okta RADIUS Server Agent handles de-duplication of requests from the originating RADIUS client, however, if those are spread between multiple agents, they are only de-duplicated at Okta service side resulting in unnecessary load. See Load Balancer Session Persistence Notes below for more detail.
Recommended configuration for stickiness is generally using the Calling-Station-ID combined with the Framed-IP. Calling-Station-ID for many VPNs will be the client IP address of the originating client. If a different RADIUS attribute is storing the client IP address, then configure the load balancer to use that attribute instead.
We recommend setting load balancing method of Least Connections where available to distribute load on active RADIUS Server Agents..
Use load balancer health check function with synthetic logins to ensure that in case of RADIUS Server Agent issue a failover occurs seamlessly and with minimum user impact. Each Virtual Server should have it’s own health check over its respective port. To configure your load balancer or RADIUS client to do health checks, create a user account that will be used only for this purpose. We recommend:
- Create a user with no assigned groups, no application access, and no privileges beyond a basic user account.
- Create a strong, unique password for the health check user account
Note: the password and username cannot contain a hash (#) character
- Create a custom RADIUS application for triaging this inbound healthcheck
- Assign this user to the RADIUS application (thereby allowing access)
The purpose of this account is to validate that the RADIUS client can access the Okta service and field an authentication request appropriately. Typically health check should only involve primary authentication, since second-factor transactions usually require some form of user input or dynamic response.
Set load balancer to remove RADIUS server out of rotation after 2 consecutive failures. Set load balancer to add server back in rotation after 1 successful response from server
For best overall system availability, consider a redundant system configuration for the load balancer to avoid a single point of failure. Please see load balancer vendor documentation for recommendations.
For F5’s overall recommendation for RADIUS load balancing, please refer to the F5 RADIUS Load Balancing documentation. F5 supports an iApp for managing RADIUS volume. This iApp also supports automated healthchecks via synthetic transactions to ensure that the end users are able to authenticate.
Use Session Persistence where possible
When deploying Okta RADIUS Server Agent with a load balancer, Okta recommends using session persistence (aka sticky sessions) based on the end-user’s VPN client or IP to optimize performance, especially in situations where waiting for user input to 2FA challenge is done off-band (e.g. Okta Verify w/ Push). The Okta RADIUS Server Agent handles de-duplication of requests from the originating RADIUS client. However, if the requests are spread between multiple agents, they are only de-duplicated at Okta service side resulting in unnecessary load for both the RADIUS Server Agents and the Okta Service; this extra load would also count against your rate limits on the Okta Service. Recommended configuration for stickiness is generally using the Calling-Station-ID combined with the Framed-IP. Calling-Station-ID for many VPNs will be the client IP address of the originating client. If a different RADIUS attribute is storing the client IP address, then configure the load balancer to use that attribute instead.
Caveats when Session Persistence is not set up
While we recommend a load balancer as it provides high availability and horizontal scale, it is possible to deploy the RADIUS Server Agent behind a load balancer without persistence, and this is still preferable to not using a load balancer at all, but readers should be aware that this model will forfeit most of the benefits of request de-duplication Okta RADIUS Server Agent performs at the agent level.
RADIUS uses connectionless UDP protocol and most clients will automatically resend requests on a periodic interval until they've received a response from the RADIUS Server Agent. If these "retries" get load balanced to different RADIUS Server Agents, each agent is going to be simultaneously doing the same work (processing the same RADIUS request), and the first one to get a response from Okta and send a response back to the client will "win".
Normally the first RADIUS Server Agent to receive the original request will be the first to respond, because it will make the call to Okta and get the response back before a retry from the client is ever issued. However, when using Okta Verify with Push factor, the RADIUS Server Agent which receives the request will sit and poll Okta until the user confirms/denies the push request on their phone. During this time, the RADIUS client is likely to send retries of the same push MFA request. In this scenario, if the retries are sent to the same RADIUS Server Agent, then the agent is smart enough to recognize those as duplicate packets and drop them immediately. But if the retry is load balanced to a different RADIUS Server Agent, that agent will process the request as a net new request and initiate the push notification again. In order to minimize the effects of this behavior, Okta recommends that you set the RADIUS client retry interval to 30 seconds or higher if you deploy in a load-balanced environment that does not support stickiness. This generally gives the end-user enough time to receive the push notification and respond to it before the RADIUS client starts sending retries.
Another possibility for race conditions (in the absence of load balancer persistence) is if a particular RADIUS Server Agent becomes backlogged with a large queue of requests. This can happen if there are not enough worker threads configured on the agent, or if those threads are all consumed by long-running requests such as Okta Verify with Push or slow responses from the Okta service such as where the Okta service has to round-trip back into your on-prem Active Directory agent in order to authenticate the user, and then respond back to the RADIUS Server Agent which then has to respond back to the RADIUS client. In this case, retries again are a concern because if they are load-balanced to other agents, it depends on which agent gets around to processing the request first. Again this is generally safe no matter which agent "wins", but it will be harder to debug the system as a whole.
The benchmarking results below can be used to determine the type of server (or servers) needed to support the peak authentication-events per minute your environment is being designed to accommodate.
The Okta RADIUS Server Agent has been benchmarked on an AWS t2.medium instance (see https://aws.amazon.com/ec2/instance-types/), which represents a modest baseline of hardware specs (2 vCPU cores, 4GiB memory). The benchmark was performed at Okta RADIUS Server Agent default settings, which are typically suitable for most customers.
These benchmarks were run using JMeter to simulate a real end-user login flow via the Web VPN login (browser) > Cisco ASA > RADIUS Server Agent > Okta.
System specification: Amazon EC2 t2.medium (2 vCPU, 4 GiB memory), Windows Server 2012, Okta RADIUS Server Agent v.2.7.0 (thread count: 15, connection pool size: 20)
|Arrival Rate (per second)||Factor||Error rate % (Primary/MFA)||CPU % (peak)||Memory Use MB (peak)|
|6.5||Okta Verify w/ Push||0 / 0||3||20|
|25||Security Question||0 / 0||3||20|
The RADIUS Agent has a pool of worker threads and accepts incoming requests via a queue. Because the throughput depends on a lot of factors both internal and external to the agent (how many authentication threads are in the worker pool, how long each request to the Okta service takes, how long an end-user takes to respond to a push MFA notification, etc.), actual results can vary. It is important to test throughput in your own deployment and tune the agent according to how it performs in your own environment.
The RADIUS Agent connects to the Okta Service via REST APIs, and is subject to the same rate limits as any other HTTP client. If your capacity requirements are high, you can horizontally scale by adding additional RADIUS agents and spreading load between them, but note that they will each count independently against your total allowed API calls on the Okta Service. See https://help.okta.com/en/prev/Content/Topics/Security/API.htm#api_rate_limiting for more information. If the RADIUS Agent is rate limited, it will return an ACCESS-REJECT response for those requests.
Problem: The RADIUS Server Agent will not install
- Ensure you are installing on one of the supported Windows or Linux versions for Okta RADIUS.
- Windows Server 2003 R2
- Windows Server 2008 R2
- Windows Server 2008 R2 Core
- Windows Server 2012 R2
- Windows Server 2016
- Red Hat Enterprise Linux release 8.0
- CentOS 7.6
- Ubuntu 18.04.4
- Use the full Okta URL under “Custom” instead of just subdomain under “Production” in the installer.
- Check for the presence of a proxy server, the RADIUS Server Agent installer is sensitive about proxies.
- Check for a SSL interception device like a Palo Alto or FireEye. This is related to certificate pinning and affects all agents.
- Try a different server in the environment just to eliminate any local machine issues.
- Make sure there are no leftover files under c:\program files (x86)\Okta\Okta RADIUS\ from a previous failed install.
- Check Windows services.msc to make sure there isn’t a bad Okta RADIUS service leftover from a previous install (rare).
- Try another version of the RADIUS Server Agent like like the newest EA version.
The Okta RADIUS agent can be installed on the following Windows Server versions:
Windows v2008 is not supported.
The Okta RADIUS agent has been tested on the following Linux versions:
Problem: VPN device can’t reach RADIUS Server Agent
- The RADIUS Server Agent is running but the RADIUS client device cannot reach it (note: different than failing logins)
- Check the Okta RADIUS logs under C:\Program Files (x86)\Okta\Okta RADIUS Agent\current\logs\ to see if any connections are being made. Any connection, even failed ones, should show up.
- Double check the server name/server IP entered into the VPN device, just to make sure it was keyed in correctly.
- Verify the status of the Windows firewall on the Okta RADIUS Server Agent server to make sure it is not blocking the connection.
- Verify that the VPN device and the server can reach each other via ping or ask for a network admin to verify network connectivity.
- Configure the RADIUS server using the IP address instead of the hostname. There are networks where DNS is limited and hostnames will not resolve.
- Determine if network layer issues are preventing connection with network engineer (NTRADPing can be helpful here).
Problem: Correct credentials fail to authenticate
- The RADIUS Server Agent is rejecting valid login attempts
- Verify the user is assigned to the RADIUS App in Okta.
- Verify the user is enrolled in MFA.
- Verify the shared secret on both the Okta RADIUS Server Agent and on the VPN device. A mismatch will cause all authentications to fail.
- Check the local RADIUS logs.
- Also look for any errors that could indicate the API token expired.
- If you see a malformed username in the logs, like the user sent “bob” but the log shows a “Á” this indicates that the server is using MSCHAPv2 to encode the username. Check the VPN device configuration to make sure only PAP authentication is enabled.
- Check the Okta syslog to see why the connection was rejected.
- Check VPN device for any settings that could/would restrict login.
Problem: User not prompted for preferred factor
- The server or client doesn’t support RADIUS challenge
- OpenVPN server does support RADIUS challenge but the free client that is included with it does not support the method and fails.
- Some versions of Cisco’s AnyConnect VPN client have issues with challenge. It is sporadic and upgrading to the latest version usually fixes it.
- VMWare View prior to version 5.1 does not support RADIUS challenge.
- This is not true two-factor auth unless it is paired with AD/LDAP auth! This may or may not be a concern.
- For information on 2FA (to use only the second factor in MFA), see Using the Okta RADIUS App.
Problem: Changes to RADIUS agent config.properties not taking effect.
- Changes have been made to RADIUS agent config.properties file, but these changes are not being reflected in the RADIUS Agent.
- The RADIUS Agent must be restarted after making any changes to the config.properties file.
- Changes made in the associated app in the Okta org do NOT require an agent restart.
However, the agent may take a few minutes before it retrieves the updated configuration.
- For more information about RADIUS Agent properties see the Additional Properties section in Install and configure the Okta RADIUS Server agent.
RADIUS logs are helpful when troubleshooting
- Windows logs can be found in:
C:\Program Files (x86)\Okta\Okta RADIUS Agent\current\logs
okta_radius.log contains authentication messages, errors, etc.
- Linux logs can be found in:
To gather all logs together use a command similar to:
$ tar -zcvf logs.tar.gz /opt/okta/ragent/logs
- Okta Syslog
- Device logs (Cisco/F5/Netscaler/etc)
To increase the logging level:
- Open the log4j.properties file from the installation folder
Windows: C:\Program Files (x86)\Okta\Okta RADIUS Agent\current\user\config\radius\.
- Change all three instances of info to debug. Which, when updated, should resemble:
- log4j.logger.app=debug, app
- log4j.logger.access=debug, access
- log4j.rootLogger=debug, app, stdout
- Open the log4j.properties file from the installation folder
The Okta logs will let you know if we are passing the credentials to an AD agent.
Look for keywords, such as username used to authenticate via RADIUS, and then error messages or warnings.
Logging levels can be managed by editing the log4j.properties file.