Capacity planning and sizing
Determining required capacity in your Okta Access Gateway implementation is crucial to achieving performance.
For concepts on capacity planning and sizing, see Capacity planning and sizing.
Estimating access rates
Average access rates represent a lower bound on how many accesses a given instance of Access Gateway needs to support. You can estimate average access rates by looking at the sets of users that access the system.
To estimate the average access rate, determine:
- Total users: How many users does this instance serve in total? This value represents the total number of users who might ever access the gateway.
- Estimated daily users: The percentage of users who use an application on a given day.
- Estimated daily accesses: The number of times a specific user accesses an application on a specific day.
- Page accesses for each session: For a given set of authenticated users, what is the expected number of page accesses during a single session?
With these concepts in mind, you can estimate an average authentication rate as:
- Average users = Total users / Estimated daily users
- Average accesses = Average users / Estimated daily accesses
Extrapolate overall accesses by examining:
- Overall accesses = Average accesses * Page accesses.
Consider grouping users by the frequency that they access the system:
- Frequent users: Frequent users access the system regularly, typically multiple times per day.
- Infrequent users: Infrequent users access the system on occasion but with a much lower frequency.
- Rare users: Rare users access the system a maximum of one to three times a week.
The following sample demonstrates how to produce an estimate:
- Assume the total number of users is 10,000.
- Frequent average accesses = 10,000 * the number of frequent users.
If we assume that 50% of users are frequent users, then we have a baseline of 5,000.
Frequent users typically access the system at least five times per day. Use this to calculate the number of frequent users as:
- Frequent users * accesses/day = 5 * 5,000 = 25,000
Infrequent users access the system two or three times a day, and represent another 25% of the user base.
- Infrequent users * accesses per day = 10,000 * 25 = 2500 in frequent users each of which accesses the system three times, up a total of 7,500 accesses per day.
Rare users represent the remaining 25% of the user base. These users access the system a maximum of once a day, but typically only access the system every several days.
Rare accesses = total rare users, * total accesses * rarity of access, of once every other day = 2,500 * 1 * .5 = 1,250 total accesses.
You can estimate peak daily uses as:
- Frequent accesses: 25,000
- Infrequent accesses: 7,500
- Rare accesses: 1,250
- This produces a total daily accesses of 33,750.
Sizing
When calculating sizing, consider the following areas:
- Cores - Total number of CPUs/cores.
- RAM - Typical memory requirements.
- Storage - How much disk space is required? Primarily for logging purposes.
- NIC - What are throughput requirements?
Terms and definitions:
Term |
Description |
---|---|
Application |
A reference to an application as defined by the Access Gateway Admin UI console and listed on the Applications page. |
Authentication |
Authentication or AuthN, is the process of establishing identity. Authentication occurs the first time a user access a given application within Access Gateway. Authentication may also occur at other times when accessing application resources. |
Authorization |
Authorization, of AuthZ, is the process of determining access rights to a given page or resource. Authorization occurs every time a user attempts to access an application resource such as a page |
Session |
Access Gateway session or simple session, refers to the information maintained and used by Access Gateway. Typically this includes all the information in a traditional HTTP(s) session as well as Access Gateway specific session data, such as attributes and possibly Kerberos tickets (when in use). |
Memory Sizing
Access Gateway appliance memory use is divided into:
- OS, Access Gateway engine, and micro-services, with 1.5GB considered as the minimum for production environments.
- Cached Sessions: 128MB minimum.
Since OS, core Access Gateway, and micro-services memory is fixed, determining memory requirements is primarily focused on cache session sizing.
To determine memory size examine:
- Total sessions - The maximum number of in memory sessions at any given time.
- Average session size - The average expected size of any given session.
Total sessions are calculated using:
- Number of users
- Percentage of user sign-in events per day
- Applications accessed
- Total sessions = #users x % sign in per day x applications accessed
or
Session size is a function of:
- Application session and application attributes, with default size of ~1024b
- Kerberos tickets (where applicable), ~1024b, but are often larger based on number of IIS applications accessed
and
Session cache then becomes:
- Session cache = Total sessions * (average session size * 2)
For example:
Web Application - Session Cache | |||||
---|---|---|---|---|---|
Users | Percentage sign-in events/Day |
Applications Accessed |
Total Sessions(Users % logins/day * applications accessed) |
Session Size |
Session Cache |
5,000 | 50% |
5 |
12,500 |
1024B*2 |
~25MB |
10,000 |
75% |
10 |
75,000 |
1024B*2 |
~150MB |
25,000 |
50% |
100 |
125,000 |
1024B*2 |
~500MB |
Kerberos, when used, adds additional caching requirements:
Kerberos Apps - Reserved | |||||
---|---|---|---|---|---|
Users | Percentage sign-in events/Day |
IIS Applications accessed |
Total Sessions(Users * logins/day * applications accessed) |
Session Size |
Session Cache |
10,000 |
50% |
5 |
25,000 |
1024B |
~50MB |
Total application memory should then include at least 1.5GB for fixed requirements and session cache plus Kerberos requirements.
Session considerations:
- Sessions are cleared using a Least Recently Used (LRU) algorithm
.
When cache is full and new sessions are created, the oldest idle session is removed. - Session Monitoring
logger raises alerts for cache near full and full conditions.
you can find statistics in the management console.
Consider increasing appliance memory to reduce cache full situations. -
Always consider peak session usage situations
and plan accordingly.
Consider peak conditions and size for those conditions. For example, consider load, such as the time of year when employees are enrolled, or mornings and after lunch sign-in events and similar situations.
Hard Disk Sizing
Overview
Access Gateway requires hard disk for software, system logs/log archives, and backups.
Disk use comprises of:
- Software, including operating system, and Access Gateway.
- Backups, performed nightly and retained for 30 days.
- System log output, spooled to local disk, and including Audit, Access and All Log files.
- Log archives, maintained for 30 days, rolled and compressed.
Software and backup size requirements are typically small making the primary consideration log sizes.
Log Entries
Log entries primarily contain session information, Authentication (AuthN), and Authorization (AuthZ) content. Typically, one entry per HTTP(s) request.
To correctly size a disk, the number and size of these entries over a given time period must determined.
To determine the size of log entries, you must consider the number of system users and the count of times users access applications (AuthN entries) and the number of subsequent page views for that application (AuthZ events).
Overall, the composition of a log entries is based on:
- Session - Each log entry includes Access Gateway session information.
- AuthN - Authentication audit and logging information.
- AuthZ - Authorization log information, including resource accessed and policy rules.
From a disk use perspective, the size of each of these is the other important question.
Examples
Let's look at an example.
If a given user base is 10,000 users, of which 75% access the system per day, and each user accesses 10 applications on an average. In this case, you can determine average daily accesses as:
- Active Users = total users * access percentage
10,000 * 75% = 7500 active users.
Each user accessing 10 applications:
- Accesses per day = Active Users * applications accessed
7500 * 10 = 75,000 accesses per day.
Assuming that each access requires a session, authentication, and authorization, you can then determine an estimated log entry size as:
- Log entry size = Access Gateway session size + AuthN size + AuthZ size + some small formatting overhead.
For a more realistic example, let's assume Access Gateway sessions, AuthZ, and AuthN sizes are roughly the same and ~1024B each. Then, each access would require approximately 3K bytes.
Assuming that access is more or less evenly distributed over the course of 24 hours, there are approximately 75,000/24 or 3125 accesses per hour.
If each access measures 3K bytes in size, then the hourly growth of a log becomes 3k*3125 or 9.6MB/hour or approximately 230mb a day. Assuming a consistent access pattern, you would need ~7,000MB of disk/month for log and log related content..
A reasonable rule is to allocate twice expected consumption plus additional overhead space for software updates, configuration, and backups. In the given example, there would be a disk requirement of approximately 14GB plus 10-20% or roughly 17-20GB/month.
Hard Disk considerations:
- Monitor logger alerts on low disk to avoid low disk warning size for maximum or peak requests. The check runs hourly and gives warning for 70% and alerts for 90%.
- Every HTTP request results in audit and access logs.
- Faster disk IO improves throughput.
- Session size affects audit logging with authorization and audit logs contain session contents.
- Don’t be conservative in Hard Disk sizing, allocate 2x estimated disk requirements to avoid burst and large page requests resulting in low disk warnings.
CPU Sizing
The Access Gateway engine autoscales across CPUs, which results in a worker per CPU. Each additional core results in an additional thread allowing for additional processing.
CPU considerations
-
More CPU/Cores will improve capacity.
- Network throughput is typically the bottleneck.
Throughput Sizing
Throughput is a direct function of AuthN, AuthZ, and return content.
- AuthNs = SAML assertions processed (from Okta to each application).
- AuthZs = Policy check per HTTP request (all HTTP requests).
Assuming the following values (see logs for actual values):
AuthN Bytes | AuthZ Bytes | Returned Data |
---|---|---|
1024B | 1024B | 2048B |
Network throughput becomes a function of:
- Sign in /second * AuthN size
- AuthZ requests/second
- Average returned data size
Total network throughput then becomes a function of:
- Sign in/sec * (AuthN + AuthZ/sec + returned data size).
Assuming that the dominant factor in network access is the amount of data returned per request and an average response is~20KB. For 500 requests the result becomes 20KB * 500 or 10 MB/s.
Simplifying:
- Average network bandwidth = Average response size * Average request arrival rate.
Network Requirements | ||||
---|---|---|---|---|
Requests/ second | AuthN Size + AuthZ Size + Returned Data |
Total |
||
500 |
1024B |
1024B |
2048B |
~20MB/S |
Exact timing information can be found in the AuthN and All logs. Total time to perform a request and return data is also tracked.
Instance Sizing
Consider the following table when sizing instances:
Use | Physical/virtual hardware | AWS Equivalent |
---|---|---|
Proof Of Concept |
1 instance of |
t2.medium |
Small | 2 instances of 2 cores at 4G memory, 220G HD, each with single 1 Gbps NIC each |
t2.medium |
Medium | 3 instances of 2 cores, at 8G memory, 500G HD, each with single 1 Gbps NIC |
m4.large |
Large | 3 instances of 4 cores, 16G memory, 500G HD each with single 1 Gbps NIC |
m4.xlarge See AWS Instance Types. |
Scaling
Scaling is the process of increasing or decreasing an Access Gateway cluster size.
Clusters can be:
- Scaled vertically - Adding or removing memory, disk or CPU from a given instance.
- Scaled horizontally - Adding or removing Access Gateway instances from a cluster.
Okta recommends defining all Access Gateway high availability cluster members similarly with the same CPU, memory, and disk configurations.
When examining cluster performance, consider the following:
- For a given instance, the best performance increases can be made by adding CPU or using solid state disk.
- To improve overall cluster throughput, Okta recommends horizontal scaling or adding additional Access Gateway instances.
For example: For a two node cluster that handles 1500 requests, you can double the capacity by adding two additional nodes with the same CPU, memory, and disk configuration. - In general, horizontal scaling is linear due to Access Gateway's use of sticky sessions (session affinity). Access Gateway does not share sessions between nodes.
Capacity may be limited by other factors not related to Access Gateway, such as network throughput or the back-end application performance.
See Network interfaces for more information on Access Gateway networking and expanding networking throughput.