Capacity planning and Access Gateway

Capacity planning is a series of tasks that you perform to assess current and future resource consumption requirements. It helps you determine the system resources that you need to meet user demand, respond to peaks in demand, and enable growth as demand increases over time. Capacity planning also helps you avoid having excess capacity sitting idle, which increases your storage and maintenance costs.

The two primary factors in capacity planning are sizing and scaling. Sizing is the calculation of memory, hard-disk space, CPU processing power, and throughput capability needed to support your Access Gateway deployment. Scaling is increasing or decreasing an Access Gateway cluster size to optimize consumption of system resources.

To perform capacity planning, determine your current access rates and use the Access Gateway consumption data to extrapolate future consumption scenarios.

This topic provides consumption data and system requirements for Access Gateway to help you perform capacity planning calculations for your environment.

Estimate current access rates

Before you can predict future resource consumption requirements, you need to establish a baseline. This baseline is the levels of your current consumption. Calculate how many accesses and authentications that Access Gateway needs to support. Collect data from a longer period to achieve greater accuracy in your future predictions.

Gather the following data from your logging system and other sources:
- The high, low, and average number of accesses and authentications for the period
- The hours of the day and days of the month that have the highest demand
- The hours of the day and days of the month that have the lowest demand
- The total number of users who might access Access Gateway.
- The number of users who might use an application daily
- The number of times users access an application each day
- The number of page accesses each user makes during a single session
- The number of applications that Access Gateway protects

Use the data that you collect in these calculation formulas to calculate current figures for the number of users and accesses:

Average accesses	Overall accesses
Average users = total users ÷ estimated daily users	Overall accesses = average accesses x page accesses
Average accesses = average users ÷ estimated daily accesses	Overall average accesses = average users x accesses per day

Classify users by their frequency of access:
- Frequent users: Access the system five or more times per day
- Infrequent users: Access the system once or twice per day
- Rare users: Access the system one to three times per week
Calculate typical access rates for each group of users. For example, if you have 5000 frequent users and they access the system five times each day, you have 25,000 accesses per day from this group.

Estimate hardware requirements

Hardware refers to the amount of memory and hard-disk space, the number of CPUs and cores, and the level of throughput you need to ensure optimal performance for your Access Gateway deployment.

Memory requirements

Item	Minimum requirement for production environments
Operating system, Access Gateway engine, and micro-services	1.5 gigabytes
Cached sessions	128 megabytes
Average session size	1024 bytes
Kerberos tickets. These tickets may be larger, depending on the number of IIS applications accessed.	1024 bytes

Calculate web application and Kerberos session caches

To calculate the space required for the session cache, multiply the total number of sessions by the average session size, and then double the figure.

Each session is calculated from the number of users, the percentage of user sign-in events per day, and the number of applications that the users access.

To calculate the space required for the session cache, use this formula: Total sessions x (average session size x 2)

Here are some scenarios that show how to use this formula:

Users	Percentage of sign-in events per day	Applications accessed	Total sessions	Session size	Session cache
5,000	50%	5	Percentage of user sign-ins per day x applications accessed 12,500	1024 bytes x 2	approx. 25 MB
10,000	75%	10	75,000	1024 bytes x 2	approx. 150 MB
25,000	50%	100	125,000	1024 bytes x 2	approx. 500 MB

Users

Percentage of sign-in events per day

Applications accessed

Total sessions

Session size

Session cache

5,000

50%

Percentage of user sign-ins per day x applications accessed

12,500

1024 bytes x 2

approx. 25 MB

10,000

75%

75,000

1024 bytes x 2

approx. 150 MB

25,000

50%

100

125,000

1024 bytes x 2

approx. 500 MB

Kerberos apps add more caching requirements.

Users	Percentage of sign-in events per day	IIS applications accessed	Total sessions	Session size	Session cache
10,000	50%	5	Percentage of user sign-ins per day x applications accessed 25,000	1024 bytes	approx. 50 MB

Users

Percentage of sign-in events per day

IIS applications accessed

Total sessions

Session size

Session cache

10,000

50%

Percentage of user sign-ins per day x applications accessed

25,000

1024 bytes

approx. 50 MB

Session considerations:

Sessions are cleared using a Least Recently Used (LRU) algorithm. When the cache is full and new sessions are created, the oldest idle session is removed.
The Session Monitoring logger raises alerts for full and near-full cache conditions. You can find statistics in the Access Gateway Management console. Consider increasing the appliance memory to reduce the incidence of full caches.
Always consider the times when loads typically increase: - Time of year when employees are enrolled - Mornings - After lunch sign-in events, and so on

Hard disk requirements

Access Gateway requires hard disk space for the following elements. Determine the number and size of these elements over a given time period:

Software: Access Gateway software and the operating system. The software file size is usually small.
Backups: Performed nightly and retained for 30 days. Backup file size is usually small.
System Log: Log files are spooled to the local disk, and include Authentication, Authorization, Audit, Access, Session, and All Log files. Typically, there's one entry for each HTTP/HTTPS request. Consider the number of users, the number of times users access applications, and the number of page views they make for that application.
Log archives: These are maintained for 30 days, and are rolled and compressed.

Calculate log entry sizes and growth

Use these calculations to estimate how much your disk size needs will grow. Replace the example figures with your own. A reasonable rule is to allocate twice the expected consumption plus additional overhead space for software updates, configuration, and backups. In this example, there would be a disk requirement of approximately 14 gigabytes plus 10% to 20% extra, or roughly 17–20 gigabytes of growth each month.

User base	10,000
Access rate	75%
Applications per user	10
Accesses per day	User base x access rate x applications per user 10,000 x 0.75 x 10 = 75,000
Typical log entry size (Access Gateway, authorization, and authentication sessions)	1024 bytes x 3 = 3072 bytes
Disk size growth per day	Accesses per day x typical log entry size 75,000 x 3072 bytes = 76,800,000 bytes or approximately 230.4 megabytes per day
Disk size growth per month (30 days)	Disk size growth per day x 30 230.4 x 30 = 6.912 gigabytes
Disk size growth per year (365 days)	Disk size growth per day x 365 = 84.096 gigabytes

Hard disk considerations:

Monitor logger alerts on low disk levels to avoid low disk warning size for maximum or peak requests. The check runs hourly and gives warnings at 70% usage and alerts for 90% usage.
Every HTTP request results in audit and access logs.
Faster disk IO improves throughput.
Session size affects audit logging with authorization and audit logs contain session contents.
Don't be conservative when calculating hard-disk sizing. Allocate double the estimated disk requirements to avoid burst and large page requests resulting in low disk space warnings.

CPUs and cores

The Access Gateway engine autoscales across CPUs, which results in one worker per CPU. Each additional core results in an additional thread, which allows additional processing.

More CPUs or cores improve capacity.
Network throughput is typically the processing bottleneck, not CPU processing.

Throughput

Throughput is the rate of data delivery through the network. In Access Gateway, the data delivered through the network include authentication, authorization, and return content data:

Authentications: The number of SAML assertions processed (from Okta to each application)
Authorizations: The number of policies checked per HTTP request (all HTTP requests)

Each authentication and authorization request uses approximately 1024 bytes, resulting in 2048 bytes of returned data. You can calculate network throughput using this formula:

Sign-in attempts per second x (authentications + authorizations per second + returned data size)

To calculate the average network bandwidth, use this formula:

Average response size x average request arrival rate

If the average response is approximately 20 kilobytes in size, then for 500 requests the result is 20 kilobytes x 500, which equals 10 megabytes per second.

You can find exact data, including the time to perform a request and the size of the return data, in the AuthN and All logs.

Estimate instance size

The number of users and authentications that each instance size can support depends on many complex factors. Factors that affect capacity include the relationship between total disk and memory size available on the server, processor and network connection speeds, and so on.

Access Gateway allows you to select a configuration that suits your environment. You may deploy one or two instances of Access Gateway on hardware with multiple cores and larger memory modules. Or, you may deploy Access Gateway on many smaller instances spread across multiple servers in a cluster.

For help with determining how many users your Access Gateway instances can support, use the calculations on this page or contact Okta Support for help with your environment.

This table describes the minimum hardware configurations required to support different Access Gateway instance sizes. The number of users and apps shown in the table are provided as guidelines only. Actual performance depends on factors unique to your environment.

Use	Physical/virtual hardware	AWS equivalent
Proof Of Concept	1 instance with the following hardware: 2 cores of 2 gigabytes of memory 220 gigabytes (default) hard drive Single 1 Gbps NIC	t2.medium
Small 1000–5000 users 1–10 apps	2 instances, each with the following hardware: 2 cores of 4 gigabytes of memory 220 gigabytes (default) hard drive Single 1 Gbps NIC	t2.medium
Medium 5000–20,000 users 10–100 apps	Three instances, each with the following hardware: Two cores of 8 gigabytes of memory 220-gigabytes hard drive Single 1 Gbps NIC	m4.large
Large 20,000 users or more 100 apps or more	Three instances, each with the following hardware: Four cores of 16 gigabytes of memory 220-gigabytes hard drive Single 1 Gbps NIC	m4.xlarge See AWS Instance Types.

Estimate scaling requirements

Scaling is the task of increasing or decreasing an Access Gateway cluster size.

Scaled vertically: Add or remove memory, disk, or CPUs from a given instance. If excessive logging occurs, use a log forwarding solution and set the log level in Access Gateway to the minimal (warning or error) level. If you need to expand the disk space on the appliance, contact Okta Professional Services.
Scaled horizontally: Add or remove Access Gateway instances from a cluster.

Okta recommends that you deploy all Access Gateway high-availability cluster members with the same CPU, memory, and disk configurations.

Cluster performance tips

The best performance increases result from adding CPUs or using solid-state disks.
To improve overall cluster throughput, Okta recommends horizontal scaling or adding additional Access Gateway instances. For example, you can double the capacity of a two-node cluster that handles 1500 requests by adding two additional nodes with the same CPU, memory, and disk configuration.
In general, horizontal scaling is linear due to Access Gateway's use of sticky sessions (session affinity). Access Gateway doesn't share sessions between nodes.

Capacity may be limited by other factors not related to Access Gateway, such as network throughput or back-end application performance. See Network interfaces for more information on Access Gateway networking and expanding networking throughput.