Capacity planning and sizing

Determining required capacity in your Okta Access Gateway implementation is crucial to achieving performance.

For concepts on capacity planning and sizing see About Access Gateway capacity planning and sizing.

 

Topics:

 

Estimating access rates

Average access rates represent a general lower bound on how many accesses a given instance of Access Gateway needs to support. We can estimate average access rates by looking at sets of users that access the system.

Estimating the average access rate requires the determination of:

  • Total users - How many users are served by this instance, in total. Total users represents all users who might ever access the gateway.
  • Estimated daily users - The percentage of users who actually use an application in a given day.
  • Estimated daily accesses- The number of times a given user accesses an application in a given day.
  • Page accesses per login- For a given set of authenticated users, how many page accesses are expected during a single session?

With these concepts in mind we can estimate an average authentication rate as:

Average users = Total users * estimated daily users.

Average accesses = average users * estimated daily accesses.

We can then extrapolate overall accesses by examining:

overall accesses = Average accesses * page accesses.

For example, consider three groups of users, each accessing the system, but at different levels.

  • Frequent users - frequent users access the system regularly, typically multiple times per day.
  • Infrequent users - Infrequent users access the system on occasion but with a much lower frequency.
  • Rare users. - Rare users access the system.

For example:

Assuming a total number of users of 10,000.

Frequent average accesses = 10.000 * the number of frequent users.
If we assume that 50% of users are frequent users then we have a baseline of 5,000.

Frequent users typically access the system no less then 5 times per day. We can calculate frequent users as:

Frequent users * accesses/day = 5 * 5,000, or 25,000.


Infrequent users are defined as those that access the system 2-3 times a day and represent another 25% of the user base.  Accessing the system twice per day.

In frequent users * accesses/day = 10,000* 25 or 2500 in frequent users each of which accesses the system 3 times, up a total of 7,500 accesses per day.

Rare users represent the renaming users. These users access the system a maximum of once a day, but typically only access the system every several days.

Rare accesses =

2,500 * 1 * .5 (total rare users, * total accesses * rarity of access, of once every other day)  For a total of 1,250 total accesses.

 

We can then estimate peak daily uses as:

  • Frequent accesses: 25,000
  • Infrequent accesses: 7,500
  • Rare accesses: 1,250
  • For a total of accesses per day of 33,750.

Sizing

When calculating sizing we consider the following areas:

  • Cores - Total number of cpus/cores.
  • RAM - Typical memory requirements.
  • Storage - How much disk space is required? Primarily for logging purposes.
  • NIC - What are throughput requirements?

 

Terms and definitions:

Term

Description

Application

A reference to an application as defined by the Access Gateway UI and listed on the Applications page.

Authentication

Authentication, or AuthN, is the process of establishing identity. Authentication occurs the first time a user accessing a given application within Access Gateway. Authentication may also occur at other times when accessing application resources.

Authorization

Authorization, of AuthZ, is the process of determining access rights to a given page or resource. Authorization occurs every time a user attempts to access an application resource such as a page

Session

Access Gateway Session, or simple Session, refers to the information maintained and used by Access Gateway. Typically this includes all the information in a traditional HTTP(s) session as well as Access Gateway specific session data such as attributes and possibly Kerberos tickets(when in use).

 

 

Memory Sizing

Access Gateway appliance memory use is divided into:

  • OS, Access Gateway engine, and micro-services, with 1.5GB considered the minimum for production environments.
  • Cached Sessions: 128MB minimum.

Since OS, core Access Gateway and micro-services memory is fixed, determining memory requirements is primarily focused on cache session sizing.


To determine memory size examine:

  • Total sessions - the maximum number of in memory sessions at any given time.
  • Average session size - the average expected size of any given session.

Total sessions can are calculated using:

  • Number of users
  • Percentage of user logins per day
  • Applications accessed
  • or

  • Total sessions = #users x % login per day x application accessed

Session size is a function of:

  • Application session and application attributes, with default size of ~1024b
  • and

  • Kerberos tickets (where applicable), ~1024b, but are often larger based on number of IIS applications accessed

Session cache then becomes

  • Session cache = Total sessions * (average session size * 2)

For example:

Web Application - Session Cache
Users Percentage Login/Day

Applications Accessed

Total Sessions(Users % logins/day * applications accessed)

Session Size

Session Cache

5,000 50%

5

12,500

1024B*2

~25MB

10,000

75%

10

75,000

1024B*2

~150MB

25,000

50%

100

125,000

1024B*2

~500MB

Kerberos, when used, adds additional caching requirements:

Kerberos Apps - Reserved
Users Percentage Login/Day

IIS Applications accessed

Total Sessions(Users * logins/day * applications accessed)

Session Size

Session Cache

10,000

50%

5

25,000

1024B

~50MB

Total application memory should then include, at a minimum, 1.5GB for fixed requirements and session cache plus Kerberos requirements.

 

Info

Note

Session considerations:

  • Sessions are cleared using a Least Recently Used, or LRU, algorithm .
    When cache is full and new sessions are created, the oldest idle session is removed.
  • Session Monitoring logger will raise alerts for cache near full and full conditions.
    Statistics can be found in the management console.
    Consider increasing appliance memory to reduce cache full situations.
  • Always consider peak session usage situations and plan accordingly.
    Consider peak conditions and size for those conditions. For example consider load such as the time of year when employees are enrolled, or mornings &after lunch logins and similar situations.

Hard Disk Sizing

Overview

Access Gateway requires hard disk for software, system logs/log archives, and backups.
Disk use is comprised of:

  • Software, including operating system, and Access Gateway.
  • Backups, performed nightly, and retained for 30 days.
  • System log output, spooled to local disk, and including Audit, Access and All Log files.
  • Log archives, maintained for 30 days, rolled and compressed.

Software and backup size requirements are typically small making the primary consideration log sizes.

Log Entries

Log entries primarily contain session information, Authentication (AuthN) and Authorization (AuthZ) content. Typically one entry per HTTP(s) request.
In order to correctly size disk the number and size of these entries over a given time period must determined.
To determine the size of log entries we must consider the number of system users, and the count of time users access applications (AuthN entries) and the number of subsequent page views for that application (AuthZ events).

Overall the composition of a log entries are based on:

  • Session - Each log entry includes Access Gateway session information.
  • AuthN - Authentication audit and logging information.
  • AuthZ - Authorization log information, including resource accessed and policy rules.

From a disk use perspective the size of each of these is the other important question.

Examples

Let's look at an example.
If a given user base is 10,000 users, of which 75% access the system per day, and each user accesses, on average, 10 applications. then we can determine average daily accesses as:

  • Active Users = total users * access percentage, or 10,000 * 75% or 7500 active users.

Each user accessing 10 applications:

  • Accesses Per Day = Actice Users * applications accessed, or 7500 * 10 = 75,000 accesses per day.

If we assume that each access requires a session + authentication and authorization we can then determine an estimate of the size of each log entry as:

  • Log entry size = Access Gateway Session size + AuthN size + AuthZ size + plus some small formatting overhead.

For more realistic example, let's assume Access Gateway Sessions, AuthZ and AuthN sizes are roughly the same and ~1024B each. Then each access would require approximately 3K bytes.

Assuming access is roughly evenly distributed over the course of 24 hours, we see approximately 75,000/24 or 3125 accesses per hour.

If each access was 3K bytes then the hourly growth of a log becomes 3k*3125 or or 9.6MB/hour or approximately 230mb a day. Assuming a consistent access pattern we would expect to need ~7,000MB of disk/month for log and log related content..

A reasonable rule is to allocate twice expected consumption plus additional overhead space for software updates, configuration and backups. In the example given we would then consider as a disk requirement of approximately 14GB plus 10-20% or roughly 17-20GB/month.

Info

Note

Hard Disk considerations:

  • Monitor logger alerts on low disk - runs hourly. 70% warning, 90% alerts.
    To avoid low disk warning size for maximum or peak requests.
  • Every HTTP request results in audit and access logs.
  • Faster disk IO will improve throughput.
  • Session size will affect audit logging with authorization and audit logs contain session contents.
  • Don’t be conservative in Hard Disk sizing, allocate 2x estimated disk requirements to avoid burst and large page requests resulting in low disk warnings.

CPU Sizing

The Access Gateway engine autoscales across CPUs, which results in a worker per CPU. Each additional core results in an additional thread allowing for additional processing.

Info

Note

CPU considerations

  • More CPU/Cores will improve capacity.

  • Network throughput is typically the bottleneck.

Throughput Sizing

Throughput is a direct function of AuthN, AuthZ and return content.

  • AuthN’s = SAML assertions processed (From Okta to each application. )
  • AuthZ’s = Policy check per HTTP request (all HTTP requests)

Assuming (see logs for actual values):

AuthN Bytes AuthZ Bytes Returned Data
1024B 1024B 2048B

Network throughput becomes a function of:

  • Login/second * AuthN size
  • AuthZ requests/second
  • Average returned data size.

Total network throughput then becomes a function of:

  • Login/sec * (AuthN + AuthZ/sec + returned data size).

Assuming that the dominant factor in network access is the amount of data returned per request, and an average response is~20KB. For 500 requests the result becomes 20KB * 500 or 10 MB/s.

Simplifying:

  • Average network bandwidth = Average response size * Average request arrival rate.

 

Network Requirements
Requests/ second AuthN Size

AuthZ Size

Returned Data

Total

500

1024B

1024B

2048B

~20MB/S

Info

Note

Exact timing information can be found in the AuthN and All logs. Total time to perform a request and return data is also tracked.

Instance Sizing

Consider the following table when sizing instances.

Use Physical/virtual hardware AWS Equivalent

Proof Of Concept

1 instance of
2 cores at
2G memory, 220G(default) HD, each with single 1 Gbps NIC each

t2.medium

Small 2 instances of
2 cores at
4G memory, 220G HD, each with single 1 Gbps NIC each
t2.medium
Medium 3 instances of
2 cores, at
8G memory, 500G HD, each with single 1 Gbps NIC
m4.large
Large 3 instances of
4 cores, 16G memory, 500G HD each with single 1 Gbps NIC
m4.xlarge

See AWS Instance Types for more information.

Scaling

 

Scaling is the process of increasing or decreasing an Access Gateway cluster's size.

Clusters can be:

  • Scaled vertically - Adding or removing memory, disk or cpu from a given instance.
  • Scaled horizontally - Adding or removing Access Gateway instances from a cluster.

Okta recommends defining all Access Gateway high availability cluster members similarly with the same CPU, memory and disk configurations.

When examining cluster performance, consider the following:

  • For a given instance, the best performance increases can be made by adding CPU or using solid state disk.
  • To improve overall cluster throughput, Okta recommends horizontal scaling, or adding additional Access Gateway instances.
    For example: In a two node cluster, handling 1500 requests, capacity can be doubled by adding two addiction nodes with the same CPU, memory and disk configuration.
  • In general, horizontal scaling is linear due to Access Gateway use of sticky sessions (session affinity). Access Gateway does not share sessions between nodes.
Note

Note

Capacity may be limited by other factors not related to Access Gateway, such as network throughput or the back end application performance.
See About network interfaces for more information on Access Gateway networking and expanding networking throughput.