2.11 Commerce Release Guide - Performance Improvements Phase I

Overview

In the 2.11 release, Aspenware began a multi-phase project to analyze performance across Aspenware Commerce, including Azure application services, DBs, and functions, the commerce application itself, Unity, and the round trips to the fulfillment systems (RTP|ONE and Siriusware). The purpose of this initiative, which is continuing throughout Fall 2020, was to improve overall performance and user experience while maintaining product stability and minimizing risk.

In this first phase of the project, we focused on analysis and implementing low-risk enhancements. In the second phase of the project, which we will release in October 2020, we will tackle some larger targets, including a major enhancement to inventory management.

We’ll be publishing additions to this document in future releases as we continue this initiative. More guidance for golf customers is also forthcoming.

Aspenware will offer professional services to help resorts select and implement appropriate service levels.

Findings and Recommendations

We began this initiative by analyzing baseline performance using Application Insights and App Optics in multiple customer environments. We then created a dedicated development environment specifically for load and performance testing, and conducted baseline tests in that environment.

We created load tests for various shopping experiences, including complex tickets with dynamic pricing and inventory, season passes, and very simple shopping flows using products with no inventory and no dynamic pricing. As we ran each test, we analyzed the results in each scenario tested. As a result of this analysis, we identified poorly performing areas of the solution and ranked them by frequency of calls, worst performing calls, risk to implement, and level of effort. We then started work on optimizing the ‘worst offenders' first (see Improvements Implemented In This Release).

As a part of the process, we first ran baseline tests, then ran the same tests after implementing improvements. We used a site running at Service Level 3, scaled out to 3 application services, with database level 4, simulating 600 virtual users.

To illustrate some results of the improvements, here’s a table showing percentage improvement in call time for a few of the ‘worst offenders’:

Call	% improvement

Call	% improvement
GET Checkout/GetGuestInfoInitRequest	97.73%
POST UnityShoppingCart/ProductDetails_AttributeChange	69.84%
POST UnityCustomer/UnityLoginV2	83.42%

Azure App Service and Azure SQL Database service levels and scaling suggestions

The two primary architectural components of Aspenware Commerce are an Azure Application Service and an Azure SQL Database. Both need to be considered, among other things, when configuring Azure to support increased shopper volume.

Azure App Service service and SQL Database considerations

Recommendations for Azure service levels and scaling are not an exact science. Google Analytics or other web analytics tools are great for reporting on traffic data, such as user visits statistics and average session duration. However, it’s difficult to tell the exact number of concurrent virtual users (VU) on a site in GA using historical data. It’s easy to see historical visits per hour in GA, but the site may actually only have a few concurrent visitors at one time. There’s a real-time dashboard in GA to view concurrent users at that moment, but no history on it. The following formula can be used from GA data to estimate concurrent users: Concurrent users = Hourly Sessions x Avg. Session Duration (in minutes)/60
1. For example, if GA shows you had 1,000 hourly users during a peak and the average session duration was 7 minutes, that’s an average of 116 concurrent users. It may peak at higher levels than that number, but that is a good estimate. 10,000 users in one hour for the same session duration is an average of 1,166 concurrent users.
2. These kinds of peaks in shopper traffic are often short-lived and last minutes or hours, not days or weeks.
The minimum service levels for Aspenware Commerce are:
1. App Service: P2v2
  1. During our load testing analysis, we found Pxv2 is more efficient than S2 or S3 under load.
  2. Also, some resorts may currently be configured on Sx service levels. Aspenware will work with resorts to adjust this to Pxv2 levels with the 2.11 release.
2. Azure SQL Database: S2
Currently, sites are scaled manually by Azure administrators to meet expected heavy load days/hours during anticipated high volume periods. We’re analyzing auto-scaling options now and will publish additional information on auto-scaling in the next release.
1. Azure App Services can be either “Scaled Up/Down” or “Scaled Out”.
  1. Scaling Up/Down: Scaling Up adds more resources to the application service hosting the commerce site. Performed manually by an Azure administrator, this needs to be scheduled with the admin in advance of the required event and requires a brief outage for the site.
    1. Scaling Up/Down is typically used less frequently that Scaling Out.
  2. Scaling Out: Scaling Out adds additional app service instances. This can be done manually, can be scheduled based on business rules, or driven automatically based on demand. It does not require an outage.
    1. Scaling Out is typically the best option for reacting to increased shopper traffic.
    2. Scaling out application services has a near-linear impact on the number of users that the site can support, although site responsiveness may noticeably degrade at higher levels.
    3. Azure allows application services to be scaled out to a maximum of 30 app service instances.
    4. Scaling out multiplies the hosting costs by the number of instances.
      1. For example, if you are using a P2v2 app service level that costs $292/month, scaling to 3 instances for an entire month would cost $876/month.
        However, Azure bills for service levels hourly. So, if you had a major sale launching at 10 am on a Tuesday, you could schedule scaling out to start at 8 am and to stop at 12 pm that day. In this scenario, scaling out to 3 instances for those 4 hours would charge for 4 out of approximately 720 hours in a month, which equates to a charge of less than $10.00.
    5. In the coming weeks, Aspenware will publish guidelines on rules-based auto-scaling to be able to scale up automatically and scale back down again after the demand dies down.
      1. Note that auto-scaling has a “reaction time” of 5-10 minutes and is not recommended when a planned, sudden load is expected. Instead, scheduling or manually scaling out is recommended in those scenarios.
      2. Note that scaling out with the PayEezy payment gateway requires Aspenware Commerce version 2.11 or higher.
Azure SQL Databases also play a large role in performance. Database service levels can be manually adjusted as detailed in the table below. Changing DB Service levels requires a brief outage, typically just a few seconds.
1. Aspenware will also publish more detailed guidelines on scaling the Azure SQL database in the coming weeks.

The table below outlines conservative guidelines for suggested application service levels and database levels based on the estimated number of users on the store for the period of time being evaluated.

Application Service Level	Number of scaled application services	Database level	Conservative estimated ranges of Users

Application Service Level	Number of scaled application services	Database level	Conservative estimated ranges of Users
P2v2	1	S3	1-100
P2v2	2	S3	101-300
P2v2	3	S4	301-400
P2v2	6	S7	401-800
P2v2	10	S7	800-1500
			1500+, implement Queue-It

Unity minimum server requirements and recommended log levels

Unity application server requirements
1. Minimum server requirements
  1. Windows Server 2016 or 2019
  2. Processor - 2.60GHz 64-bit
  3. Installed memory (RAM) – 16 GB
  4. Disk space – 158 GB
Unity scaling recommendations
1. During our load testing, we experimented with multiple Unity application services, but the performance wasn’t significantly improved by the number of instances.
2. Instead, we found significant performance improvements could be obtained by setting log level verbosity to “warning” or above. While debug messages are sometimes helpful for troubleshooting, the performance benefits of setting log levels to warning outweighed them. Furthermore, since we are systematically updating every client to run Application Insights as a part of the performance enhancements project, we no longer need local Unity log files to have debug levels for logging.
3. Steps to update Unity log levels:

Update nlog.config and appsettings.json to set all log levels to Warning level.

nlog.config – change all of the <rules> <logger> items to have a minLevel=”Warn”

e.g., <logger name="*" minlevel="Warn" writeTo="allfile" />

appsettings.json – change all of the Logging sections to use a default log level of “Warning”

"Logging": {

"IncludeScopes": false,

"Debug": {

      "LogLevel": {

        "Default": "Warning"

      }

    },

"Console": {

     "LogLevel": {

       "Default": "Warning"

     }

   },

 "EventLog": {

     "LogLevel": {

      "Default": "Warning"

      }

 },

    "LogLevel": {

      "Default": "Warning"

    }

     }

Improvements Implemented in this release

DeleteGuests and KeepAlive schedule tasks - These schedule tasks, especially DeleteGuests, were negatively affecting performance every time they ran. For some sites, DeleteGuests was pegging the server every hour. Furthermore, the DeleteGuests schedule task wasn’t functioning properly and didn’t always delete the right records. These schedule tasks have been disabled and have been moved to Azure functions.
GetProductVariants - this call, used extensively in the dynamic pricing calendar, was called thousands of times in a typical 24 hour period. This call has been made asynchronous, which has improved performance in the calendar by x (from an average of 54.3 seconds to an average of xx seconds).
GetImage - for customer images, the application was always requesting images from RTP, even if they did not exist. We’ve modified this call to include a flag for customer images so the application now only requests an image URL if one exists for a customer. This has improved performance everywhere customer images are displayed, in My Account and during checkout.
Remove extra database language calls - the application was previously calling for localized customer languages more often than was needed to return the customer language. This call is now cached for better performance.
GetUtcTime - Similar to the language calls, the application was requesting UTCTime more often than needed.
Cache default LOB - The default line of business is now cached for better performance.
Cache metadata tags - Metadata tags, used for product classifications, are now cached.
Reduce calls for last activity date for customer record - previously, the application requested a last activity date on the customer record more often than it should.
Optional CDN - An optional content delivery network (CDN) is now available for static CSS and Javascript files. We are working with beta resorts on a scheduled basis now to collect performance data on the CDN benefits, and will publish findings in the next major release.