Part XX - Caching in OpenText Content Server

Christopher Meyer

Mon Jan 30 2017

# Introduction

Caching is an effective way to boost the performance of an OpenText Content Server module. Caching works by persisting the return value of an operation (such as an expensive function or SQL call), and reusing the value later without having to execute the operation again.

There are a few ways to implement caching in Content Server, but this post will focus on Memcached.

# Using Memcached in OpenText Content Server

Memcached is an open source caching system that was added to Content Server in v10.0. The Memcached website (opens new window) sums up what it does:

Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

Sounds great! Content Server provides an OScript API to read and write data to Memcached. Once a value is written to the cache it becomes available to future requests and is accessible from all threads in a cluster.

It's important to remember that Memcached isn't a persistent data store; its purpose is to temporarily store transient values to boost performance. A general rule is never to assume Memcached contains a cached value. Memcached purges cached values using a least recently used policy when its memory threshold is reached. For this reason a developer should always first check if a value exists in the cache before trying to use it.

The API for communicating with Memcached is via $LLIAPI.Memcached, and has the following functions (other functions are available, but these are the important ones):

SetValue() - write a value to the cache;
GetValue() - get a value from the cache; and
DeleteKey() - remove a value from the cache.

Each function returns an Assoc with the status of the call (using the standard ok & errMsg keys; see Part XIX on error handling for more information).

Let's briefly discuss each.

SetValue()

The SetValue() function writes data to Memcached and has the following interface:

function Assoc SetValue(Object prgCtx, String namespace, Dynamic key, Dynamic value, Integer timeout=0)

A few things to note:

Care must be taken to choose a namespace/key pair that uniquely maps to the cached value. This is essential to prevent conflicts with other modules that use the cache. Behind the scenes the API concatenates the namespace and key together, which means the pairs hello/world & hellow/orld are effectively the same. It's a small bug, but I have yet to see it cause problems.
The value being cached may only consist of types that are serialisable to a string (e.g., String, Integer, List, Record, RecArray, Assoc, etc.). Non-serialisable types (e.g., Object, Frame, DAPINODE, etc.) cannot be used with the cache.
The Undefined type cannot be cached. It may seem like a strange thing to do, but is useful when Undefined is a legitimate return value of an operation (a workaround exists in RHCore).
An optional timeout parameter sets how long data should live in the cache before expiring. It's not required, but is useful when no other method to invalidate a cached value exists. More on this later.

GetValue()

The GetValue() function returns a cached value and has the following interface:

function Assoc GetValue(Object prgCtx, String namespace, Dynamic key)

The return Assoc contains a boolean found key, which indicates whether the namespace and key exists in the cache. If found, the value key will contain the cached value.

DeleteKey()

The DeleteKey() function removes a value from the cache and can be used to expire a value that is no longer valid. It has the following interface:

function Assoc DeleteKey(Object prgCtx, String namespace, Dynamic key)

# What can be cached?

Any return value from an operation or function in Content Server can be cached as long as it meets the following criteria:

the operation doesn't mutate the state of the system (i.e., it's read-only),
the value being cached isn't too large (Memcached has a default limit of 1 MB per cached item, which can be configured),
the value being cached consists of serialisable data types, and
there is a policy to invalidate cached values when they are no longer valid.

Cache invalidation is probably the most difficult part of caching and warrants its own discussion.

# Cache Invalidation

There are only two hard things in Computer Science: cache invalidation and naming things. - Phil Karlton (opens new window)

Cache invalidation has to do with managing the conditions to keep a cached value sufficiently fresh. But doing this isn't always obvious: How do you know if the return value of an operation has changed without running the operation again? Wouldn't running the operation again defeat the purpose of caching?

As far as I know there are a three strategies to cache invalidation (if you know of others please tell me), and which approach is applicable or best is directly related to the makeup of the operation being cached. They are:

a key can be constructed that uniquely maps to the value (also known as key-based expiration),
events to invalidate the cache are known, and callbacks can be implemented to delete the value when these events occur, or
it is satisfactory to expire the cache after a timeout, and stale data during this time isn't a concern.

Let's discuss each.

# Key-based expiration

In some cases a key can be constructed that uniquely maps to the return value of the operation being cached. A simple example is a pure function, which is a function that:

always has the same return value for the same inputs, and,
doesn't mutate the state of the system.

For example, consider a simple sum() function (ignoring that you'd never need to cache a function like this):

function Integer sum(Integer a, Integer b)
	return a + b
end

1
2
3

This is a pure function since the same inputs for a and b will always return the same value. Knowing this we can construct a unique namespace and key from the parameters and lazy load it as follows:

function Integer sumCache(Integer a, Integer b)

	String namespace = "sumCache" // something unique for this function
	List key = {a,b} // a key based on the input parameters

	Integer sumValue

	Assoc results = $LLIAPI.MemcachedUtil.GetValue(.fPrgCtx, namespace, key)

	// check if a cached value is found
	if results.found
		// we have a cached value
		sumValue = results.Value
	else
		// no cached value, so compute
		sumValue = .sum(a, b) // call the original function

		// cache it for the next time sumCache is called with a & b
		$LLIAPI.MemcachedUtil.SetValue(.fPrgCtx, namespace, key, sumValue)
	end

	return sumValue

end

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Pure functions don't require cache invalidation handling since the same inputs will always have the same output (e.g., a + b doesn't change for the same values of a and b).

But in reality, most operations are not pure functions and require more care. For example, consider a function to return a category value from a node:

function Assoc GetCategoryValue(Object prgCtx, \
		DAPINODE node, \
		String categoryName, \
		String attributeName)

1
2
3
4

The implementation details are not important, but assume it's a costly operation. At first glance it might seem like something we can cache using the same pattern as before with the following namespace and key:

String namespace = "GetCategoryValue"
List key = {node.pID, categoryName, attributeName}

1
2

Of course, this will not work since the function will have a different return value once the attribute value has changed. But despite this difference, can we still construct a key that uniquely maps to the return value of the function?

In many Content Server instances the modified date of a node is updated whenever an attribute value is changed (this is configurable on the admin.index page, but ignore it for the moment). We can use this information to construct a key that also contains the modified date:

List key = {node.pID, node.pModifyDate, categoryName, attributeName}

This ensures a unique key whenever a category value is changed (since the modified date gets updated), and forces the next GetCategoryValue() call to fetch and cache the updated value.

This approach works well when it's possible, but unfortunately that's not always the case. In many situations there is no equivalent of a "modified date" or anything else to indicate a value has changed. For these cases we need another strategy.

# Manually invalidate a cached value

Cached values can be manually invalidated with the $LLIAPI.MemcachedUtil.Delete() function. This can be called from a callback (or elsewhere) to respond to events that might alter the cached value.

Consider our previous example and say Content Server is configured to not update the modified date on a category update. This would no longer make the key unique each time a category value was changed. So let's fix this by first simplifying the key without the modified date (since it's no longer relevant):

List key = {node.pID, categoryName, attributeName}

We can then implement the $LLIAPI.NodeCallbacks.CBCategoriesUpdatePre() callback (which is executed when a category is updated) to manually delete the old value from the cache when a category update occurs. This will force the next GetCategoryValue() call to fetch and cache the updated value.

# Expiring a cached value after a timeout

There are sometimes too many dependencies or unknown factors to efficiently invalidate a cached value. As a last resort you can use the timeout parameter in the SetValue() call to expire the value after a given number of seconds. The compromise is accepting how often the expensive operation should be allowed to execute versus how long a stale value is acceptable. It's not the best choice, but is sometimes the easiest.

# Caching strategies in RHCore

RHCore provides some useful extensions to assist in caching. On RHNode (see Part I for information on RHNode) is a cacheKey() method, which returns a unique string that can be used to construct a cache key for the node. It has the same return value until a delta operation is performed on the node (e.g., Records Management data is updated, a category value is changed, is renamed, a user is added to the ACLs, etc.). After any such event it returns a new unique value until the next delta operation.

We can use this method with our previous example as follows (which doesn't require any callback to be implemented):

String namespace = "GetCategoryValue"
List key = {node.cachekey(), categoryName, attributeName}

1
2

A similar method exists in RHModel (see Part II for an introduction to RHModel), and has a few additional options for more advanced and complex caching scenarios.

# HTML fragment caching

One of my favourite uses of caching is HTML fragment caching. HTML fragment caching permits blocks of HTML rendering code to be cached such that subsequent calls can be quickly rendered again. I don't believe Weblingo or WebReports support this, but it's very easy to do with RHTemplate (see Part III for information on template rendering.

For example, say we had a table to display some Records Management information:

<table>
  <tbody>
    {% for node in nodes %}
    <tr>
      <td>{{ node.name|escape }}</td>
      <td>{{ node.recman.classifyInfo.Status|escape }}</td>
      <td>{{ node.recman.PhysicalObjectInfo.UniqueID|escape }}</td>
    </tr>
    {% endfor %}
  </tbody>
</table>

1
2
3
4
5
6
7
8
9
10
11

This is a heavy operation since fetching the status and uniqueid requires multiple database hits and is executed on each iteration. Imagine if this were rendering thousands of rows.

A quick and easy way to improve performance is to add caching. This can be done directly in the template by surrounding the block with the {% cache %} template tag. The tag accepts zero or more keys, which should be chosen in a way that uniquely maps to the content of the block being rendered. For example:

<table>
  <tbody>
    {% for node in nodes %} {% cache node.cachekey %}
    <tr>
      <td>{{ node.name|escape }}</td>
      <td>{{ node.recman.classifyInfo.Status|escape }}</td>
      <td>{{ node.recman.PhysicalObjectInfo.UniqueID|escape }}</td>
    </tr>
    {% endcache %} {% endfor %}
  </tbody>
</table>

1
2
3
4
5
6
7
8
9
10
11

This simple addition makes a huge improvement to the rendering time, and is a technique I regularly use in my development.

# Wrapping up

Caching can give a massive boost to the performance of a Content Server module. I'm finding new ways of using it and am delighted with how much of an improvement it makes. There is almost no reason not to use it.