Part XIV - Error Handling in OpenText Content Server

OpenText Content Server OScript does not support exception handling. I used to believe this was a limitation, but after learning more I no longer believe this to be the case.

Exception handling is one way to handle errors and has its critics (opens new window). One criticism is that exceptions that are not immediately caught can create unpredictable paths in your code. This can lead to problems such as putting your program into an inconsistent state or data corruption.

Another common approach to error handling is error checking. Error checking maintains the flow of code by having functions return a special value when an error occurs. This is the approach taken by OScript.

Content Server OScript provides two approaches for error checking. One approach is baked into the language and the other is a convention. Let's look at each.

# The Error Package

The Error class (or "package") makes it possible to return an error from a function regardless of the function signature. It's often used in lower level API calls such as CAPI.IniGet() or DAPI.GetNodeByID(). For example, the File.Open() function (used to open a file on the filesystem) has the following return value (from the documentation):

A File representing the open file if successful; Error otherwise.

The "Error" referred to here is of the Error class, and can be checked for by using the IsError() or IsNotError() function. For example:

File f = File.Open("c:/temp/myfile.txt", File.ReadMode)

if IsNotError(f)
	// we have successfully opened the file
	// oops, an error occurred

An Error object can also be defined and returned from a custom OScript function (using the Error.Define() function), but this is rarely used.

# The OScript Return Value Error Checking Convention

Content Server has a convention of wrapping most function return values in an Assoc datatype with the following keys:

  • ok - a boolean indicating if the function call was successful;
  • errMsg - a string containing a verbose error message if the function call was unsuccessful (this often gets echoed back to the user);
  • apiError - an Error object usually from a failed lower level API call (if applicable); and
  • anything else pertinent to the function, if successful.

I tend to call an Assoc with this structure a return value Assoc. If ok is false then I call it an error Assoc.

With this convention you'll find the following pattern throughout Content Server:

Assoc results = somefunction()

if results.ok
	// Great! The call to somefunction() was successful.  Keep going.
	// Oops, something went wrong.

It's the responsibility of the calling function to handle the error. This usually means ceasing operations and returning the error Assoc to the calling function. This then gets passed up the call stack until it's finally handled.

What I like about this pattern is its simplicity and consistency. By adopting it a developer can return the error from most function calls and know it'll be understood and handled up the call stack. The pattern also makes it difficult for a developer to overlook or ignore error handling.

A limitation with this type of error handling is errMsg (which often gets echoed back to the user) only tells you what went wrong, but provides little context to where the error occurred or why. This is particularly annoying when the error is only reproducible on a production system. To collect more information we need a more aggressive way to catch errors, which I'll get to later. Let's first discuss database transactions.

# Database Transactions

A Content Server request often performs multiple database queries to update the database. For example, adding a new document adds table records to DTreeCore, DVersData, DAudit, ProviderData, etc. What happens if an error occurs part way through updating the tables? We certainly don't want the integrity of the system to be compromised by having only some tables updated. For this we have database transactions.

A database transaction allows a developer to wrap a group of database calls into a single transaction. A transaction can then be committed at once (i.e., all database queries made in the transaction are committed) or rolled back in the event of an error. This prevents a group of queries from being partially applied if an error occurs part way through the request.

Database transactions are started and ended with the StartTrans() and EndTrans() functions. A typical usage pattern is as follows:

if prgCtx.fDbConnect.StartTrans()
	results = ... //  do a bunch of db inserts and updates

	results = ... // error, transaction could not be started

The StartTrans() call returns true if the database transaction could be started. From that point on any database transaction (regardless of where in the call stack) is part of the transaction. Only after EndTrans() is called are the cumulative queries either committed (by passing in true) or rolled back (by passing in false). In the example I passed in results.ok, which ties in with the previous section on error handling.

There are a few things to consider when using a database transaction:

  • Transactions can be nested, but only the outer most transaction determines if all queries are committed or rolled back (this is why it's important for errors to be passed back up the call stack).
  • Every successful StartTrans() call must be balanced with a call to EndTrans(). Failure to do so will leave the database transaction open and cause unexpected behaviour in the current and subsequent request. More on this later.
  • Most override points (i.e., where a developer adds code to Content Server such as in a request handler or WebNodeAction) will execute without an open transaction. It's the responsibility of the developer to start the transaction, do appropriate error handling, and close the transaction as required.

# Fatal errors, crashes, stack traces, server did not respond

The return value Assoc pattern works well for errors that have been anticipated and checked for. The error gets caught, is passed up the call stack, and is handled.

Fatal errors occur when something unexpected happens from which the system can't recover. Examples include:

  • attempting to divide a number by zero;
  • passing an incorrect number of parameters into a function;
  • attempting to assign a value to a variable of a different type (e.g., String text = 5); etc.

Fatal errors will:

  • immediately halt further processing (i.e., crash the thread);
  • generate a trace file on the server containing debug information;
  • display a jarring "Server did not respond" (aka "SDNR") error to the user (sort of like a 500 error); and
  • fail to close any open database transactions (because the thread is immediately halted and EndTrans() isn't called).

Fatal errors are always indication of a bug. Although a "Server did not respond" error is unsettling for the user, it's useful because it captures debug information in the trace file containing:

  • the full stack trace showing the execution path to the code that failed;
  • the reason the error occurred; and
  • the local variable state at every level in the call stack.

This information is usually enough for a developer to analyse, debug, and fix the issue.

The unsettling part of a fatal error is that it can leave a database transaction open. Database connections (and hence database transactions) are persisted per thread, and are not automatically cleaned up when a crash occurs. This means if a request crashes with an open transaction then the following request on that thread will begin with the transaction still open.

This corrupted state would persist if it weren't for the ResetTransactionsIfNecessary() function, which gets called at the end of most requests. This function acts as a cleanup by rolling back any database transaction that may have been left open by some buggy code (i.e., each StartTrans() wasn't balanced with a call to EndTrans()).

Placing the ResetTransactionsIfNecessary() call at the end of the request prevents it from being called when a crash occurs. Only at the end of the following request on that thread does it get called. This could be a completely unrelated request by a different user, who will have the database transaction on their request unexpectedly rolled back. This could lead to odd behaviour in the request and data loss.

It's unsettling to have the error from one request bleed into the next, but explains why a subsequent request after a crash may sometimes behave strangely. A possible solution for OpenText might be to move or copy the ResetTransactionsIfNecessary() call to the start of each request to guarantee it begins with no leftover open transactions.

# Crash early, use assertions

Something I learned from the Pragmatic Programmer (opens new window) is to crash early. The idea is:

A dead program normally does a lot less damage than a crippled one.

This may seem counter intuitive, but the idea is to immediately crash program execution if an error is detected that could potentially cause more damage if the program were allowed to continue. Many programming languages provide an "assert" function for this, which forces a crash when a condition is false. The interface is usually something like this:

assert(condition, errorMessage)

The function does nothing if condition is true, but crashes the program if condition is false. The Pragmatic Programmer (opens new window) states this is useful for situations where you might think to yourself: "...but of course that could never happen."

When you think something could never happen then why not back it up with an assert()? If for whatever reason it does happen you'll immediately be made aware of it. An assert() call isn't a replacement for proper error handling; instead, assertions are there to catch conditions that should never happen.

OScript doesn't have an assert() function, and so I added one to RHCore. The interface is simply:

function Void assert(Boolean condition, String errMsg)

The function does a little more than just crash the thread. If condition is false it:

  • closes and rolls back any open database transaction;
  • logs errMsg to the debug window;
  • crashes the request; and
  • generates a trace file.

For example, consider a function that accepts an Assoc or Record datatype:

function Void MyCustomFunction(Dynamic assocOrRecord)
	// ...

Since the argument type is Dynamic it's technically possible to call the function with a value of another type (e.g., MyCustomFunction({5})). But how do we know this won't compromise the integrity of our data or cause even bigger problems? To play it safe we can add an assertion to the function:

function Void MyCustomFunction(Dynamic assocOrRecord)
	.assert(Type(assocOrRecord) in {Assoc.AssocType, RecArray.RecordType}, 'Not an Assoc or Record.')
	// ...

Assertions are useful during development to test assumptions and find bugs in your code. However, they are also useful in production environments to catch errors that may have slipped through testing or are only reproducible in that environment.

Crashing a thread may seem like an aggressive thing to do. I've been told more than once: "A thread should never crash!" Correct, it shouldn't. But if a fatal error is going to happen I would prefer to do it in a controlled way that provides me with a trace file and doesn't leave an open database transaction.

# Form Validation Errors

An unfortunate limitation in Content Server is that form validation errors are treated like any other error. The validation error occurs, the error is passed up the call stack, the transaction is rolled back, and an error like the following is presented to the user:

The user must use the back button to recover, which is not obvious and generally discouraged. With some luck the form will return to its previous state where the user can fix the error and submit the form again.

Form validation errors should be handled differently than other Content Server errors. Users should be given the chance to correct their input without having to use the browser back button. Some progress has been made in this area (e.g., the login page), but the majority of forms still don't support friendly validation.

For more information on form validation see my blog post: Part IV: OpenText Content Server Forms.

# Wrapping Up

It's important not to overlook the significance of error handling. Errors will always happen, but with a some care they can be controlled to minimise their impact.

Questions or comments? Please leave a comment below.