The Journey to retry_assert()

I was annoyed quite a bit with bash (no, not because I was new, I used it professionally daily for 12 years before starting NGS). One of the red herrings was the constant need to reimplement small stupid things like retry() function for example (or copy it all the time across projects and adjust as needed).

It is obvious to me that retry() should be in the standard library, among other similarly frequently needed functions like log(), debug(), error(), exit(message), etc.

retry() background

In NGS, retry() has several named parameters. The most important of them is body, which is the code to repeatedly run until it succeeds. The code in body reports success/failure by returning a value. Truthy value means success and that retry should finish. Falsy value means failure and that retry should attempt again (unless it reached maximum attempts).

On success, retry() returns the value from the body. In some cases this value is not needed, while in others it’s very handy and eliminates additional code.

On failure, retry() throws an exception.

assert() background

In NGS, the straightforward way to assert a condition is the assert() function, either as assert(data, “message”) or assert(data, pattern, “message”). It either returns the original data or throws an AssertFail exception if data was falsy in the first case or if data did not match the pattern in the second case.

Take 1: retry() handles exceptions

Originally, retry() was catching exceptions thrown in body. The rationale was simple: you are trying something that might fail. It’s the reason you are using retry() to begin with.

Then reality has shown, like on many other occasions, that catching exceptions indiscriminately is a bad idea. Some exceptions are unrelated to the retry logic, and should not be retried, similar to some HTTP codes.

retry(
  ...
  body = {
    1 / 0  # good luck retrying
  }
)

Take 2: retry() doesn’t handle any exceptions

The swing in the other direction was also “reasonable”: orthogonality is also not a new idea. retry() handles the retry logic. Exceptions and errors are handled by the body itself or by something outside retry(). But now we are breaking this code:

retry(
  times = 60
  sleep = 1
  title = 'Waiting for session to be resumed'
  ...
  body = {
    get_session(_session.id).assert({'status': 'RUNNING'}, 'Session did not resume')
  }
)

assert() throws exception and nobody catches it so the program terminates. To fix the above, the code becomes:

retry(
  times = 60
  sleep = 1
  title = 'Waiting for session to be resumed'
  ...
  body = {
    get_session(_session.id) =~ {'status': 'RUNNING'}
  }
  fail_cb = {
    # all attempts failed
    throw Error('Session did not resume')
  }
)

… and that’s just too verbose for what it does. Also, the error message is now is not adjacent to the pattern.

Take 3: retry() uses Result type

The idea was that body should return as subtype of Result: either Success or Failure instance. There were two issues though.

  • This change breaks existing code. Supporting arbitrary data returned by existing code and Result won’t lead to anything elegant.
  • It makes body inelegant and verbose: Result({ assert(...) })

Given the two arguments above, I’ve cut this branch of thought and didn’t implement this approach.

Take 4: retry() with retry_assert()

The problem with using assert() in body (or any code that it calls) is semantic ambiguity:

  • “normal” assertion, where a failure signifies unexpected, likely permanent condition that will not change after a retrying.
  • assertion about temporary condition that we are waiting to change on one of the next retries.

The solution was to introduce retry_assert(), which is almost identical to assert(). The difference is that retry_assert() throws RetryAssertFail while assert() throws AssertFail.

The code becomes:

retry(
  times = 60
  sleep = 1
  title = 'Waiting for session to be resumed'
  ...
  body = {
    get_session(_session.id).retry_assert({'status': 'RUNNING'}, 'Session did not resume')
  }
)

… which is concise again and the error message is near the pattern against which the data is tested. retry_assert() throws an exception that retry() catches.

Let’s take a brief look at the excerpt from retry() implementation in the standard library.

try {
  result = body()
} catch(ra:RetryAssertFail) {
  guard i < times - 1
  # Ignoring the assert as long as it's not last iteration
} catch(e:Exception) {
  logger("$title Exception $e")
  throw e
}

If “guard” succeeds, we do nothing. The loop around this code will continue to the next iteration.

If “guard” fails, meaning we are now at the last attempt, the exception is handled by the second “catch” clause.

I’m fond the fact that the implementation is simple and straightforward.

Future

retry_assert() while appears to be very useful and ergonomic is still new in NGS. Practice may show that some adjustments might be needed.

The plan is to give more thought to the retry()able vs not retry()able exceptions in general. In particular:

  • Can any code called from body throw RetryAssertFail? Should it?
  • Marking other exceptions as retry()able

Please share your thoughts. It’s easy to miss some other interesting perspectives. Have better naming for retry_assert()? Have behavior improvement suggestion? Know about similar functionality in other languages? I’ll be glad to hear.

For the curious why I “was” annoyed with bash and not “am”? Well, I still am but it hurts much less now since most of my coding that was previously bash is now NGS.

Leave a comment