Predicates, Upgraded

This post describes the thought process and implementation of moving from predicates to patterns while keeping this feature consistent with the rest of the language and compatible with existing code.

Minimalistic example:

filter(instances, F(i) i.state == "running") # old
filter(instances, {"state": "running"})      # new

Background

While implementing the language, I’ve added higher order functions that seemed “obvious” to have in a language:

  • first(list, predicate) – first element passing the predicate test
  • filter(list, predicate) – only elements passing the predicate test
  • reject(list, predicate) – only elements failing the predicate test
  • the_one(list, predicate) – added a bit later, returns the only element that is satisfies the predicate, throws exception if there isn’t exactly one element passing the predicate test, it was later pointed out to me that I was not the first that noticed the use case.

Observation

What I’ve noticed after adding the higher order functions that take predicates is that in many cases, the predicate functions are similar in structure. Example:

filter(instances, F(i) i.state == "running")

The predicate checks the element against a pattern. That’s how I’ve arrived at what one can call the “classical” pattern matching, known for decades.

Design

For the curious, there is earlier design document which corresponds somewhat to the implementation. It has “Prior Art” section which you might find interesting.

Implementation

This should be the interesting part. How to organically absorb a known technique, pattern matching, into a new programming language.

Multiple Dispatch

NGS uses multiple dispatch all over. Therefore filter(list, predicate) becomes filter(list, pattern) as opposed to for example adding a new function filter_pattern(list, pattern).

Higher Order Functions

Switch from Predicate to Pattern – Take 1

In my first attempt to upgrade from predicates to patterns, I’ve modified the implementation of higher order functions as follows:

predicate = Pred(pattern) # added

# ... the check stays the same
predicate(element)

That seemed logical at the time. You get a pattern, convert it to predicate using Pred() function and continue the business as usual. This felt aligned with the functional programming style. I wish I would have documented why exactly it was wrong. I vaguely remember that keeping the pattern was somehow better due to semantic: it’s a pattern, keep it as a pattern. I also do remember that the code was looking bad. I’ll update this post if the specifics come back to me.

Switch from Predicate to Pattern – Take 2

The (cleaner) implementation inside the higher order functions was changing from checking whether an element satisfies the predicate to checking whether the element matches against the given pattern:

# Old check
F higher_order(list, predicate) {
  ...
  predicate(element)
  ...
}

# New check
F higher_order(list, pattern) {
  ...
  element =~ pattern
  ...
}

“Filling The Matrix”

What is filling the matrix? In context of multiple dispatch, for lack of proper known terminology, “filling the matrix” is a term that a coined. It means that if for a particular method, if a combination of arguments of particular types makes sense – this method should be implemented for this types. The programmer should not be surprised by “hmm… I am using this method for other types, why doesn’t it work for these types?”

In order to fill the the matrix for the filter() method, we should ask what filter(list, predicate) should do then?

filter(list, predicate), specifically where the predicate is a function, should continue working exactly as before because (1) it’s the “obvious” behavior and (2) it’s compatible with existing code. This leads to a design decision that the predicate is a valid subset of pattern, which again plays nicely with multiple dispatch. The implementation is straightforward:

# Implementation, simplified for brevity
F =~(x, f:Fun) f(x).Bool()

# Usage:
some_val =˜ pred # Same as pred(some_val).Bool()

Consistency

Patterns are recursive, as in many (all?) other implementations. A pattern works (with rare exceptions) exactly the same whether it’s at the top level or it’s nested.

The Patterns

Sample patterns examples follow:

# A type
filter([1, 2, "abc"], Int)  # [1,2]

# Escaping - "Lit"eral
filter([1, 2, "abc"], Lit(Int))  # []
filter([1, 2, Int], Lit(Int))  # [<Type Int>]

# Literal
filter([1, 2, "abc"], 2)  # [2]

# Hash (with nested literal)
filter([{"a": 1}, {"b": 2}], {"a": 1})  # [{"a": 1}]

# Hash (with nested type)
filter([{"a": 1}, {"b": 2}], {"a": Int})  # [{"a": 1}]

# Regex
filter(["ab", "cd"], /^a/)  # ["ab"]

# Repeat
[1,2,3] =~ Repeat(Int)  # <MatchSuccess ...>

Result

As expected from any similar pattern matching facility, the result is a powerful and concise way to check whether given data matches a particular pattern.

Example from JSON-RPC implementation in NGS:

VALID_JSON_RPC_REQUEST = {
	'jsonrpc': '2.0'
	'method': Str
	'id': IfExists(AnyOf(Num, Str, Null))
	'params': IfExists(AnyOf(Arr, Hash))
}
...
if (req !~ VALID_JSON_RPC_REQUEST) send_error(...)

Notes:

VALID_JSON_RPC_REQUEST, the variable storing the pattern, is a hash. Therefore the expected data structure in req is a hash (JS object).

IfExists pattern means that the associated key does not have to exists but if it does, the value associated with the key must match the provided pattern.

Now imagine the above example using the “classical” predicates only.

Difference From Other Languages

In NGS, as of now, the pattern is a data structure, not a syntactic construct. As you’ve seen in the example above, the pattern is just an expression which can be stored in a variable and passed to a function. In Python for example, the pattern is a syntactic construct, which precludes storing and passing (usefulness of which could be argued but at least it keeps the language simple).

Future Work

  • Big missing chunk of functionality is capturing the values. This might require a new syntax, specific to patterns or more general (so it could be useful for other features too)
  • Fix Repeat() – it should match sub-sequences, not the whole list. Edit: repeat has proven useful as it is. Matching sub-sequences is likely to be implemented if Pfx(), Ifx(), and Sfx().
  • Something like Check(func, pattern). It would run func on current subject at check the output of func against the pattern. For example, one could then filter by length: my_arr_of_strings.filter(Check(len, 3)).

Conclusion

NGS has implemented underappreciated and underused concept of pattern matching, tailored to the language. The benefit is clarity and brevity of code. NGS will be extending the pattern matching facility.


Have a nice day!

Leave a comment