The Case for Semantics in the Shell

Should the shell have semantic understanding of external programs that it runs? By “semantic” I mean here specifically “understanding” of inputs and outputs to the level which allows something smarter than bytes and text manipulation.

My claim is that semantic understanding by the shell is necessary in order to be productive. Let’s delve into the reasons behind this perspective.

The shell has few places where semantic understanding of external programs is possible (did I miss any?).

  1. Command line arguments. Semantic understanding here is already done, popular, and valued. Welcome to command line completion. Blows the “shell is not supposed to do that” argument out of the water.
  2. Input files and the standard input. I don’t have anything to say about this one at the moment because it was never a practical concern for me personally.
  3. Exit codes. Implemented in Next Generation Shell and shows good results. Opens the door for proper error handling as opposed to the atrocity we have today. More below.
  4. Output of programs – stdout. Very partially is implemented in NGS, boosting ergonomics. When done fully, at the very least, opens the door for better completion and real interactivity. More below.
  5. Output of programs – stderr. Could and probably should to determine the error and throw appropriate exception. I didn’t get to this yet so can’t comment. It sounds good though to have finer grained exceptions. It sounds more semantically correct.
  6. Output of programs – files. I didn’t get to this one either yet so can’t comment.

Viewpoint

The shell is a programming language. Was it supposed to be or not or how it became one are irrelevant questions. De facto, it is used as a programming language today. Therefore, I’m comparing it to other programming languages. Side note: aside for few domain specific features, the shell doesn’t look good in comparison.

The shell is also a UI and therefore I compare it to other UIs. Side note: the upside of CLI is repeatability, simplicity, and textual representation while every other aspect is underwhelming in comparison.

Yes, I’m comparing the shell to modern programming languages and to modern UIs. That’s because it’s not a historical exercise to explain why the things are as they are nor justify anything. It’s a look and thought in which direction we should move to be more productive and less frustrated.

I do have a lot of accumulated frustration with the programming language and the UI of the shell. In fact, that’s the reason I’ve started working on Next Generation Shell in 2013. The frustration with classical shells is still there but I use NGS for most of the scripting that I need now. For my needs, it sucks way less than bash.

I’m also a bit tired of “use Python instead” and similar, where the ergonomics of working with external programs sucks.

Exit Codes

What do we Have Today?

Error handling in the shell is aligned with how C is handling errors. The reason that practically no modern programming language does that anymore is because this approach is fucked up. Here is why:

  1. Both in C and in the shell, semantic understanding of what’s an error and what’s not is not standardized across the language but rather scattered everywhere in your code. You can’t handle your errors in a generic way, you need to know how the success of each function that you call looks like.
  2. Both in C and in the shell you can ignore errors. This will lead to execution of the rest of the program in an unexpected manner. You don’t want that. You are supposed to handle every error. This one leads to inability to have concise scripts. In practice, for many of my scripts, I prefer them to fail with an exception and exit with non zero exit code than to be verbose. With current shells that’s not an option.
  3. -e is an attempt to fix the situation. While I do use it, I need to periodically refresh about all the edge cases. People will be exploding on the -e bombs.

Note that some of the modern shells and libraries continue this “agnostic” approach by pretty much ignoring exit codes. Even if they do convert between exit codes and exceptions, it’s the simplistic zero/non-zero exit code approach by default.

What Could we Have?

If the shell would know where errors occur, proper error handling would be possible, aligned with modern programming languages. Throw an exception or handle it somehow else in a unified manner across the language. Semantic error codes handling is already implemented in Next Generation Shell.

Comparison

Please don’t beat yourself about how many bombs you have in your shell scripts. With bash and alikes, it’s inevitable.

# bash, no semantic understanding of exit codes

if grep -q str file;then
  # found
else
  # bomb - we are here because:
  # * str was not found or
  # * grep was unable to open the file
  # * grep had syntax error or
  # * grep had some other error
  # * we had typo in "grep" and it didn't run
  # * ???
fi


# NGS, semantic understanding of exit codes

if $(grep -q str file) {
  # found
} else {
  # not found
}
# no bomb, if anything goes wrong - it's an exception

Please let me know if you want me to elaborate about this comparison or something is not clear.

Output of Programs – stdout

What do we have today?

Classical shells, along with all modern shells that I’m aware of, completely ignore outputs of the programs they run.

Let’s ignore for a moment the consequence in classical shells which is unbound mess of stdout and stderr of several programs mixed together in your terminal.

Let’s talk about “interactive” in the “interactive shell”. Most of the screen is treated like text printed on a paper – no interaction possible. The interaction happens mostly on one line, “the command line”.

What Could we Have?

Better completion. For example, when I’m trying to aws ec2 stop-instances –instance-ids _ , what’s more likely completion: arbitrary EC2 instance or one that is listed in output of the previous aws ec2 describe-instances command (which used built in filters or pipeline filters after it)?

Interactive UI. Imagine this amazing technology which allows to interact (click, right click, menu, etc) with objects on the screen, you know, like web browsers had for tens of years now. That requires semantic understanding of what’s on the screen of course, including which output can be used as input to which other programs.

Partial Implementation Today

.Reservations[].Instances – when you are tired of repeating this, you might want to try NGS, which understands among other commands the aws ec2 describe-instances command and returns array of instances as the top level (only when you use the special syntax for run-and-parse). You are welcome.

Conclusion

Semantic understanding in the shell, while hard to implement, is a necessity if we would like to be more productive with the shell.

The “shell is not supposed to do that” argument is invalid. As elsewhere, semantics matters. The more of semantic “understanding” a tools has, the more powerful that tool can be. Overwhelming majority of programmers, don’t use Notepad (dogmatically, because “text editing must be pure”) but use IDEs. That’s because IDEs “understand” the programming languages, allowing higher level manipulation such as refactoring, which can’t be properly done with sed for example.


Update following Reddit discussion. 2023-05-31.

PowerShell

Interesting ideas, some of which seem to have appeared in PowerShell

u/gumnos

I am not an expert in PowerShell. Here is my best understanding and comparison of what’s in PowerShell to my vision of the shell UI.

PowerShell got Right

  • Flow of typed objects through the pipeline
  • Separating the filtering and formatting (aligned with the idea of typed objects). In Unix, compared to that, it is frequent to see both filtering and formatting of the output spread across many utilities.

Missing in PowerShell

Not taking the typed objects (which are already there!) to the logical conclusion – interaction with them. PowerShell, like any other shell I heard of, treats the output of the cmdlets as something not to be looked into. (I assume if it had something like the interaction I’m talking about we would see it all over the internet / YouTube, hence I don’t think I missed it.)

I’m not aware of any attempts to wrap around existing programs in cmdlets and make typed data out of them. Note that you can for example parse CSV but the output of it is PSCustomObject which is structured data, not typed data in a sense of having the semantics of the object. I mean that if you would like to implement some interaction with that object, you wouldn’t know what it is.

Error Handling / grep

But the grep example is a poor choice, since POSIX grep does define those exception cases, so that rewrite would look something like

if grep -q '…' file.txt
then
  # success
else
  if [ #? -eq 1 ]
  then
    # not found
  else
    # error/exception whether a bad regex, file/input read-errors, etc
  fi
fi
u/gumnos

( assuming typo in #?, should have been $? )

My main point was not about error codes documented or not but rather that the shell doesn’t do anything with them.

Yes, you potentially could handle the errors in your code as above.

In practice I have seen the error handling as shown above exactly zero times. It’s too verbose and I estimate many will not even consider this (unaware of the issues). The shell “pushes” you to write incorrect code. Possible does not equal practical here. In NGS, the correct handling of errors is shorter and is the straightforward way. NGS “pushes” you to write correct code.

Specifically the example above shifts the problem of inability of if to have 3 branches (yes, no, error) to another if. if you have syntax error in the second if (ex: if [ $? --eq 1 ]) you end up in the second else, which is again a problem. The way to avoid this (in bash, not sure about other shells) is to use set -e and if [[ ... ]] (not in POSIX) to avoid getting into the else branch over syntax error. Note that -e has its own corner cases which some people are not aware of and others are not particularly fond of, so overall it’s not a stellar feature either.

Semantics Support in Programs

That said, it’s not just the shell that would need to grow these semantic features, but also every shell utility ever developed that wants to play in this ecosystem.

u/gumnos

Until that happens, and if we are realistic it’s “if that happens”, the strategy is exactly like command line completion. We don’t expect the upstream to do it. Either the shell or 3rd party can provide a wrapper/handler for a program. That handler will provide with two (at the moment) main semantic pieces of information:

  1. The meaning of exit code (currently: was that exception or not)
  2. The meaning of output (parse the output, convert to typed data)

Implementation-wise I was thinking about a repository with declarative information about exit codes and how to parse the output. I did no PoC on that and not sure how practical that would be. It might be that only code will cut it. If the repository does work though, each shell can have it’s own implementation or even better an external program that does extract the semantics from exit codes and the output.

libxo and Friends

And over on FreeBSD at least, a number of utilities have grown libxo support, allowing rich machine-parsable output (XML, JSON, & HTML in addition to text) to convey metadata downstream in the pipeline.

u/gumnos

I welcome the tools like libxo and jc. On the quest to have semantic UI, they solve half the problem. I’m really happy when somebody eliminates for me the need to parse text. print $8 in awk, which is a symptom of complete absence of understanding is solved with these tools. We are in a much better shape and are able to perform operations on the structured data which comes out. That’s great!

Why parsing text is half of the problem then? PSCustomObject. That’s where these tools get us. We understand which fields we have but not what the object nor its fields mean. You can’t have right click and a menu with object-specific operations if you don’t know what the object is.

Example: the string i-123456 is just a string or an instance id in EC2? The parses won’t tell you this.

The semantic wrappers/helpers are still needed but much of the dirty and tedious work can be outsourced to these tools. For that I’m very grateful.


Leave a comment