The Shell vs the Web

Context

I’m working on the UI of Next Generation Shell. While on it, I am comparing AWS Web Console to the shell with AWS CLI. Particularly I’m looking at UX. The aim of this comparison and the steps that will follow is to design the most effective UI that I can think of.

The Obvious

Stating with the obvious to get it out of the way.

The Web Console

Clicks have no (customer-usable) record, hence zero repeatability. OK, almost zero, I don’t count going to CloudTrail and figuring out what happened. Example: navigating from failed CodePipeline build to the actual error somewhere in the log is repetitive but can’t be recorded/replayed.

IaC took over and “you are not supposed to” create resources through the Web. The reasons for that are all over the internet, so I’m not repeating.

Assuming IaC, what’s the typical use of the AWS Management Console? Debugging (i.e. navigating through resources), discovering, and small scale tinkering.

AWS CLI

AWS CLI exposes all the APIs, therefore it can be used to do anything. Since it’s CLI, it’s scriptable and therefore repeatable (with some investment).

UX of AWS CLI for interactive scenario is not a priority, apparently. You can reach this conclusion briefly looking at the tabular output.

A Deeper Look

Most SDK (API) calls have roughly zero semantics beyond basic CRUD. As a consequence, AWS Console:

  • To display a single page issues more than one API (SDK) call and then aggregates the data.
  • Has filtering defaults. Ex: Show only active CloudFormation stacks, not all.
  • Has fields defaults. Ex: EC2 instances have way more fields than can be comfortably displayed and processed by a human. The default is to display a predefined subset.
  • Has wizards for setups that would otherwise be cumbersome.

Sample AWS Console page (censored) showing multiple (in this case, 1 + N) API calls:

What’s common between all these features of the Console? Semantic understanding, formed as a layer above the API.

While NGS provides the facility to display clickable objects on the screen, the planned naive approach won’t work. The plan was semantic understanding of the output and generating next command on interaction with an object (click, open menu, etc) displayed on the screen. That won’t work because the next command is actually not a single command. The updated vision is more semantic involvement, on par with the Web UI. It’s harder to implement but it also means that the potential productivity gain is even higher than originally expected.

Conclusion

Quickly checking CodePipeline failure, for example, is currently way more convenient in the Web UI because of the semantics, which the Web UI is aware of and AWS SDK and the shell are not. Side note: Web UI sucks for other reasons, maybe that’s something for another post.

In order for the shell to compete with the Web UI, it must be at least as useful and as powerful. This kind of power can only come from semantic understanding of the manipulated objects. Pub/sub architecture (being considered) would allow composable UI, where different pieces of information and functionality come from different plugins.

The role of the shell is to provide the infrastructure where all this semantic processing can happen (“understanding” and displaying), like the browser that shows pages authored by 3rd parties.

The hope is to provide a UI which is as usable as the Web while also having the strengths of the shell.

The Plan

  • bring the quality of (the good parts of) Web experience to NGS
  • combine it with repeatability of the shell
  • embrace semantics as the way to productivity

Questions

Is your shell all about AWS?

No. It’s just the first “tenant” as that’s what I use a lot at work. When working on NGS, I typically prioritize features needed at work.

Objections

By being involved in the semantics, you are limiting what can be done in your shell.

In a well designed system, you get boost when working with the objects that the system knows about but it doesn’t preclude you working with anything else… it will be just hard, like today. Sample good layered design – AWS CDK with L1, L2, and L3 constructs.

The shell is not supposed to do that.

Hope the argument about command line completion satisfies here as an answer.

You can do all this in the shell, just different. Write scripts.

The idea is to bring the quality of shell interaction up to par with other technologies that have been out there for decades, not to give up on the shell and compromise on subpar “interactive” experience.

The question is not whether something can or can not be done but how productive that would be. That’s why developers use IDEs and not Notepad and prefer higher level programming languages. It’s all about semantics again: IDE “knows” the programming language and is therefore more helpful; higher level programming language has “more semantics” per amount of code.

Response to u/m-faith

It’s so frustrating having to do:

  1. Run command (and get useless blob of text printed to the terminal)
  2. Utilize/expend X amount brain power reading and interpreting the blob of text
  3. Run subsequent command, having to retype some of the characters from the previous command’s output in order to provide an argument to this command

Yes, this UX of current shells is one of the main reasons behind developing NGS.

Note that copying parts of output of previous command to construct next command is not implemented in the shell. It’s implemented in the terminal. The shell doesn’t look at the output.

It’s amazing to me how many users and authors of other shells are content with this fundamentally broken UX. The ability to do better existed since 1970s, where terminal’s capability of moving a cursor appeared. Bill Joy responded shortly with vi at the time. The shell never embraced the idea beyond some very limited use in command line editing and completion.

…instead of:

  1. Run command… and get DATA!!!
  2. Run subsequent now more easily!

Yes, please! Note how far current shells are from this: to get data, you need to first to even look at the output. None of the shells are doing it. They just dump stdout and stderr to your terminal, unlimited amount, potentially mixed with output of background processes, essentially making a dumpster out of your terminal.

The plan for NGS is:

  1. Look at the output data
  2. Allow plugins to parse and provide semantic meaning (until collaborating utilities provide semantically meaningful output).
  3. Allow plugins to help construct a menu where the user would pick next command. All proposed commands are based on the understanding of the output of the previous command.
  4. After selection by the user, present the choice on the screen. Ex: “you have chosen to shut down EC2 instance i-123456”. (not a repeatable command because it’s too specific).
  5. Have framework for automating/scripting that would allow modifying/generalizing the above to (examples):
    • You have chosen to shut down most recently launched EC2 instance.
    • You have chosen to shut down EC2 instance with tag “Name” being “test01”.
    • You have chosen to shut down EC2 instance that was launched by user “developer-that-forgot-test-instance-running”.

Have a nice day!

Leave a comment