Reading the readme, I find myself wondering what problems this solves for. The e...

andyk · on June 4, 2024

Hey, project lead here. I had a very specific use case in mind: I’m playing with using LLM agent frameworks for software engineering - like MemGPT, swe-agent, Langchain and my own hobby project called headlong (https://github.com/andyk/headlong). Headlong is focused on making it easy for a human to edit the thought history of an agent via a webapp. The longer term goal of headlong is collecting large-ish human curated datasets that intermix actions/observations/inner-thoughts and then use those data to fine-tune models to see if we can improve their reasoning.

While working on headlong I tried out and implemented a variety of ‘tools’ (i.e., functions) like editFile(), findFile(), sendText(), checkTime(), searchWeb(), etc., which the agents call using LLM function calling.

A bunch of these ended up being functions that interacted with an underlying terminal. This is similar to how swe-agent works actually.

But I figured instead of writing a bunch of functions that sit between the LLM and the terminal, maybe let the LLM use a terminal more like a human does, i.e., by “typing” input into it and looking at snapshots of the current state of it. Needed a way to get those stateful text snapshots though.

I first tried using tmux and also looked to see if any existing libs provide the same functionality. Couldn’t find anything so teamed up with Marcin to design and make ht.

playing with the agent using the terminal directly has evolved into a hypothesis that I’ve been exploring: the terminal may be the “one tool to rule them all” - i.e., if an agent learns to use a terminal well it can do most of what humans do on our computers. Or maybe terminal + browser are the “two tools to rule them all”?

Not sure how useful ht will be for other use cases, but maybe!

MobiusHorizons · on June 4, 2024

This makes a lot of sense. I would call that out, because it's really surprising out of context. Hopefully you can see how unusual it would be to try to use human interfaces from code for which in at least the majority of cases, there are programatic interfaces for each task that already exist, and would be much less bug prone / finicky. I guess the analogy would be choosing to use Webdriver to interact with a service for which there is already an API.

andyk · on June 4, 2024

done! in the "Alternatives and related projects" section I just added to the ht readme -- https://github.com/andyk/ht/blob/main/README.md#alternatives...

loa_in_ · on June 5, 2024

I'm surprised you didn't have luck with tmux as there's built-in buffer to file command

mariocesar · on June 4, 2024

I had this issue of needing to control Docker containers in a VPS, without sharing access to the server itself. It seems like it will be easy to create a simple web service that can communicate with the ht API, list my containers, show me the stats, and restart containers if I want to. I can manage all security in the web service.

This could be a nice case.

shanemhansen · on June 4, 2024

That would work but wouldn't making http requests to the docker socket itself be a little easier?

Example:

    curl --silent --unix-socket /run/user/1000/docker.sock http://v1.41/version

From: https://dev.to/smac89/curl-to-docker-through-sockets-1mhe

You could even do something like a reverse proxy to very limited paths although I tend to think that would ultimately be a bad idea and making your own http calls is probably better.

mbreese · on June 4, 2024

You probably don’t want to expose that service to the internet…

I see this as something like the console management for VPS. Back in the day, I remember reading about how prgmr.com had setup a console that you’d directly SSH into. That’s now this interface [1] (and a company name change), but I could see how programmatically working with this would be helpful.

[1] https://tornadovps.com/documentation/vps-console

johnmaguire · on June 4, 2024

The comment you're replying to mounted the Docker daemon as a local socket, accessible only on the machine. (It exposes an HTTP server still.)

I don't see why one would be any more comfortable exposing a shell to the internet than the Docker daemon. It grants _more_ capabilities. Either should likely be protected by authentication.

mbreese · on June 4, 2024

My understanding was that there was a server running Docker containers where the admin wanted to allow others to control/start containers without giving them access to the machine through a local login. The idea proposed was to make the docker port accessible to the outside world (authenticated, somehow).

I'm not sure I'd want to expose the Docker port to the outside world (or outside of a strictly firewalled subnet). Even if it is wrapped, this seems to dangerous to me.

The service I talked about is not a shell. It's a command line program that operates as the shell when you login via SSH. Instead of bash/zsh/etc, this program runs instead. The purpose is to give the VM admin access for out-of-band management (serial console, reinstalling the OS, etc). I'm a big fan of this approach, where you don't necessarily get full access to the host, but you do have enough access to do the work that's needed (and still SSH encrypted). No more, no less. To me, this seems like a great approach for something like restricted VPS or container admin.

I ended up doing something similar for an SSH jump box a few jobs back where you could setup some basic admin things (like uploading SSH keys) using a CLI program that was used as an SSH shell.

To bring it back to the original post -- like others, I had a hard time seeing what the OP could be used for. Until I thought about this OOB CLI. It would be great for scripting access to something like this.

shanemhansen · on June 4, 2024

I think I was very unclear. I didn't mount anything, the docker daemon by default is accessible over http (over a unix domain socket rather than the typical TCP).

I was proposing that this persons web app not do any sort of subprocess automation via something like ht, and instead take in requests and talk to the docker daemon on clients behalf. Since that allows any sort of authentication or filtering that needs to happen.

I wasn't really seriously proposing the straight reverse proxy setup. That's one of those layer violations like PostREST that is either genius or lunacy. I haven't figured out which one.

woodrowbarlow · on June 4, 2024

to me this seems essentially like 'screen/tmux without the multiplexing features' which is useful because most of us do 'terminal multiplexing' via our window manager and we're really just using screen because we want to detach the process from the terminal session (e.g. as a glorified nohup wrapper). another similar tool is `nq`.

ComputerGuru · on June 5, 2024

That’s “most of us who aren’t ssh’d into a remote machine” because ssh once and tmux thereafter beats ssh’ing in ever window/tab.

andrewshadura · on June 4, 2024

Try dtach

hitchstory · on June 4, 2024

I wrote an integration testing framework which I wanted to integrate with a tool exactly like this so it could be used to, e.g. test a command line app like vim.

Expect is what I tried to integrate with first. It falls over quite quickly with any kind of app that does anything mildly complicated with the terminal.

andyk · on June 4, 2024

Interesting. When we decided to build ht we didn't compare it to expect (which I hadn't heard of or used) but I'm comparing the two now as they seem related.

How exactly did `expect` fall over?

From what I can tell, expect does not provide the functionality of a stateful terminal server/client under the hood for you so it isn't as easy to grab "text" screenshots of a Terminal User Interface, which is one of the main motivations behind ht (will update the readme to make this main use-case more clear)

hitchstory · on June 9, 2024

Screenshots was one thing that didnt work that I needed but I think lots of control characters used by command line apps also messed up pexpect.

I built my own probably not very good equivalent of your thing called icommandlib. I'm going to investigate ripping it out and replacing it with your tool.

colinsane · on June 4, 2024

here's a terrible script which runs as root on all my boxes (as `redirect-tty /dev/tty1 unl0kr`): https://git.uninsane.org/colin/nix-files/src/commit/9189f18c...

none of the Linux greeters meet all my needs, so i fall back to `login`. but i still need a graphical program for actually entering in my password -- particularly because some of my devices don't have a physical keyboard (i.e. my phone). so i take the output of a framebuffer-capable on-screen-keyboard [1] and pipe that into `login`. but try actually doing that. try `cat mypassword.txt | login MobiusHorizons`. it doesn't work: `login` does some things on its stdin which only work on vtty. so instead i run login on /dev/tty1, and pipe the password into /dev/tty1 for the auth.

yes, this solution is terrible. a lot of things would make it less terrible. i could fix one of the greeters to work the way i need it (tried that). i could patch `login` (where it probably won't ever be upstreamed). i could integrate the OSK into the same input system the ttys use... or i could reach for `ht`. everything except the last one is a day or more of work.

1: https://gitlab.com/postmarketOS/buffybox/-/tree/master/unl0k...

tstack · on June 4, 2024

As others have kinda alluded to, it could be useful for testing TUI applications. I develop a logfile viewer for the terminal (https://lnav.org) and have a similar application[1] for testing, but it's a bit flaky. It produces/checks snapshots like [2]. I think the problems I run into are more around different versions of ncurses producing slightly different outputs.

[1] - https://github.com/tstack/lnav/blob/master/test/scripty.cc [2] - https://github.com/tstack/lnav/blob/master/test/tui-captures...

wolrah · on June 4, 2024

The immediate thought I had upon reading the description was "this would be great for Minecraft servers".

Most of us running Minecraft servers on Linux have it wrapped in screen or tmux because the CLI is the only way to issue certain commands including stopping it properly.

This could provide an alternative.

Xelynega · on June 5, 2024

What I typically do is create a systemd service for game servers and attach a TTY. That way it starts with the rest of my web services, and linux already handles "i/o to processes" via files that other process can access(e.x. /run/minecraft/{stdin,stdout,stderr})

wolrah · on June 5, 2024

Would you mind expanding on that, or can you point me at some relevant documentation?

I've never seen a systemd service example for Minecraft which allowed for sending commands to the server CLI and seeing the result without involving screen/tmux/etc. The top result on Google just doesn't allow command input at all, running the service "headless", the one on the official MC wiki uses screen, and the only other options I've seen use RCON which is neither secure nor does it show the responses you'd get on the MC console.

If there's a way to run just the straight Minecraft JAR as a background service and still be able to interact with it in the occasional cases where I need to I'm very interested.

Xelynega · on June 6, 2024

Oh yea one more comment, this stdin redirection isn't really necessary in minecraft from the last decade.

The minecraft server has a built-in RCON server running on a separate port than can be enabled(https://wiki.vg/RCON), and once enabled can be interacted with an RCON client(like https://github.com/Tiiffi/mcrcon).

So instead of redirecting stdin to a systemd process, you can also just leave stdin disconnected and use the built-in RCON server to do commands every so often.

Xelynega · on June 6, 2024

Basically, you setup the standard minecraft service and then create a "socket" in systemd to use as stdin for the process(relevant documentation in systemd.socket and systemd.exec).

For me this looks like

-- /etc/systemd/system/minecraft.service --

  [Unit]                                    
  Description="Minecraft server service"
  
  [Service]
  Environment=JAVA_HOME="/usr/lib/jvm/java-22-openjdk"
  WorkingDirectory=/home/steam/minecraft/1.20.6/
  ExecStart=/usr/lib/jvm/java-22-openjdk/bin/java -Xmx4096M -Xms1024M -jar /home/steam/minecraft/1.20.6/server.jar nogui
  User=steam
  Group=steam

  Sockets=minecraft.socket
  StandardInput=socket
  StandardOutput=journal
  StandardError=journal

  [Install]
  WantedBy=multi-user.target

-- /etc/systemd/system/minecraft.socket --

  [Socket]                                  
  ListenFIFO=%t/minecraft.stdin         
  Service=minecraft.service

-------------

What this will do is add a systemd dependency on minecraft.service to start minecraft.socket first(which creates the fifo `/run/minecraft.stdin`) then setup minecraft.service to listen to this socket for it's StandardInput(while leaving stdout and stderr pointing towards the journal).

The service can then be started and set to automatically start on boot(`systemctl daemon-reload && systemctl enable --now minecraft`). While running, data can be written to the socket file with `echo` and redirection(e.x. `echo "help" > /run/minecraft.stdin`), and the output will be visible in the journal(`journalctl -xef --unit minecraft.service`)

If you set stderr/out to go over the socket as well, then you can attach something like `screen` to it and use it like a typical TTY(or `telnet`).

This uses the file `/run/minecraft.stdin` as the socket, but the documentation for systemd.socket shows that this can also be a TCP port to listen for connections(and systemd.service shows using regular files, but then you have to manually set them up).