Spriteful TErminal GrAphics Protocol
STEGAP is my (incomplete!) proposal for a terminal graphics protocol facilitating bitmapped sprites combined with terminal glyphs and styling. Just being able to blit a bitmap into a terminal is of little use for libraries like Notcurses. Unfortunately, that's about all that Sixel gives you. Useful background reading might include my Theory and Practice of Sprixels.
Previous work includes:
- DEC's Sixel protocol, the most widely-implemented (and poorly-specified) of the bunch
- Kovid Goyal's Kitty graphics protocol
- Autumn Lamonte's Jexer image sequence
- ITerm2's Image Protocol
nota bene: this was a learning exercise. as the result ended up looking a good deal like Kovid's protocol (see above), i encourage terminal developers to implement that protocol, as it is a major advance over Sixel. if you want to extend it with things discussed below, hit me up.
Goals as a toolkit developer
Ideally, I want to be able to:
- be able to discover support for the protocol at runtime
- provide a given bitmap in as few bytes as possible
- associate a bitmap with an identifier. this bitmap might not be wholly opaque—transparent pixels are of critical importance, translucency less so. i ought be able to reload the bitmap (keeping the size constant), and have it redrawn without flicker.
- i ought be able to have an identifier generated for me if i so desire
- draw text atop the bitmap without a background color, so that the graphic is not obscured except where the glyph is defined
- move the bitmap in a flicker-free way elsewhere in the visible area
- update glyphs (partially-)obscured by a bitmap without disturbing the bitmap
- destroy the bitmap with a single escape, ideally yielding whatever had been obscured by said bitmap
- drive multiple visible bitmaps without grotesque drops in performance
I do not require the ability to:
- stack text atop text within a cell, with or without intermediate graphics.
- this is equivalent to saying "all text is at the same z-index", so long as we can place graphics at z indices below and above the text layer
- make use of accelerators unavailable over a network
- feed arbitrary image containers to the terminal
A solution must not:
- require me to render text myself, nor read font tables etc.
- require me to write some image container format if i have raw pixels
I am happy to provide extra information when it can substantially simplify the terminal's job.
- An eight-bit, UTF-8 environment. If we want to support seven-bit environments, we'll need to further encode our BGRA.
- ESC (0x1b, 27) starts a new control sequence, terminating any ongoing one. This is necessary to conform to widespread existing behavior, but it is unfortunate, as it means we can't blithely write arbitrary bytes.
- We mustn't tread on any defined XTerm control sequences.
The control sequence
FIXME do we piggyback on an existing control sequence? perhaps a new P2 using sixel?
Want to indicate:
- Command (load, reload, delete, move)
- Whether our graphic is entirely opaque (ala Sixel's P2=0); this can facilitate powerful optimizations in some terminals
- (maybe) Geometry, x and y in pixels
- Whether scrolling is in play
- Origin offset, x and y in pixels
- Data format
Allocates a graphic identifier, and loads it with data. The graphic will be drawn subject to placement rules, unless it would be entirely off-screen, in which case it is not drawn (but still loaded, and possibly brought on-screen with a move). A graphic which is entirely off-screen is never entered into the scrollback region, and needn't (but may) be preserved when entering or leaving the alternate screen. If the identifier is already in use, no action is taken. If the identifier is -1, the terminal ought return an unused identifier. FIXME what if all possible identifiers are in use?
Parameters: identifier, opacity, geometry, scrolling, origin, data format, bulk data
Reloads the specified graphic with new data. Any part of the graphic which is visible must be updated; this update should be performed without flicker. The new data must be the same geometry as the old data, but it needn't be the same format. The entirety of the data must be provided. FIXME we'll want some kind of animation-friendly partial reload. If the specified graphic does not exist, no action is taken. If the provided data is a container, and the container yields geometry different from that associated with the existing graphic, no action is taken.
Parameters: identifier, data format, bulk data
Erases the specified graphic, making visible any content which it had obscured. If the identifier does not exist, no action is taken. If the identifier is -1, all graphics are deleted, and their identifiers recycled.
A move erases the graphic from its current location (see Deletes, above), and draws it at the current cursor position (as modified by the origin offset, as described in Placement, below), subject to the current direction of text (not the direction when originally drawn). If the identifier does not exist, no action is performed.
Parameters: identifier, origin offset, scrolling.
Support for 6-bit greyscale pixels with a bit of alpha channel, transmitted as bias-32 bytes in row-major order, is mandatory. If 64 levels of greyscale are not available, at a minimum, the 6-bit range must be partitioned into a non-empty range mapping to black, and a non-empty range mapping to white. Each byte contains a value between 32 (0x20) and 96 (0x60), inclusive. This value is equal to 32 + the greyscale value for values less than 96, or transparent if equal to 96. This leaves 31 legal 7-bit-clean values on either side (remember, ESC is prohibited).
- Alternatively, we could use bias-31 bytes. This leaves 30 legal 7BC values on the low end, and 32 on the high end. We lose the property that all defined values are printable if we do this, along with some pleasing (but in the end, meaningless) bit-pattern properties.
- Alternatively, we could provide a 7-bit greyscale. We lose the property of being seven-bit-clean if we do so.
Using a unit smaller than a byte would require defining a padding mechanism at row's end, and complicates calculations for necessary length.
...is not supported. Use PNG with a PLTE chunk, or WebP.
To RGB or not to RGB?
I like 24-bit RGB; if you don't need the alpha channel, you're saving 25% of your bytes. Introducing a single bit of alpha is nasty, though (pretty much requires a secondary channel, or you lose everything you gained).
- If the user indicates an all-opaque graphic via the control sequence (ala P2=0 in Sixel), RGB probably ought be available (and BGRA perhaps not available). Otherwise, BGRA ought be available (and RGB perhaps not available).
Support for 32-bit 8bpc BGRA pixels, in row-major order, is mandatory. Our BGRA is byte-oriented, not word-oriented, and thus there ought be no question of endianness: the 8 bits of the Blue channel must be the first byte of each pixel. Assuming a 32-bit BGRA pixel to be natively composed via ((A << 24) | (R << 16) | (G << 8) | B), a little-endian machine is storing BGRA, and a big-endian machine is storing ARGB; the big-endian machine would thus need a swizzle to properly output this wire format (alternately, it could work on individual 8-bit units stored as BGRA). This definition corresponds to PIX_FMT_RGB32 as emitted by FFmpeg. We cannot simply use the native ordering, because the terminal emulator and application might exist in different endianness domains.
If the terminal emulator cannot fully implement RGBA for some reason (perhaps it has only 256 colors at its disposal), it ought itself sensibly quantize the graphic. Likewise, in the absence of composed translucency, the Alpha channel can be partitioned into a wholly transparent range and a wholly opaque range. It is mandatory that a pixel with an Alpha value of 0 not obscure an existing pixel.
Were it not for the need to avoid embedded ESC characters, communicating the graphic's geometry would be sufficient to avoid any need for further encoding. Unfortunately, we've assumed that an ESC (decimal 27) will abort the graphic. Most protocols use base64 encoding to work around this, resulting in a flat 33% increase in bytes transmitted. STEGAP instead defines a byte of 25 to be an internal escape. The byte following a 25 (0x19) byte can have only two values: 0 (to indicate 25), or 1 (to indicate 27). This is a 100% increase in bytes transmitted for 2 of 256 possible byte values. The best case is thus 0% overhead (no 25 nor 27 values), the expected overhead for a uniform distribution of bytes is 0.78%, there is less overhead until at least 33% of all values are 25 or 27, and the (pathological) worst case suffers 100% overhead.
For a 1000x1000 pixel bitmap, assuming uniform random distribution of values, this is the difference between transmitting 5.07MB and 3.84MB.
- If this worst possible case is unacceptable, the semantics of the byte following a 25 could be expanded such that a new escape character was specified, but this seems kinda silly (and wouldn't help if the pixels are being generated on the fly, anyway, or if the offending values were the unmappable 27).
Beyond this inferior worst case, downsides include that base64 is very well known, whereas this is a strange and inelegant encoding, that this encoding is not seven-bit-clean, and that it's difficult to know how many bytes this will require.
- It might be desirable to define deeper samples than 8bpc (10- and 16-bit, presumably).
We do not define any compression schemes. Image compression researchers have done well in this area.
At a minimum, we must allow WebP. Ought we require a MIME type identifying the type of the bitmap? If so, need the emulator make its supported formats discoverable?
Beyond the visible area
It is only possible to have pixels logically "above" or "to the left of" the visible area if the origin of the bitmap has a logically negative component. This is only possible in the presence of cursor-relative positioning; see below.
Pixels beyond the horizontal bounds of the visible area must not be displayed. In particular, pixels to the right of the visible area must not be placed on lower rows. The terminal may make such pixels visible if the visible area is made wider, but this is not required. Such pixels must not result in scrolling.
Pixels above the top of the visible area must be ignored.
Pixels beyond the bottom of the visible area depend on whether scrolling is in play.
Graphic placement depends on the current text direction:
- LTR: graphics are placed with their topmost, leftmost pixel at the current location of the cursor, starting in the top left pixel of the cell.
- RTL: graphics are placed with their topmost, rightmost pixel at the current location of the cursor, starting in the top right pixel of the cell.
- If text is being drawn top-to-bottom, you can fuck off for now (and probably already are).
Two arguments allow for pixel offsets to be applied to this implicit origin; the offsets may be negative. Note that this might lead to a negative horizontal and/or vertical coordinate for the graphic's origin; see above to deal with this situation.
After the graphic is drawn, the cursor is updated, once again depending on text direction:
- LTR: cursor is placed in the lowermost, rightmost cell in which pixels were defined
- RTL: cursor is placed in the lowermost, leftmost cell in which pixels were defined
In both cases, the extreme cell where pixels were defined is used, not necessarily where pixels were drawn. I.e. even if a cell was wholly transparent pixels, if the graphic was there, it counts.
Pixels drawn in the lowermost, rightmost cell of the visible area must not, in and of themselves, result in scrolling.
Pixels logically below the visible area must be ignored if scrolling is disabled. If scrolling is enabled (see below), such pixels must cause scrolling equivalent to the ceiling of (rows beyond the bottom / cell height), both measured in pixels. For instance, if graphics scrolling is enabled, and cells are 10 pixels tall, and the cursor is on the bottom row when the graphic is emitted, and the graphic is 32 pixels tall, the terminal ought scroll up three rows (10 can be displayed, 32 - 10 = 22, ⌈22 / 10⌉ = 3).
If scrolling is disabled, the terminal must not make such pixels visible if the visible area is made taller. Rationale: this could result in graphics being drawn over prompts, to the confusion of the user.
A change to the cell-pixel geometry (the geometry, in pixels, of a terminal cell) must result in a SIGWINCH, just as a change to the overall visible geometry does. The TIOCGWINSZ ioctl(2) must accurately fill in the ws_xpixel and ws_ypixel fields.
FIXME ought graphics be rescaled on cell-pixel geometry change?
Display of a graphic must not affect the current text direction. Display of a graphic must not affect the current text palette, nor the color of any displayed text.
FIXME need know control sequence--are we piggybacking atop sixel? FIXME image formats? by MIME type? FIXME extensions? FIXME operating parameters?
Coexistence with text and other graphics protocols
Clears and resets
When the screen is cleared, all graphics even partially intersected by the visible area must be destroyed, and their identifiers recycled. Graphics wholly within the scrollback region should not be affected.
When the terminal is reset, all graphics even partially intersected by the visible area must be destroyed, and their identifiers recycled. Behavior in the scrollback region should match the effects of said reset on text in the scrollback region.
If a line of text is available for scrollback, any graphics present on that line should be stored in common, and faithfully reproduced during a scrollback, unless they are deleted or moved. The identifiers of any such graphics remain reserved until the graphics are erased, or no longer relevant to the scrollback buffer, unless the terminal doesn't support graphics in the scrollback buffer, in which case their identifiers must be recycled. If such an offscreen graphic is reloaded, the reload should be faithfully reproduced.
If scrollback is preserved to disk, graphics in the preserved region should be likewise preserved. If graphics are preserved, their identifiers should be preserved, and made available to new contexts. The means of referencing them from within the preserved content is entirely up to the terminal.
A sufficiently negative placement offset can move or even construct a graphic logically in the scrollback region. Such a construction should not be faithfully reproduced. Instead, such a graphic should be considered "wholly off-screen", and subject to the rules described above in Loads.
- There is no native expression of palette-indexed color, nor HSV (Sixel)
- There is no filesystem-based data exchange (Kitty, iTerm)
- There is no terminal-side scaling (Kitty)