Check out my first novel, midnight's simulacra!

Spriteful TErminal GrAphics Protocol: Difference between revisions

From dankwiki
Line 25: Line 25:
* We mustn't tread on any defined [https://invisible-island.net/xterm/ctlseqs/ctlseqs.html XTerm control sequences].
* We mustn't tread on any defined [https://invisible-island.net/xterm/ctlseqs/ctlseqs.html XTerm control sequences].


==Bulk data format==
==Data formats==


At a minimum, we should accept 32-bit 8bpc BGRA pixels, in row-major order. Our BGRA is byte-oriented, not word-oriented, and thus there ought be no question of endianness: the 8 bits of the Blue channel must be the first byte of each pixel. Assuming a 32-bit BGRA pixel to be natively composed via <tt>((A << 24) | (R << 16) | (G << 8) | B)</tt>, a little-endian machine is storing BGRA, and a big-endian machine is storing ARGB; the big-endian machine would thus need a swizzle to properly output this wire format (alternately, it could work on individual 8-bit units stored as BGRA). This definition corresponds to <tt>PIX_FMT_RGB32</tt> as emitted by FFmpeg. We cannot simply use the native ordering, because the terminal emulator and application might exist in different endianness domains.
===Unencoded greyscale===
Support for 6-bit greyscale pixels with a bit of alpha channel, transmitted as bias-32 bytes in row-major order, is <b>mandatory</b>. If 64 levels of greyscale are not available, at a minimum, the 6-bit range must be partitioned into a non-empty range mapping to black, and a non-empty range mapping to white. Each byte contains a value between 32 (0x20) and 96 (0x60), inclusive. This value is equal to 32 + the greyscale value for values less than 96, or transparent if equal to 96. This leaves 31 legal 7-bit-clean values on either side (remember, ESC is prohibited).
 
Alternatively, we could use bias-31 bytes. This leaves 30 legal 7BC values on the low end, and 32 on the high end. We lose the property that all defined values are printable if we do this, along with some pleasing (but in the end, meaningless) bit-pattern properties.
 
Alternatively, we could provide a 7-bit greyscale. We lose the property of being seven-bit-clean if we do so.
 
Using a unit smaller than a byte would require defining a padding mechanism at row's end, and complicates calculations for necessary length.
 
===Palette-indexed===
...is not supported. Use PNG with a PLTE chunk, or WebP.
 
===To RGB or not to RGB?===
I like 24-bit RGB; if you don't need the alpha channel, you're saving 25% of your bytes. Introducing a single bit of alpha is nasty, though (pretty much requires a secondary channel, or you lose everything you gained).
 
===Minimally-encoded BGRA===
Support for 32-bit 8bpc BGRA pixels, in row-major order, is <b>mandatory</b>. Our BGRA is byte-oriented, not word-oriented, and thus there ought be no question of endianness: the 8 bits of the Blue channel must be the first byte of each pixel. Assuming a 32-bit BGRA pixel to be natively composed via <tt>((A << 24) | (R << 16) | (G << 8) | B)</tt>, a little-endian machine is storing BGRA, and a big-endian machine is storing ARGB; the big-endian machine would thus need a swizzle to properly output this wire format (alternately, it could work on individual 8-bit units stored as BGRA). This definition corresponds to <tt>PIX_FMT_RGB32</tt> as emitted by FFmpeg. We cannot simply use the native ordering, because the terminal emulator and application might exist in different endianness domains.


If the terminal emulator cannot fully implement RGBA for some reason (perhaps it has only 256 colors at its disposal), it ought itself sensibly quantize the graphic. Likewise, in the absence of composed translucency, the Alpha channel can be partitioned into a wholly transparent range and a wholly opaque range. <b>It is mandatory that a pixel with an Alpha value of 0 not obscure an existing pixel</b>.
If the terminal emulator cannot fully implement RGBA for some reason (perhaps it has only 256 colors at its disposal), it ought itself sensibly quantize the graphic. Likewise, in the absence of composed translucency, the Alpha channel can be partitioned into a wholly transparent range and a wholly opaque range. <b>It is mandatory that a pixel with an Alpha value of 0 not obscure an existing pixel</b>.
Line 33: Line 49:
Were it not for the need to avoid embedded ESC characters, communicating the graphic's geometry would be sufficient to avoid any need for further encoding. Unfortunately, we've assumed that an ESC (decimal 27) will abort the graphic. Most protocols use base64 encoding to work around this, resulting in a flat 33% increase in bytes transmitted. STEGAP instead defines a byte of 25 to be an internal escape. The byte following a 25 (0x19) byte can have only two values: 0 (to indicate 25), or 1 (to indicate 27). This is a 100% increase in bytes transmitted for 2 of 256 possible byte values. The best case is thus 0% overhead (no 25 nor 27 values), the expected overhead for a uniform distribution of bytes is 0.78%, there is less overhead until at least 33% of all values are 25 or 27, and the (pathological) worst case suffers 100% overhead.
Were it not for the need to avoid embedded ESC characters, communicating the graphic's geometry would be sufficient to avoid any need for further encoding. Unfortunately, we've assumed that an ESC (decimal 27) will abort the graphic. Most protocols use base64 encoding to work around this, resulting in a flat 33% increase in bytes transmitted. STEGAP instead defines a byte of 25 to be an internal escape. The byte following a 25 (0x19) byte can have only two values: 0 (to indicate 25), or 1 (to indicate 27). This is a 100% increase in bytes transmitted for 2 of 256 possible byte values. The best case is thus 0% overhead (no 25 nor 27 values), the expected overhead for a uniform distribution of bytes is 0.78%, there is less overhead until at least 33% of all values are 25 or 27, and the (pathological) worst case suffers 100% overhead.


If this worst possible case is unacceptable, the semantics of the byte following a 25 could be expanded such that a new escape character was specified, but this seems kinda silly (and wouldn't help if the pixels are being generated on the fly, anyway).
For a 1000x1000 pixel bitmap, assuming uniform random distribution of values, this is the difference between transmitting 5.07MB and 3.84MB.
 
If this worst possible case is unacceptable, the semantics of the byte following a 25 could be expanded such that a new escape character was specified, but this seems kinda silly (and wouldn't help if the pixels are being generated on the fly, anyway). Other downsides include that base64 is very well known, whereas this is a strange and inelegant encoding, that this encoding is not seven-bit-clean, and that it's difficult to know how many bytes this will require.
 
It might be desirable to define deeper samples than 8bpc (10- and 16-bit, presumably).
 
===Compressed data===
 
We do not define any compression schemes. Image compression researchers have done well in this area.
 
At a minimum, we must allow WebP. Ought we require a MIME type identifying the type of the bitmap? If so, need the emulator make its supported formats discoverable?


==Terminal obligations==
==Terminal obligations==


A change to the cell-pixel geometry (the geometry, in pixels, of a terminal cell) must result in a <tt>SIGWINCH</tt>, just as a change to the overall visible geometry does. The <tt>TIOCGWINSZ</tt> <tt>ioctl(2)</tt> must accurately fill in the <tt>ws_xpixel</tt> and <tt>ws_ypixel</tt> fields.
A change to the cell-pixel geometry (the geometry, in pixels, of a terminal cell) must result in a <tt>SIGWINCH</tt>, just as a change to the overall visible geometry does. The <tt>TIOCGWINSZ</tt> <tt>ioctl(2)</tt> must accurately fill in the <tt>ws_xpixel</tt> and <tt>ws_ypixel</tt> fields.

Revision as of 21:35, 7 June 2021

STEGAP is my proposal for a terminal graphics protocol facilitating bitmapped sprites combined with terminal glyphs and styling. Just being able to blit a bitmap into a terminal is of little use for libraries like Notcurses. Unfortunately, that's about all that Sixel gives you. Useful background reading might include my Theory and Practice of Sprixels.

Goals as a toolkit developer

Ideally, I want to be able to:

  • be able to discover support for the protocol at runtime
  • provide a given bitmap in as few bytes as possible
  • associate a bitmap with an identifier. this bitmap might not be wholly opaque--transparent pixels are of critical important, translucency less so. i ought be able to reload the bitmap (keeping the size constant), and have it redrawn without flicker.
  • draw text atop the bitmap without a background color, so that the graphic is not obscured except where the glyph is defined
  • move the bitmap in a flicker-free way elsewhere in the visible area
  • update glyphs (partially-)obscured by a bitmap without disturbing the bitmap
  • destroy the bitmap with a single escape, ideally yielding whatever had been obscured by said bitmap

I do not require the ability to:

  • stack text atop text within a cell, with or without intermediate graphics

A solution must not:

  • require me to render text myself

Assumptions

  • An eight-bit, UTF-8 environment.
  • ESC (0x1b, 27) starts a new control sequence, terminating any ongoing one. This is necessary to conform to widespread existing behavior, but it is unfortunate, as it means we can't blithely write arbitrary bytes.
  • We mustn't tread on any defined XTerm control sequences.

Data formats

Unencoded greyscale

Support for 6-bit greyscale pixels with a bit of alpha channel, transmitted as bias-32 bytes in row-major order, is mandatory. If 64 levels of greyscale are not available, at a minimum, the 6-bit range must be partitioned into a non-empty range mapping to black, and a non-empty range mapping to white. Each byte contains a value between 32 (0x20) and 96 (0x60), inclusive. This value is equal to 32 + the greyscale value for values less than 96, or transparent if equal to 96. This leaves 31 legal 7-bit-clean values on either side (remember, ESC is prohibited).

Alternatively, we could use bias-31 bytes. This leaves 30 legal 7BC values on the low end, and 32 on the high end. We lose the property that all defined values are printable if we do this, along with some pleasing (but in the end, meaningless) bit-pattern properties.

Alternatively, we could provide a 7-bit greyscale. We lose the property of being seven-bit-clean if we do so.

Using a unit smaller than a byte would require defining a padding mechanism at row's end, and complicates calculations for necessary length.

Palette-indexed

...is not supported. Use PNG with a PLTE chunk, or WebP.

To RGB or not to RGB?

I like 24-bit RGB; if you don't need the alpha channel, you're saving 25% of your bytes. Introducing a single bit of alpha is nasty, though (pretty much requires a secondary channel, or you lose everything you gained).

Minimally-encoded BGRA

Support for 32-bit 8bpc BGRA pixels, in row-major order, is mandatory. Our BGRA is byte-oriented, not word-oriented, and thus there ought be no question of endianness: the 8 bits of the Blue channel must be the first byte of each pixel. Assuming a 32-bit BGRA pixel to be natively composed via ((A << 24) | (R << 16) | (G << 8) | B), a little-endian machine is storing BGRA, and a big-endian machine is storing ARGB; the big-endian machine would thus need a swizzle to properly output this wire format (alternately, it could work on individual 8-bit units stored as BGRA). This definition corresponds to PIX_FMT_RGB32 as emitted by FFmpeg. We cannot simply use the native ordering, because the terminal emulator and application might exist in different endianness domains.

If the terminal emulator cannot fully implement RGBA for some reason (perhaps it has only 256 colors at its disposal), it ought itself sensibly quantize the graphic. Likewise, in the absence of composed translucency, the Alpha channel can be partitioned into a wholly transparent range and a wholly opaque range. It is mandatory that a pixel with an Alpha value of 0 not obscure an existing pixel.

Were it not for the need to avoid embedded ESC characters, communicating the graphic's geometry would be sufficient to avoid any need for further encoding. Unfortunately, we've assumed that an ESC (decimal 27) will abort the graphic. Most protocols use base64 encoding to work around this, resulting in a flat 33% increase in bytes transmitted. STEGAP instead defines a byte of 25 to be an internal escape. The byte following a 25 (0x19) byte can have only two values: 0 (to indicate 25), or 1 (to indicate 27). This is a 100% increase in bytes transmitted for 2 of 256 possible byte values. The best case is thus 0% overhead (no 25 nor 27 values), the expected overhead for a uniform distribution of bytes is 0.78%, there is less overhead until at least 33% of all values are 25 or 27, and the (pathological) worst case suffers 100% overhead.

For a 1000x1000 pixel bitmap, assuming uniform random distribution of values, this is the difference between transmitting 5.07MB and 3.84MB.

If this worst possible case is unacceptable, the semantics of the byte following a 25 could be expanded such that a new escape character was specified, but this seems kinda silly (and wouldn't help if the pixels are being generated on the fly, anyway). Other downsides include that base64 is very well known, whereas this is a strange and inelegant encoding, that this encoding is not seven-bit-clean, and that it's difficult to know how many bytes this will require.

It might be desirable to define deeper samples than 8bpc (10- and 16-bit, presumably).

Compressed data

We do not define any compression schemes. Image compression researchers have done well in this area.

At a minimum, we must allow WebP. Ought we require a MIME type identifying the type of the bitmap? If so, need the emulator make its supported formats discoverable?

Terminal obligations

A change to the cell-pixel geometry (the geometry, in pixels, of a terminal cell) must result in a SIGWINCH, just as a change to the overall visible geometry does. The TIOCGWINSZ ioctl(2) must accurately fill in the ws_xpixel and ws_ypixel fields.