Check out my first novel, midnight's simulacra!
Theory and Practice of Sprixels: Difference between revisions
No edit summary |
|||
Line 10: | Line 10: | ||
* Kitty's protocol, advanced by Kovid Goyal. | * Kitty's protocol, advanced by Kovid Goyal. | ||
* The "1337" iTerm2 protocol, also implemented by wezterm. | * The "1337" iTerm2 protocol, also implemented by wezterm. | ||
* You can | * You can shit <i>con brio</i> all over the Linux framebuffer directly when in a framebuffer console. | ||
Sixel is both the oldest and the most widely supported, though the majority of this support has come in the past year. Unfortunately, while passable for controlling early-Eighties nine-pin color dot matrix printers like the DEC LA50, it's pretty undesirable from a console bitmap standpoint. Nonetheless, combined with "sprite", it has loaned its name to "sprixels". | Sixel is both the oldest and the most widely supported, though the majority of this support has come in the past year. Unfortunately, while passable for controlling early-Eighties nine-pin color dot matrix printers like the DEC LA50, it's pretty undesirable from a console bitmap standpoint. Nonetheless, combined with "sprite", it has loaned its name to "sprixels". |
Revision as of 08:55, 29 March 2021
dankblog! 2021-03-29, at the danktower
I've spent significant time over the past few weeks adding "sprixel" support to Notcurses, culminating (thus far) in yesterday's 2.2.4 release. What's a sprixel? Numerous terminal emulators support one or another "graphic protocol", by which bitmaps of arbitrary size can be written to the glyph-based viewing area. Showing that dynamic control of the font is approximately equivalent in power to such graphic protocols is left as an exercise for the reader.
Sprixel flavors
There are at least four major methods:
- the venerable DEC Sixel protocol, introduced on the VT125, and also present on at least the VT240 and VT 330. Implemented by at least XTerm, Mlterm, and foot, with patches out for at least VTE, alacritty, and Windows Terminal.
- Kitty's protocol, advanced by Kovid Goyal.
- The "1337" iTerm2 protocol, also implemented by wezterm.
- You can shit con brio all over the Linux framebuffer directly when in a framebuffer console.
Sixel is both the oldest and the most widely supported, though the majority of this support has come in the past year. Unfortunately, while passable for controlling early-Eighties nine-pin color dot matrix printers like the DEC LA50, it's pretty undesirable from a console bitmap standpoint. Nonetheless, combined with "sprite", it has loaned its name to "sprixels".
These protocols differ from each other in fundamental, important ways, and forming an abstraction from them isn't trivial. Integrating them wholly into the z-ordered semantics of Notcurses was still more difficult. Notcurses works via piles (rendering contexts) of planes (drawing surfaces), totally ordered on a z axis. Higher planes obscure or blend with lower ones. When Notcurses renders a pile, it projects these planes down onto a single plane, solving a single surface; it rasterizes this virtual surface to the physical viewing area by encoding it as a set of terminal escape codes and UTF-8 data. Only cells which have changed need to be rasterized, and eliding unchanged cells is a critical optimization, usually cutting the output size down by 90% or more. Every output byte is multiplied several times before becoming visible (we must copy it to a kernel buffer, the terminal must read it from a kernel buffer, and the terminal must display the resulting cell), so trading some computation for fewer output bytes is well worth it.
The elegant model of planes of fixed-width, independent cells solving for a single cell matrix already breaks down in the presence of Unicode wide glyphs, perhaps foremost among them U+FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM ﷽, a single glyph occupying anywhere from one to who-knows (I've seen 9) cells, depending on font, font engine, and terminal emulator. Wide characters cannot be printed in part, and printing another character anywhere atop a wide character obliterates all of the old character. Even restricting ourselves to characters two cells wide, a single character can annihilate four columns containing two characters:
- column 0: left side of character A
- column 1: right side of character A
- column 2: left side of character B
- column 3: right side of character B
Print a wide character C at column 1, and we end up with
- column 0: space
- column 1: left side of character C
- column 2: right side of character C
- column 3: space
This isn't a problem with a single virtual plane, but imagine we have two planes, P0 and P1, with P0 above P1. They are of equal size, and P0 is transparent by default. P0 has a wide character at column 1. P1 has a wide character at column 0. What do we render? It's impossible to render the first half of P1, despite that being the logical render. Allah, the All-Powerful, has fucked us again!
This difficulty is ratcheted up significantly in the case of sprixels. In the most complicated case, a single sprixel might need to be both "under" (obscured by) and "over" (partially obscuring) text, and the set of cells obscuring and being obscured can change every frame. Imagine, for instance, that we have the following friendly octopus (512x357px, and note that it has a transparent channel):
Let's assume a cell geometry of 11x20, which just happens to be my current cell geometry. 512 pixels then occupy a little over 46 columns (512 == 46 * 11 + 6), while 357 pixels occupy just about 18 rows (357 == 17 * 20 + 17). We want to be able to draw this happy fellow atop a background of text:
Note that the text "underneath" the octopus can change while the octopus is present, and this change must be reflected if the text is visible. We also want to draw on top of the octopus, and this text too can change while the octopus is displayed, or even go away, in which case the octopus must be restored.
We might want to write atop the sprixel, but with a transparent background. Unfortunately, this is only really possible with Kitty (I'm excluding ideas like sampling the glyph and interpolating an RGB background for the text cell), where we can place an image at "z=-1" in the internal kitty image Z-axis to place it underneath text. Here I've done so, placing the sprixel underneath the layer of 'a's, but still annihilating the sprixel with some other text.
Sixel doesn't really provide this capability; writing text over a sixel graphic will annihilate affected cells in their entirety. Here we come to the most meaningful difference between Sixel and Kitty/iTerm: the former is temporal, the latter positional.
Temporal vs Positional
Sixel writes the provided glyph as a unit, starting at the current cursor position (assuming we've entered DECSDM aka "Sixel Display Mode", which we always do—otherwise all sixels are emitted at the upper left corner of the display). Bits not specified in the sixel are transparent (assuming the Device Control String has used the value 2 for parameter P2), and transparent pixels leave undisturbed whatever was already present at the time of emission. Writing an entirely transparent region is a no-op. Writing a new character (or new Sixel pixels) anywhere in the graphic's region will wipe out the entirety of that cell. The Sixel data thus annihilated is not recoverable. This means that updating the text underneath a Sixel will blow the obscuring graphics away, requiring that the affected Sixel pixels be rerendered. In the event of stacked sprixels, we would potentially need to rerender the entire stack. In a naive implementation, changing one text cell "underneath" the octopus could result in painting 781 cells (the text cell actually being changed, and the 780 cells of the octopus). This is plainly undesirable.
With Kitty, a graphic is positional. All text is assumed to be at z=0. Graphics with non-negative coordinates obscure text. Graphics with negative coordinates sit underneath text. Note that this means we must choose whether a Kitty sprixel is going to have text above it, or under it, but no arbitrary single sprixel can do both.
Cell wiping
Thankfully, unlike Sixel, we have the ability to delete graphics in the Kitty protocol (Sixel graphics can only be cleared along with a larger screen region, or entirely obscured by other output), so we can "cut" chunks out of our bitmap, delete the visible sprixel, and load the new one; doing so effectively allows us to make text underneath a "z=-1" sprixel dynamically available. We cut the chunk out by setting the α (alpha) parameter to 0 across the breadth of the affected RGBA region. This function, sprite_kitty_cell_wipe(), can be found in src/lib/kitty.c. Easy peasy. We must keep the original α values somewhere, in case the damage need be recovered (we needn't duplicate the RGB values; we leave those as they are). Since we normalize the α values on input to either 0 or 255, we only need to keep one bit of state for each pixel, and map back that bit times 0xff. For a 1024x1024 pixel image (4MiB), this would be be a maximum 1Mb bit vector, representing 3.125% overhead.
Color quantization
The Kitty protocol accepts arbitrary RGBA 32-bit words in the sRGB color space; our natural internal format is 24bpp RGB plus two bits of alpha, so that's great—our colorspaces match. Sixel is a palette-indexed format. Common sizes are 16-, 256-, and 1024-entry palettes (in XTerm, this is controlled with the numColorRegisters X resource. Palettes are independent unless the privateColorRegisters resource is enabled). A color quantization step is thus necessary (note that this these quantizations can be performed in perfect parallelism when split across multiple frames/images, but not necessarily within a single frame). There are five essential color quantization algorithms:
- Uniform chunking aka popularity algorithms / Voronoi clustering. Any child can come up with this scheme. Autumn Lamonte's Jexer uses a clever variant on this. Very fast, but the least accurate on difficult images.
- Median cut, with or without a histogram. Heckbert in SIGGRAPH 1982. Used by libsixel by way of netpbm.
- octrees, introduced by Gervautz and Purgathofer in 1988.
- Kohonen neural nets as introduced by Dekker in NeuQuant. Used by the color_quant crate.
- K-means clustering, perhaps best known from exoquant.
Notcurses uses my own take on Voronoi, a scheme I call "relaxation", as seen in the refine_colors() function in src/lib/sixel.c. I'll likely be changing this up, though. In the meantime, it's very robust, working properly on images that trip up libsixel. For heavy-duty testing, the fascinating images at AllRGB have been invaluable.
The T-A Matrix
Associated with a sprixel (technically with the plane on which the sixel is blitted, so it can be reused across frames) is a "transparency-annihilation matrix" having dimensions equivalent to the sprixel's area in cells. These matrix entries are loaded when the sprixel is first encoded, as either 0 (a wholly opaque cell) or 1 (a cell region with at least one transparent pixel). These values are used to determine whether or not a cell underneath the sprixel needs be updated to reflect a change in the render; if the cell is entirely obscured by the sprixel, there's no need to change the text in the cell (and this saves us an expensive redraw when using Sixel). When we cut out a cell, the T-A matrix takes the value 2 (annihilation). Only with Kitty do we actually cut the cell out of the encoded glyph (which we then delete and replay), but we always want to track this, because we often reuse the T-A matrix across frames. When we encode an image with a preseeded T-A matrix in hand, we implicitly drop the annihilated pixels. Using this method, we never obscure text atop a series of frames, eliminating flicker we would otherwise suffer.
As you might have already guessed, the cut vector mentioned above is hung off the T-A matrix (though it must be refreshed on each frame change). When an annihilated cell is uncovered, it must be reconstructed; the T-A matrix value is changed to 4 (anastasis). At render time, the resurrected cell can either be rebuilt using the cut vector directly into the image (Kitty), or the Sixel-based sprixel can be reconstructed in toto. It might be desirable to revive into cell-sized sprixels, but see below.
As noted, the T-A matrix would seem to be a property of a sprixel, but is really a property of the sprixel-plane (it must sometimes have lifetime beyond the sprixel, but we must check its suitability for further sprixels). The T-A matrix is absolutely key to getting decent flicker-free performance while supporting arbitrary damage to sprixels.
External Links
- The VT330/VT340 Programmer Reference Manual Volume 2: Graphics Programming from vt100.net
- The Kitty graphics protocol
- The iTerm2 graphics protocol
- Thomas Dickey's XTerm Control Sequences document
- Jexer, Autumn Lamonte's Java library inspired by Borland's "Turbo" line of produts, is the only thing that comes close AFAIK
previously: "spooky tmux at a distance" 2021-02-20