Spout Effects
Spout Effects is a Windows program that can take in a video stream from a variety of input sources, apply visual effects, then stream the resulting video to one of many output methods. It currently can receive video input from webcams and programs compatible with the Spout2 library, and it can output as a virtual webcam or as a Spout2 source. Current VFX include a monochrome edge detection filter, a Sobel outline filter, and an ASCII filter.
History
I initially made a simple ASCII converter in Python for fun. This looked great, but it had several issues. Being a command-line application, I was stuck with a lot of undesirable features. Most terminal programs are fairly slow for a variety of reasons - a great video on that can be found here - and the Windows console is an especially egregious offender. The demo of the Python ASCII converter below is fairly optimized, several times faster than my naive first attempt, yet it still struggled to hit ~25 fps with a canvas size of 200x100. Other features that I wanted to add, such as color, would harm performance even more. Adding ANSI color codes would require printing approximately 4x the data. This clearly wasn't going to work well enough.
So naturally, in order to fix these small problems, I began a total rewrite and decided to teach myself OpenGL while I was at it. Because that's what a normal person would do.
Starting Over
The result was Spout Effects - made with C++/OpenGL from scratch (mostly), it's the result of a lot of trial and error. I went in blind and looked up as little help as possible. I started out with naive approaches and designed optimizations as needed. My first implementation made one draw call for each character and created a texture for each glyph I was drawing with. I eventually discovered instanced rendering and texture arrays, allowing me to draw each frame in a single draw call and by binding only a single texture for every glyph.
I finished reimplementing and optimizing my original Python program, allowing me to move on to those extra features I mentioned I'd wanted to add. First up: edge detection. A big problem with ASCII filters is the loss of detail - you're essentially downscaling the image by 8 times or more, depending on the size of the characters. Because of this, more detailed scenes and look "busy" and be difficult to understand. Luckily for me, Acerola, a very talented graphics programmer, noticed the same problem around the same time and had a great idea for a solution. I decided to model my approach after his.
Essentially, by using straight-line characters ( _ | / \ ) at the edges of objects, we can more cleanly separate distinct objects and improve perceived visual fidelity at lower resolutions. To correctly place these characters, we need to identify both the location of edges and their direction. For that, we use two different filters: the extended Difference of Gaussians for edge detection, and the Sobel filter for finding the direction of these edges.
The Difference of Gaussians is a method of edge detection that works by subtracting two Gaussian blurs with different strengths. The Extended Difference of Gaussians modifies this by adding several new parameters that allow for fine-tuning the output for more aesthetically pleasing edges. Passing my webcam into a DoG shader I wrote and tweaking the parameters results in the below image. You can see that the edges are well-defined and there is very little noise. With this, we have our edges - the next stage will take this image and calculate the corresponding directions.
To calculate the direction of the edges, we use the Sobel filter. This filter calculates the gradient of the image intensity. In other words, for each pixel, it outputs a vector pointing in the direction of greatest change in pixel color. Paired with our black and white image, the only vectors with non-negligible magnitudes will be along the white edges, and the directions will point towards the surrounding black pixels. We take the angles of these vectors and partition them into four partitions: one for each of the directional characters we are going to use. The image below uses color to visualize these partitions. You can see that vertical lines are colored green, horizontal lines are colored blue, and diagonal lines are colored red and yellow. We encode this data as an integer for use in the next stage.
If you remember, I mentioned earlier that ASCII images are pretty much downscaled. Each character is essentially 1 "pixel" yet takes up 8x8 actual pixels. If we let the GPU do the downscaling, it results in a very poor image with much of the detail lost. Instead, I opted to use a compute shader to do some custom downscaling. If you're unfamiliar, compute shaders are special shaders that can do arbitrary computations and write the results to a texture. The developer chooses at run-time how many instances ("work groups") of this shader to run, and each work group runs some number of threads ("local groups"). In our case, we want each instance of this compute shader to run one instance for each character in our output image, and we want each work group to contain 8x8 threads, one for each pixel that contributes to that character. Each thread samples one pixel from the previous Sobel filter output and records which direction it was, adding it to a shared array. After each thread is done, the first thread adds up the number of pixels of each edge type, selecting the most common type and writing that to the output. This means that any noisy areas in the Sobel filter image will be narrowed down to whichever color was most common. We also have each thread store the brightness of its pixel, this time sampled from the original input image, in a shared array. The first thread calculates the average luminance and stores it in the output texture for future use when deciding which ASCII character to draw.
The final step for our ASCII filter is the actual drawing. We issue an instanced draw call with one instance for each character. Each of these characters samples the output of the compute shader, retrieving the brightness and edge type for that area of the screen. If the character is not an edge, the brightness is used to access a texture array containing our glyph "palette". The brighter the pixel, the higher the index, and the larger/brighter the glyph. After sampling the appropriate glyph texture and drawing each character, the process is completed. The finished product can be seen below. The full process is very optimized, taking approximately 0.1ms per frame and taking place almost entirely on the GPU. This makes it ideal for use with things like livestreaming, where the CPU is usually under heavy load from other tasks and every bit of usage makes a big difference in performance.
That's a pretty good result. You can see that the effect is able to capture details that would be lost by typical ASCII filters. My glasses, my door, and even my headphone wire (at the bottom right of my head) are captured and well-defined. The filter also holds up well in motion, and it looks great at high framerates.