Some open thoughts:
In practice it seems that the best image processing routine is to go from upscaling -> binarization -> Tesseract OCR which is the current routine that is being used now. However, as powerful as leptonica can be, its upscaling leaves much to be desired. nnedi3 and waifu2x are very good, with xBR (scale by rules) being decent as well. nnedi3 was developed as an image deinterlacer, which on old televisions, looks something like this
. Because a deinterlaced frame is only half the image, nnedi3 uses a neural network (with pre-computed weights, important for later) to interpolate the missing pixels in an image. Because of this feature, nnedi3 also works extremely well as an image doubler. Waifu2x was developed later using neural networks and has a pleasant "trace" or signature that many image interpolation algorithms leave behind. However, it does not use hard coded weights and cannot run in real-time, where as nnedi3 can. Still Lanczos gives pretty good results, in my opinion, but since the source image will be upscaled over 2x, using an image doubler will improve the quality dramatically.
Image binarization is very good. Leptonica uses Sauvola, Wolf, et al
and is arguably the best binarization algorithm out there. There is a possibility that leptonica preforms binarization first, then upscaling, which gives inferior results.
After the binarization step, a clean up step can possibly be introduced. This might be as simple as denoising small particles, or using MSER (Maximally Stable External Regions) to detect text regions. The question is whether such a complex step would be better as the first step, SIFT/MSER -> crop -> upscaling -> binarization -> denoise -> ocr. Regardless it might require OpenCV, or I'd have to compile the code myself.
What's especially interesting is how these algorithms were developed. xBR is a pixel shader for people who love to emulate old PS1 games. Waifu2x is used to increase the low quality images of "waifus" or 2D girls who are attractive to them. Tesseract is an open source project, now led by google, but google's vision API is far superior to whatever can come out of tesseract. Still it's possible that they are using tesseract in some form. leptonica is a library written in C for image processing studies. Lanczos is a sinc() function, or sin(x)/x upscaler. It's better than BiCubic and Catmull Rom. Most of these algorithms are implemented on madVR, a video renderer which can be found on the doom9 forums. The doom9 forums are full of videophiles who love video quality, and there's lots of really good projects on there. SIFT, MSER, are modern versions of DoG (Difference of Gaussian) which itself is an approximation of LoG (Lapacian of Gaussian), and Harris corners. These algorithms can be used to detect "hotspots" or interesting parts of an image. These points can then be tracked in real-time for example like on Microsoft's Hololens mixed reality headset. SIFT = Scale Invariant Feature Transform. Still it's quite ironic that many of these techniques were simply developed by people who love watching anime, people who love their old video games, and people interested in math. Unless these ideas are mathematically pure, in the sense that at some point or another someone will stumble upon the mathematics, very few of these techniques were developed with a sense of purpose, other than I want to enjoy my video games/movies/anime more.
EDIT: possible deskewer? Also I'll probably take a break from this. It's a pretty elaborate pet project, but still a pet despite it being warm and fuzzy.