Show HN: Fast and Exact Algorithm for Image Merging

113 pointsposted a day ago
by C-Naoki

30 Comments

scottdupoy

a day ago

Interesting to see something like this!

My computer science masters thesis was based on the same goal. I used a 2D convolution which meant you can merge images with inexact overlaps. I had to run a high-pass filter first to limit the image details to their edges only or else the convolution incorrectly matched bright areas.

In reality merging pictures is further complicated because the source images may be slightly rotated relative to each other and also due to the images being slightly curved due to lens distortion.

My supervisor wanted me to do a PHD on the topic!

gsliepen

a day ago

I used this for several applications. Note that 2D convolution can be done efficiently using FFTs, and filtering can be combined with this very efficiently: if you see your high-pass filter as a convolution of its own, you can pre-calculate its FFT, and just multiply it almost for free in the frequency domain with the two images you want to convolve.

scottdupoy

a day ago

That's exactly how it worked, hand rolled FFT and filtering following the method in "Numerical Recipes for C"

gsliepen

16 hours ago

Oh, Numerical Recipes is nice but their algorithm implementations are not really state-of-the-art. I highly recommend using FFTW (https://fftw.org/) as it will likely give you a substantial performance improvement.

C-Naoki

a day ago

Thank you for your comments! For sure, the CNN is expressive for learning the characteristics of images. However, in this development, I tried to not use deep-learning because I believe that it is important to provide fast, consistent results without the need for training data. If you are particularly interested in this app, I would be glad if you could create a pull request to extend the algorithm.

jdhwosnhw

a day ago

The parent comment said nothing about using deep learning. Convolution is not the same as using a CNN. I interpreted their comment as meaning they used a 2D convolution (presumably a 2D cross correlation, actually) to find regions of overlap

scottdupoy

a day ago

Yes you're right it was a 2D cross-correlation which is very analogous to a convolution

r_hanz

20 hours ago

If memory serves… the only difference is that one of the kernels being convolved is reversed for convolution.

sitkack

a day ago

The images might not be coplanar and the overlapping composition should be 2d planes in 3d space or go full gaussian splat.

mightyham

a day ago

What are the practical applications for this tool? Typically stitching images for something like panoramas requires significantly more advanced image processing algorithms because the pixels do not perfectly overlap.

jdiff

a day ago

Even in web browsers that support screenshotting an entire page, websites often unload elements that are off-screen. A solution like this can take a bunch of screen-length images and stitch them into a full view of the document.

hackernewds

a day ago

There are Chrome extensions that do this well already

C-Naoki

a day ago

Thank you for comments! Certainly, this application may not be able to handle any kinds of images. However, I tried to stitch images without using deep-learning. Therefore, the strength of this app is that when this app receives the same images, it always produces consistent results. In the future, I will try to develop a more effective image merging method in more generalized scenario.

jasonjmcghee

a day ago

Is deep learning state of the art for something like this?

Would have expected it to just be kernel based.

Regardless, you can have fully deterministic deep learning approaches. You can use integers, run on a CPU, and seed everything.

tobr

a day ago

Interesting! The example shows two images that appear to have a pixel-perfect matching region. Is that a requirement or does it work with images that are only somewhat similar?

asadm

a day ago

seems to be doing some mean-square error to find best matching region.

therobot24

a day ago

look at those for loops! should look into fft-based correlation, can even do so with melon transform for scale and circular harmonic transform for rotation

I’ve been looking for something like this for creating surveys using drone footage - extract every “n” frames from the video, then stitch ‘em up somehow to make a “layer”.

There’s existing software for this kind of work, but I’ve been in the mood to reinvent the wheel a bit for some strange reason.

martinmaly21

a day ago

Nice work!

What's the latest state of the art in image stitching these days? From what I can tell, there was a bunch of research done on it in the past, but with all the recent advancements in AI, not much has changed on this front. I'd love to be wrong though!

sorenjan

a day ago

Related to this, is there a name for the effect when you stitch together video frames into a static background while keeping the moving objects moving? The best example I can think of is this Bigfoot video[0, 1], where the shaky footage has been combined into a bigger canvas with "Bigfoot" moving through it. It's a combination of video stabilization and image panorama, but with some smarts to only keep one version of the moving object in each finished frame.

[0] https://www.youtube.com/watch?v=Q60mSMmhTZU [1] https://x.com/rowancheung/status/1641519493447819268

iamjackg

a day ago

A long time ago I did some work to do exactly this in an automated fashion using ffmpeg. It wasn't perfect, but it was better than nothing. I tried going back through my bash history, and the last related entry was this command line:

    ffmpeg -i C0119.MP4 -vf vidstabtransform=interpol=no:crop=black:optzoom=0:zoom=0:smoothing=0:debug=1:input="weirdzoom.trf",unsharp=5:5:0.8:3:3:0.4 kittens-stabilized.mp4
I think the trick was to set all the stabilization parameters to 0 and crop=black to force ffmpeg to move the image around as much as necessary and zoom everything out.

EDIT: nevermind, it was more complicated than that. I actually wrote a Python script that modified the motion tracking information generated by ffmpeg to reduce the zoom amount and fit everything within a 1920x1080 frame. Man, I wish I'd added comments to this.

The https://www.reddit.com/r/ImageStabilization/ subreddit has a lot of posts in that style, but from the research I did it seems like it's mostly done manually by lining up each frame as a separate layer and then rendering an animation that adds one layer per frame.

wmanley

a day ago

See also: Hugin - Panorama photo stitcher. I used to use a lot back in ~2006 for making panoramas. It automatically finds "control points" in your photos, figures out which ones are shared between the photos and uses that information to determine the relative positions of the photos, and your lens parameters.

Once it does that it can stitch the photos together. It does this by projecting the photos onto a sphere, and then taking a picture of that sphere using whatever lens parameters you want.

https://hugin.sourceforge.io/

kouru225

16 hours ago

This app never works for me and I don’t know why. Photoshop’s auto merge always works but this one doesn’t

tsumnia

a day ago

Nicely done and keep up the practice. I recall during my Masters needing to translate facial landmark points from a Cartesian coordinate system into points that could would appear on their respective images. It wasn't for anything major, I just wanted a visual representation of my work. Its these little "neat" projects that help build larger breakthroughs.

lugao

a day ago

Why a naive pixel matching library got so many likes here?

debo_

a day ago

Because it's nice to see people trying things for themselves, even if they aren't novel?

kouru225

16 hours ago

Oh shit thank you so much. I’ve always had to use photoshop for this

C-Naoki

14 hours ago

I'm glad it was helpful!