umjunsik132
12 hours ago
Hi HN, author here. For years, it bothered me that convolution (the king of vision) and matrix multiplication / self-attention (the engine of Transformers) were treated as completely separate, specialized tools. It felt like we were missing a more fundamental principle. This paper is my attempt to find that principle. I introduce a framework called GWO (Generalized Windowed Operation) that describes any neural operation using just three simple, orthogonal components: Path: Where to look Shape: What form to look for Weight: What to value Using this "grammar", you can express both a standard convolution and self-attention, and see them as just different points in the same design space. But the most surprising result came when I analyzed operational complexity. I ran an experiment where different models were forced to memorize a dataset (achieving ~100% training accuracy). The results were clear: complexity used for adaptive regularization (like in Deformable Convolutions, which dynamically change their receptive field) resulted in a dramatically smaller generalization gap than "brute-force" complexity (like in Self-Attention). This suggests that how an operation uses its complexity is more important than how much it has. I'm an independent researcher, so getting feedback from a community like this is invaluable. I'd love to hear your thoughts and critiques. Thanks for taking a look. The paper is here: https://doi.org/10.5281/zenodo.17103133
CuriouslyC
4 hours ago
I'm also an independent researcher, and I just wanted to say it's exciting to see other individuals making real contributions! One thing I've noticed is that as I'm discovering some very deep stuff, the imposter syndrome is hitting me hard because I don't have a research group to vibe off of. I have scientific training and 17 years of ML experience, but I think it's still natural to question yourself when you're pushing past the SOTA and finding deep patterns that the field has missed.
If it's useful to you, I'm happy to be a sounding board/vibes partner for your research. My contact info is in my profile.
rf15
6 hours ago
Very good find, thank you for writing it down. For some time I had the impression that they could be unified, I just never bothered trying.