Typically, when we photograph small objects at very close range, only a narrow depth of field is in focus. The rest of the image appears blurred. The further other parts of the scene are from the focal plane, the more they blur. This shallow focus helps us to understand scale and depth.
However, in these pictures, the artist has cleverly avoided the blurring effect by combining multiple pictures taken at different focal distances into a single image. The resulting pictures look crisp and clear throughout, and as a result, lacks the usual depth cues we are accustomed to in macro photography. That's why these pictures resemble photographs of large halls!
A similar effect can be observed in ray tracing as well, where we are free to construct entirely imaginary scenes. While defining a scene that we want to be perceived as small, we need to remember to add focal blur [1] carefully. If we forget to do so, the resulting scene can produce the exact opposite impression, that of a vast space.
[1]: https://github.com/susam/pov25#focal-blur
I think that's the most interesting part! From the article:
> Every part of his process is intentional because he doesn't want the images to look like miniatures. The focus stacking helps him avoid the typical aesthetic of macro photography by reducing the amount of background blur and focal compression. Creating an image that looks like it was taken with an ultra-wide-angle lens also results in leading lines we associate with normal-sized things, like streets and buildings, which tricks your brain into thinking the subject is not small. He also uses lighting to make it look like the sun is shining down, emphasizing the feeling that you are standing inside something.
Despite being physically quite close to the subject, the ratio of subject-size-in-frame to distance-to-subject is usually still quite small (the angle of view for macro lenses is generally much smaller than what the focal length at infinity would suggest).
So for us, macro shots tend to have two characteristics: 1.) perspective is approaching an isometric drawing 2.) usually narrow depth of field.
These shots on the other hand were made with a very wide field of view and focus stacking produces a deep depth of field. I'm sure that if you worked out the angles and distances in e.g. the violin shot then the ratios will be basically the same as your typical 2.5 story architecture shot or subway architecture done with something in the 14-20mm FF range. Because the photographer went to great lengths to make it look like that.
There's also other cues, like the height of the camera relative to the floor and ceiling of the room, and of course the light.