darhodester
20 days ago
Hi,
I'm David Rhodes, Co-founder of CG Nomads, developer of GSOPs (Gaussian Splatting Operators) for SideFX Houdini. GSOPs was used in combination with OTOY OctaneRender to produce this music video.
If you're interested in the technology and its capabilities, learn more at https://www.cgnomads.com/ or AMA.
Try GSOPs yourself: https://github.com/cgnomads/GSOPs (example content included).
henjodottech
19 days ago
I’m fascinated by the aesthetic of this technique. I remember early versions that were completely glitched out and presented 3d clouds of noise and fragments to traverse through. I’m curious if you have any thoughts about creatively ‘abusing’ this tech? Perhaps misaligning things somehow or using some wrong inputs.
darhodester
19 days ago
There's a ton of fun tricks you can perform with Gaussian splatting!
You're right that you can intentionally under-construct your scenes. These can create a dream-like effect.
It's also possible to stylize your Gaussian splats to produce NPR effects. Check out David Lisser's amazing work: https://davidlisser.co.uk/Surface-Tension.
Additionally, you can intentionally introduce view-dependent ghosting artifacts. In other words, if you take images from a certain angle that contain an object, and remove that object for other views, it can produce a lenticular/holographic effect.
echelon
19 days ago
Y'all did such a good job with this. It captivated HN and was the top post for the entire day, and will probably last for much of tomorrow.
If you don't know already, you need to leverage this. HN is one of the biggest channels of engineers and venture capitalists on the internet. It's almost pure signal (minus some grumpy engineer grumblings - we're a grouchy lot sometimes).
Post your contract info here. You might get business inquiries. If you've got any special software or process in what you do, there might be "venture scale" business opportunities that come your way. Certainly clients, but potentially much more.
(I'd certainly like to get in touch!)
--
edit: Since I'm commenting here, I'll expand on my thoughts. I've been rate limited all day long, and I don't know if I can post another response.
I believe volumetric is going to be huge for creative work in the coming years.
Gaussian splats are a huge improvement over point clouds and NeRFs in terms of accessibility and rendering, but the field has so many potential ways to evolve.
I was always in love with Intel's "volume", but it was impractical [1, 2] and got shut down. Their demos are still impressive, especially from an equipment POV, but A$AP Rocky's music video is technically superior.
During the pandemic, to get over my lack of in-person filmmaking, I wrote Unreal Engine shaders to combine the output of several Kinect point clouds [3] to build my own lightweight version inspired by what Intel was doing. The VGA resolution of consumer volumetric hardware was a pain and I was faced with fpga solutions for higher real time resolution, or going 100% offline.
World Labs and Apple are doing exciting work with image-to-Gaussian models [4, 5], and World Labs created the fantastic Spark library [6] for viewing them.
I've been leveraging splats to do controllable image gen and video generation [7], where they're extremely useful for consistent sets and props between shots.
I think the next steps for Gaussian splats are good editing tools, segmenting, physics, etc. The generative models are showing a lot of promise too. The Hunyuan team is supposedly working on a generative Gaussian model.
[1] https://www.youtube.com/watch?v=24Y4zby6tmo (film)
[2] https://www.youtube.com/watch?v=4NJUiBZVx5c (hardware)
[3] https://www.twitch.tv/videos/969978954?collection=02RSMb5adR...
[4] https://www.worldlabs.ai/blog/marble-world-model
[5] https://machinelearning.apple.com/research/sharp-monocular-v...
[7] https://github.com/storytold/artcraft (in action: https://www.youtube.com/watch?v=iD999naQq9A or https://www.youtube.com/watch?v=f8L4_ot1bQA )
darhodester
19 days ago
First, all credit for execution and vision of Helicopter go to A$AP, Dan Streit, and Grin Machine (https://www.linkedin.com/company/grin-machine/about/). Evercoast and Wild Capture were also involved.
Second, it's very motivating to read this! My background is in video game development (only recently transitioning to VFX). My dream is to make a Gaussian splatting content creation and game development platform with social elements. One of the most exciting aspects of Gaussian splatting is that it democratizes high quality content acquisition. Let's make casual and micro games based on the world around us and share those with our friends and communities.
bininunez
19 days ago
Thanks darhodester! It was definitely a broad team effort that started with Rocky and Streit's creative genius which was then made possible by Evercoast's software to capture and generate all the 4D splat data (www.evercoast.com), which then flowed to the incredible people at Grin Machine and Wild capture who used GSOPs and OctaneRender.
eMPee584
19 days ago
What do you think about the sparse voxel approach, shouldn't it be more compute efficient than computing zillions of ellipsoids? My understanding of CGI prolly is t0o shallow but I wonder why it hasn't caught on much..
darhodester
19 days ago
I believe most of the "voxel" approaches also require some type of inference (MLP). This limits the use case and ability to finely control edits. Gaussian splatting is amazing because each Gaussian is just a point in space with a rotation and non-uniform scale.
The most expensive part of Gaussian splatting is depth sorting.
darhodester
19 days ago
The ghost effect is pretty cool, too! https://www.youtube.com/watch?v=DQGtimwfpIo
jofzar
19 days ago
https://youtu.be/eyAVWH61R8E?t=3m53s
Superman is what comes to mind for this
kqr
19 days ago
I remember splatting being introduced as a way to capture real life scenes, but one of the links you have provided in this discusson seems to have used a traditional polygon mesh scene as training input for the splat model. How common is this and why would one do it that way over e.g. vertex shader effects that give the mesh a splatty aesthetic?
darhodester
19 days ago
Yes, it's quite trivial to convert traditional CG to Gaussian splats. We can render our scenes/objects just as we would capture physical spaces. The additional benefits of using synthetic data is 100% accurate camera poses (alignment) which means the structure from motion (SfM) step can be bypassed.
It's also possible to splat from textured meshes directly, see: https://github.com/electronicarts/mesh2splat. This approach yields high quality, PBR compatible splats, but is not quite as efficient as a traditional training workflow. This approach will likely become mainstream in third party render engines, moving forward.
Why do this? 1. Consistent, streamlined visuals across a massive ecosystem, including content creation tools, the web, and XR headsets. 2. High fidelity, compressed visuals. With SOGs compression, splats are going to become the dominant 3D representation on the web (see https://superspl.at). 3. E-commerce (product visualizations, tours, real-estate, etc.) 4. Virtual production (replace green screens with giant LED walls). 5. View-dependent effects without (traditional) shaders or lighting
It's not just about the aesthetic, it's also about interoperability, ease of use, and the entire ecosystem.
sbierwagen
20 days ago
From the article:
>Evercoast deployed a 56 camera RGB-D array
Do you know which depth cameras they used?
bininunez
20 days ago
We (Evercoast) used 56 RealSense D455s. Our software can run with any camera input, from depth cameras to machine vision to cinema REDs. But for this, RealSense did the job. The higher end the camera, the more expensive and time consuming everything is. We have a cloud platform to scale rendering, but it’s still overall more costly (time and money) to use high res. We’ve worked hard to make even low res data look awesome. And if you look at the aesthetic of the video (90s MTV), we didn’t need 4K/6K/8K renders.
bredren
19 days ago
You may have explained this elsewhere, but if not—-what kind of post processing did you do to upscale or refine the realsense video?
Can you add any interesting details on the benchmarking done against the RED camera rig?
spookie
19 days ago
This is a great question, would love some some feedback on this.
I assume they stuck with realsense for proper depth maps. However, those are both limited to a 6 meters range, and their depth imaging isn't able to resolve features smaller than their native resolution allows (gets worse after 3m too, as there is less and less parallax among other issues). I wonder how they approached that as well.
darhodester
20 days ago
Aha: https://www.red.com/stories/evercoast-komodo-rig
So likely RealSense D455.
darhodester
20 days ago
I was not involved in the capture process with Evercoast, but I may have heard somewhere they used RealSense cameras.
I recommend asking https://www.linkedin.com/in/benschwartzxr/ for accuracy.
secretsatan
20 days ago
Couldn’t you just use iphone pros for this? I developed an app specifically for photogrammetry capture using AR and the depth sensor as it seemed like a cheap alternative.
EDIT: I realize a phone is not on the same level as a red camera, but i just saw iphones as a massively cheaper option to alternatives in the field i worked in.
F7F7F7
20 days ago
ASAP Rocky has a fervent fanbase who's been anticipating this album. So I'm assuming that whatever record label he's signed to gave him the budget.
And when I think back to another iconic hip hop (iconic that genre) video where they used practical effects and military helicopters chasing speedboats in the waters off of Santa Monica...I bet they had change to spear.
cwillu
19 days ago
Is there any reason to think https://thebaffler.com/salvos/the-problem-with-music doesn't apply here?
numpad0
20 days ago
A single camera only captures the side of the object facing the camera. Knowing how far away that camera facing side of a Rubik's Cube help if you were making educated guesses(novel view synthesis), but it won't solve the problem of actually photographing the backside.
There are usually six sides on a cube, which means you need minimum six iPhone around an object to capture all sides of it to be able to then freely move around it. You might as well seek open-source alternatives than relying on Apple surprise boxes for that.
In cases where your subject would be static, such as it being a building, then you can wave around a single iPhone for the same effect for a result comparable to more expensive rigs, of course.
antidamage
18 days ago
The minimum is four RGB-only cameras (if you want RGB data) but adding lidar really helps.
The standard pipeline can infer a huge amount of data, and there are a few AI tools now for hallucinating missing geometry and backfaces based on context recognition, which can then be converted back into a splat for fast, smooth rendering.
user
19 days ago
darhodester
20 days ago
I think it's because they already had proven capture hardware, harvest, and processing workflows.
But yes, you can easily use iPhones for this now.
secretsatan
20 days ago
Looks great by the way, i was wondering if there’s a file format for volumetric video captures
darhodester
19 days ago
Some companies have a proprietary file format for compressed 4D Gaussian splatting. For example: https://www.gracia.ai and https://www.4dv.ai.
Check this project, for example: https://zju3dv.github.io/freetimegs/
Unfortunately, these formats are currently closed behind cloud processing so adoption is a rather low.
Before Gaussian splatting, textured mesh caches would be used for volumetric video (e.g. Alembic geometry).
itishappy
19 days ago
https://developer.apple.com/av-foundation/
https://developer.apple.com/documentation/spatial/
Edit: As I'm digging, this seems to be focused on stereoscopic video as opposed to actual point clouds. It appears applications like cinematic mode use a monocular depth map, and their lidar outputs raw point cloud data.
numpad0
19 days ago
A LIDAR point cloud from a single point of view is a mono-ocular depth map. Unless the LIDAR in question is like, using supernova level gamma rays or neutrino generators for the laser part to get density and albedo volumetric data for its whole distance range.
You just can't see the back of a thing by knowing the shape of the front side with current technologies.
itishappy
19 days ago
Right! My terminology may be imprecise here, but I believe there is still an important distinction:
The depth map stored for image processing is image metadata, meaning it calculates one depth per pixel from a single position in space. Note that it doesn't have the ability to measure that many depth values, so it measures what it can using LIDAR and focus information and estimates the rest.
On the other hand, a point cloud is not image data. It isn't necessarily taken from a single position, in theory the device could be moved around to capture addition angles, and the result is a sparse point cloud of depth measurements. Also, raw point cloud data doesn't necessarily come tagged with point metadata such as color.
I also note that these distinctions start to vanish when dealing with video or using more than one capture device.
numpad0
18 days ago
No, LIDAR data are necessarily taken from a single position. They are 3D, but literally single eyed. You can't tell from LIDAR data if you're looking at a half-cut apple or an intact one. This becomes obvious the moment you tried to rotate a LIDAR capture - it's just the skin. You need depth maps from all angles to reconstruct the complete skin.
So you have to have minimum two for front and back of a dancer. Actually, the seams are kind of dubious so let's say three 120 degrees apart. Well we need ones looking down as well as up for baggy clothing, so more like nine, 30 degrees apart vertically and 120 degrees horizontally, ...
and ^ this will go far down enough that installing few dozens of identical non-Apple cameras in a monstrous sci-fi cage starts making a lot more sense than an iPhone, for a video.
secretsatan
19 days ago
Recording pointclouds over time i guess i mean. I’m not going to pretend to understand video compression, but could it be possible to do the following movement aspect in 3d the same as 2d?
fastasucan
20 days ago
Why would they go for the cheapest option?
secretsatan
20 days ago
It was more the point that technology is much cheaper. The company i worked for had completely missed it while trying to develop in house solutions.
brcmthrowaway
20 days ago
Kinect Azure
user
20 days ago
dostick
20 days ago
Can such plugin be possible for Davinci Resolve, to have merge of scene captured from two iPhones with spatial data, into 3D scene? With M4 that shouldn’t be problem?
darhodester
20 days ago
Yes: https://irrealix.com/plugin/gaussian-splatting-davinci-resol...
(I'm not the author.)
You can train your own splats using Brush or OpenSplat
jeffgreco
20 days ago
Great work! I’d love to see a proper BTS or case study.
darhodester
19 days ago
I do believe a BTS is being developed.
tokymegz
19 days ago
Stay tuned
c-fe
19 days ago
Hi David, have you looked into alternatives to 3DGS like https://meshsplatting.github.io/ that promise better results and faster training?
darhodester
19 days ago
I have. Personally, I'm a big fan of hybrid representations like this. An underlying mesh helps with relighting, deformation, and effective editing operations (a mesh is a sparse node graph for an otherwise unstructured set of data).
However, surface-based constraints can prevent thin surfaces (hair/fur) from reconstructing as well as vanilla 3DGS. It might also inhibit certain reflections and transparency from being reconstructed as accurately.
moralestapia
20 days ago
Random question, since I see your username is green.
How did you find out this was posted here?
Also, great work!
darhodester
20 days ago
My friend and colleague shared a link with me. Pretty cool to see this trending here. I'm very passionate about Gaussian splatting and developing tools for creatives.
And thank you!
npkk2
19 days ago
I've been mesmerized by the visusals of Gaussian splatting for a while now, congratulations for your great work!
Do you have some benchmarks about what is the geometric precision of these reproductions?
darhodester
19 days ago
Thank you!
Geometric analysis for Gaussian splatting is a bit like comparing apples and oranges. Gaussian splats are not really discrete geometry, and their power lies in overlapping semi-transparent blobs. In other words, their benefit is as a radiance field and not as a surface representation.
However, assuming good camera alignment and real world scale enforced at the capture and alignment steps, the splats should match real world units quite closely (mm to cm accuracy). See: https://www.xgrids.com/intl?page=geomatics.
tamat
19 days ago
nice work.
I can see that relighting is still a work in progress, as the virtual spot lights tends to look flat and fake. I understand that you are just making brighter splats that fall inside the spotlight cone and darker the ones behind lots of splats.
Do you know if there are plans for gaussian splats to capture unlit albedo, roughness and metalness? So we can relight in a more realistic manner?
Also, environment radiosity doesnt seem to translate to the splats, am I right?
Thanks
darhodester
19 days ago
Thank you!
There are many ways to relight Gaussian splats. However, the highest quality results are currently coming from raytracing/path tracing render engines (such as Octane and VRay), with 2D diffusion models in second place. Relighting with GSOPs nodes does not yield as high quality, but can be baked into the model and exported elsewhere. This is the only approach that stores the relit information in the original splat scene.
That said, you are correct that in order to relight more accurately, we need material properties encoded in the splats as well. I believe this will come sooner than later with inverse rendering and material decomposition, or technology like Beeble Switchlight (https://beeble.ai). This data can ultimately be predicted from multiple views and trained into the splats.
"Also, environment radiosity doesnt seem to translate to the splats, am I right?"
Splats do not have their own radiosity in that sense, but if you have a virtual environment, its radiosity can be translated to the splats.
darhodester
19 days ago
This may interest you: https://www.linkedin.com/posts/radiancefields_in-case-you-we...
Syzygies
19 days ago
Back in 2001 I was the math consultant for "A Beautiful Mind". One spends a lot of time waiting on a film set. Eventually one wonders why.
The majority of wait time was the cinematographer lighting each scene. I imagined a workflow where secondary digital cameras captured 3D information, and all lighting took place in post production. Film productions hemorrhage money by the second; this would be a massive cost saving.
I described this idea to a venture capitalist friend, who concluded one already needed to be a player to pull this off. I mentioned this to an acquaintance at Pixar (a logical player) and they went silent.
Still, we don't shoot movies this way. Not there yet...
mmaaz
19 days ago
Really cool work!
huflungdung
19 days ago
[dead]
delaminator
19 days ago
[flagged]
chrisjj
19 days ago
[flagged]
dagmx
19 days ago
Is it possible you didn’t comprehend which parts were 3D?
Or if you did, perhaps a critique is better rather than just a low effort diss.
chrisjj
19 days ago
I viewed on a flat monitor, so perhaps I missed some 4D and 5D too.
/i
darhodester
19 days ago
That's hurtful.
user
19 days ago
GrowingSideways
19 days ago
Take the money and never admit to selling this shit. Why would you ever willingly associate your name with this?
darhodester
19 days ago
Read the room. Plenty of people are interested in the aesthetics and the technology.
GrowingSideways
19 days ago
Just because people want to give you money doesn't mean you toss your dignity out the window.