Terrible render performance due to Spatial Understanding meshes (?)

trzy · May 2017

I'm seeing terrible (30-40 FPS) and inconsistent frame rates when using spatial understanding (I think this the problem but I'm still trying to fully understand why). I've done some profiling and the number of triangles is not horrible in the worst case (around 60K). The number of draw calls can be as high as 60 given the number of small meshes being produced, however.

Has anyone else experienced this or found a way to mitigate it?

I use spatial understanding for occlusion -- so I set the occlusion material once the room scan is finalized and shut off spatial mapping updates. And yes, quality settings are set to Fastest.

Thank you!

EDIT: I've tried a couple of things and with very little to show for it.

Enabled static batching of spatial understanding meshes. This maybe helped a tiny bit but not much. Profiler indicates it is is indeed working, too.
Stereo single pass rendering enabled.

I also tried profiling and I don't see anything weird. Most of the CPU time is spent waiting for the GPU and the GPU profiler doesn't indicate anything out of the ordinary. Look at how light this load is! What could possibly be happening here?

mark_grossnickle · May 2017

Maybe check your camera clipping? Set the max plane to about 5 (meters) or less... unless you really need more. It may be trying to map too large of an area.

I'd also check your shaders. Don't use standard or standard fast. Try to use the Holotoolkit or mobile ones.

Is that profiler running on your PC or running connected to the device? If its PC that performance doesn't seem great. Especially those spikes. If you click on a spike can you drill down and figure out what is causing those?

trzy · May 2017

Thanks, Mark, for the response!

I'm using only two shaders: HoloToolkit/VertexLitConfigurable and HoloToolkit/Occlusion

The profiler is running on the PC but connected to the app running on the physical device (app built in Release mode).

I would like to have the entire room mapped for occlusion. True, I could do some fancy things to eliminate portions of the spatial mesh that will never occlude anything (floor, ceiling, etc.) if I want to get clever.

But, everything I've read indicates that Unity apps should be able to push 100k tris per frame at 60Hz. I can't get close to that performance.

I'll try to drill down into those spikes when I get home tonight. I was having some difficulty with the UI (and trying to enable the frame debugger crashes the app on the HoloLens).

If you're willing to take a look at the project it's at: http://trzy.org/tmp/Game-Crane2.zip

When I build and deploy, I change "Windows.Universal" to "Windows.Holographic" in the Visual Studio project's Package.appxmanifest file (not sure if that's still needed these days, though).

You map the room first and then air tap. After a few seconds the spatial meshes will be invisible and a couple of cubes will be placed. In the editor, you can use wasd+arrow keys to move and look around, enter to air tap (I have a scan of my room loaded up in the editor).

mark_grossnickle · May 2017

I think running Frame Debugger while running the app in Unity's game window should be sufficient to determine draw calls. I don't see why it would be different on the device than the editor.

I think you can set Holographic to be enabled in the player setting so you shouldn't need to manually change the Package but that sounds harmless in any case.

Do you get a steady 60fps when spatial mapping is disabled?

trzy · May 2017

I have used the Frame debugger in the editor, but it doesn't reveal anything interesting that I can see.

I did run a couple times with spatial meshes disabled. It appeared to be running at 60fps with ever so slight hiccups. I'll double check tonight but I almost expect to find that when simple objects enter the frustum, there will be noticeable hiccups. Which is quite odd.

My understanding is that RoboRaid uses spatial meshes for occlusion (although it clearly does not use spatial understanding) and was written in Unity, so
I feel what I'm doing should easily be achievable. Would prefer not to move to C++/Direct3D but there appears to be confirmation that much higher performance can be achieved if rendering is done thoughtfully (hundreds of thousands of triangles per frame).

mark_grossnickle · May 2017

There are two things I use the frame debugger for... 1. Looking to see how those 60 draw calls are broken down and attempting to get that lower through batching. Example, I may notice that multiple 2d sprites are not being batched and then go make sure they are using the same tag.
2. Preventing overdraw. If we have a large background that is mostly covered up by something in the foreground then make sure via sorting layers that the foreground is drawn first so that area doesn't get redrawn.

Speaking of overdraw, is there much transparency in the scene?

I should note that we aren't always hitting 60fps either. Which isn't ideal but as long as the fps is steady it seems passable. Not that I am recommending that approach... just would consider how important it is to your experience before recreating anything in Direct3D.

trzy · May 2017

I think I have pretty decent batching -- most of what I'm drawing are spatial meshes. Spatial understanding produces a large number of meshes (~70?) because each patch is very small. The upside is better culling.
These can be statically batched.

There are no transparencies at all. Vertex Lit Configurable (opaque) and Occlusion are the only shaders.

trzy · May 2017

Further profiling information attached. Nothing interesting during these performance spikes. It's just stuck waiting for the GPU to complete, evidently.

I can confirm that not rendering the spatial meshes gets me back to 60FPS.

Right click these and open in a new tab to view them full sized.

mark_grossnickle · May 2017

Looks like a GPU issue then:

https://forum.unity3d.com/threads/major-vr-performance-issue-oculuswaitforgpu-running-on-cpu.328442/

I realize that is oculus but it still applies.

Your rendering screenshot shows 97k verts. I realize 100k is listed as the max but we start to hit performance issues much before then. Try to get that down and see if it helps.

trzy · May 2017

Yeah, I guess so. Very strange. By the way, I thought it was 100K triangles that was the limit. You've seen problems at 100K verts? That's extremely low.

Looks like occlusion will have to be achieved more thoughtfully. I guess I'll have to determine which spatial mesh objects are between the camera and the game objects and render only those. Many spatial meshes can safely be discarded because there's no way anything will end up outside of them.

mark_grossnickle · May 2017

Yeah, 100k verts unfortunately. Perhaps you could get more if you went native but I can't speak to that.

Jarrod1937 · May 2017

I'll be doing testing myself. I also plan on using the spatial understanding for both a collision mesh as well as a good occlusion mesh. However, for speed purposes, I've considered generating the spatial understanding collision mesh, hiding it's rendering, and using the lower tri count spatial mesh for occlusion, should be faster, just may not match up.

Terrible render performance due to Spatial Understanding meshes (?)

Answers