The Mixed Reality Forums here are no longer being used or maintained.
There are a few other places we would like to direct you to for support, both from Microsoft and from the community.
The first way we want to connect with you is our mixed reality developer program, which you can sign up for at https://aka.ms/IWantMR.
For technical questions, please use Stack Overflow, and tag your questions using either hololens or windows-mixed-reality.
If you want to join in discussions, please do so in the HoloDevelopers Slack, which you can join by going to https://aka.ms/holodevelopers, or in our Microsoft Tech Communities forums at https://techcommunity.microsoft.com/t5/mixed-reality/ct-p/MicrosoftMixedReality.
And always feel free to hit us up on Twitter @MxdRealityDev.
Render 2000+ Gameobjects at the same time
Hi, all!
I am implementing a data visualization tool using 3D Scatter dots plot. I am using sphere to represent each dot in the chart because I need to dynamically change the radius and position of each dot.
However, I have 2000+ data dots in the chart and right now the frame rates are super low (around 15 - 20 FPS) when I look at the chart.
I have optimized shaders (vertexLit), got rid of all the colliders, had all the spheres using same material and tried all the optimization approach described in https://www.reddit.com/r/HoloLens/comments/5b959y/low_framerate_with_unity_builds/. But I still cannot boost performance to an acceptable level.
Are there any suggestions on optimization? Or is it that HoloLens simply cannot render 2000+ spheres at the same time.
Best Answers
-
OptionsPatrick mod
The default sphere in unity has 768 triangles... times 2000 that's a lot of rendering. You might be able to do something with instancing. I've not done this with unity, TBH, but I have in native DX for rendering the Kinect depth stream as 200k+ cubes. Instancing made a huge difference.
Another approach would be to see if you can solve your problem with sprites or a particle system. By default particles have 2 triangles.
===
This post provided as-is with no warranties and confers no rights. Using information provided is done at own risk.(Daddy, what does 'now formatting drive C:' mean?)
5 -
Optionsthebanjomatic ✭✭✭
You absolutely wouldn't want to do some form of instancing, while just enabling instancing on the shader you are using would likely help, you probably would want to avoid the overhead of having 2000+ game objects also. Instead, you could have a single game object that uses DrawMeshInstanced to and an array of matrix transforms (scale and translation can be represented in those).
Depending on the size of the points you are plotting (most likely small) you might want to consider a lower-poly mesh as well.
Using a particle system might also help as Patrick suggests, but you might run into issues with fill-rate if you are relying on alpha blending or additive sampling.
6
Answers
What is your vert/polygon count in the scene? Perhaps consider using a low poly sphere if it is high.
Are there lights in the scene? I would try to avoid those on the spheres.
What is your draw count and batch count? Are you changing the material on the fly in any way (changing its color or applying the material at runtime)?
Taqtile
The default sphere in unity has 768 triangles... times 2000 that's a lot of rendering. You might be able to do something with instancing. I've not done this with unity, TBH, but I have in native DX for rendering the Kinect depth stream as 200k+ cubes. Instancing made a huge difference.
Another approach would be to see if you can solve your problem with sprites or a particle system. By default particles have 2 triangles.
===
This post provided as-is with no warranties and confers no rights. Using information provided is done at own risk.
(Daddy, what does 'now formatting drive C:' mean?)
You absolutely wouldn't want to do some form of instancing, while just enabling instancing on the shader you are using would likely help, you probably would want to avoid the overhead of having 2000+ game objects also. Instead, you could have a single game object that uses DrawMeshInstanced to and an array of matrix transforms (scale and translation can be represented in those).
Depending on the size of the points you are plotting (most likely small) you might want to consider a lower-poly mesh as well.
Using a particle system might also help as Patrick suggests, but you might run into issues with fill-rate if you are relying on alpha blending or additive sampling.
I am getting a sense that the number of triangles and number of batches does not affect my performance significantly. It is only a matter of the number of gameobjects in the scene.
I tried using low poly spheres and getting a testing results like following:
Using low poly spheres:
Number of Spheres 2000
Batch: 35
Tris: 94k
FPS: 20 - 30 FPS
Number of spheres: 400
Batch: 24
Tris: 20.2k
FPS: 45 - 55 FPS
Using primitive unity spheres:
Number of Spheres: 2000
Batch: 1965
Tris: 1.5 M
FPS: 15 - 20 FPS
Number of Spheres: 400
Batch: 410
Tris: 300.2k
FPS: 30 FPS
Banjo makes a very good point about not having all of these individual game objects. For instancing to be useful this would also be necessary. Your tests above are reinforcing that point. Are the spheres doing anything in their update loop?
===
This post provided as-is with no warranties and confers no rights. Using information provided is done at own risk.
(Daddy, what does 'now formatting drive C:' mean?)
I am not updating each sphere in the update loop. But when user makes certain selection, a co-routine will be called and each dot will shift (lerp) to a different position and possibly change its own scale.
We did something similar with point cloud data (20,000 points). As a test we tried to create a GameObject for each - but it died real quick. But if we just drew each point using the Unity rendering it work fine with no issues at all.
What do you mean by just drawing each point?
Do you meant using Graphics.DrawMeshInstanced like Banjo proposed?
Hmm. I'm concerned you may have a coding bottleneck as well. Are you seeing any consistent overhead or spikes while using the deep profiler?
With only 20 tris and 400 objects I wouldn't think you would be stuck at 30 fps unless there is additional overhead. These stats make me think something else is at play.
Number of spheres: 400
Batch: 24
Tris: 20.2k
FPS: 30 FPS
Taqtile
Those numbers (the batch numbers in particular) make me think that only dynamic batching is being used and not instancing. Is instancing enabled on the material used by the spheres?
Thank you for pointing that out. It turns out that I did turn on some unnecessary components when testing on this one. The frame rates did go to 45 - 55 FPS under this stats. But it does not change the fact that it is still laggy when putting on 2000 gameobjects at the same time.
No, the instancing is not enabled on the material for spheres. I just tried setting enableInstancing to true for the material in start(). But it does not seem to help the performance in a significant way. (btw I am using the shader "HoloToolkit/VertexLit Configurable" if that's helpful)
Focusing on this one then:
Number of Spheres 2000
Batch: 35
Tris: 94k
FPS: 20 - 30 FPS
2000 game objects is overhead but that alone wouldn't make you go below 30fps. The tris are on the high side but the batches are fine.
Any lights in the scene? turn those off or at least disable lighting on these spheres.
Is that coroutine you mentioned being created on each sphere? If so that could be the issue. A master class that contained a list of all of the spheres and ran through them would perform better.
Taqtile
I looked into this a little bit last week, but I haven't had the chance to deploy to the device and do actual performance testing. The main thing that I tried was having a single GameObject that calls DrawMeshInstanced 2 times (1000 instances each batch) as part of LateUpdate. My original code built up the matrices using something like this:
When I was updating all 2000 matrices every frame, and even on my desktop, it was bringing things down to almost 60fps and about 98% of the frame time was spent doing this when I looked at it in the profiler.
I then changed things around to calculate these values only once and only perform the two rendering calls each frame and performance shot way back up to a several hundred FPS on my desktop.
There are a lot of inefficiencies in my naive implementation, for example, if I used
transform.localToWorldMatrix
instead of doing TRS manually, it would save 32000 matrix multiplications. Additionally, if I calculated the individual datapoint's TS matrix ahead of time, that would save 1*2000 more. In theory just doing that would make the code 5 times faster. Additionally, some simple checks can be done to only update the transforms or values have changed, etc.[Update]
@Wanze
Having made the above changes (reducing things to one matrix multiplication per data point) it is back in the acceptable range for performance. Its still slower than not re-calculating (300fps vs 900 fps on my pc) but that is to be expected.
I was able to deploy to the HoloLens this morning and test, and performance-wise, it still sucked. With the default sphere mesh, I was getting about 15fps still, however, using cubes it was hitting a steady 60fps. I think at that point you are running into vertex processing bottlenecks as the sphere mesh has 64x more triangles than the cube, but the pixel fill rate should actually be smaller for the spheres.
You might want to re-evaluate whether or not you need actual spheres, or if you could get by with something less triangle dense or even using textured quads (at which point a particle system might be a good choice).
Let me know if you'd like me to upload my test project or the script I used. I had to modify the shader to enable and support instancing also.
Thanks so much for your efforts! There are lots of good insights here. And you are right that I don't need actual spheres. I tried low poly objects with less triangles and reduced the number of gameobjects in the view HoloLens need to render. The performance is still not stable and occasionally drops to 30ish. I will continue developing my main functionalities and leave further optimization a topic for later.