Hello everyone.

The Mixed Reality Forums here are no longer being used or maintained.

There are a few other places we would like to direct you to for support, both from Microsoft and from the community.

The first way we want to connect with you is our mixed reality developer program, which you can sign up for at https://aka.ms/IWantMR.

For technical questions, please use Stack Overflow, and tag your questions using either hololens or windows-mixed-reality.

If you want to join in discussions, please do so in the HoloDevelopers Slack, which you can join by going to https://aka.ms/holodevelopers, or in our Microsoft Tech Communities forums at https://techcommunity.microsoft.com/t5/mixed-reality/ct-p/MicrosoftMixedReality.

And always feel free to hit us up on Twitter @MxdRealityDev.

Render 2000+ Gameobjects at the same time

Hi, all!

I am implementing a data visualization tool using 3D Scatter dots plot. I am using sphere to represent each dot in the chart because I need to dynamically change the radius and position of each dot.

However, I have 2000+ data dots in the chart and right now the frame rates are super low (around 15 - 20 FPS) when I look at the chart.

I have optimized shaders (vertexLit), got rid of all the colliders, had all the spheres using same material and tried all the optimization approach described in https://www.reddit.com/r/HoloLens/comments/5b959y/low_framerate_with_unity_builds/. But I still cannot boost performance to an acceptable level.

Are there any suggestions on optimization? Or is it that HoloLens simply cannot render 2000+ spheres at the same time.

Best Answers

Answers

  • What is your vert/polygon count in the scene? Perhaps consider using a low poly sphere if it is high.

    Are there lights in the scene? I would try to avoid those on the spheres.

    What is your draw count and batch count? Are you changing the material on the fly in any way (changing its color or applying the material at runtime)?

    Taqtile

  • WanzeWanze
    edited August 2017

    I am getting a sense that the number of triangles and number of batches does not affect my performance significantly. It is only a matter of the number of gameobjects in the scene.

    I tried using low poly spheres and getting a testing results like following:

    Using low poly spheres:

    Number of Spheres 2000
    Batch: 35
    Tris: 94k
    FPS: 20 - 30 FPS

    Number of spheres: 400
    Batch: 24
    Tris: 20.2k
    FPS: 45 - 55 FPS

    Using primitive unity spheres:

    Number of Spheres: 2000
    Batch: 1965
    Tris: 1.5 M
    FPS: 15 - 20 FPS

    Number of Spheres: 400
    Batch: 410
    Tris: 300.2k
    FPS: 30 FPS

  • Banjo makes a very good point about not having all of these individual game objects. For instancing to be useful this would also be necessary. Your tests above are reinforcing that point. Are the spheres doing anything in their update loop?

    ===
    This post provided as-is with no warranties and confers no rights. Using information provided is done at own risk.

    (Daddy, what does 'now formatting drive C:' mean?)

  • I am not updating each sphere in the update loop. But when user makes certain selection, a co-routine will be called and each dot will shift (lerp) to a different position and possibly change its own scale.

  • Peter_NZPeter_NZ ✭✭✭

    We did something similar with point cloud data (20,000 points). As a test we tried to create a GameObject for each - but it died real quick. But if we just drew each point using the Unity rendering it work fine with no issues at all.

  • WanzeWanze
    edited August 2017

    @Peter_NZ said:
    We did something similar with point cloud data (20,000 points). As a test we tried to create a GameObject for each - but it died real quick. But if we just drew each point using the Unity rendering it work fine with no issues at all.

    What do you mean by just drawing each point?

    Do you meant using Graphics.DrawMeshInstanced like Banjo proposed?

  • Hmm. I'm concerned you may have a coding bottleneck as well. Are you seeing any consistent overhead or spikes while using the deep profiler?

    With only 20 tris and 400 objects I wouldn't think you would be stuck at 30 fps unless there is additional overhead. These stats make me think something else is at play.

    Number of spheres: 400
    Batch: 24
    Tris: 20.2k
    FPS: 30 FPS

    Taqtile

  • Those numbers (the batch numbers in particular) make me think that only dynamic batching is being used and not instancing. Is instancing enabled on the material used by the spheres?

  • @mark_grossnickle said:
    Hmm. I'm concerned you may have a coding bottleneck as well. Are you seeing any consistent overhead or spikes while using the deep profiler?

    With only 20 tris and 400 objects I wouldn't think you would be stuck at 30 fps unless there is additional overhead. These stats make me think something else is at play.

    Number of spheres: 400
    Batch: 24
    Tris: 20.2k
    FPS: 30 FPS

    Thank you for pointing that out. It turns out that I did turn on some unnecessary components when testing on this one. The frame rates did go to 45 - 55 FPS under this stats. But it does not change the fact that it is still laggy when putting on 2000 gameobjects at the same time.

  • @thebanjomatic said:
    Those numbers (the batch numbers in particular) make me think that only dynamic batching is being used and not instancing. Is instancing enabled on the material used by the spheres?

    No, the instancing is not enabled on the material for spheres. I just tried setting enableInstancing to true for the material in start(). But it does not seem to help the performance in a significant way. (btw I am using the shader "HoloToolkit/VertexLit Configurable" if that's helpful)

  • Focusing on this one then:
    Number of Spheres 2000
    Batch: 35
    Tris: 94k
    FPS: 20 - 30 FPS

    2000 game objects is overhead but that alone wouldn't make you go below 30fps. The tris are on the high side but the batches are fine.

    Any lights in the scene? turn those off or at least disable lighting on these spheres.

    Is that coroutine you mentioned being created on each sphere? If so that could be the issue. A master class that contained a list of all of the spheres and ran through them would perform better.

    Taqtile

  • thebanjomaticthebanjomatic ✭✭✭
    edited August 2017

    I looked into this a little bit last week, but I haven't had the chance to deploy to the device and do actual performance testing. The main thing that I tried was having a single GameObject that calls DrawMeshInstanced 2 times (1000 instances each batch) as part of LateUpdate. My original code built up the matrices using something like this:

    for (int i = 0; i < dataPoints.Count; i++)
    {
       matricesToDraw[i] = ... // A few matrix multiplications to combine the parent game object's transform with the individual dataPoint's data.
    }
    

    When I was updating all 2000 matrices every frame, and even on my desktop, it was bringing things down to almost 60fps and about 98% of the frame time was spent doing this when I looked at it in the profiler.

    I then changed things around to calculate these values only once and only perform the two rendering calls each frame and performance shot way back up to a several hundred FPS on my desktop.

    There are a lot of inefficiencies in my naive implementation, for example, if I used transform.localToWorldMatrix instead of doing TRS manually, it would save 32000 matrix multiplications. Additionally, if I calculated the individual datapoint's TS matrix ahead of time, that would save 1*2000 more. In theory just doing that would make the code 5 times faster. Additionally, some simple checks can be done to only update the transforms or values have changed, etc.

    [Update]
    @Wanze
    Having made the above changes (reducing things to one matrix multiplication per data point) it is back in the acceptable range for performance. Its still slower than not re-calculating (300fps vs 900 fps on my pc) but that is to be expected.

    I was able to deploy to the HoloLens this morning and test, and performance-wise, it still sucked. With the default sphere mesh, I was getting about 15fps still, however, using cubes it was hitting a steady 60fps. I think at that point you are running into vertex processing bottlenecks as the sphere mesh has 64x more triangles than the cube, but the pixel fill rate should actually be smaller for the spheres.

    You might want to re-evaluate whether or not you need actual spheres, or if you could get by with something less triangle dense or even using textured quads (at which point a particle system might be a good choice).

    Let me know if you'd like me to upload my test project or the script I used. I had to modify the shader to enable and support instancing also.

  • @thebanjomatic said:
    I looked into this a little bit last week, but I haven't had the chance to deploy to the device and do actual performance testing. The main thing that I tried was having a single GameObject that calls DrawMeshInstanced 2 times (1000 instances each batch) as part of LateUpdate. My original code built up the matrices using something like this:

    for (int i = 0; i < dataPoints.Count; i++)
    {
       matricesToDraw[i] = ... // A few matrix multiplications to combine the parent game object's transform with the individual dataPoint's data.
    }
    

    When I was updating all 2000 matrices every frame, and even on my desktop, it was bringing things down to almost 60fps and about 98% of the frame time was spent doing this when I looked at it in the profiler.

    I then changed things around to calculate these values only once and only perform the two rendering calls each frame and performance shot way back up to a several hundred FPS on my desktop.

    There are a lot of inefficiencies in my naive implementation, for example, if I used transform.localToWorldMatrix instead of doing TRS manually, it would save 32000 matrix multiplications. Additionally, if I calculated the individual datapoint's TS matrix ahead of time, that would save 1*2000 more. In theory just doing that would make the code 5 times faster. Additionally, some simple checks can be done to only update the transforms or values have changed, etc.

    [Update]
    @Wanze
    Having made the above changes (reducing things to one matrix multiplication per data point) it is back in the acceptable range for performance. Its still slower than not re-calculating (300fps vs 900 fps on my pc) but that is to be expected.

    I was able to deploy to the HoloLens this morning and test, and performance-wise, it still sucked. With the default sphere mesh, I was getting about 15fps still, however, using cubes it was hitting a steady 60fps. I think at that point you are running into vertex processing bottlenecks as the sphere mesh has 64x more triangles than the cube, but the pixel fill rate should actually be smaller for the spheres.

    You might want to re-evaluate whether or not you need actual spheres, or if you could get by with something less triangle dense or even using textured quads (at which point a particle system might be a good choice).

    Let me know if you'd like me to upload my test project or the script I used. I had to modify the shader to enable and support instancing also.

    Thanks so much for your efforts! There are lots of good insights here. And you are right that I don't need actual spheres. I tried low poly objects with less triangles and reduced the number of gameobjects in the view HoloLens need to render. The performance is still not stable and occasionally drops to 30ish. I will continue developing my main functionalities and leave further optimization a topic for later.

Sign In or Register to comment.