A bunch of questions regarding using HoloLense and object recognition

SNKaupe · January 2018

Greetings,

I'm working for my professor on a potential paper involving utilizing the HoloLens in an edge computing context. We have, however, a few questions we haven't been able to find an answer to so far.

Our work is based upon a HoloLense application developed by Carnegie Mellon. It uses object recognition to detect real-world objects and render holograms on top of those. We want to improve upon this process and share the virtual world between more than one device.

Which brings me to our questions:

How precise is spacial mapping? The article on spatial mapping makes reference to different levels of detail available for the mesh generated by the mapping process. So far, I've only roughly checked precision by simply examining the triangle mesh the HoloLens displays when performing the airt tap on nothing. Is there any official source or any experience with how big (or rather, small) an object has to be in order for the HoloLense to create a halfway decent mapping of it?
Sharing World Anchors. If I read the articles on coordinate systems correctly, using a WorldAnchor is the only good way of establishing a shared, virtual environment between two HoloLenses (without building a different system from scratch). Is that correct and, if so, do I furthermore read the article correctly when I assume that the WorldAnchor works by uploading part of the spatial mapping of the first HoloLens establishing the anchor, and every other HoloLens determining its position in space by matching said part to their own mapping?
Establishing a coordinate system. Still referring to the article on coordinate systems, it states that "Room-scale VR apps today typically establish this kind of absolute room-scale coordinate system with its origin on the floor." Is this true for the HoloLense, too? Is there a source that details on how the HoloLense constructs this initial coordinate system and gives information if it ever changes?
Prior works. Does anyone happen to know if somebody else has already performed hologram placement via object recognition? The solution for finding the object's position implemented by CMU is not optimal and we were thinking about trying different strategies, such as casting rays at the spatial mapping (hence the question about its level of precision).

Thank you for any potential help on those points!

dbarrett · January 2018

Hello @SNKaupe! I will try to answer your questions as best as I can based off of my experience.

If you haven't figured it out yet, Spatial Mapping is not very accurate and cannot be reliably used for applications like object recognition. I couldn't tell how small an object would be before it doesn't pick it up, but I will say that Spatial mapping has a really hard time picking small objects up. I would probably give it a ballpark of a minimum of an inch for it to pick it up with a high contrast background. You also will have trouble picking up black objects in general or in dark spaces as well as in bright sunlight. Since the HoloLens uses IR cameras to help with the Spatial Mapping.
As far as Sharing goes World Anchors are your best bet. However, if you have not developed a multiplayer application before make sure you get the concepts of that down first. Basically, how World Anchors work is that the first HoloLens has a point that it designates as the anchor and then uploads it to all the other HoloLens that joins the session. However, it does not change the other HoloLens' coordinate system it only tells the others where it is relative to there space. Therefore, when trying to move an object over the server, you have to make sure you basically make the anchor point your point that everything is relative to. Many people do not get that last point and struggle, but lucky for you I have answered many questions regarding this issue on here.
Okay, so HoloLens doesn't work like VR does where it puts the coordinate system on the floor. HoloLens puts the 0,0,0 point wherever the HoloLens starts the application at. The camera (aka the HoloLens) will also always start at the 0,0,0 point as well.
I am currently working on an application dealing with object and text recognition. The solution that I came up with was to integrate OpenCV with the HoloLens and use the methods that have been developed there. You will have to do some serious optimization with the HoloLens because it does take up a lot of memory. We basically took the video frames and made the FOV a little bit smaller so that it wasn't so hard on the HoloLens. Once you get the optimization up it is a matter of knowing more about OpenCV and the different things it can do. We developed an app that can pick up phrases and if it detects one we are looking for it can give the user more information about it. We can also use it to find where the text is located in space and place an anchor there. We also used it to detect the biggest shape, facial recognition, and even handwriting. As far as detecting objects you would need to create a neural network and teach your program what you're looking for if you are shooting for full blown object recognition.

There is also Vuforia but, my client did not want to pay the outrageous amount of money it costs to use it. However, since you are working on a school application it sounds like you might be able to use the free version.

I hope this answered all your questions!

dbarrett · January 2018

Alright, as far as Sharing goes there are two types of sharing. There is the Sharing service which involves more of having a dedicated server and then there is Sharing with UNET (Unity Networking). I myself haven't really messed with the Sharing Services side because most of the time I am dealing where there may not be internet or may not be able to carry around a server. Sharing with Unet is basically where one HoloLens becomes the host/server and then all the other HoloLens become clients of that device. They way sharing with unet happens is basically when the HoloLens first starts up the application it searches for potential IP addresses that may be hosts, if it doesn't find any then it becomes the host. When it becomes the host and someone connects to them it sends the anchor data to the client, the client then gets the anchor data and puts the anchor in relation to their Unity coordinate system.

If you aren't aware yet, there is a Spatial Map on the OS level that keeps track of basically where everything is and is constantly running. Inside of a Unity application you have an entirely different grid system. If that makes sense.

So when you send the client the anchor data you are sending the OS level location in space and applying it to your Unity application grid system. Then everything is a matter of making sure your objects derive their location from that shared point.

I hope that might have explained a bit more. There are examples of both types of applications on the Mixed Reality Toolkit.

https://github.com/Microsoft/MixedRealityToolkit-Unity/tree/master/Assets/HoloToolkit

Basically, the method we used we gave it a known text width and from there it used triangular calculations to calculate how far away it was in space. to accurately place it in the z since the method could get the x and y already. We are really just trying to push the HoloLens to the limits really and we couldn't go with the edge computing in our application because there was no guarantee that the client would have internet capability at all times.

dbarrett · January 2018

Hello @SNKaupe! I will try to answer your questions as best as I can based off of my experience.

If you haven't figured it out yet, Spatial Mapping is not very accurate and cannot be reliably used for applications like object recognition. I couldn't tell how small an object would be before it doesn't pick it up, but I will say that Spatial mapping has a really hard time picking small objects up. I would probably give it a ballpark of a minimum of an inch for it to pick it up with a high contrast background. You also will have trouble picking up black objects in general or in dark spaces as well as in bright sunlight. Since the HoloLens uses IR cameras to help with the Spatial Mapping.
As far as Sharing goes World Anchors are your best bet. However, if you have not developed a multiplayer application before make sure you get the concepts of that down first. Basically, how World Anchors work is that the first HoloLens has a point that it designates as the anchor and then uploads it to all the other HoloLens that joins the session. However, it does not change the other HoloLens' coordinate system it only tells the others where it is relative to there space. Therefore, when trying to move an object over the server, you have to make sure you basically make the anchor point your point that everything is relative to. Many people do not get that last point and struggle, but lucky for you I have answered many questions regarding this issue on here.
Okay, so HoloLens doesn't work like VR does where it puts the coordinate system on the floor. HoloLens puts the 0,0,0 point wherever the HoloLens starts the application at. The camera (aka the HoloLens) will also always start at the 0,0,0 point as well.
I am currently working on an application dealing with object and text recognition. The solution that I came up with was to integrate OpenCV with the HoloLens and use the methods that have been developed there. You will have to do some serious optimization with the HoloLens because it does take up a lot of memory. We basically took the video frames and made the FOV a little bit smaller so that it wasn't so hard on the HoloLens. Once you get the optimization up it is a matter of knowing more about OpenCV and the different things it can do. We developed an app that can pick up phrases and if it detects one we are looking for it can give the user more information about it. We can also use it to find where the text is located in space and place an anchor there. We also used it to detect the biggest shape, facial recognition, and even handwriting. As far as detecting objects you would need to create a neural network and teach your program what you're looking for if you are shooting for full blown object recognition.

There is also Vuforia but, my client did not want to pay the outrageous amount of money it costs to use it. However, since you are working on a school application it sounds like you might be able to use the free version.

I hope this answered all your questions!

SNKaupe · January 2018

Hi @dbarrett

Thanks for your comment, you've answered some of my previous questions. Two things still remain open, though:

Sharing. I've already looked up how to use World Anchors. What I'm looking for is a more specific explanation of how they work. Holograms 240 mentions that, if the anchor can't be downloaded to a different device within 30 - 60 seconds, you should "walk to where the original HoloLens was when setting the anchor to gather more environment clues." That sounds as if part of the spatial mapping of the HoloLense is uploaded to the sharing service, but I haven't found any actual source for that yet.
Prior/other works. That's an interesting thing you're doing there. We do not have to cope with trying to cram OpenCV onto the HoloLense--we're working with an application by Carnegie Mellon that is meant to demonstrate the advantages of edge computing, that is, having a cloud service in your immediate vicinity, where you do not suffer high latencies and can efficiently "outsource" demanding calculations to the edge system. No problem there, the recognition is running on the server. What we are now trying to do is to improve how the object's location is determined and/or how we can optimize ressource usage when several HoloLenses see the same objects. If you don't mind asking: How do you detect the distance to the text you recognized? The CMU app determines the x and y position from the position of the detected object in the captured frame and approximates a distance to the object by dividing a magic number by the square root of the area the objects has in the frame. This works quite well as long as you do not move your head, but once you change your perspective, you see that the hologram meant to be placed upon the object isn't actually placed on the object, but somewhere behind it. Looks okay from the original perspective, but isn't that great once you have a different one.

Again, thank you for your earlier help and I hope to hear once more form you (or anyone else who might have something to contribute).

dbarrett · January 2018

Alright, as far as Sharing goes there are two types of sharing. There is the Sharing service which involves more of having a dedicated server and then there is Sharing with UNET (Unity Networking). I myself haven't really messed with the Sharing Services side because most of the time I am dealing where there may not be internet or may not be able to carry around a server. Sharing with Unet is basically where one HoloLens becomes the host/server and then all the other HoloLens become clients of that device. They way sharing with unet happens is basically when the HoloLens first starts up the application it searches for potential IP addresses that may be hosts, if it doesn't find any then it becomes the host. When it becomes the host and someone connects to them it sends the anchor data to the client, the client then gets the anchor data and puts the anchor in relation to their Unity coordinate system.

If you aren't aware yet, there is a Spatial Map on the OS level that keeps track of basically where everything is and is constantly running. Inside of a Unity application you have an entirely different grid system. If that makes sense.

So when you send the client the anchor data you are sending the OS level location in space and applying it to your Unity application grid system. Then everything is a matter of making sure your objects derive their location from that shared point.

I hope that might have explained a bit more. There are examples of both types of applications on the Mixed Reality Toolkit.

https://github.com/Microsoft/MixedRealityToolkit-Unity/tree/master/Assets/HoloToolkit

Basically, the method we used we gave it a known text width and from there it used triangular calculations to calculate how far away it was in space. to accurately place it in the z since the method could get the x and y already. We are really just trying to push the HoloLens to the limits really and we couldn't go with the edge computing in our application because there was no guarantee that the client would have internet capability at all times.

A bunch of questions regarding using HoloLense and object recognition

Best Answers

Answers