Its using Visual descriptors to generate a pointcloud. Buildings and text are really great for creating descriptors, so when they change you loose key points for "localizing"(ie getting your position). This needs to be updated as those buildings change.
You also need a day/night dataset (although some newer descriptors are day/night resistant)
I had assumed this was referring to street level flying drones.
Like you see in drone races. But with a little bomb attached.
Even with half life. That could be years. Depending on changes. Old neighborhoods probably haven't changed.. And, not sure I read that deal had a cutoff, or not. Could they continue getting updates. ?
there was a startup that pitched the idea of using Satellite data to do ground based navigation. (https://sturfee.com/vps) they didn't get bought out by either google, niantic or facebook, so it can't of worked that well.
Niantic's stuff is a pre-built map that the client will reference to get a position. Its essentially a massive feature matching exercise. The problem with using airborn photos is that you miss a bunch of features you can't see. (samy thing trying to match ground features from the air.)
THe lens calibration issue isn't actually that much of a problem _for the client_. if you have a rough idea of the lens (exif data really helps there) then you can still get meter accurate (and a few degrees heading) its a bit more of a problem for generating the initial map, but Structure from motion with good motion priors goes a long way to make it less of a problem
Now, Niantic are proposing that you can train a model that can relocalize generally without a detailed map, I think thats a bit far fetch, especially to do at any large scale. (ie bigger than a cubic kilometer)
No, because they are different things for different purposes.
Visual navigation is prone to degradation. Keeping the "map" updated requires constant visits. (I know because my team worked on the patent for a method for updating said maps.)
Also Pacification bot would be run by the military who most lilkey have GPS.
Finally, For ground based bots, SLAM is actually more useful, rather than pre-built map based navigation.
1) VPS is not new, the startup I worked at had a working public system in 2018.
2) The hard part about VPSs is not actually the navigation, its generating and querying the map.
How does the VPS work?
You build a point cloud of features (for us we paid people to go and record videos in cities, Tesla/Waymo/toyata/google drove cars niantic got it's players to take videos/pictures)
Align that point cloud to the 3d world, store it in a way that can be queried quickly (doing that quickly and at scale is still an area of research)
Then your client needs to extract the keypoints from an image and perform triangulation against the map to see where the camera was taken (There are calibration issues, but we ain't got time for that)
Now.
Niantic, from what I can see (and its been a while) has a database of key landmarks, but not of the areas inbetween. For decent navigation I would say that this is a massive problem.
I know niantic are pushing the whole "spatial world model" but frankly I don't think that scales. They stuff they have released is memorybound in vGPUs which isn't that useful for realtime querying.
I strongly suspect that actually they have a different system, much more traditional along the lines of colmap, or hloc, or something with a feedforward model in it.
However for the drone usercase, what you actually want is SLAM, which is a very different problem. for SLAM you need to build the map whilst your are moving, and then try and do loop closure or some other method to stop drift. Once you've gone there and back you can use that model for relocaliosation.
I'm not sure they need the in between areas, so long as the landmarks are inclusive of similar features. In fact, they probably only need a high quality 3D scans of primitive features to perform classification (walls, building, intersections, etc). I haven't played recently, so I'm unsure how distinct each landmark is.
> I'm not sure they need the in between areas, so long as the landmarks are inclusive of similar features.
That gets you good navigation around landmarks, but when you go further away, you get less usable feature points, as they are closer together you get more position error/need higher resolution cameras.
The intermediate places gives you the precise consistent navigation.
And changes happen at pretty much all levels of scale. Even once you get well past startup size the times of structure and processes required for a 10,000 or 20,000 person company is much different from a 1,000 or 2,000 person company.
Only a few people can adequately explain themselves through slack.
It doesn't help that a lot of managers are _bad_ managers, and don't/can't/don't know how to run a tight 1:1.
the point of the 1:1 is to provide a high bandwidth way of getting worries and steers from employees to management and direction back to employees. if there is nothing to talk about then cut the meeting short.
You also need a day/night dataset (although some newer descriptors are day/night resistant)
reply