Customers in the autonomous vehicle, defense, delivery, and logistics robotics spaces all work with point cloud data and classifier models on a regular basis. Learn from Elliot Branson, Scale’s Director of Engineering, and John Hurliman, Maker at Foxglove, how best to work with point clouds, and best practices for designing good UX around tooling for point clouds. 3D shouldn’t be intimidating!
Summary: Foxglove Robotics’ John Hurliman shares tips about working with robot-generated 3-D data, the improvements he’s seen recently, and advice for how to set up a streaming infrastructure for that data. What you’ll learn from reading this article: Tips for setting up a streaming architecture for 3-D data Common problems when scaling robotics use cases Where it’s most productive to spend your time when working with robotics
By Johanna Ambrosio
Generating, streaming, and manipulating robot-generated 3-D data have all made strides in recent years, and those tasks continue to become easier. But challenges remain, including handling data from multiple robots and some of the “classic traditional problems” such as dealing with humans moving around.
So says John Hurliman (https://www.linkedin.com/in/jhurliman/), a Maker at Foxglove Robotics, who is focused on improving the developer experience for robotics companies. He’s led software engineering teams in both corporate and open-source roles for over two decades.
John recently sat down with Elliot Branson (https://www.linkedin.com/in/elliot-branson/), Director of Machine Learning and Engineering at Scale AI, to discuss trends in robot-generated 3-D data, tips for most effectively working with this data, and what some of Foxglove’s most innovative customers are doing.
Working with 3D Data Requires Understanding Its Full Context
Regardless of how complex the data streams are, you really need to see the data and “understand it in its original context,” John says. Otherwise it’s going to be very difficult to troubleshoot even basic things, no less perform more challenging tasks. If you’re doing sensor fusion, “how do you get your images and your point clouds, just the very first step, displaying at the same time and getting the time sequencing right?”
His suggestion: Get all of this unified into a single 3-D scene, so you can understand if your alignment is off or your transformers are wrong and you need to go back to calibration. Or perhaps you’re seeing far fewer points than you expected, and you need to discern whether this is a data problem or a sensor problem. You need to “just get it into its natural state, which for 3-D data is going to be a 3-D rendering most of the time,” John explains.
The process is similar to analyzing text-based data, John says, which he did while in the financial services industry. “You need to see your data. You need to get in and understand that we can write unit tests all day. We can check the tokenizer does this thing in this edge case just right.” But, regardless, you need to open up the text in its “natural context,” and “read it, process it, transform it, and read it again.”
It's much the same for images. “We're troubleshooting cameras, and you go, "Oh, the camera's all blown out." Okay, we create a statistic to measure that. We've catch it. And you're working, working, working, and eventually you're like, ‘Wait, why isn't he detecting anything right? Oh, the camera's flipped upside down.’ ”
Dealing With Today’s Tooling Gap
Although the requirement to understand the full context may be similar, the tools are not. Unlike other forms of data—say, a spreadsheet or word processing documents—most computers aren’t equipped with a standard set of tools to view and process 3-D images.
“You get a dump or recording of files from your LiDAR and the point clouds that you've collected from your flashlight or from various systems and you go, "Okay, well this one produced this file format. This one is just a raw packet capture,” John explains. “What do I do? Do I somehow figure out how to bring it into Unity or a game engine? Do I write my own viewer? What's the next step?"
Solving that tooling gap “is going to help push robotics forward a lot” by giving people even basic ways of visualizing 3-D images in context. Another must-have for tools, John says, is an operator dashboard that sees what the robot does in real time, or allows for offline debugging in another platform. A diagnostic platform must provide both simple gauges and the ability to plot data.
“We deal with customers from underwater AVs to mining equipment to drones to everything else, and trying to build one perfect user interface for everything is a challenge,” John explains. “And so we give people a handful of the starter tools that are really common” and provide an API so they can build any custom tools they might need.
Reducing the Iteration Loop
In the meantime, one way of making 3-D information easier and faster to work with, John says, is reducing the iteration loop. “I want to be able to scrub back and forth. I want it to play at the frame rate. I want live streaming. It's just getting that data in quickly and being able to use it at my fingertips without waiting on the computer, without waiting on things to complete.”
The first step is getting to a holistic or end-to-end of a concept or prototype of whatever you're trying to do, as early as possible. Don’t spend too much time trying to get your algorithm, your CI pipeline, or your experimental techniques to work perfectly, he advises.
At the end of the day, you might need to use a different algorithm or a simpler approach, and you will only have wasted time. Instead, he says, you need to find out what your actual problems are and where your actual slowdowns are. That’s going to give you more actionable information.
On a complex project like building robotics, “figuring out where to spend your time is half the battle,” as is knowing where to start and knowing what went wrong in the field. “Looking at the whole problem holistically and often visually is going to be the most helpful thing, I've found.”
Scale AI’s Elliot agrees. “Get back to basics,” he says. “If your tool's able to make it very easy for you to just open up something and see it and play with it and interact with it, that's the most crucial thing.”
Static Scene Visualization vs. Streaming Requirements
There are several ways to view high-definition data. One might be static scene visualization, such as a high-definition map of a city of which you want to do a simulated fly-through, another might be snapshot data, and yet a third might be real-time streaming, where you’re seeing data exactly as the robot is. The point cloud stream might be coming in at 30 frames per second, or even 60. This might allow you to really understand how the sensor works by getting live visual feedback.
There is “a lot of overlap” with the types of tools, techniques, and technology required to do all of these, John explains. ”It comes back to the same principles,” and much of the same efficiencies required for tasks like rendering. “That’s always going to be a fundamental.”
The divergence is in how much pre-processing time is required for static visualization versus real-time streaming. Different approaches might be needed to build an elaborate bonding volume hierarchy, for instance, or different offline rendering methods might be called for. Another factor is how much time is required to do something with or to the data and get it to the next stop on its journey, whether that means another application or onto the screen for a user to view.
Tips For Setting Up Your Streaming Infrastructure
Your configuration will, of course, depend on the scale of what you need to build, and you don’t need to do everything at once, John advises. But your first step should be to get all your data into a container or a data warehouse, he suggests, where you can index it. This can involve tagging or a folder organization convention, but you need some way of making sense of your data and making it easy to find what you need.
The next step is to start splitting up the data, via selective streaming. If you don't need the whole 50 gigabyte-per-minute stream to visualize just a couple of things, set it up to stream only what you need to see on your monitor.
After that, you need to accomodate what you need for specific use cases. “There's a big difference between, ‘I need to know the exact color of this point because I'm troubleshooting the sensor data,’ versus, ‘I just want to get a gist of what this thing was doing because I'm quickly scrubbing through a lot of data,’ “ he says.
You can stream point clouds by using a variety of compression or decimation techniques. Video images have their own compression methods. And you can take advantage of those once you’ve split apart all the streams and semantically understand what's in them.
Finally, he says, there’s a need to create front-end interface that makes it clear whether users are seeing summarized data, or possibly degraded data, or raw sensor data. “We need to inform” users ‘Hey, that exact pixel you click on, warning that might not be accurate.’ “ People also need the ability to drill down to see what they need.
The Importance of Browsers and GPUs
One of the interesting developments over the last few years in front-end development has been how powerful browsers have gotten. “It has gone from having these really good” HTML readers “to these pretty powerful sandbox environments where you can access a lot of system APIs,” John says. Consequently, access to the GPU and networking are getting more powerful. You no longer need to choose between building a powerful application or building a complex, interactive one, he says.
That said, it is important to have “some level” of GPU. Even a very inexpensive laptop can give you a reasonably smart machine that can render graphics fairly quickly. “It's easy to understand the data wherever you are,” Scale AI’s Elliot says. “You can debug something while you're traveling. You can just jump in a car and stream the data.”
But If you're doing more R&D, and you want to start training models or do more advanced processing of graphical data," that's where you're seeing the value of the higher-end GPUs going out in point clouds,” John says.
Some Ups and Downs of Working With Robots
Robotics is a complex field, but there are some areas where it is becoming easier, John says. If your robots and sensors are indoors, you’re not dealing with weather issues, and “you can make the sensor fusion problem a lot more tractable when you can control the lighting to some degree,” John says. In a factory environment, you can eliminate a lot of uncertainty about training issues by keeping the floor uncluttered.
But still, at the end of the day, you're dealing with “uncertain worlds,” he says. “Things move around on floors. Humans are constantly moving around. The unexpected will still happen. And so, a lot of the classic traditional problems in robots with dealing with the long tail of reality are still there even if you bring the thing indoors.”
One of the more fundamental challenges in robotics, he says, is the lack of transferability of the knowledge that is embedded into the robots, whether it's in the form of an ML model or a “really great controls algorithm.” Taking that knowledge and then pulling it into something with a slightly different sensor configuration or a slightly different operating environment “usually goes terribly wrong.”
Those projects “are always underestimated,” and he’s seen the issue first-hand in some agricultural settings. Say the farmer wants to move from one crop to another or from one plot of land to the adjacent plot. “Now I need to go collect all new data because the soil color's a little different. The sun sets in a slightly different position,” John explains. Even these seemingly minor things “can send you back to the drawing board sometimes.”
Of course, as the size and configuration of a team shifts or grows, so do data and other needs. In general, robotics groups evolve from doing live troubleshooting to a need for logging and log replay, and maybe some simulation and other offline techniques.
Then, usually, comes change control. At this point, “progress is not linear anymore,” John explains. “It's getting better, and then it got much worse, and then it gets better. And how do we measure performance over time? How do we objectively measure each commit and whether this made the code base better or worse? And these problems continue to change shape as you get to each magnitude and size.”
And once you start getting a data or a problem statement that involves different versions of robots, some of the data is being captured actual and some of the data is synthetic, and that changes things also.
A lot of people store data without the schemas of that data embedded in. Decoding that data then requires going to a file system or going back to some git commit and pulling out the definitions of what data you're storing and then decoding it.
This may work in early stages, John says. “But that really falls apart when you have three different branches of your robot being worked on in parallel, and they have different messages flowing around, or you have log files from six months ago that you need to go back and revisit, and you're going, "Wait, we've completely changed our architecture in the planning system. How do I play this data now?"
Code evolution, code maturity, code rot are all common problems as systems grow, John says.
Building a New Video Container Format For Robotics
“To unlock a lot of what we've done on the back end, we actually had to take a bit of a detour and first look at the file formats of storage recording,” John says. The Robot Operating System (ROS) bag format already exists for recording, playing back, and manipulating images, and that works until “you leave the ROS world or you go to the newer ROS 2,” John explains. (https://www.science.org/doi/10.1126/scirobotics.abm6074 - about ROS2) When that happens, there’s a “big gap in what is the correct way to record my data. We saw everything from people just concatenating files together to inventing their own file formats.”
To answer the need, Foxglove developed a format called MCAP (https://foxglove.dev/blog/introducing-the-mcap-file-format), a way to multiplex time-based streams of data into a container with different serializations. The format accommodates message-passing systems used in robotics, parallel streams coming from different sensors, and the requirement to use different levels of your stack to process data to create final outputs.
One of the top priorities for MCAP was that the recording needs to be as low-overhead as possible because they built it to be able to run up to four robots. So they created the format to very quickly fetch data from a robot's RAM or GPU or a computer’s drive, “and figured out how to do indexing so that you can still have scrubbing, you can still have summarization but with a really low impact on the performance overhead of your robot.”
The tradeoff was doing something that was “really optimized” while allowing “nice read properties at the end of the day.”
Leading-Edge Use Cases, and Progress in Some Areas
Some of the most interesting robotics applications are in agriculture, John says, as well as on the factory floor. “These are places where you would generally see fixed-function robots working on an assembly pipeline, and now you're seeing more autonomous bots that are able to do slightly less structured work and navigate the floors and coexist with humans,” he explains.
NASA’s ground control operator interface is another fascinating story, he says. There was a lot of custom extension development to do some astronomical math. Also, the Formula 1 racing division has built one-tenth scale, fully autonomous vehicles. (https://www.linkedin.com/pulse/how-1-tenth-size-f1-autonomous-car-racing-rapidly-michael-coraluzzi/)
Going forward, technology innovations will continue to make more applications possible. Today’s faster, more powerful computers and software are making it much simpler to bring together multiple visualizations, to see images and 2D annotations on those images, to project flat bounding boxes into the 3-D space together with all 3-D data, to bring more and different types of data in--and then scale all that up.
Contributing to these improvements are better video compression, adaptive video compression streams, point cloud compression, point cloud decimation, and “building out the server side in a more intelligent way,” John says. “It's all just back to the fundamentals of ‘I just want to be able to look at my data quickly and do fast iteration.’ But that's the stuff that really excites me because I know when I'm working on a robot, it's avoiding those stumbling blocks of ‘Oh, now I have to wait for this,’ or, ‘This isn't fast enough.’ Just having it seamlessly work is such a pleasure.”