Events

October 14, 2021

Highlights of TransformX (Part 2)

A review of some of the key conference takeaways

This blog is a continuation from Part1.

Good AI Needs Good Training Data, and Good Training Data needs Humans-in-the-Loop

Dr. Andrew Ng, Founder of DeepLearning.AI and Founder and CEO of Landing AI, started his keynote by laying out the case for data-centric AI development, ‘And I think it's time for us to shift to the data-centric approach to AI, in which you can even hold the code fixed, but instead find systematic tools and methodologies and principles and algorithms to systematically engineer the data so that when the data is trained on the code, it gives you the performance you need’. In short, ‘iterate more on the data, and less on the model’. To give a real-world example, he highlighted the case of a steel plant where a model was used to identify defects with ~70% accuracy. By investing time and effort in the training data, as opposed to focussing on the model, his team was able to improve the accuracy to 90%.

Your (Non-AI) Domain Experts Are Valuable - Use Them

As if that wasn’t compelling enough, Dr. Ng called out another advantage of data-centric AI - it’s easier to involve domain experts, even if they are not AI-experienced. By having domain experts ensure that the training data is well labeled, they were able to contribute to the model’s success; ‘[...] but taking a data-centric approach, it puts all of us as AI practitioners in a better position to empower even non-AI specialists, such as the staff in a steel manufacturing plant, to engineer the data systematically to feed to the algorithm and that results in a much bigger performance improvement’.

- Dr. Andrew Ng, Founder of DeepLearning.AI and Founder and CEO of Landing AI <br/> <br/>

Pay Close Attention to the Long Tail

Dimitri Dolgov, co-CEO of Waymo observed that the increasing size and complexity of models, require thoughtfully curated high-quality training data, ‘As the models get bigger you need more data, a lot of it was supervised ML with human labeling’. As models and datasets scale, ‘a human looking at the long tail helps you out tremendously’. In fact, he highlights that you must constantly mine for the long-tail scenarios where your model is performing badly so that you can improve it through data-centric development. ‘All of those [long tail] examples tend to be very interesting and very informative. We pay close attention to them. Whenever we find them, it’s part of the data-mining strategy - you bring them into your data sets’, said Dimitri. His advice to other enterprises is to be intentional about understanding all of the scenarios their models will face, ‘You need to be bold to evaluate the performance of your system. You need to invest in data mining to ﬁnd interesting examples that are representative of that [long] tail part of the distribution’.

- Dimitri Dolgov, co-CEO of Waymo <br/> <br/>

When it comes to developing models for Autonomous Vehicles (AVs) at Waymo, Dimitri shared a few examples from his own ‘long-tail, ‘And in terms of really rare stuff, we see a lot, things like a drunk cyclist weaving through trafﬁc with a stop sign on his back. Halloween is always a good source of interesting data. You see people wearing Halloween costumes, witches, ghosts, spiders, dinosaurs, all kinds of animals, animals on the road, horses, other animals doing animal things. We recently saw a Bubble Truck. It's a truck that drives around making bubbles…’. So in short, when you build AI models for the real world, you have to know how they will perform across all scenarios - even those that occur very rarely and infrequently.

Debug Your Data and Use Automatate the Easy Stuff

Dimitri’s advice to other enterprises building their own AI was to use as much automation as they can to speed up the way they ‘debug’ their data, ‘Your data iteration has to be as fast and efficient as possible. The same goes for your ML infrastructure. So your talented people, your research engineers can work on other things’. ##Richard Socher - 'You Will Need to Have Humans in the Loop' Richard Socher, CEO of You.com and the fifth most-cited researcher in NLP, described the importance of humans-in-the-loop when labeling your data for AI, with this example of training a chatbot, ‘Both the running experiments, collecting data, labeling data, helping you understand problems in the data, biases that you may have in your data sets, and issues you have dealing with distribution shifts where over time new things happen. You release a new product, your chatbot doesn't yet know how to respond to those kinds of questions, and having continuous integration tests because as you automate more and more harder and harder intellectual tasks that are still somewhat repetitive, you will need to have humans in the loop[...]’.

- Richard Socher, CEO of You.com <br/> <br/>

AI is Enabling New Ways for Robots to Perceive and Interact with the Physical World

Dr Fei-Fei Li, Sequoia Professor of Computer Science @ Stanford University used a historical perspective of evolutionary biology to demonstrate how our intelligence is linked to the ability to see or perceive the world around us. This is particularly important as we conduct research to help machines perceive the world, and then interact with it. Those machines could be manufacturing robots, autonomous vehicles or even something that loads your dishwasher. As Dr Li shared, ‘Vision is a cornerstone of human intelligence, whether biological or artificial’. Dr Li also demonstrated how robots can learn to perform complex ‘long-horizon tasks’ such as clearing a table or loading a dishwasher. In a fascinating segment, she showed how robots could learn in much the same way that a young child might - through experimentation and play. As she says, ‘’Moving around in the world is both explorative and exploitative…this helps the [robotic] agent generalize’.

- Dr Fei-Fei Li, Sequoia Professor of Computer Science @ Stanford University <br/> <br/>

From ‘Blind’ Robots, to Robots That Perceive, Reason and Act

Marc Segura, Group Senior Vice President Managing Director Consumer Segments & Service Robotics, illustrated how robots can grow beyond the ‘blind’ ones which perform static and repetitive tasks like on a production line, where the placement of materials and objects are all pre-programmed and there is no need for perception, reasoning or planning. He highlighted that true robot ‘skills’ actually require a sequence of ‘actions and decisions’. For example, ‘[...] if you want to pick something from a box and put it into another box [i.e. for eCommerce order fulfillment], you need to break it down. First thing is you need to localize the box. That could be one skill. Then you need to localize the object inside the box, segmented out from the rest, and then you need to decide how to pick it, actually pick it and then comes a dropping process’.

Cobots - Robots That Learn to Work With You

Here he differentiates how AI is enabling advances over prior ‘blind’ robots, ‘What AI is bringing is a great possibility to develop robot skills that are learning over time’. In fact, this can now give us cobots i.e. robots that work collaboratively with humans. Marc described a possible cobot scenario, ‘We're going to have more and more mixed work in between human operators and robots. So if the robots get to know and can measure, for example, the average stack time of a person, or if a person is behaving in a certain way, and they can adapt to the person and optimize the process, this is really something’. As Marc says, ‘You don't need to separate robots and people anymore’.

- Marc Segura, Group Senior Vice President Managing Director Consumer Segments & Service Robotics <br/> <br/>

In another example of how robots can change the way we think about work, Marc described how robots could be used in a ‘low batch automation’, such as you might find in ‘a laboratory in a hospital’ where there may be many tasks with low numbers of repetition. An example might be loading different types of test samples into different testing machines - each one is a slightly different task that may only have to be repeated 10 or 20 times. Rather than have a robot dedicated to a single task, it could move between workstations to perform different tasks as needed. Then a (human) lab manager could simply direct robots between workstations to wherever there was a queue of work to be done. The kind of AI advances required to make this vision a reality, bears a striking resemblance to the learn-through-play-and-exploration for long-horizon tasks, described by Dr. Fei-Fei Li.

Explainable AI is Hard (and that’s ok)

For many industries, it’s important to not just build a high-performing AI model, but also to be able to explain why that model behaves as it does. This is especially true for highly regulated enterprises where regulators require controls that mitigate bias and discrimination. However, according to Ilya Sutskever, Co-Founder and Chief Scientist at Open AI, ‘The difficulty of understanding what neural networks do is not a bug, but a feature’. He went on to explain, ‘neural networks are as successful as they are precise because they are difficult to reason with mathematically’.

In fact, Ilya described that when it comes to how humans see, hear, and understand language, ‘We can’t explain how we do the cognitive functions that we do’, therefore, ‘if computers can produce objects that are similarly difficult to understand, not impossible but similarly difficult, it means you’re on the right track'.

- Ilya Sutskever, Co-Founder and Chief Scientist at Open AI] <br/> <br/>

Data-Centric AI - Debug Your Model by Debugging the Data

Without AI models being directly interpretable, practitioners are often left to explore alternate avenues to better understand their model performance. Dr Andrew Ng described how practitioners should take an iterative approach to explore model performance within the context of the data used to training it. Using this ‘data-centric’ method, AI teams can identify areas certain model performance and connect that level of performance directly to particular slices of their training data. For example, Dr Ng explains how his team helped explain poor model performance in a defect-detection model at a steel manufacturing plant. He showed that by correlating performance with training data, not only did this explain why a model performed a particular way, but it helped identify slices of training data that could be engineered to improve performance. Tools like Scale Nucleus, help practitioners sort predictions by error metrics or explore interactive confusion matrices, to explain model failures by identifying contributory training data samples.

The Geopolitical Landscape of AI is Changing in Ways That Will have Far-Reaching Consequences

In more than one discussion, the position of China and its race to be a leader in AI was presented as a serious concern. As Eric Schmidt noted, ‘This is a national security challenge for the United States. If you want, for the next 20 or 30 years, for American values, American technology, American startups to be global platforms, we need to get our act together now. Because our competitor is doing that already’. Eric went on to share how he himself had underestimated the pace of AI development in China, ‘ In March, we said that we were one to two years ahead of China in AI. In June, they demonstrated a universal model of a size similar to that of OpenAI's GPT-3, which is a significant accomplishment on China's part. Now, maybe it's not as good, but the important point is they know what they're doing and they're on their way’.

##TikTok - A Sign of China's Accelerating AI Progress Eric also held up TikTok as an example of China’s accelerating progress, which surprised even him, ‘TikTok is a good example of the first really breakout platform from China. By the way, it’s a high-quality platform and much of its apparent success is because it has a different AI algorithm for matching. It matches not who your friends are, but rather what your interests are, using a very special algorithm. That is an example where I would have said that would not occur for another 5 years. So we have relatively little time - maybe a year or two. Not 5 or 10, to get ourselves organized’.

We Need To Change The Way We Fund AI, or Risk Falling Behind

How do we meet this challenge? Part of the answer is how we fund the increasingly fast pace of AI research, through government funding. As Mac Thornberry, Former U.S. Representative for Texas's 13th Congressional District @ US House of Representatives noted, ‘We've got to act with a much greater sense of urgency. And, and I would say, in just as one example, the sort of two-year budget cycle that we've used to for DOD budgets is just not going to cut it with technology that changes this quickly and adversaries that are moving this quickly. He offered that we should fund, ‘[...] some pool of money, for example, related to artificial intelligence, where there is greater flexibility in spending it with full transparency to Congress in how it is spent'. This was also one of the recommendations that the National Security Commission on Artificial Intelligence (NSAIC), chaired by Eric Schmidt, made in their report earlier this year.

- Mac Thornberry, Former U.S. Representative for Texas's 13th Congressional District @ US House of Representatives <br/> <br/>

Mac reminded us why it’s important to accelerate the pace of AI for national defense, ‘There is always a gap between the development of new technologies and their adoption by militaries, but the fate of nations is decided within that gap’.

To Be Continued!

There are so many more highlights of TransformX that we would love to share with you - especially when it comes to the panel discussions and informative breakout sessions. Do stay tuned for more updates as we bring you more key takeaways from TransformX.

What were your favorite takeaways? Let us know here!