The Week in AI is a roundup of key AI/ML research and news to keep you informed in the age of high-tech velocity. This week: Transformers attempt to take over AI, OpenAI releases new GPT-3 edit and insert features in beta, AI acquires common sense to reason like humans, and traditional algorithms get a boost from machine learning.
In 2017, the paper “Attention Is All You Need” announced the arrival of a new artificial neural network (NN)—the Transformer—which has since revolutionized how AI models approach language. Today, Transformers are taking on computer vision by storm, and some observers have started to wonder if they will also take over artificial intelligence. Exploring this hypothesis helps us discover an interesting path forward for this revolutionary system.
In the early days of deep learning, the development of language tasks made little to no progress and computer vision became dominant, as autonomous driving systems began to rely on neural nets. Transformers arrived and changed this outcome by quickly becoming the go-to method for use cases such as word recognition, which focuses on analyzing and predicting text. Since then, we've seen the emergence of tools such as OpenAI's Generative Pre-trained Transformer 3 or GPT-3, which trains on hundreds of billions of words and produces new text with high consistency and accuracy.
In addition, the success of Transformers has pushed researchers to discover their potential for processing multiple kinds of input at once while allowing neural nets that use them to become more accurate than those that don’t. This versatility reveals the possibility of a convergence, giving rise to universal models that may be able to tackle multiple tasks, all while surpassing benchmark performances.
In 2020, we witnessed the launch of the Vision Transformer (ViT). It sports an architecture identical to that of its predecessor, with only minor changes allowing it to analyze images instead of words. The model classified images with over 90% accuracy, a better-than-expected result that propelled it to top performer and strong contender for convolutional neural networks (CNNs) during ImageNet classification contests.
To further push ViT’s evolution, some researchers attempted to pick it apart to better understand how it "sees” and achieves competitive advantage over CNNs. Others built models that can invent new realistic images that were as convincing as those created by CNNs. However, these emerging technologies often come at a steep cost due to the high computational power required during the pretraining phase.
As a result, researchers have been looking for creative ways to build hybrid architectures that rely on the strengths of Transformers and CNNs. This new area of active exploration suggests that we will more likely see future successful models that integrate both.
OpenAI released new versions of GPT-3 and Codex that allow users to insert and edit content in existing text, instead of just completing text. The new features make it practical to leverage the OpenAI API to revise existing content, such as rewriting a paragraph of text or refactoring code.
With these added capabilities, OpenAI can tackle new use cases and improve existing ones. As an example, the company is already piloting insertion in GitHub Copilot, and the results are promising.
Traditionally, GPT-3 and Codex have added text to the end of existing content, based on the preceding text, limiting the iterative nature of text revision. The new insert capability adds relevant text in the middle of existing content, while taking context into consideration.
To improve the quality of completions, users can also provide future context to the model when writing long-form text, transitioning between paragraphs, following an outline, or guiding the model toward an ending.
In addition, the insert feature, which is available in the API today in beta mode via the new interface in Playground, is particularly useful for writing code. In fact, Codex was OpenAI’s original motivation behind launching this feature, since software development often requires inserting code in the middle of existing files as part of revision processes.
Meanwhile, the team has released another new endpoint in beta, edits, that allows users to change existing text or code via an instruction instead of completing it. Imagine instructing GPT-3 to format a block of text as a letter and requesting GPT-3 to sign it at the end. The edit possibilities for text applications are promising. The endpoint can be used to change the tone or structure of text or make targeted changes such as fixing spelling.
Moreover, the edits endpoint is particularly useful for writing code, because programmers often experience the need to refactor, add documentation, translate between programming languages, or change coding styles. The edits endpoint is currently free to use, so I suggest you give it a try, maybe doing as this user did, testing the system by asking GPT-3 to generate an academic essay on ethics and AI.
Joshua S. Rule, a professor of computational cognitive science at the Massachusetts Institute of Technology (MIT), believes that intuitive physics and psychology could help close the gap in building the general-purpose AI systems that scientists have been dreaming of for decades. Neuro-symbolic AI is among the solutions being explored to make AI generalize better, rather than remain limited to a narrower use case.
The idea is to equip the system with fundamental aspects of intelligence that humans and many animals share: intuitive physics and theory of mind.
To achieve humanlike intelligence, Rule devised a three-way interaction between neural, symbolic, and probabilistic modeling and inference. If successful, AI could go beyond recognizing patterns in data and approximating functions and move toward all things humans do to model the world, such as explaining and understanding the things we see, imagining things we can’t see but could happen, and turning them into goals that we can achieve by planning actions and solving problems.
So far, neuro-symbolic approaches have been successful in providing models with prior knowledge of intuitive physics. However, designing AI systems that learn these intuitive physics concepts the way children do remain a challenge. We need to find new techniques for learning, since physics engines are more complicated than traditional neural-networks weighing systems.
To demonstrate the way humans develop building blocks of knowledge, Rule and his colleagues Joshua B. Tenenbaum and Steven T. Piantados published “The Child as Hacker,” where they used programming as an example of how we explore solutions across different dimensions such as accuracy, efficiency, usefulness, and modularity. The paper also explores other concepts, such as how we gather bits of information and develop them into new symbols.
While it seems as if general-purpose AI is still many years away, digging deeper into this field of research seems to be our best chance to crack the code of common sense in neuro-symbolic AI.
Researchers are taking a fresh look at traditional algorithms with the help of machine learning and thus are reimagining the building blocks of computing. The new approach, called algorithms with predictions, takes advantage of the insights ML tools can provide in the data that traditional algorithms handle. The method promises to bridge two fundamentally different computing tactics: ML and traditional algorithms.
A 2018 paper by an MIT computer scientist and a team of Google researchers is at the center of the recent explosion of interest in this approach. The paper, was written by Tim Kraska, Alex Beutel, Ed H. Chi, Jeff Dean, and Neoklis Polyzotis, explored how a traditional algorithm called a Bloom filter can leverage ML to improve its performance. For example, a Bloom filter can help a company's IT department quickly and accurately check if employees are visiting a long list of websites that pose a security risk.
Even though Bloom filters don’t produce false negatives, they do produce false positives, restricting users from visiting websites they should have access to. These false positives can be costly for a company.
To address this, the paper’s contributors developed an algorithm called a “learned Bloom filter,” which combines a small Bloom filter with a recurrent neural network (RNN). To run at high performance, the RNN was trained to learn what malicious URLs look like after being exposed to hundreds of thousands of safe and unsafe websites.
The combined structure effectively achieved faster and more accurate results while minimizing both false positives and false negatives. Since launch, algorithms with predictions have proceeded into this commonly observed cyclical track: Innovative ideas such as the learned Bloom filters inspire rigorous mathematical results and understanding, which in turn lead to more new ideas.
As a result, the past few years have seen the rise of algorithms with predictions incorporated into scheduling algorithms, chip design, and DNA-sequence searches.
We are still in the early years, with programs that use ML to augment their algorithms typically only doing so in a limited way by including a single element. In contrast, evolved systems can include several separate pieces that are backed by algorithms with predictions and whose interactions are governed by prediction-enhanced components.
Although Transformers open the door to models that can handle multiple diverse tasks, neural-symbolic approaches that promise to give AI humanlike intelligence may yield progress in the pursuit of general- purpose AI.
Meanwhile, enhancing text- and code-writing efficiency, as well as improving traditional algorithms, will free up more time for developers to focus on creativity. With luck, the prolonged chip shortage won’t hinder progress in the computationally heavy AI space.
Until next time, stay informed and get involved!