The Week in AI is a roundup of high-impact AI/ML research and news to keep you up to date in the fast-moving world of enterprise machine learning. From an AI system that analyzes sources cited in Wikipedia to a precision medicine tech that predicts the effectiveness of immunotherapy, here are this week’s highlights.
Meta developed an AI system that scans Wikipedia articles, analyzes the sources cited in the articles, and identifies those that should be modified. It is releasing the code under an open-source license.
Checking the sources cited in a Wikipedia article is time-consuming. Editors must ensure accuracy and that the citations are put into the right context for millions of Wikipedia pages, each of which may contain hundreds of citations.
To address this issue, Meta’s AI system assists Wikipedia editors by automating citation review. The system can spot when information has been backed up by questionable citations and recommends more appropriate sources.
Meta’s AI system trained on 4 million snippets of Wikipedia text to learn how to detect incorrect citations. To help the system, Meta created the Sphere dataset, which contains 134 million documents from the open web. When the AI system detects questionable citations in a Wikipedia article, it searches Sphere for documents that are more relevant.
To speed up the time-consuming task of searching for potential citations in those 134 million documents, researchers developed a collection of specialized indices—shortcuts that make it faster to find specific pieces of information. Once the system finds a document that’s more relevant, it extracts the appropriate passages from that document. It also proactively discovers when multiple documents could be cited as a source and presents them to the editors.
To identify which text snippets from the Sphere documents are more appropriate for a Wikipedia article passage, the system creates mathematical representations between the source (Sphere) and target (article) texts. Meta is releasing the Sphere databases, along with the indices it developed, to make the database easier to search.
To make it possible to run indices across multiple servers, rather than on a single machine, the team is also publishing the code for an internal tool called Facebook AI Similarity Search (Faiss).
DeepMind released an AI-based system called Physics Learning through Auto-encoding and Tracking Objects (PLATO) that can learn simple physical rules by observing how objects move around and expresses “surprise” when the rules are violated.
With human babies, developmental psychologists often test how they grasp the motion of objects by tracking their eye movements. For example, when they watch a video of a moving ball suddenly disappearing, they express surprise by staring in a specific direction.
To develop a similar test for AI, the team trained the neural-network-based PLATO with animated videos of simple objects, such as cubes and balls. In the team’s research, recently published in Nature Human Behavior, PLATO not only ingested raw images from the videos it trained on, but also had access to versions that highlighted each object in a particular scene. This allowed the model to develop internal representations of physical properties of objects, including their velocities and positions.
During tens of hours of video training, PLATO developed the ability to predict how objects would behave in different situations by observing simple mechanisms, such as a ball rolling down a slope or two balls bouncing off each other.
More precisely, the system could detect three types of patterns: continuity, which is explained by objects moving in a continued direction without “skipping” from one position to another; solidity, a characteristic that prevents objects from blending with each other; and persistence, which captures objects’ shapes given environmental conditions.
PLATO’s predictions become more accurate as it gets further into a video session. When shown videos of objects that suddenly disappear, the program measures the difference between the video and its own prediction, providing a measure of surprise. While PLATO isn’t designed as a model for infant behavior, it could be a step toward an AI that can test hypotheses about how human babies learn, the researchers said.
MIT researchers developed EquiBind, a deep-learning model that is 1,200 times faster than QuickVina2-W, one of the fastest computational molecular modeling systems. EquiBind’s new method makes it possible to successfully predict and bind druglike molecules for key proteins. The paper, which explains how the model could potentially reduce the chances and costs of drug trial failures, will be presented at the 2022 International Conference on Machine Learning (ICML).
There are potentially a novemdecillion (1060) molecules that could be combined into drugs. Currently, the drug development process for fast-spreading diseases such as COVID-19 is prolonged because this number is far beyond what existing drug design models can compute.
Drug researchers must spend significant time testing and selecting compatible molecules, called ligands, that can successfully lock onto targeted proteins. Once selection is complete, the drug development can begin.
Companies typically spend billions of dollars to develop drugs, and new discoveries can take more than 10 years to be developed and tested in addition to the time required to gain final approval from the U.S. Food and Drug Administration. When testing a promising candidate, researchers look for molecules’ ability to block earmarked proteins from working. For example, an effective drug must be able to neutralize the protein of bacteria that invaded a human body.
Unfortunately, after all of that development effort, human testers often either have no effects from the drug or suffer too many side effects. Some 90% of all drugs fail during trials, and companies raise the prices of successful drugs to make up for those losses.
Today’s top-performing drug molecule modelers can’t escape the computationally heavy requirements of finding, comparing, selecting, and refining the optimal ligands that best match with targeted proteins.
To cope with the difficult ligand binding process, EquiBind has geometric reasoning, which helps it learn the underlying physics of molecules. This feature lets it successfully generalize on new, unseen data to improve predictions.
Industry professionals such as Pat Walters, chief data officer of Relay Therapeutics, are monitoring these developments and promising better drug discovery results powered by AI, according to MIT. During a trial suggested by Walters, EquiBind successfully bound the ligands of an existing drug and protein used for lung cancer, leukemia, and gastrointestinal tumors.
Researchers from Pohang University of Science & Technology in South Korea developed a precision medicine technology based on AI that predicts immunotherapy response in cancer patients.
The newly created immunotherapy has many advantages over traditional cancer treatments. For example, patients experience fewer side effects because the process leverages the body’s immune system to fight cancer cells, while the body avoids negative exposure from chemotherapy or radiotherapy.
However, it’s difficult to predict the effectiveness of immunotherapy treatments on patients. On average, 30% of cancer patients experience successful results, while 70% end up resorting to other traditional methods, the researchers said. To address this issue, the team used ML to improve its accuracy in predicting patient response to immune checkpoint inhibitors.
By discovering and analyzing more effective neural-based biomarkers—which capture what is happening in a cell or organism at any given moment—the team successfully developed an AI system that could predict the response to an anti-cancer treatment. An extensive model testing the system’s predictions outperformed conventional methods.
To train the L2 regularized logistic regression (LR) model selected for the AI system, researchers classified patients as responders or non-responders based on signals detected from the posttreatment neural-based biomarkers. They also trained and tested other models, such as a support vector classifier (SVC), a random forest (RF), and a deep neural network (DNN) to compare with the LR model.
Researchers selected the optimal hyperparameters for the LR model using the GridSearchCV function from the Python package scikit-learn. The result: The SVC and RF models performed at similar levels when measured against the LR model, while the LR model more accurately generalized on an unseen dataset than the DNN model.
Recently, the same research team developed ML models to better predict the effectiveness of chemotherapy treatments on gastric or bladder cancer patients. With this newly created neural-based biomarker model, they hope to predict positive responses to immunotherapy treatment across many types of cancer.
Meta’s researchers hope that its new AI system and Sphere database will form part of a new tech ecosystem that could potentially support many use cases, such as certificate authentication. These models are the first components of editors that could help verify documents in real time. The tools allow users to auto-complete text and apply proofreading corrections at scale.
Meanwhile, the researchers of the ML-based immunotherapy response study hope the new discovery will help detect patients who will respond to treatment in advance. The goal is to help establish prescription plans that result in customized precision medicine with more beneficial cancer treatments for patients.
Until next time, stay informed and get involved!