The Week in AI is a roundup of high-impact AI/ML research and news to keep you up to date in the fast-moving world of enterprise machine learning. From a language model solving mathematical problems to a synthetic data generator that helps improve model performance, here are this week’s highlights.
A research team at Google recently introduced Minerva, a language model that uses sequential reasoning to answer mathematical and scientific problems. Traditionally, large language models have been adopted for a range of natural language tasks, such as question answering, summarization, and common-sense reasoning. However, quantitative reasoning, which includes resolving issues in mathematics, physics, and engineering, remains underexplored by scientists.
Despite being an intriguing application for language models, quantitative reasoning has its challenges. For example, solving mathematical and scientific problems requires the ability to accurately parse a query with normal language and mathematical notation, remember pertinent formulas and constants, and produce step-by-step answers requiring numerical computations and symbolic manipulation. To achieve this, language models would need significant improvements in model architecture and training methods.
Researchers help Minerva tackle such challenges by concentrating on gathering quality training data, training at scale, and using best-in-class inference approaches to improve performance on a range of quantitative reasoning tasks. Minerva was trained on a 118GB dataset of scientific papers from the arXiv preprint service and web pages with mathematical expressions in LaTeX, MathJax, or other formats.
To communicate using conventional mathematical notation, the model retains the symbols and formatting information from the training data, which is crucial to the semantic meaning of mathematical equations. However, like most language models, Minerva uses a stochastic approach to generate several answers to a question and selects the most frequent one using majority voting. When tested for numeric reasoning skills using STEM benchmarks ranging in difficulty from grade school-level challenges (for example, GSM8K) to graduate-level coursework (such as OCWCourses), Minerva consistently produced cutting-edge outcomes.
As next steps, the researchers will continue to improve on the model’s few limitations, including a potential to use flawed reasoning processes, while hoping to unlock new opportunities with quantitative reasoning for students and professional practitioners.
Deep Longevity, in collaboration with Harvard Medical School, has developed a new deep learning approach that aims to help people find the shortest path to happiness. In a paper published in Aging-US, the researchers outlined two digital models of human psychology based on data from the “Midlife in the United States” study.
The first model is an ensemble of deep neural networks that predicts respondents’ psychological well-being in 10 years using information from a psychological survey. While depicting the trajectories of the human mind as it ages, it demonstrates that the capacity to form meaningful connections, as well as mental autonomy and environmental mastery, also develop with age. Moreover, it suggests that the emphasis on personal progress is constantly declining, but the sense of having a purpose in life fades only after 40 to 50 years. With these results, researchers are now able to grow the body of knowledge on socioemotional selectivity and hedonic adaptation in the context of adult personality development.
The second model is a self-organizing map the researchers created to serve as the foundation for a recommendation engine for mental health applications. The unsupervised learning algorithm splits all respondents into clusters depending on their likelihood of developing depression and determines the shortest path toward a cluster of mental stability for any individual. This method is a demarcation from existing mental health applications, which offer generic advice that applies to everyone.
To demonstrate the system’s potential, Deep Longevity released FuturSelf, a free online application that lets users take the psychological test described in the original publication. At the end of the assessment, users receive a report with insights aimed at improving their long-term mental well-being. They can also enroll in a guidance program that provides them with an exhaustive list of AI-chosen recommendations. The researchers will use the data obtained on FuturSelf to further develop Deep Longevity’s digital approach to mental health, in addition to working on a follow-up study on the effect of happiness on physiological measures of aging.
Researchers from the Department of Diagnostic and Interventional Radiology and Neuroradiology, University-Hospital Essen, in Essen, Germany, recently published a study in The Lancet Digital Health highlighting how a decision-referral approach, in which cardiologists work with AI models to evaluate breast cancer screening, achieves better results than clinicians or algorithms can achieve alone. As the rise of AI use in medical imaging continues to spur significant research into the development of accurate cancer screening algorithms, these researchers decided to combine the strengths of radiologists and AI to improve accuracy.
To measure the viability of such an approach while adding more data in this area of medical research, the study’s authors set out to evaluate the performance of an AI, a radiologist, and the two together when tasked with breast cancer screening. The AI model was built by using a retrospective dataset containing 1,193,197 full-field, digital mammography studies carried out between January 1, 2007, and December 31, 2020. The mammograms were sourced from 453,104 patients at eight screening centers in Germany.
Data from six of the sites was used for model development and internal testing, and data from the other two was used for model validation and external testing. The internal-test dataset consisted of 1,670 screen-detected cancers and 19,997 normal mammography exams, while the external-test dataset contained 2,793 screen-detected cancers and 80,058 normal exams. The researchers used annotations from radiological findings and biopsy information as training labels for the model to classify the images.
Post-development, the model was placed in a simulated environment to classify each image as normal or suspicious for cancer, while providing an indication of its confidence on its classification. Images that were deemed suspicious or classified with low confidence were referred to the radiologist without any indications of the AI’s inferences.
On its own, the AI model achieved a sensitivity of 84.2% and a specificity of 89.5% on internal-test data and a sensitivity of 84.6% and a specificity of 91.3% on external-test data. However, the radiologist achieved a sensitivity of 85.7% and a specificity of 93.4% on the internal-test dataset and a sensitivity of 87.2 % and a specificity of 93.4 percent on the external-test dataset.
Despite the radiologist outperforming AI when working alone, the researchers discovered that a combined decision-referral approach achieved the highest performance, as well as improving specificity and sensitivity scores. The researchers say radiologists could use this AI partnership to reduce their workload without affecting their performance.
IBM researchers have developed Task2Sim, an AI model that learns to generate synthetic, task-specific data for pretraining image-classification models. The researchers made use of ThreeDWorld, a setting created using the Unity graphics engine to create pictures with realistic objects and scenes. Deep learning models learn to make predictions and decisions based on patterns extracted from billions of real-world examples. However, health information, financial information, consumer information, and online material are all covered by copyright, ethical, and privacy rules.
Moreover, other forms of data include high curation costs, biases, and built-in vulnerabilities that have resulted in more common instances such as chatbots ranting about gender and ethnicity and resume screeners excluding competent job applicants. To address these issues, the researchers turned to synthetically generated data, which will represent 60% of the data used in training AI models by 2024, according to a Gartner prediction published in The Wall Street Journal.
There are two ways to generate synthetic images: using generative models to build AI systems that can learn from data and drastically speed up the time it takes to find new opinions to test, and using graphics engines, the image-generating systems used to train task-specific learning. This technology is widely used to train self-driving cars and warehouse robots.
The IBM researchers chose the latter method and demonstrated some advantages of using synthetic data: First, in a virtual environment, creating images from scratch presents fewer challenges than manipulating real-world data, such as the tiresome task of categorizing what is in each picture. Second, you can control the parameters of synthetic data—the background, lighting, and the way objects are posed. Third, you can generate unlimited training data, and you get labels for free.
In the next stage, the researchers will investigate whether a classifier trained on Task2Sim-generated data can surpass those trained on actual data. They also plan to use synthetic data for more difficult vision tasks, such as seeing how people and animals interact with their environment in different situations.
Machine learning models are excellent tools in many scientific fields, but they’re frequently only used to solve particular problems. For quantitative reasoning however, Minerva possesses no clear underlying mathematical structure. This makes it challenging for researchers to automatically verify and identify any flawed reasoning processes the model may have used to arrive at the final responses. However, the researchers are doubling their efforts to address this drawback.
Moreover, synthetic data may offer a safer method of learning about the physical world. The complexity and unpredictability of our world may be better understood by AI models if they can be taught how people, objects, and animals behave in a virtual environment. The AI models trained by synthetic data obtained from these various simulations will need the participation of many stakeholders (data practitioners, social scientists, behavioral economists, psychologists, and more) to help interpret and assess the viability of such systems.
Until next time, stay informed and get involved!