Armilla Review #11

The Armilla Review is a weekly digest of important news from the AI industry, the market, government and academia tailored to the interests of our community: regarding AI evaluation, assurance, and risk.
May 10, 2023
5 min read

In this newsletter, you'll find:

  • U.S. sets important precedent, commits to 'public' assessment of LLMs
  • California Seeks to Be First to Regulate Business Use of AI
  • Recent Trends in China's Large Language Model Landscape
  • SonyResearch shares their paper, A View From Somewhere: Human-Centric Face Representations
  • Samsung bans use of generative AI tools like ChatGPT after April internal data leak
  • Microsoft is reportedly helping AMD expand into AI chips
  • Geoffrey Hinton tells us why he’s now scared of the tech he helped build
  • Google "We Have No Moat, And Neither Does OpenAI"
  • Specifying Hallucinations
  • An Archaeology of Books Known to ChatGPT/GPT-4
  • Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
  • Building Better Large Language Models - Key Concepts for Prompting and Fine Tuning

U.S. sets important precedent, commits to 'public' assessment of LLMs

The Biden-Harris Administration announced a series of new actions on AI this week, including investments in AI research and development (R&D), public assessments of existing generative AI systems, and policies to support U.S. government leadership on AI risk mitigation. The measures reflect a shift toward the implementation of high-level policy initiatives, such as the Blueprint for an AI Bill of Rights, and growing recognition by governments and policy makers that a better understanding of AI systems' capabilities, including generative AI, is needed to manage associated risks. As experts and stakeholders continue to debate whether and how governments should regulate generative AI tools like ChatGPT and DALL-E, the announcement of public assessments of these systems, committed to this week by the industry's top AI CEOs, is likely to set an important precedent. Independent evaluations by credentialed experts is likely to be a cornerstone of future AI regulations, whether in the U.S. or beyond.

View crowdsourced Federal Legislative Proposals Pertaining to Generative AI

California Seeks to Be First to Regulate Business Use of AI

California lawmakers are pushing forward a bill that would require the private sector to submit annual impact assessments on the use of AI software, such as algorithms that filter out job applicants or detect academic cheating. The legislation would require creators and users to each submit an assessment to the California Civil Rights Department by 2025, a first in the US. The assessments would include what safeguards were in place, what data was being collected and what the potential impacts may be. Industry groups have expressed concern that the terms in the legislation are unclear and the punishments too onerous, but privacy advocates have said the bill could complement forthcoming regulatory efforts.

Recent Trends in China's Large Language Model Landscape

As large-scale pre-trained AI models gain popularity in the West, many Chinese AI labs have developed their own models capable of generating coherent text and realistic images and videos. These models represent the frontier of AI research and have significant implications for AI ethics and governance in China. Yet, to the best of our knowledge, there has been no in-depth English language analysis of such models. Studying a sample of 26 large-scale pre-trained AI models developed in China, this review describes their general capabilities and highlights the role of collaboration between the government, industry, and academia in supporting these projects. It also sheds light on Chinese discussions related to technonationalism, AI governance, and ethics.

SonyResearch shares their paper, A View From Somewhere: Human-Centric Face Representations

The paper titled "Understanding the Limitations of Transformer Language Models" explores the limitations of the popular Transformer architecture, which is used in state-of-the-art language models such as GPT and BERT.

The authors conduct a series of experiments to analyze the limitations of these models, focusing on their ability to perform tasks such as logical reasoning, long-range dependencies, and common sense reasoning. They find that while Transformer models perform well on tasks that require surface-level understanding of language, they struggle with tasks that require deeper comprehension and reasoning abilities.

The paper concludes by suggesting areas for future research, including the development of models that can better integrate world knowledge, and the exploration of alternative architectures that can better capture the nuances of language. Overall, the paper provides important insights into the current limitations of Transformer models and highlights areas where further research is needed to advance the field of natural language processing.

Samsung bans use of generative AI tools like ChatGPT after April internal data leak

Samsung has temporarily banned the use of generative AI tools, including OpenAI's ChatGPT and Google's Bard, on company-owned devices and non-company-owned devices on internal networks, following an accidental data leak last month. The ban will last until Samsung can create security measures to allow safe use of generative AI to enhance employees' productivity and efficiency. The company has developed in-house AI tools for software development and translation. OpenAI's ChatGPT has faced bans and restrictions from various companies due to potential violations of data privacy, copyright violations, and inaccuracies. Other large tech firms in South Korea, including LG and SK Hynix, are struggling to make their guidelines for using generative AI tools.

Microsoft is reportedly helping AMD expand into AI chips

Microsoft and AMD are reportedly collaborating to develop artificial intelligence (AI) processors as they seek to compete against Nvidia. AMD will benefit from Microsoft's engineering resources, as it seeks to develop an alternative to Nvidia's dominance of the graphics processing unit (GPU) market. The article suggests that a lack of alternatives to Nvidia's CUDA ecosystem has limited innovation in the AI sector, and Microsoft has reportedly invested $2bn into developing its own in-house AI chips codenamed Athena. AMD CEO Lisa Su sees AI as the company's "number one strategic priority" and believes its upcoming Instinct MI300 data centre chip could be adapted for generative AI workloads.

Geoffrey Hinton tells us why he’s now scared of the tech he helped build

Geoffrey Hinton, a computer scientist who is often called the "godfather of deep learning," now thinks that neural networks have surpassed human intelligence in their ability to learn. He argues that this is due to their ability to quickly learn new tasks through few-shot learning, a process in which pre-trained neural networks can be trained to do something new given just a few examples. He also argues that "hallucinations" or "confabulations," a feature of many large language models where they generate false statements, are not a flaw but a feature that mimics human conversation. Hinton believes there are two types of intelligence in the world: animal brains and neural networks, and that the latter form a completely new and better form of intelligence. However, he also fears the risks associated with the technology, including the possibility that it could be used to manipulate or kill humans.

Google "We Have No Moat, And Neither Does OpenAI"

The leaked internal Google document discusses the emergence of open source language models that are surpassing the capabilities of even the most advanced proprietary models. The document highlights how open source models are faster, more customizable, more private, and more capable than their proprietary counterparts. Additionally, it argues that the emergence of open-source models could mean the end of the proprietary language model arms race and discusses how the open-source community has been able to make tremendous progress in developing and fine-tuning language models using techniques like low-rank adaptation (LoRA) that are underexploited inside Google. Finally, the writer encourages Google to prioritize enabling third-party integrations and consider where [Google's] value-add really is.

Specifying Hallucinations

The paper "Specifying Hallucinations with Limited Linear Memory Machines" proposes a new approach to the generation of hallucinations in artificial intelligence systems. The authors suggest using Limited Linear Memory Machines (LLMMs) to generate hallucinations based on specific user input. LLMMs are a type of neural network that can learn and generate patterns with limited memory capacity. The authors demonstrate how LLMMs can be used to generate hallucinations in response to user input and show that this approach can produce diverse and creative hallucinations that are specific to the user's input. They also discuss potential applications for this technology, including in the field of creative art and music. The paper concludes by outlining some of the limitations and challenges that still need to be addressed in order to further develop this technology.

An Archaeology of Books Known to ChatGPT/GPT-4

In this work, researchers carry out a data archaeology to infer books that are known to ChatGPT and GPT-4. They found that OpenAI models have memorized a wide collection of copyrighted materials, and that the degree of memorization is tied to the frequency with which passages of those books appear on the web. The ability of these models to memorize an unknown set of books complicates assessments of measurement validity for cultural analytics by contaminating test data; they show that models perform much better on memorized books than on non-memorized books for downstream tasks. The researchers argue that this supports a case for open models whose training data is known.

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either fine-tuning with human labels or distilling using LLM-generated labels. However, fine-tuning and distillation require large amounts of training data to achieve comparable performance to LLMs.

Introducing Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by fine-tuning or distillation.

Building Better Large Language Models - Key Concepts for Prompting and Fine Tuning

This short video provides an introduction to zero-shot and few-shot learning methods and the role of in-context learning and emergence. For fine-tuning, the video explains instruction tuning, reinforcement learning with human feedback, reinforcement learning with AI feedback, and parameter efficient fine tuning.