author image

What is a Large Language Model?

Close up of face with digital display reflected in their glasses

Large Language Models (LLMs) are silicon brains that can produce and analyse language. They do this by learning the statistical associations between billions of words and phrases. LLMs are the foundation for all AutogenAI’s Language Engines.

Large Language Models (LLMs)

Cutting-edge LLMs are pre-trained on nearly the entire corpus of digitised human knowledge. This includes Common Crawl, Wikipedia, digital books and other internet content.

The table below summarises what modern LLMs have ‘read’…

Data SourceNumber of words from that data source
Common Crawl580,000,000,000
Books26,000,000,000
Wikipedia94,000,000,000
Other web text26,000,000,000
TOTAL726,000,000,000

LLMs have ‘read’ over 700,000,000,000 words. A human reading 1 word every second would take 23,000 years to achieve the same feat.

LLMs are now capable of generating text that is sophisticated enough to complete scientific papers. They can output computer code that is superior to many expert developers. Elon Musk has described LLMs as “the most important advance in Artificial Intelligence”.

Large Large Model timeline and growth
The first language model was in 2018 and they have been exponentially increasing in size – measured by the number of parameters they contain – ever since.

The models themselves have been exponentially increasing in size measured by the number of parameters they contain. This means that their parameter growth is continually getting faster. Parameters are numbers that are learnt by the model through training. Imagine a range from -2,147,483,648 to 2,147,483,647; that’s the scale of their parameters that help provide them with impressive computing power.

In 1965, American engineer Gordon Moore famously predicted that the number of transistors (electronic switches) per silicon chip (the building block of all modern electronic devices) would double every year.

The new “Moore’s Law” for LLMs suggests an approximately eight-fold increase in the number of parameters every year, potentially yielding a similar increase in performance (although recent evidence suggests this might not be the case and that we might have reached the limit of performance improvements purely due to size).

“Moore’s Law” for LLMs
The new “Moore’s Law” for LLMs suggests an approximately eight-fold increase in the number of parameters every year

Performance and training

Most meaningful assessments of the output quality of current LLMs are subjective. This is perhaps inevitable when dealing with language. LLMs have written convincing articles for example like this one published in the Guardian.

LLMs have surpassed expert human-level performance in some quantitatively-assessed writing tasks  such as machine translation, next token prediction (content generation), and even some computer programming assignments.

It currently costs approximately $10m to train a LLM and the current process takes a month. The research and development costs for the most sophisticated models like OpenAI’s GPT-3 are unknown but are likely much higher. Microsoft invested $1b into OpenAI in 2019.

The high training costs of LLMs are primarily related to the huge amounts of computing power required to find the best model parameters across such vast amounts of data.

Other models, such as recurrent neural networks and classical machine learning models have had access to the same amount of data as LLMs, but none have managed to surpass human-level performance on some tasks like LLMs have.

What does the future hold?

LLMs look set to transform industries that rely on text including translation services and copywriting. They are already being deployed in next generation chatbots and virtual assistants. The business that I work for, AutogenAI, is deploying specifically trained enterprise-level large language models to speed and improve the production of tenders, bids and proposals.