Language Engines are powered by large language models. These models are trained to produce the highest probability next word based on the preceding words. For example, given the words “The cat sat on the”, the highest probability next word is “mat”.

Sometimes the highest probability next word will be morally undesirable. This can occur when the new word derives from biased, offensive or gendered assumptions. For example, assuming that a professor is male.

Language Engines can be configured to try and avoid these biases using a number of techniques. At AutogenAI we are working to build Language Engines that reflect the modern world as we would want it to be – diverse, inclusive and welcoming. Below are some of the technical ways that we and other machine learning engineers are doing this:

Inclusive data

Large Language Models (LLMs) are trained using only a subset of all available text data. The job of cleaning up the “Common Crawl corpus” – a vast webarchive consisting of petabytes of data collected since 2011 –  into what is known as “The Colossal Clean Crawled Corpus” is detailed in this paper. Removing offensive and inaccurate content before training the models goes some way to eliminating bias.

Human evaluation and re-training

Humans are increasingly being used to manually evaluate the output produced by LLMs for accuracy and inclusivity. This human feedback is fed back into the models, training them to make fewer factual errors, avoid offensive and toxic language, and produce more relevant and diverse responses.

OpenAI trained their large language model using the following three steps:

Evaluation and retraining of an LLM
A diagram illustrating the three steps to evaluate and retrain a LLM: (1) supervised fine-tuning, (2) reward model training, and (3) reinforcement learning via proximal policy optimization.
Credit: OpenAI

Evaluation and retraining of an LLM

A diagram illustrating the three steps to evaluate and retrain a LLM: (1) supervised fine-tuning, (2) reward model training, and (3) reinforcement learning via proximal policy optimization.

Credit: OpenAI

Word bias neutralisation

Machine learning engineers neutralise well-known biases from certain words. For example, removing gender biases from words like “babysitter” and “doctor” ensuring that they are equally likely to be chosen as part of a sentence describing a female or male occupation.

This is achieved through a process that takes every word and adjusts the relative positioning between them to ensure that only words that have intentionally gendered distinctions (such as “king” and “queen” or “he” and “she”) lie along that vector direction.

Word vector bias nautralisation
An example of how the words “Doctor” and “Babysitter” can be gender neutralized by using known genderd word vectors.

Prompt Engineering

Altering the text used to interact with large language models – a technique  known as “Prompt Engineering” – has a huge impact on the output. By continuously testing how the model responds to different inputs, AutogenAI’s Prompt Engineers remove a significant amount of output bias. Using the same technique, we can also incorprate our clients’ corporate language, win themes, values and priorities to produce company-specific outputs.

Conclusion

Language Engines are rapidly becoming key co-producers of the written content that we consume. It is vital that those of us building Language Engines work to ensure that the text produced is inclusive and reflects the diversity of the society that we will live in.

AutogenAI’s team of fine tuners, prompt engineers, developers and writing specialists work with our clients to responsibly use language engines, filtering out plagiarism, bias and inaccuracy, while embedding company beliefs and values into all content produced.