Transformer (AI Architecture)

Simple Definition

The transformer is a type of neural network architecture that has become the foundation for almost all modern AI language models. GPT, Claude, Gemini, Llama — they all use transformer-based architectures.

It was introduced in the 2017 paper “Attention Is All You Need” by researchers at Google, and it revolutionized natural language processing.

What Made Transformers Different

Before transformers, language models processed text sequentially — word by word, like reading left to right. This was slow and made it hard to capture long-range relationships between words.

Transformers introduced self-attention: the ability for every word in a sequence to directly consider every other word at the same time. This meant the model could understand “The trophy didn’t fit in the bag because it was too big” — knowing that “it” refers to “trophy” not “bag” — by looking at all words simultaneously.

Key Component: Attention

The “attention mechanism” is the core innovation. It lets the model weigh how relevant each word is to every other word when building its understanding.

For example, in “The cat sat on the mat,” the word “sat” attends strongly to “cat” (who’s sitting?) and “mat” (sat on what?).

Why Transformers Scale So Well

Transformers process all tokens in parallel (unlike sequential models), which means they can be trained much faster on modern hardware. This scalability is why it became practical to train models on hundreds of billions of tokens.

LLM — large language models built on transformers
Neural Network — the broader category transformers belong to
Deep Learning — the field transformers are central to
GPT — OpenAI’s transformer-based model family

See AI terms in action

Browse practical AI workflows that use the concepts in this glossary.

AI Workflows Browse Glossary

Last updated: May 28, 2026

Transformer (AI Architecture)

Simple Definition

What Made Transformers Different

Key Component: Attention

Why Transformers Scale So Well

Related Terms

Related Terms and Resources

Back to Glossary

AI Workflows

Llm

Neural Network

Deep Learning

Gpt

See AI terms in action