Member-only story
The Components of LLMs.
The Immense Potential.
The underlying technology of LLMs is called transformer neural network, simply referred to as a transformer.
Transformers have provided a significant leap in the capabilities of LLMs.
Without them the current generative AI revolution wouldn’t be possible.
Transformers are based on the same encoder-decoder architecture as recurrent and convolutional neural networks. Such a neural architecture aims to discover statistical relationships between tokens of text.
This is done through a combination of embedding techniques. Embeddings are the representations of tokens, such as sentences, paragraphs, or documents, in a high dimensional vector space, where each dimension corresponds to a learned feature or attribute of the language.
The embedding process takes place in the encoder.
Due to the huge size of LLMs, the creation of embedding takes extensive training and considerable resources.
What makes transformers different compared to previous neural networks is that the embedding process is highly parallelizable, enabling more efficient processing.
This is possible thanks to the attention mechanism.
Recurrent and convolutional neural networks make their word predictions based exclusively on previous words.
In this sense, they can be considered unidirectional.
The attention mechanism allows transformers to predict words bidirectionally, that is, based on both the previous and the following words. The goal of the attention layer, which is incorporated in both the encoder and the decoder, is to capture the contextual relationships existing between different words in the input sentence.
Training transformers involves two steps:
Pre Training and Fine Tuning.
In the phase of Pretraining, transformers are trained on large amounts of raw text data.
The Internet is the primary data source.