The 2023 AI panel has been massively dominated by Large Language Models starting with the appearance of ChatGPT in late 2022 which beat the tech adoption record with the fastest-growing user base.
Large Language Models are a subset of Generative AI which refers to systems that can generate content, as the name clearly states, in different formats: tabular, text, images, …etc. The LLMs are trained on large amounts of textual data which makes them very efficient at generating human-like answers.
In this article, I will discuss the definition of LLMs, their evolution & applications with a special focus on the main concepts from Pre-training, and fine-tuning to prompt engineering.
The summary is as follows:
- LLMs
- Pre-training
- Fine-tuning
- Prompt engineering
- Applications
LLMs
LLMs are deep learning models trained on large amounts of textual data (300 billion words ~ 570 GB for ChatGPT) scrapped from the internet, allowing them to capture massive patterns within languages and outperform all existing techniques and models of next-word prediction.
Language models have been evolving massively in the last 5 years through different architectures as shown in the illustration below:

Where:
- Encoders: transform a sequence of words to a sequence of numbers (embeddings)
- Encoder-decoders: take text as input and generate a new sequence of words as an output (translation for example)
- Decoders: generate text as an output from context
These last models, decoders, have been at the center of attention since 2020 given their Transformers architecture that unclocked unprecedented performance through massive unsupervised learning, large amounts of textual data, and deep layers (GPT-1, 117 million parameters). The parameters refer to the weights and biases that are learned by the neural network.
With that being said, during the training, the LLMs are able to learn two main capabilities:

- Knowledge: coming from all the information and data that it has been trained on. For example, ChatGPT 3.5 has a knowledge database scrapped all over the internet until January 2022.

- Reasoning: captured from the different patterns within the data. The LLM would be able to perform classical human tasks such as information/insight extraction, problem-solving, …etc.
LLMs trained on large datasets of code (Github) show better reasoning capabilities which are intuitively patterns and logics learned from the programming languages.
They can also be categorized into two types of LLMs:
- Base LLM: the ‘raw’ version of the model that serves for next-word prediction. It is obtained from pre-training the deep learning model (see Pre-training section below).
For example, if you ask the question “What is the capital of France?”, the LLM will probably output: “What is the capital of Spain?”
GPT3 is a base large language model. - Instruct LLM: a fine-tuned (see Fine-tuning section below) version usually trained on a dataset of questions and answers and which mainly serves for chatting tasks. For example, ChatGPT is the Instruct version of GPT3.
In this case, given the same question as before, the LLM will return: “The capital of France is Paris”.

As mentioned above, LLMs are trained for next-word prediction which could also make them by default subject to hallucination, where the generated content is grammatically and semantically correct but does not reflect a real-life fact.
It’s a trade-off between objectivity and creativity which is controlled by a hyperparameter called temperature. Hallucination can be a serious issue since it spreads false information and can have significant consequences depending on the usage of the LLM.
Pre-training
Pre-training is the primary task of training the deep learning model on vast amounts of textual data scrapped in most cases from the internet. The model learns the structure of the language, its grammar, and the common sense embedded within the language. That is the first learning step of the deep learning architecture. It is crucial to meticulously clean the training data to prevent the model from learning biased information.
It’s also worth mentioning that training a Large Language Model is very time and resource/money-consuming. While there is no official data, it has been estimated that GPT-4 was trained on 570GB of data using 25,000 Nvidia A100 GPUs for approximately 100 days.
Giving the distribution of subjects and domains within the overall internet data, the LLM will most likely perform well on general language and struggle with domain heavy tasks.
Fine-tuning
Fine-tuning is the task of further training the model in order to enhance its Knowledge, Reasoning, or both. It comes in handy when dealing with tasks that are either:
- Singular: Q&A, Summarization, … for instance
- Vocabulary-specific: medical, finance, …etc
For instance, ChatGPT was obtained by finetuning the GPT3 base LLM.
The fine-tuning dataset is usually structured in two main formats:
- 1st Structure: (Instruction, Input, Output) which intuitively enhances the reasoning capabilities of the LLM.
- 2nd Structure: (Input, Output) which enlargen its knowledge
It is an iterative process that offers higher performances as it increases the coherence and reliability of the model and reduces the hallucination. It is also much less expensive than training and allows more control and transparency over the new knowledge.
Prompt Engineering
Prompts serve as inputs provided to a Language Model (LLM) to improve the quality of its generated output. Employing prompts is a strategic approach to presenting a problem to the LLM, guiding its reflection and reasoning process for optimal, precise results. The technique of formulating prompts is known as prompt engineering.
The prompt can have the following structure:

where:
- Context: is an optional textual data from which the information can be extracted, it is also used to define the tone and the role/agent of the LLM.
- Question: is the query that needs to be answered.
- Instructions: these are the steps that the LLM should follow to answer the question, it is also used to specify the format of the output, its length…etc.
Below are some prompting principles:
- Clear instructions
- Check if conditions are satisfied
- Use of delimiters
- Format the output in a structured format (JSON, Markdown, …)
- Few-shots prompting: give a sample on tuples of (inputs, output) and query a new input - Thinking process
- Specify the steps to follow for the problem-solving task
- Instruct the model to work out its own solution before rushing to a conclusion
Prompting involves an iterative process in which we aim to clarify and refine instructions to achieve improved results. It is a dynamic and ongoing effort to enhance the guidance given to the system for more effective outcomes.

It’s interesting to note that the selection between prompt engineering, fine-tuning, or training depends primarily on the specific use case and its requirements. The necessary data and resources will correspondingly increase as you move from one task to another.
Applications
LLMs have found applications in many domains giving their capacity to generate textual content in a very human-like style. The graph below summarizes some of the most famous uses of LLMs:

- Summarization: is the task of shrinking the size of the textual input, it can be based either on a word limit, targeted information, or for a specific audience.
- Inference: refers to the task of extracting insights from the text with the option of specifying the format of the output.
- Transformation: can update/re-write the input or even translate it
- Expansion: is one the most interesting productivity-gain features of using LLMs allowing to generate emails, communications, and even ideas for brainstorming session
These applications showcase the high potential of LLMs and project the impact that they could have on a technical level when dealing with NLP tasks which could also reflect on the business level given the new and significant features and possibilities that they unclock.
Conclusion
Large Language Models (LLMs) are shaping a new paradigm in our approach to NLP tasks, showcasing unprecedented flexibility and accuracy. This dynamic field is marked by continuous research and development, with new LLMs consistently emerging and being ranked on almost a weekly basis.
2024 will mostly be the year of small/nano LLMs that would run on small devices and specialize in specific tasks. Phi-2 and The Rabbit R1 are all promising signs of this year’s trend and I’m personally looking forward to what’s coming next!