
As ChatGPT, Perplexity, Gemini, and other large language models (LLMs) gain popularity, brands have one fear: disappearing de moteurs de recherche. By providing answers directly à users, these generative AIs are disrupting the SEO stratégies of many companies.
But to respond to users with complete transparency, LLMs must provide their sources. And that’s where l'IA générative optimisation comes in. With this new generation of natural referencing, you can devenir visible on any propulsé par l'IA moteurs de recherche. Let’s take a closer look.
Becoming familiar with traitement du langage naturel (NLP) and large language models (LLMs) is an essential step in anticipating developments in moteurs de recherche optimisation, digital branding and content stratégies. A thorough understanding of these technologies allows their full potential to be exploited.
The ideas presented Voici based on a decade of work in semantic research, in-depth studies of scientific literature, and analysis of patents related to l'IA générative.
Before adopting tools such as GEO (Generative Engine optimisation), it is essential to understand the technological foundations of LLMs. Just as it is essential to master moteurs de recherche to avoid ineffective practices, investing a few hours in learning these concepts can save time and resources by ruling out irrelevant stratégies.
Language models such as GPT, Claude, and LLaMA represent a major transformation in the way l'IA générative and moteurs de recherche process queries.
They do not simply search for textual matches, but generate nuanced and contextually rich responses, thanks to their advanced linguistic comprehension and reasoning capabilities. Par exemple, research such as Microsoft’s « Large Search Model: Redefining Search Stack in the Era of LLMs » highlights their role in reshaping search technologies.
LLMs adopt a unified approach to treating all research-related tasks as text generation problems. This enables them to:
Raw data is transformed into tokens, the fundamental units of models. These tokens represent various types of information (words, entities, images, etc.) depending on the application.
Tokens are converted into vectors, an essential step in transformer-based models such as Google’s technology. These vectors, which are numerical representations of tokens, capture their specific attributes and enable semantic relationships to be measured using methods such as cosine similarity or Euclidean distance.
This vector- and transformer-based approach revolutionised l'IA générative and remains a key factor in the widespread adoption of LLMs today.

Decoding is the stage where the model interprets the probabilities associated with each possible next token (word or symbol). The aim is to produce a fluid and natural sequence.
To achieve this objective, several methods are used:

These methods influence the model’s level of creativity. Par exemple, a « strict » model favours the most probable options, producing consistent and predictable responses, while a « more flexible » model explores alternatives, generating varied responses.
The choice of decoding method explains why the same prompt can produce different results. With greater creative freedom, models consider a wider range of potential words, which encourages more original and nuanced responses.
Bien que encoding and decoding processes are mainly associated with text processing, they also apply to other formats such as audio and visuals. These contents are first converted into text tokens before being processed by the models.
For GEO applications, this multimedia capability is less relevant. Cependant, the expanded context window of LLMs provides a better understanding of the relationships between primary and secondary entities in sentences, thereby enriching the results produced.
Large language models (LLMs) face three main challenges:
To remedy this, recovery-augmented generation (RAG) is emerging as an effective solution.
RAG enriches LLMs by providing them with subject-specific data, integrated in the form of documents or vectors representing complex semantic relationships (as in knowledge graphs).
This approach enables:
RAG provides opportunities for tools such as GEO by facilitating access to relevant sources and permettant customisation of the data used.
Cependant, the challenge remains in determining how platforms select and evaluate the quality of sources, a key factor in ensuring the relevance and reliability of responses.
Retrieval models play a fundamental role in retrieval-augmented generation (RAG) systems, acting as « specialised librarians » capable of identifying relevant information in huge data sets.
These models use advanced algorithms to evaluate and select the most useful data, incorporating external knowledge into text generation. This enables:
Recovery systems rely on several technologies, including:
Not all AI systems incorporate sophisticated retrieval mechanisms, which poses challenges for optimising RAG architectures.
OpenAI recently a présenté SearchGPT, combining several technologies to enrich its responses:
Bien que ChatGPT shows similarities with Bing’s rankings for certain queries, its model uses diverse sources, ensuring a certain degree of independence in its responses.
The evaluation of a recovery-augmented generation (RAG) system relies on several key metrics pour mesurer its performance and reliability.
In RAG systems, the prompt is transformed into a background query to efficiently query databases while preserving the original context, improving the accuracy and relevance of responses.
To bridge the gap between the two, RAG adapts complex prompts into queries while preserving critical context, ensuring optimal retrieval of relevant sources.
The objectives of GEO (Global Entity optimisation) vary Selon the ambitions of the stakeholders:
Bien que these objectives are complementary, they require distinct approaches. In both cases, it is essential to establish a strong presence among the sources favoured or frequently consulted by language models.
Each AI assistant uses specific criteria to select and recommend entities. This involves:
For professionals accustomed to focusing on Google, this requires a strategic shift, incorporating a variety of platforms and constantly monitoring market developments.
AI systems use contextual attributes to structure their responses. Par exemple:
Example prompt:
« I am 47 years old, weigh 95 kg, am 180 cm tall and run 3 times a week for 6 to 8 kilometres. quelles sont les best jogging shoes for me? »

Products or services frequently associated with these contexts are more likely to be mentioned by AI systems.
Platforms such as ChatGPT and Perplexity demonstrate how AI systems identify and cite content based on queries:
Working upstream to align votre contenu with AI assistant criteria is Par conséquent crucial to maximiser visibility and mentions in the responses generated.
Copilot analyses attributes such as age and weight to contextualise responses.
Based on the data provided, he can deduce a context of overweight by referring to the sources cited.

The sources cited come exclusively from informative content, such as tests, reviews and rankings, rather than e-commerce pages or detailed product descriptions.
ChatGPT takes into account attributes such as distance travelled and weight. Based on the referenced sources, it deduces a context of being overweight and long-distance running.

The sources cited come exclusively from informative content, such as tests, reviews and rankings, rather than from e-commerce pages, such as product descriptions or categories.
Perplexity takes into account the weight attribute and deduces a context of overweight from the referenced sources.
Sources include informative content such as tests, reviews, rankings, ainsi que traditional e-commerce pages.

Gemini does not directly provide sources in its results. Cependant, further analysis shows that it also takes age and weight contexts into account.
Gemini release – invitation to choose the best running shoes Gemini
sources – invitation to choose the best running shoes
Each major LLM recommends different products, but one shoe is consistently suggested by all four AI systems tested.

Each major LLM offers different products, but one shoe is consistently recommended by all four AI systems tested.

All systems show some creativity, suggesting different products during various sessions. Copilot, Perplexity, and ChatGPT favour non-commercial sources, such as review sites or test pages, in line with the prompt’s objective.
Bien que Claude offers shoe models, it relies solely on its initial training data, without access to en temps réel data or a retrieval system.

Each LLM has its own process for selecting sources and content, making the GEO challenge more complex. Recommendations are influenced by co-occurrence frequency and context, which increases the likelihood of certain tokens during decoding.
GEO emphasises the strategic positioning of products, brands, and content in the datasets used to train LLMs. Understanding the training process of these models in detail is crucial for identifying intégration opportunities.
The information presented below is drawn from studies, patents, scientific publications, EEAT research, and personal analysis. The main questions raised are:
Recent work, particularly on the selection of sources for tools such as AI Overviews, Perplexity, and Copilot, highlights significativement overlaps in the sources used.
Par exemple, analyses conducted by Rich Sanger, Authoritas, and Surfer show that the snippets generated by Google AI display approximately 50% overlap in their selection of sources.

The margin of fluctuation in study overlaps remains significativement. At the beginning of 2024, the overlap rate stood at around 15%, Bien que it reached up to 99% in some analyses.
Recovery systems appear to influence nearly 50% of the results of AI-generated previews, indicating ongoing experiments aimed at optimising performance. These observations corroborate criticisms about the variable quality of the responses produced by these systems.
The selection of sources used in AI responses reveals strategic opportunities to position brands or products in a contextually relevant manner. Cependant, it is crucial to distinguish between:
Examining the model training process sheds light on these distinctions. Par exemple, Gemini, Google’s large multimodal model, processes diverse data such as text, images, audio, video, and code. It uses web documents, books, code, and multimedia content for training, permettant it to handle complex tasks efficiently.
Finally, analyses of AI insights and the most frequently referenced sources offer valuable insights into the indices and Knowledge Graph used by Google during model pre-training. This paves the way for alignment stratégies to include relevant content.

In the RAG process, domain-specific sources are integrated to enhance the contextual relevance of responses.
A key feature of Gemini is its use of a mixture of experts (MoE) architecture. Unlike traditional transformers, which rely on a single neural network, a MoE model is composed of smaller, specialised « expert » networks. This model selectively activates the most relevant expert paths based on the input data, optimising the efficiency and performance of the system. It is likely that the RAG process will be integrated into this architecture.
Developed by Google, Gemini undergoes several training stages, using publicly available data and advanced techniques to maximiser the relevance and accuracy of the content generated:
Pre-training
Like other large language models (LLMs), Gemini is first pre-trained on a variety of public data sources. Google applies various filters to ensure data quality and avoid undesirable content. This phase also involves flexible selection of probable words, permettant the model to generate more creative and contextually appropriate responses.
Supervised fine-tuning (SFT)
After pre-training, the model is refined using high-quality examples, either created by experts or generated by other models and then reviewed by specialists. This process is similar to learning good textual structure by observing examples of well-written texts.
Reinforcement learning from human feedback (RLHF)
The model is then refined through human evaluations. A reward system based on user preferences helps Gemini recognise and learn preferred response styles and content types.
Extensions and augmentation through retrieval
Gemini can search external data sources, such as Google Search, Maps, YouTube, or other specific extensions, to enrich responses with up-to-date contextual information. Par exemple, to answer questions about the weather or current events, Gemini can query Google Search directly to find reliable, recent information and then incorporate it into its response.
The model filters résultats de recherche to include only the most relevant information based on the context of the query. Par exemple, for a technical question, it will prioritise scientific or technical results over general information available on the web.
Gemini combines the retrieved data with its internal knowledge to generate an optimised response. This process includes creating a logically structured and readable response, followed by a final review to ensure that it meets Google’s quality standards and does not contain inappropriate content. This quality control is reinforced by a ranking system that prioritises the most relevant responses. The model then presents the highest-ranked version to the user.
User feedback and continuous optimisation
Google constantly takes into account feedback de users and experts à refine the model and correct any weaknesses. One possibility being considered is that AI applications could access existing search systems and intégrer their results.
Some studies suggest that a high ranking in moteurs de recherche increases the likelihood that a source will be cited in AI applications connected to them. Cependant, as analyses have shown, current overlaps do not yet reveal a clear link between the highest rankings and the sources used.
Another criterion seems to influence the choice of sources: Google’s approach favours compliance with quality standards when selecting sources for pre-training and the RAG process. en plus, the use of classifiers is also cited as a determining factor in this process.

When naming classifiers, a link can be established with the concept of EEAT, where quality classifiers also play a key role.
The information provided by Google on post-training also mentions the use of EEAT to classify sources Selon their quality.

The reference to assessors refers to le rôle de quality assessors in the EEAT assessment.

Rankings in most moteurs de recherche depend on the relevance and quality of information, both in terms of the document, the domain, and the author or source entity.

Sources are often selected non seulement for their relevance, mais aussi for their quality in terms of the subject area and the source entity.
This can be explained by the fact that more complex queries must be rewritten in the background to generate appropriate requêtes de recherche for querying the rankings. Bien que relevance varies depending on the query, quality remains a constant criterion.
This distinction helps to understand the weak correlation between moteurs de recherche rankings and sources referenced by l'IA générative, and why lower-ranked sources may sometimes be included.
To evaluate quality, moteurs de recherche such as Google and Bing use classifiers, including Google’s EEAT framework. Google specifies that EEAT can vary depending on the domain, thus requiring stratégies tailored to each subject, particularly in GEO stratégies.
The sources used differ depending on the sector or subject, with platforms such as Wikipedia, Reddit and Amazon playing various roles, as indicated by a study by BrightEdge.
Par conséquent, industry- and subject-specific factors must be taken into account in positioning stratégies.
As mentioned earlier, there is still no tangible evidence regarding the direct influence on the results of l'IA générative. Platform operators themselves seem uncertain about comment qualify the sources chosen during the RAG process.
These factors highlight l'importance de identifying areas where optimisation efforts should be focused, particularly by determining which sources are sufficiently reliable and relevant to be prioritised.
The next challenge is to understand comment position yourself as one of these authoritative sources.
The research paper entitled « GEO: Generative Engine optimisation » a présenté the concept of GEO, exploring how l'IA générative outcomes can be influenced and identifying the key factors responsible for this influence.
Selon the study, the visibility and effectiveness of the GEO can be optimised by the following elements:
These factors vary depending on the domain, suggesting that integrating targeted, sector-specific personalisation is essential for increasing the visibility of web pages.
The following tactical measures for GEO and LLMO can be derived from this document:
En outre, certain tactical practices should be avoided:
Selon BrightEdge’s research, Voici the strategic considerations to take into account:
AI and Perplexity insights favour distinct sets of domains depending on the sector.
In the fields of health and education, these platforms favour reliable sources such as mayoclinic.org and coursera.com, making them key targets for effective SEO stratégies.
In contrast, in the e-commerce and finance sectors, Perplexity favours sites such as reddit.com, yahoo.com, and marketwatch.com. Tailoring your SEO efforts to these preferences using backlinks and co-citations can significativement improve performance.
basées sur l'IA search stratégies must be tailored to each sector.
Par exemple, Perplexity’s preference for reddit.com shows l'importance de community information in e-commerce, while AI Overviews favours sites such as consumerreports.org and quora.com for reviews and Q&As.
Marketers and SEO specialists should align their content stratégies with these trends by creating detailed product reviews or supporting Q&A forums for e-commerce brands.
SEO specialists must closely monitor Perplexity’s preferred sites, particularly the growing influence of reddit.com for community content.
Google’s partnership with Reddit could influence Perplexity’s algorithms, giving it greater priority.
SEO specialists must remain proactive and adjust their stratégies to adapt to changes in citation preferences, ensuring the relevance and effectiveness of their actions.
l'avenir de GEO, particularly in the context of propulsé par l'IA search such as ChatGPT, marks a significativement shift for corporate marketing stratégies. Il s'agit largely based on how consumer search behaviours are evolving and how companies are en tirant parti de new technologies to position themselves in these environments.
Here is a summary of the key points of this development and what it means for brands:
Businesses must focus non seulement on traditional SEO, mais aussi on the co-occurrence of their brand with relevant entities and attributes in authoritative media. réussir in a world dominated by l'IA générative, it will be crucial pour optimiser relationships on recognised platforms and invest in digital authority management to influence public perception on a large scale. 😊
Cela signifie that brands will need to adapt their stratégies to ensure they are well positioned in both traditional moteurs de recherche and AI-generated results, while anticipating rapid changes in the digital landscape.
SEO optimisation on LLMs involves making yourself visible on new moteurs de recherche.
Whether it’s ChatGPT, Perplexity, Microsoft Copilot, or Gemini, every large language model uses a database to provide answers to internet users. With a few exceptions, this database is simply made up of the vast universe of the internet. Autrement dit, LLMs draw their knowledge from the hundreds of thousands of sites web on the web. Among them is undoubtedly yours.
Some language models use content without specifying the source to the user. Cependant, pour le sake of transparency, more et plus encore of them are citing the source of the information they provide. Certain particularly relevant sites are thus able to make themselves visible on LLM tools. Users who want to En savoir plus can then click on the site link and access its content.
But to do that, you still need to master SEO optimisation on LLMs.
découvrez our tips to help you stand out and devenir visible on next-generation moteurs de recherche.
Before embarking on l'IA générative optimisation, it is essential to fully understand the technologies behind LLMs. This will enable you to identify the future potential of SEO.
These artificial intelligence assistants go far beyond simple text matching. Autrement dit, they do not just extract information from existing documents. Instead, they generate comprehensive, accurate, relevant, nuanced and contextually rich responses.
Il s'agit possible thanks to their linguistic comprehension and reasoning abilities (traitement du langage naturel – NLP).

Please note, Cependant, that LLMs do not ‘understand’ internet users’ queries in the human sense of the term. They process data statistically.
In essence, the main principles of SEO remain unchanged. It is always necessary to maintain the technical aspects of your site web, boost its popularity with a netlinking stratégie, and produce quality content.
But here, the quality requirements are even higher than before. SEO on LLMs does not seek information from sites that all say the same thing, with different turns of phrase. Instead, l'IA générative wants unique and accurate content, with concrete examples, figures, infographics, études de cas, explicit sources, etc.
Ultimately, the more general votre contenu is, the less likely it is to meet the requirements of l'IA générative optimisation.
Concerned about the relevance of their sources, large language models will not promote unknown sites web. They favour sites that already have a certain authority and popularity. This obviously involves a netlinking stratégie, but non seulement that. Nowadays, you need to define a real brand identity: why are you unique?
Ultimately, l'IA générative optimisation goes even further than traditional SEO.
Please note that, at present, there is no proven and approved method for optimising l'IA générative. en fait, each model produces different results in response to the same query. It is Par conséquent not a 100% exact science.
That said, these key principles maximiser your chances of appearing on new moteurs de recherche.
With Christmas shopping season approaching, SEO optimisation for LLMs is becoming essential. To capture consumers’ attention on platforms such as SearchGPT, brands must adjust their SEO stratégies by incorporating optimisations tailored to l'IA générative. By focusing on content quality, relevant co-occurrences and visibility on reliable sources, businesses will be able pour améliorer their positioning during this crucial period.
For many people, Christmas shopping is not always a pleasant experience. So when an basées sur l'IA tool offers the possibility of providing a list of ideas tailored to a relatively specific request, including the set budget and key information about the recipient, it would be a shame not to take advantage of it. Bien que this practice has not yet been adopted by all suppliers, who still overwhelmingly favour Google’s moteurs de recherche, it remains an increasingly worrying trend for brands, who see a growing interest in appearing among the first options offered by platforms.
For annonceurs, it is Par conséquent important to monitor their own mentions on basées sur l'IA search platforms and mettre en œuvre an effective SEO stratégie to strengthen their online presence. Il s'agit a much more complex task than it is for moteurs de recherche. On platforms such as ChatGPT, search is conversational, which makes results more contextual and personalised, and Par conséquent more difficult to categorise.
SEO tools have recently emerged to enable brands to obtain information about how they are mentioned in conversations. Profound, en particulier, is one of the leaders in this market. This tool generates multiple conversations using variations to run them several times, afin de identify the brands most recommended by different platforms. Through this technology, brands can En savoir plus about their own visibility in the responses provided by platforms and adapt their SEO stratégie accordingly, targeting relevant mots-clés.
Based on a study conducted on toy retailers, the moteurs de recherche Land site web revealed some relevant information provided by Profound. The analysis tool can, en particulier, provide a ranking of visibilité de la marque on a given topic, based on conversations generated on propulsé par l'IA search platforms. The percentage shown Par conséquent corresponds to the rate of AI responses that mention the brand in question:

Please note that the visibility of each brand may differ depending on the platform, as the models used are not the same:

It is also possible to determine the position of each competing brand for each of the different themes provided to Profound:

But then, how did the brands mentioned above become benchmarks in their field? It’s all about SEO, so it’s worth taking a look at the sources used by AI to provide its answers.
Par exemple, on the subject of toys, it was mainly the content on the TinyBeans site web, aimed at parents, that had the greatest influence on the results. Autrement dit, this site is considered the leading authority on toy-related content, Selon various basées sur l'IA search models.

Among the top four results, there are no fewer than three content publishers, namely TinyBeans, Forbes and Parents. Conversely, content provided by the biggest brands in this market, such as Lego and Melissa & Doug, only appear at the bottom of the top 10.
While the analysis carried out on the Profound tool is not sufficient to draw a definitive conclusion about how referencing works, a clear trend seems to be emerging. Unlike moteurs de recherche such as Google, which favour the quality of content provided by the brand afin de highlight it on results pages, it is third-party content that seems to take precedence on SearchGPT platforms.
Par conséquent, for a brand to perform well in the results delivered by these platforms, it must first and foremost be promoted within recognised sources. en plus to the SEO recommendations already mentioned, brands would be well advised to mettre en œuvre digital public relations stratégies and work in collaboration with quality content publishers afin de gain a competitive edge over their rivals.
If you would like to increase votre visibilité on l'IA générative engines, please do not hesitate to contact our SEO agency pour optimiser your site web.