Can we trust artificial intelligence (AI) tools? In other words, will they soon be able to replace humans? This is the question that is being asked today in many fields. The financial sector is obviously no exception.
A study conducted by the firm Reputation Age provides interesting insight in this respect. It sought to measure the degree of reliability of ChatGPT, Claude (paid version), and Gemini regarding the financial indicators of CAC 40 companies. The results are staggering.
Simple questions
The firm questioned these tools using two methods. First, it requested the generation of a table presenting four key indicators for all groups in the Parisian index (except Pernod Ricard due to its non-calendar fiscal year): revenue, net income, operating income and net debt.
In a second phase, a Reputation Age agent questioned each model more precisely: one question per indicator sought for each company, totaling 156 questions (4 questions for 39 companies).
The proposed results were then compared with the financial indicators published in the companies" press releases.
Results that call for caution, to say the least
Regarding the first exercise, it appears that ChatGPT was unable to generate the requested table. Claude managed to present a table but with only 11 correct indicators (a positive response rate of 7%). As for Gemini, while it ranked first in this section, its accuracy rate was limited to 12.9% (20 out of 156).
One could attribute these inaccuracies to requests that are too vague. However, the second exercise did not yield better results. The correct response rates were 7.1% for Claude, 9.6% for ChatGPT and 12.8% for Gemini.
In detail, while the rate of correct answers is very poor, regardless of the indicator sought (8% to 17%), it is particularly disappointing regarding operating income. ChatGPT and Claude did not present any valid operating income results. And while Gemini fared slightly better here, its relevance rate was only 11%.
Precision currently out of reach
The conclusion of the firm behind the study is clear. "Financial precision remains out of reach for AI," the study states, pointing to "dangerous approximations."
Naturally, one might wonder about the reasons for this inability to produce reliable results. Reputation Age provides some answers. The main weakness of these tools lies in the fact that they do not use official websites as their sole reference. The multiplicity of sources and the lack of prioritization necessarily lead to inaccuracies. Furthermore, this work highlighted some confusion regarding the fiscal years taken into account.
Unsurprisingly, Reputation Age does not forget to opportunely remind CAC 40 companies of the importance of being better identified by AI tools and the need to be supported by specialists in this field. However, in the nearer term, this study can primarily be seen as a warning to the general public. AI is an undeniable aid for searching, sorting, and selecting information in an increasingly massive data universe. On the other hand, trusting it blindly without performing a minimum of verification would be totally foolish, especially in matters of investment. It is in this regard that AI is not yet able to replace human agents capable of providing certification or a guarantee.




















