OpenAI seems to have run out of quality training data for GPT-5 training

By: Nastya Bobkova | 26.12.2024, 13:50

The development of GPT-5 OpenAI has been seriously hampered by high costs and technical problems, although work on the project has been going on for more than 18 months.

Here's What We Know

Earlier, Microsoft hoped that the new model would be ready by mid-2024, but the deadline will not be met.

According to The Wall Street Journal, each GPT-5 training costs the company more than $500 million in computing power alone, but the result has not yet met expectations. Compared to GPT-4, only minor improvements have been achieved, and they are not enough to justify such a huge expenditure.

One of the main problems is the lack of high-quality data to train the model. The public internet cannot provide enough diverse and high-quality data to achieve the desired results. To solve this problem, OpenAI has invited experts to create new training materials, such as software code and mathematical problems. However, this process is very slow.

For example, GPT-4 training required 13 trillion tokens, which is a huge amount of text that cannot be collected in a short time.

The company's internal problems have also exacerbated the situation: more than two dozen key executives left OpenAI in 2024, including Chief Scientist Ilya Sutskever and CTO Mira Murati. In addition to GPT-5, the company is working on other projects such as o1 and Sora.

OpenAI CEO Sam Altman confirmed that GPT-5 will not appear in 2024, which significantly delays the company's plans for the development of artificial intelligence.

Source: WSJ