ByteDance is using GPT-4 to train its models

ByteDance got busted using responses from OpenAI’s AI models to secretly develop their own chatbot. This action is not acceptable as it violates the rules set by OpenAI and Microsoft. The Verge's Alex brought this to light, leading to OpenAI suspending ByteDance's account on their platform.

What is going on here?

ByteDance leaned hard on OpenAI's API to develop Project Seed, knowing it was illegal.

What does this mean?

ByteDance extensively relied on OpenAI's API throughout the development of its chatbot, Project Seed, encompassing tasks such as training and performance testing. It's important to note that using the outputs of AI models for training new models is generally prohibited by OpenAI and many other AI companies.

Despite ByteDance instructing its team to cease using text generated by GPT models for training, the API was still covertly employed to assess the performance of Project Seed. The ByteDance team faces immense pressure to achieve a level of performance comparable to GPT-3.5 by the end of 2023 and GPT-4 by mid-2024.

In response to this breach of rules, OpenAI has suspended ByteDance's API account. Notably, the majority of ByteDance's utilization of OpenAI's services occurred through Microsoft Azure.

Why should I care?

Startups have been leveraging synthetic data generated by GPT-4 to train their models for several months without encountering significant resistance from OpenAI. Similarly, open-source models undergoing fine-tuning based on GPT responses have faced minimal pushback. However, the landscape might change as more prominent companies, such as ByteDance, engage in similar practices. It is anticipated that OpenAI may adopt a more stringent stance in response to the increased use of its models by larger entities.