OpenAI has recently announced a significant change in its approach to user data. The company has confirmed that it will no longer use customer data sent via its APIs to train its expansive language models, such as GPT-4.
In a recent interview with CNBC, OpenAI CEO Sam Altman confirmed this change, which was implemented on March 1, 2023. OpenAI has updated its terms of service to reflect this new commitment to user privacy.
Altman stated, "Customers clearly want us not to train on their data, so we've changed our plans: We will not do that."
APIs, or application programming interfaces, allow customers to connect directly to OpenAI's software. Altman mentioned that OpenAI has not been using API data for model training "for a while," suggesting that this official announcement formalizes an existing practice.
Implications For Business Customers
This change by OpenAI has broad implications, especially for its business customers, which include companies like Microsoft, Salesforce, and Snapchat. These companies are more likely to utilize OpenAI's API capabilities for their operations, so the shift in privacy and data protection is particularly relevant to them.
However, it's important to note that the new data protection measures only apply to customers using OpenAI's API services. OpenAI's updated terms of service clarify that they may still use content from services other than their API.
For example, text entered into the popular chatbot ChatGPT may still be utilized by OpenAI unless the data is shared through the API.
Broader Industry Impact
OpenAI's policy shift comes at a time when industries are grappling with the potential impacts of large language models, like OpenAI's ChatGPT, replacing content traditionally created by humans.
Recently, the Writers Guild of America went on strike after negotiations with movie studios broke down. The Guild had been advocating for restrictions on using OpenAI's ChatGPT for script generation or rewriting.
OpenAI's decision not to use customer data for training marks a pivotal moment in the ongoing conversation about data privacy and AI. As companies continue to explore and push the boundaries of AI technology, ensuring user privacy and maintaining trust will likely remain central to these discussions.
The Evolution of ChatGPT: GPT-3 To GPT-4
OpenAI's commitment to not using customer data for training applies to its latest language model, GPT-4, which was released on March 14, 2023.
GPT-4 introduces several improvements over its predecessor, GPT-3. These include a significant increase in word limit size (25,000 compared to the 3,000-word limit of ChatGPT), a larger context window size, and improved reasoning and understanding capabilities.
Another notable feature of GPT-4 is its multi-modality, which allows it to understand and infer information from images in addition to text. This latest model generates more human-like texts and incorporates features like emojis for a more personalized feel.
The exact size and architecture of GPT-4 remain undisclosed, leading to speculation about the details of the model. OpenAI's CEO has denied specific claims about the model's size.
In terms of performance, GPT-4 has demonstrated strengths in text generation but also has some limitations. It scored in the 54th percentile on the Graduate Record Examination (GRE) Writing and performed in the 43rd – 59th percentile on the AP Calculus BC exam.
Additionally, GPT-4 performed well on easy Leetcode coding tasks, but its performance declined with increased task difficulty.
While the specifics of GPT-4's training process are not officially documented, it is known that GPT models generally involve large-scale machine learning with a diverse range of internet text.
As a result of changes to OpenAI's data usage policy, the data used for training its language models does not include information shared via the API unless users explicitly agree to contribute it for this purpose.
As this technology continues to improve and play a more significant part in our lives, it is interesting to observe how companies pivot and respond to concerns about keeping data private and earning people's trust.