OpenAI, the leading artificial intelligence firm, has unveiled its latest web crawling tool, “GPTBot,” with the potential to revolutionize future ChatGPT models. According to a recent blog post, web pages crawled with GPTBot’s user agent could contribute to improving the accuracy and capabilities of upcoming iterations.
A web crawler, commonly known as a web spider, is a bot designed to index content from websites across the internet. Major search engines like Google and Bing utilize web crawlers to populate their search results with relevant websites. OpenAI’s GPTBot aims to gather publicly available data from the world wide web, adhering to strict guidelines. It filters out sources with paywalled content, those that collect personally identifiable information, and text that violates OpenAI’s policies.
Website owners can deny access to GPTBot by implementing a “disallow” command in a standard file on their servers, ensuring control over their content’s accessibility.
Notably, OpenAI’s move to develop GPTBot comes shortly after the company filed a trademark application for “GPT-5,” the anticipated successor to the current GPT-4 model. The application, submitted to the United States Patent and Trademark Office, covers using “GPT-5” in AI-based human speech and text software, audio-to-text conversion, and voice and speech recognition.
However, interested observers may need to exercise patience, as OpenAI’s founder and CEO, Sam Altman, clarified that the training of GPT-5 is not imminent. Safety audits must be conducted thoroughly before commencing the process.
Amid the excitement over GPTBot’s potential, concerns about OpenAI’s data-collecting practices have been raised. Specifically, issues regarding copyright and consent have drawn regulatory attention. In June, Japan’s privacy watchdog warned OpenAI for collecting sensitive data without proper permission. Furthermore, Italy temporarily banned ChatGPT’s use in April, alleging a breach of European Union privacy laws.
These concerns escalated in late June when a class-action lawsuit was filed against OpenAI by 16 plaintiffs. The allegations revolved around OpenAI accessing private information from ChatGPT user interactions. Such actions could potentially put OpenAI and its partner, Microsoft, in violation of the Computer Fraud and Abuse Act, a law with a precedent for web-scraping cases.
Despite the ongoing challenges, OpenAI remains committed to responsible AI development and aims to address these concerns transparently while pushing the boundaries of language models to enhance human-computer interactions.