One of the first websites to declare last year that it would charge AI behemoths for access to the content required to train chat bots was Stack Overflow. The well-known Q&A platform for programmers has now signed up Google as its first client, marking the beginning of a “meaningful” new revenue stream, according to CEO Prashanth Chandrasekar.
The agreement is noteworthy since it is still unknown how much Google and other AI developers would spend on the content required for AI initiatives. The creation of AI systems has been aided by millions of books and websites, yet the majority of publishers have not received payment, and some are suing over alleged misuse. ChatGPT and other generative AI technologies seem to pose a threat to many publishers, including Stack Overflow, as they can provide answers to questions that would have previously directed coders to them.
As part of the agreement, Google’s cloud division will use Stack Overflow queries and answers about Google Cloud services to offer technical support and coding help via a Google Gemini chatbot. Customers of Google Cloud computing will also be able to ask queries via the command-line interface of Google Cloud. “We have a tremendous ability to help complete that loop, and their AI may not have all the answers,” Chandrasekar says. “We are the largest repository for vetted and curated community knowledge.”
In its own words, Gemini will condense responses from Stack Overflow and include the brand of the company, a reference to the source content, and the username of the site user who provided it. The system will be unveiled shortly after, with a demonstration scheduled for Google Cloud Next, the search engine giant’s annual cloud conference in April.
According to Chandrasekar, Google Cloud may leverage Stack Overflow data to train huge language models and other AI systems without facing any substantial limits. According to him, “trust, accuracy, quality, and attribution back to the sources of these AI outputs are where we want to stand firm.”
He refused to disclose the amount that Google pays Stack Overflow for the information. In the short, medium, and long terms, this will be a significant commercial proposition for us, according to Chandrasekar.
Covert Scraping
Before, without much notice, Google and other AI developers collected data from Stack Overflow and other websites. The websites that provide the fundamental material have started to seek what they consider to be their fair share as the demand for generative AI technology has increased, and the valuations of the firms developing them have skyrocketed. Luckily for Stack Overflow, Chandrasekar says, potential clients have taken note of the message. He declares, “We’re not having to chase people.”
AI systems that produce computer code, which have shown to be well-liked by software developers and a substantial source of income for Microsoft and OpenAI, can benefit greatly from the data that Stack Overflow provides.
The new Stack Overflow arrangement was signed barely one week after Google and the operator of discussion sites Reddit came to a licensing agreement to collect data, which helped chatbots become more conversational. The previous year, well before Stack Overflow did, Reddit had announced that it would begin charging for data access.
The costs that Stack Overflow charges for its OverFlow API are contingent upon the nature of the data that is supplied. The website charges extra for layers of metadata, such as post categories and voting histories of user-submitted answers, trends about the kinds of questions being asked, and customized cuts of information, like questions about a particular coding language, to aid in fine-tuning, in addition to its basic repository of 59 million questions and answers. According to Chandrasekar, “it’s more about what level of the data they have access to.” “It’s less about how often they ask for information.”
Internal testing, he claims, demonstrates the potential value of Stack Overflow data. He claims that after using Stack Overflow data to fine-tune open-source language models from Meta and AI company Mistral, the accuracy of answers to technical inquiries rose by 20 percentage points.
The Google agreement will also evaluate the new data creation capabilities for Stack Overflow for users of the Gemini version for Google Cloud integration. If the chatbot is unable to provide a suitable answer, users will have the option to submit their questions to Stack Overflow, where it will be available for the website’s user community to respond after being authorized by moderators. The firms are discussing allowing users to submit better answers to Stack Overflow as they get ready for the demo in April.