Resisting OpenAI's "white prostitution", but Reddit was "exploded" by its own users first?

Question

Source: Silicon Starman (ID: guixingren123)

Author: Li Hezi

Never thought that Reddit is such a persona

I don’t know how many people still remember Reddit’s announcement of charging for API interfaces in April this year.

The review of the stream-saving version is that because companies such as OpenAI and Google are not satisfied with the data of their own platforms to train their large models, Reddit is going to start charging companies that call its API.

Recently, it was discovered that affected by this incident, the three large-scale groups r/aww, r/pics and r/gifs on Reddit (with 34.1 million, 30 million and 21.6 million subscribers respectively) have been stalked by John Oliver. The picture "exploded".

Because the posts displayed in the Reddit groups are sorted by the number of likes, when entering these three groups, the screen is full of John Oliver's face full of joy...

spez is the Reddit username of Reddit CEO Steve Huffman

r/gifs and r/aww also changed their group names to "GIFs of John Oliver" and "A subreddit for cute and cuddly John Oliver pictures."

John Oliver is the host of the well-known talk show "Last Week Tonight". This show is famous for its satire on current affairs news. Netizens' collective performance art obviously wants to use his memes to express their dissatisfaction with Reddit.

For example, there is a picture with 222,000 likes. The content of the screen is a group photo of John Oliver and the three main characters of "Sesame Street", and the accompanying text is: John Oliver and the CEO and executives of Reddit.

What's even more impressive is that many of the materials used in these memes were provided to netizens by John Oliver himself.

So what the hell is going on here?

Reddit sowing discord?

Reddit announced through the media on April 18 this year that it will charge data usage fees for companies that call its API. At that time, Reddit CEO Steve Huffman made it clear that "Reddit's data corpus is very valuable, but we don't need to use all of them. Valuable data is freely available to some of the world's largest companies."

At first glance, this decision was aimed at companies that develop large models such as OpenAI and Google, but soon, some developers in other fields recalled that they might be the one who was slaughtered the most.

The most troublesome one was on the 8th of this month, when the third-party Reddit client Apollo on the iOS platform announced that it would be officially shut down on June 30th.

We know that Reddit's mobile terminal has been doing poorly for a long time, so it has spawned the development of many third-party apps. They will use the free API interface provided by Reddit to help users browse the content on Reddit more conveniently. Apollo is one of the most popular third-party Reddit clients.

Apollo developer Christian Selig talked about the reasons for closing Apollo, saying that under the new API policy, Reddit will charge $12,000 for every 50 million API requests. $1.68 million per month (7 billion API requests) upfront, and potentially as much as $20 million per year to Reddit.

The problem is that this sky-high fee is simply unaffordable for individual developers like Christian Selig and Apollo, which is positioned for free.

Christian Selig communicated with Reddit many times to no avail, and finally made a decision to close the site. In fact, when things develop here, everyone can disperse. It is reasonable for Reddit to charge for the API, but what really makes users angry is the next series of reddit operations.

When Christian Selig was still negotiating with Reddit, one day he suddenly received a message asking him how he commented on Reddit's internal claim that "Apollo tried to threaten Reddit for $10 million to quell the dispute".

But what Reddit didn't expect was that Christian Selig recorded the communication with them. He then posted the transcript and audio of this part of the call online, and commented that Reddit was "blatantly lying."

I thought that after this time, Reddit would rethink the pricing, but it not only still insisted that it would promote the new API policy, but also continued to criticize Christian Selig, "Talk one thing to us, but it is completely another thing to the outside world... Recorded and leaked private calls so that I don't know how we should do business with him."

This poor response eventually led to more than 7,000 groups on Reddit joining the protest against Reddit. Some groups blackened most of their content, some turned private, and others chose to close down.

At one point I couldn't even find r/funny, the largest group on Reddit

Although this wave of protests has almost paralyzed Reddit, Reddit CEO Steve Huffman still made some amazing remarks, such as calling the group leaders who donated to the Reddit group unpaid "landlord gentlemen", and many group members had to listen to them "It's like a city where protests go on for too long and the rest of the townspeople want to get on with their lives... If they could comment, I bet the group would say 'Turn it off, it's annoying'."

Then came the performance art of the user at the beginning of this article.

In order to overthrow Steve Huffman's remarks, the team leaders of these groups called on their respective team members to conduct a vote to determine the future of the group. The options are: A-Return to normal, B-Only allow John Oliver's meme.

As a result, the number of votes for option B overwhelmingly won.

Everything starts with the big model

Reddit is actually not the first platform to change the API fee rules because of the problem of large model call data. In February of this year, Musk announced that Twitter's API access will set up a paywall in the future.

According to a document published by a Twitter customer representative in early March, the company plans to offer developers three levels of Enterprise packages:

The Small Package, the cheapest of the bunch, costs $42,000 a month and gives access to 50 million tweets. Higher tiers give researchers or businesses access to more data, 100 million and 200 million tweets, respectively, but cost $125,000 and $210,000 a month, respectively.

In other words, developers have to pay Twitter at least $500,000 a year (but the data volume of 505,000 tweets is far from enough to train a large model).

On April 19 (the day after Reddit announced that it would charge for API usage), Microsoft announced that it would no longer provide user-oriented advertising data management services for Twitter due to dissatisfaction with the inability to access Twitter data for free in the future.

Then Musk tweeted the next day that he might sue Microsoft, accusing it of "illegally" using Twitter data to train AI.

In addition, Getty Images also sued Stability AI in February this year, claiming that it violated the copyright of Getty Images' pictures.

However, when similar incidents developed to Reddit, the situation seems to be different. First, Reddit did not choose to sue the large model companies. Second, after Reddit’s API charging standards were exposed, the large model companies (especially OpenAI) remained silent.

Unknown to many, Sam Altman, now the CEO of OpenAI, was an early investor in Reddit.

The mobile application Loopt developed by Altman in his early ventures was once the same group of start-up companies invested by the well-known old incubator Y Combinator as Reddit. Later, the venture failed. After selling Loopt in 2012, Altman joined Y Combinator as a part-time partner people.

In 2014, the founder of Y Combinator, Paul Graham, chose Altman, who was 20 years younger than himself, as the president of Y Combinator. Later, Y Combinator led by Altman led Reddit’s B round of financing in September 2014. thing.

He even served as interim CEO of Reddit for 8 days after Reddit CEO Yishan Wong resigned in 2014.

For more than seven years since then, Altman has been a member of Reddit's board of directors until January 2022 when he announced his departure. When he left, he said, "I love Reddit as a user, and I love the years I spent on the board. The team led by Steve and the rest of the board are great, and the entire company is very capable people."

Therefore, some people speculate that with 7 years of "friendship", Reddit's decision to charge for API usage may have been discussed with OpenAI.

Data is considered to be one of the key elements of future large-scale model competition, especially with the release of Meta's open-source large language model LLaMA. Not long ago, the statement that "Google and OpenAI have no moat" regarding large-scale model open source was also widely discussed— —One of the conclusions is that the quality of data used to train large models is better than the size of the data.

Regardless of data quality or data size, Reddit has an advantage. First of all, it is the 11th most visited website in the world (6th in the United States), and secondly, it produces constantly updated real discussions on the hottest events at the moment—however you look at it, it is the most ideal for training large models database.

It was previously reported that Reddit plans to IPO later this year, which means that Reddit, whose income is still dominated by advertising and has not yet made a profit, urgently needs to find more ways to make money. OpenAI, which is not short of money, is obviously more profitable than individual developers. is attractive.

Altman also said before that OpenAI is actively cooperating with content companies and obtaining authorization, expressing its willingness to pay high prices for high-quality data in specific fields.

One for data, one for money, seems like a perfect combination. Some people also speculate that Reddit may access large models in the future.

Judging from the current tough attitude of Reddit, it seems that it does not want to take care of individual developers too much. In the face of users and commercial interests, it chooses the latter. But a contradictory problem is that the Reddit platform that Steve Huffman said can train the large model to produce the best results, and the data with "novelty and relevance" at the same time is created by one Reddit user after another.

But like Steve Huffman would say "a city where protests have lasted so long that other citizens want to get on with their lives," he seems pretty convinced that users won't leave.

View Original