Geeking with Greg

Saturday, August 19, 2023

Challenges using LLMs for startups

Someone (not an AI expert) was asking me about applying large language models (LLMs, like ChatGPT) to a particular product. In case it's useful to others, here's an edited version of what I said.

When thinking about LLMs for a task, I think it's important to consider how LLMs work.

Essentially, LLMs are trying to produce plausible text given the context. It's roughly advanced next word prediction, so given previous words and phrases, and an enormous amount of data on what writing usually looks like around similar words and phrases, predict the next words.

The models are trained by having human judges quickly rate a bunch of output for how plausible it looks. This means the models are optimized for producing output that, on first glance, looks pretty good to people.

The models can imitate writing style, especially for a short period of time, but not particularly well or closely. If you ask LLMs to produce what a specific person would say, including the user of the product, you'll get a plausible-looking but unreliable answer.

The models also have no understanding of what is correct or accurate, so they will produce output that is wrong, potentially in problematic ways.

The models have no reasoning ability either, but they often can produce plausible output to questions that appear to require reasoning by rephrasing memorized answers to similar questions.

Unfortunately, these issues mean you may struggle to get LLMs to produce high quality output for many of the features and products you might be thinking about, even with considerable effort and optimization on the LLMs.

If you're prepared for that, you can try to make customers forgiving of the inevitable errors with a lot of effort on the UI and by limiting expectations, but that's not easy!

Friday, June 30, 2023

Attacking the economics of scams and misinformation

We see so many scams and so much misinformation on the internet because it is profitable.

It's cheap to create bogus accounts. It's cheap to use hordes of accounts to shill your scams and feign popularity. Posting false customer reviews easily can make crap look like it's trustworthy and useful.

Bad actors are even more effective when they manipulate ranking algorithms. When fake crowds of bogus accounts like, share, and click on content, algorithms that use customer behavior -- such as trending, search, and recommenders -- think crap is genuinely popular and show it to even more people.

Today, the FTC announced rules that change the game: "Federal Trade Commission Announces Proposed Rule Banning Fake Reviews and Testimonials."

These rules make it much more risky and costly to create fake reviews of your products. About a third of customer reviews are fake!

These new rules also make it much more costly to manipulate social media using fake accounts and shills. From the FTC: "Businesses would be prohibited from selling false indicators of social media influence, like fake followers or views. The proposed rule also would bar anyone from buying such indicators to misrepresent their importance for a commercial purpose."

Important is this changes the economics of spam for the bad guys. Before, faking and shilling was free advertising for the bad guys and often profitable. Now businesses that shill and use fake followers face risk of much higher costs, likely tipping the balance into making a lot of scams unprofitable.

A paper from a decade ago, "The Economics of Spam", describes how difficult it is already is to make money as an email spammer. Then it summaries interventions like the FTC's recent action brilliantly, saying, "The most promising economic interventions are those that raise the cost of doing business for the spammers, which would cut into their margins and make many campaigns unprofitable."

More risky and less profitable means less of it. This action from the FTC is great news for anyone who uses the internet.

Tuesday, June 13, 2023

Optimizing for the wrong thing

Many companies that think of themselves as data-driven underestimate how easy it is for metrics to go terribly wrong.

Take a simple example. Imagine an executive who will be bonused and promoted if they increase advertising revenue next quarter.

The easiest way for this exec to get their payday is to put a lot more ads in the product. That will increase revenue now, but annoy customers over time, causing a short-term lift in revenue but a long-term decline for the company.

By the time those costs show up, that exec is out the door, on to the next job. Even if they stay at the company, it's hard to prove that the increased ads caused a broad decline in customer growth and satisfaction, so the exec gets away with it.

It's not hard for A/B-tested algorithms to go terribly wrong too. If the algorithms are optimized over time for clicks, engagement, or immediate revenue, they'll eventually favor scams, lots of ads, deceptive ads, and propaganda because those tend to maximize those metrics.

If your goal metrics aren't the actual goals of the company -- which should be long-term customer growth, satisfaction, and retention -- then you easily can make ML algorithms optimize for things that hurt your customers and the company.

Data-driven organizations using A/B testing are great but have serious problems if the measurements aren't well-aligned with the long-term success of the company. Lazily picking how you measure teams is likely to cause high future costs and decline.

Sunday, April 30, 2023

Why did wisdom of the crowds fail?

Wisdom of the crowds summarizes the opinions of many people to produce useful results. Wisdom of the crowds algorithms -- like rankers, recommenders, and trending algorithms -- usefully do this at massive scale.

But several years ago, wisdom of the crowds on the internet started failing. Algorithms started recommending misinformation, scams, and disinformation. What happened?

Let's think about it in more detail. What changed that caused problems for wisdom of the crowds? Why did it change? What can we can we do about it?

Importantly, did anyone find ways to mitigate the problems? If some did fix their algorithms from amplifying misinformation on their platforms, how did they do that? And why didn't everyone fix their wisdom of the crowd algorithms to prevent them from amplifying misinformation?

I have my own answers to these questions, but I'm curious to hear others. If you have thoughts, I'm most curious to hear about whether you think anyone (at least partially) addressed the problems aggravating misinformation on the internet and, if so, why you think others have not.

Netflix and their new streaming with ads

I was wondering how well Netflix's new ad-supported plans are doing. There hasn't been a lot of criticial reporting on it, and I'm sure others are wondering too, so let's go take a look at what we can find.

Their Q1 2023 only has a few details, but it sounds like the $7/month ad plan generates more total revenue than the $10/month basic, but does not appear to be more profitable and does not appear to be getting a lot of subscribers.

It's not surprising that customers aren't in love with the new Netflix ad plan. It's got ads and the catalog is smaller, and it's only $3 more a month to upgrade to no ads in basic.

It's also not surprising that Netflix is able to get at least $3/month in ad revenue from these viewers, though it might be surprising if it was also substantially more profitable given the cost of acquiring and serving those ads.

It'll take more time before we'll know how this goes for Netflix. But so far it doesn't seem like it's much of a success?

Only as good as the data

The Washington Post reports on the data used for ChatGPT and other large language models (LLMs):

We found several media outlets that rank low on NewsGuard’s independent scale for trustworthiness: RT.com No. 65, the Russian state-backed propaganda site; breitbart.com No. 159, a well-known source for far-right news and opinion; and vdare.com No. 993, an anti-immigration site that has been associated with white supremacy.
Chatbots have been shown to confidently share incorrect information ... Untrustworthy training data could lead it to spread bias, propaganda and misinformation.

AI is only as good as its data. Obviously using known propaganda like Russia Today will be a problem for ChatGPT. Generally, including disinformation or misinformation will make the output worse.

AI/ML benefits from thinking hard about high quality data and the metrics you use for evaluation. It's all an optimization process. Optimize for the wrong thing and your product will do the wrong thing.

Monday, April 17, 2023

The biggest threat to Google

Nico Grant at the New York Times writes that Google is furiously adding features to its web search, including personalized search and personalized information recommendations, in an "panic" that "A.I. competitors like the new Bing are quickly becoming the most serious threat to Google’s search business in 25 years."

Now, I've long been a huge fan of personalized search (eg. [1] [2]). I love the idea of recommending information based on what interested you in the past. And I'm glad to see so many interested in AI nowadays. But I don't think this is the most serious threat to Google's search business.

The biggest threat to Google is if their search quality drops to the point that switching to alternatives becomes attractive. That could happen for a few reasons, but misinformation is what I'd focus on right now.

Google seems to have forgotten how they achieved their #1 position in the first place. It wasn't that Google search was smarter. It was that Altavista became useless, flooded with stale pages and spam because of layoffs and management dysfunction, so bad that they couldn't update their index anymore. And then everyone switched to Google as the best alternative.

The biggest threat to Google is their ongoing decline in the usefulness of their search. Too many ads, too much of a focus on recency over quality, and far too much spam, scams, and misinformation. When Google becomes useless to people, they will switch, just like they did with Altavista.

Sunday, April 16, 2023

Ubiquitous fake crowds

The Washington Post writes: "The Russian government has become far more successful at manipulating social media and search engine rankings than previously known, boosting ... [propaganda] with hundreds of thousands of fake online accounts ... detected ... only about 1% of the time."

Fake crowds can fake popularity. It's easy to manipulate trending, rankers, and recommender algorithms. All you have to do is create a thousand sockpuppet accounts and have them like and share all your stuff. Wisdom of the crowds is broken.

This can be fixed, but first you have to see the problem clearly. Then you'll see that you can't just use the behavior from every account anymore for wisdom of the crowd algorithms. You have to use only reliable accounts and toss everything spammy or unknown.

Saturday, March 25, 2023

Are ad-driven business models bad?

There's been a lot of discussion that ad-driven business models are inherently exploitative and anti-consumer. I think that's both wrong and not a helpful way to look at how to fix the problems in the tech industry.

I think the problem with ad-driven models is that it's easy and tempting for executives to use short-term metrics and incentives like clicks or engagement. It's the wrong metric and incentives for teams. But I think the problem is more ignorance, or willful ignorance, of that issue.

In the short-term, for an ad-supported product, ad revenue and profitability does look like ad clicks. In the long-term, ad profitability looks like converting performing ads for advertisers over the lifetime of customers. Those are quite a bit different.

With subscription-driven models, it's more obvious that your metrics should be long-term. With ad-driven models, long-term metrics are harder to maintain, and many execs don't realize they need to. If execs let teams optimize for clicks, they eventually find those clicks have long-term costs as customers start leaving, but unfortunately it's quite costly to reverse the damage once you're far down this path.

In the long-term, I think you can improve the profitability of an ad-driven platform by making the content and ads work better for customers and advertisers (raising ad spend, increasing ad competition for the space, and reducing ad blindness) and by retaining customers longer (along with recruiting new customers). That looks a lot like the strategy for increasing the profitability of a subscription-driven platform. So I don't see much of a difference between ad-supported and subscription-supported business models other than the temptation for executives to inadvertently optimize for the wrong thing.

Saturday, March 18, 2023

NATO on bots, sockpuppets, and shills manipulating social media

NATO has a new report, "Social Media Manipulation 2022/2023: Assessing the Ability of Social Media Companies to Combat Platform Manipulation".

Buying manipulation remains cheap ... The vast majority of the inauthentic engagement remained active across all social media platforms four weeks after purchasing.
[Scammers and foreign operatives are] exploiting flaws in platforms, and pose a structural threat to the integrity of platforms.

The fake engagement gets picked up and amplified by algorithms like trending, search ranking, and recommenders. That's why it is so effective. A thousand sockpuppets engage with something new in the first hour, then the algorithms think it is popular and show crap to more people.

I think there are a few questions to ask about this: Is it possible for social media platforms to stop their amplification of propaganda and scams? If it is possible but some of them don't, why not? Finally, is it in the best interest of the companies in the long-run to allow this manipulation of their platforms?

Saturday, February 25, 2023

Too many metrics and the Otis Redding problem

The "Otis Redding problem" is "holding people, groups, or businesses to too many metrics: They can’t satisfy or even think about all of them at once."

The problem is not just that people don't really know what to do anymore. It's that many people, when faced with this, start doing things that reward themselves: "They end up doing what they want or the one or two things they believe are important or that will bring them rewards (regardless of senior management’s strategic intent)."

That quote is from Stanford Professor Bob Sutton's book Good Boss, Bad Boss, which somehow I hadn't read until recently. I've read all of Bob Sutton's other books too, they're all great reads.

This is just one tidbit from that book. There's lots more in there. On the Otis Redding problem, my read is that Bob's advice is to only pick a 2-3 simple, actionable metrics, but then frequently discuss whether they are achieving what you want and change them if they aren't.

By the way, the name the "Otis Redding problem" comes from the line in his song "Sitting on the Dock of the Bay" where he says, "Can’t do what ten people tell me to do, so I guess I’ll remain the same."

Superhuman AI in the game Go

For a few years now, AI achieved superhuman game playing abilities for Go.

It was quite a milestone for AI. When I was in graduate school, people used to joke that AI for Go was where careers go to die. The game has a massive search space, so had thwarted efforts for decades.

So AlphaGo and similar efforts that beat top-ranked Go players was a very big deal indeed when it happened back in 2016. But now, a amateur-level human player just beat a top-ranked AI at playing Go. He won 14 of 15 games.

Most of the reporting on this has been that the player used an exploit, one hole in the AI strategy, that will easily be closed. But I think this will be harder to fix than most people expect.

AlphaGo and similar techniques work by using deep learning to guide the game tree search, focusing it on moves used by experts. This result says you can't do that, that you need to consider more possible moves.

The human won here by doing moves the AI didn't expect, then exploiting the result. It's not that there is just one hole. It's that doing moves outside of what the AI expects, anything outside of what it has seen in the training data, can result in a bad playing by the AI, which can then be exploited by the human.

Solving that means considering more moves by the opponent, which explodes the game tree search, making the search massively exponential again. I suspect it's going to be hard to fix.

Thursday, February 16, 2023

Huge numbers of fake accounts on Twitter

It seems like this should get more attention, "hundreds of thousands of counterfeit Twitter accounts set up by Russian propaganda and disinformation" that are "still active on social media today."

There has been widespread manipulation of social media, customer reviews, and trending, search ranker, and recommender algorithms using fake crowds.

All of these depend on wisdom of the crowds. They try to use what people do and like to help other people find things. But wisdom of the crowds doesn't work when the crowd isn't real.

Caroline Orr Bueno has some more details, writing that "this is the first we've heard of an ongoing campaign involving such a large number of accounts" and that it is clear this is at "a scale with the potential to mass-manipulate."

Orr Bueno also quotes former Twitter executive Yoel Roth as saying "it's all too cheap and all too easy." This is the core problem with misinformation and disinformation in the last decade.

If it is cheap, easy, and profitable to scam and manipulate using huge crowds of fake accounts, you will get huge numbers of fake accounts. The solution will have to be to make it more expensive, difficult, and unprofitable to scam and manipulate using fake accounts.

Details on personalized learning at Duolingo

There's a new, great, long article on how Duolingo's personalized learning algorithms work, "How Duolingo's AI learns what you need to learn".

An excerpt as a teaser:

When students are given material that’s too difficult, they often get frustrated and quit ... [Too] easy ... doesn’t challenge.
Duolingo uses AI to keep its learners squarely in the zone where they remain engaged but are still learning at the edge of their abilities.
Bloom’s 2-sigma problem ... [found that] average students who were individually tutored performed two standard deviations better than they would have in a classroom. That’s enough to raise a person’s test scores from the 50th percentile to the 98th
When Duolingo was launched in 2012 ... the goal was to make an easy-to-use online language tutor that could approximate that supercharging effect.
We'd like to create adaptive systems that respond to learners based not only on what they know but also on the teaching approaches that work best for them. What types of exercises does a learner really pay attention to? What exercises seem to make concepts click for them?

Great details on how Duolingo maximizes fun and learning while minimizing frustration and abandons, even when those goals are in conflict. Lots more in there, well worth reading.

Massive fake crowds for disinformation campaigns

The Guardian has a good article, "'Aims': the software for hire that can control 30,000 fake online profiles", on fake crowds faking popularity and consensus to manipulate opinion.

Misinformation and disinformation are the biggest problems on the internet right now. And it's never been cheaper and easier to do.

Note how it works. The fake accounts coordinate together to shout down others and create the appearance of agreement. It's like giving one person a megaphone. One person now has thousands of voices shouting in unison, dominating the conversation.

Propaganda is not free speech. One person should have one voice. It shouldn't be possible to buy more voices to add to yours. And algorithms like rankers and recommenders definitely shouldn't treat these as organic popularity and amplify them further.

The article is part of a much larger investigative report combining reporters from The Guardian, Le Monde, Der Spiegel, El Pais, and others. You can read much more starting from this article, "Revealed: the hacking and disinformation team meddling in elections".

Tuesday, January 31, 2023

How can enshittification happen?

Cory Doctorow has a great piece in Wired, "The ‘Enshittification’ of TikTok. Or how, exactly, platforms die." It's about that we regularly see companies make their product worse and worse until it hits a tipping point, then the company loses its customers and starts dying.

Enshittification eventually causes the company to die, so isn't in the best interest of the company. It's definitely not maximizing shareholder value or long-term profits. So why does it happen?

Cory Doctorow does have a bit on the why, but could use a lot more: "An enshittification strategy only succeeds if it is pursued in measured amounts ... For enshittification-addled companies, that balance is hard to strike ... Individual product managers, executives, and activist shareholders all give preference to quick returns at the cost of sustainability, and are in a race to see who can eat their seed-corn first."

That's not very satisfying though. I mean, the company dies. Execs are screwing up. Why does that happen? What can be done about it? That's the question I think needs answering.

Understanding exactly why enshittification happens is important to finding real, viable solutions. Is it purposeful or unintentional on the part of teams and company leaders? Is it inevitable or preventable? If you get the root cause wrong, you'll get the wrong solution.

My view is that enshittification is mostly unintentional. I think it's a result of A/B testing, mistakes in setting up incentives, and teams busily optimizing for what's right in front of them instead of keeping their eye on the prize.

I don't think executives intentionally drive companies into the ground. I think most execs and teams have no idea that this path they are going down will cause such long-term harm to the company. If most really don't want to destroy the company, that leads to different solutions.

Layoffs as a social contagion

Stanford Professor Jeffrey Pfeffer wrote about the recent layoffs at tech companies, saying that it hurts the company in the long-term, but CEOs can't avoid the pressure to join in.

[CEOs] know layoffs are harmful to company well-being, let alone the well-being of employees, and don’t accomplish much, but everybody is doing layoffs and their board is asking why they aren’t doing layoffs also.
The tech industry layoffs are basically an instance of social contagion, in which companies imitate what others are doing. If you look for reasons for why companies do layoffs, the reason is that everybody else is doing it ... Not particularly evidence-based.
Layoffs often do not increase stock prices, in part because layoffs can signal that a company is having difficulty. Layoffs do not increase productivity. Layoffs do not solve what is often the underlying problem, which is often an ineffective strategy ... A bad decision.

For more on the harm, please see my old 2009 post from the last time this happened, "Layoffs and tech layoffs".

Monday, December 19, 2022

Are ad-supported business models anti-consumer?

Advertising-supported businesses are harder to align with long-term customer satisfaction than subscription businesses, but they make more money if they do.

A common view is that ad-supported websites, in their drive for more ad clicks, cannot resist exploting their customers with scammy content and more and more ads.

The problem is that eventually those websites become unusable and the business fails. Take the simple case of websites that put more and more ads on the page. Sure, ad revenue goes up for a while, but people rapidly become annoyed with all the ads and leave. The business then declines.

That's not maximizing revenue or profitability. That's a business failure by execs that should have known better.

It's very tempting to use short-term metrics like ad clicks and engagement for advertising-supported businesses, which encourages doing things like increasing ad load or clickbait content. But in the long run, that hurts retention, growth, and ad revenue.

In a subscription-supported business, it's easier to get the metrics right because the goal is keeping customers subscribing. In an ad-supported business, it isn't as obvious that keeping customers around and clicking ads for years is the goal. But it's still the goal.

Ad-supported businesses will make more money if they aren't filled with scams or laden with ads. But it's easy for ad-supported businesses to get the incentives and metrics wrong, much more error-prone than for subscription-supported businesses where the metrics are more obvious. While it may be harder for executives to see, ad-supported business do better if they focus on long-term customer satisfaction, retention, and growth.

Monday, December 12, 2022

Focus on the Long-term

One of my favorite papers of all time is "Focus on the Long-Term: It's better for Users and Business" from Google Research. This paper found that Google makes more money in the long-term -- when carefully and properly measured -- by reducing advertising. Because of this work, they reduced advertising on mobile devices by 50%.

tl;dr: When you increase ads, short-term revenue goes up, but you're diving deeper into ad inventory and the average ad quality drops. Over time, this causes people to look at ads less, click on ads less, and reduces retention. If you measure using long experiments that capture those effects, you find that showing fewer ads makes less money in the short-term but more money in the long-term.

Because most A/B tests don't measure long-term effects properly and this is hard for most organizations to measure correctly, the broader implication is that most websites show too many ads to maximize long-term profits.

Saturday, December 10, 2022

ML and flooding the zone with crap

Wisdom of the few is often better than wisdom of the crowds.

If the crowd is shilled and fake, most of the data isn't useful for machine learning. To be useful, you have to pull out the scarce wisdom in the sea of noise.

Gary Marcus looked at this in his latest post, "AI's Jurassic Park moment". Gary talks about how ChatGPT makes it much cheaper to produce huge amounts of reasonable-sounding bullshit and post it on community sites, then he said:

For Stack Overflow, the issue is literally existential. If the website is flooded with worthless code examples, programmers will no longer go there, its database of over 30 million questions and answers will become untrustworthy, and the 14 year old website will die.

StackOverflow added:

Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking or looking for correct answers.
The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce. There are also many people trying out ChatGPT to create answers, without the expertise or willingness to verify that the answer is correct prior to posting. Because such answers are so easy to produce, a large number of people are posting a lot of answers.

There was a 2009 SIGIR paper, "The Wisdom of the Few", that cleverly pointed out that a lot of this is unnecessary. For recommender systems, trending algorithms, reviews, and rankers, only the best data is needed to produce high quality results. Once you use the independent, reliable, high quality opinions, adding more big data can easily make things worse. Less is more, especially in the presence of adversarial attacks on your recommender system.

When using behavior data, ask what would happen if you could sort by usefulness to the ML algorithm and users. You'd go down the sorted list, then stop at some point when the output no longer improved. That stopping point would be very early if a lot of the data is crap.

In today's world, with fake crowds and shills everywhere, wisdom of the crowds fails. Data of unknown quality or provable spam should be freely ignored. Only use reliable, independent behavior data as input to ML.