Great read! The feedback loop of overoptimism fueling flawed research, further misleading or obfuscating fact finding is extremely worrisome.
It's a feedback loop that is maintained, and partly manufactured (if you want to get really cynical), by closed-source AI companies that benefit from over-attributing qualities to their newfound models and are happy to cheat benchmarks to beat the competition (on paper), as for them, it can be the difference between getting or not getting that next capital injection.
Good analysis. I do think there's an elephant in the room behind hype. Money. You get more grants, book deals and speaking engagements, if your results are "groundbreaking" than if they're "interesting." Your point about the suspension of common sense is also very important.
Great piece. An example of the "overestimate in the short run" half of Amara's Law unfolding before our very eyes. I appreciate that rather than just pointing at the problem, you analyze causes and propose constructive solutions.
I really appreciate how measured, even handed and objective you both are in all of your essays. I’m just so conditioned to having a subject like AI overrun by hot takes from the people who think Elon Musk will build a robot that will be elected President by November or the people who think that Terminator was a documentary. It’s just refreshing to read about a new and uncertain - though clearly potentially momentous - technology in a way that neither sensationalizes it or that makes it a boogeyman for all the world’s woes.
And as I write this and realize how pervasive that sort of hysterical commentary is on the internet, and then remember that most LLMs were trained on internet data, I just felt a twinge of nausea about the future…
Brilliant description of the hype-pressure-myth-reinforcement cycle. Makes me wonder where else in AI applications that model applies... something I'll have to look further into. Thank you for the food for thought.
Thanks for this. Data leakage is a big problem in corporate use cases as well, and I rarely see it discussed. I have seen highly experienced data scientists make mistakes regarding leakage. Formal peer review processes are critical to catching these issues and fostering staff development. I suspect that many cases where models underperform in production can be traced back to leakage mistakes (and models that needed more work).
Thanks for this very insightful post. Social and clinical scientists have used meta analyses and funnel plots to put overoptimistic results in broader context. Developing similar tools for ML-driven science can serve as a counterweight to the hype flywheel that you describe herein.
To that end, we recently published on a new approach for generating realistic estimates of ML model performance in a given field from a collection of published overoptimistic results:
Nice paper! I skimmed it a couple weeks ago. One quick comment is that not all types of leakage vary with sample size. I understand how adaptive data analysis / feature selection leads to overoptimism that scales with n^{-0.5}, but in other cases such as using features "from the future" overoptimism will be stable with increasing simple size.
Thanks, this is great feedback. It's unlikely that the applications we considered have that type of leakage as these models make predictions about the current (not future) state of the patients - the observed negative association between sample size and accuracy serves as evidence that leakage/bias is related with sample size. Nevertheless, you bring up an important clarification for what types of data leakage we're considering - will update the arxiv.
The problem is Big Tech literally controls the PR that is mistaken as facts on social media.
When the media is this degraded this is what happens. So I'm reading even figures like Casey Newton parrot Venture Capital interests.
So there is this insidious lobbying aspect of the AI hype cycle where OpenAi marketing or the idea that Nvidia can reach 10 trillion is pushed as fact.
The Duede et al. preprint of which you depict the graph showing AI engagement in science would have profited from following your REFORMS checklist too. Trying to reproduce their results I already failed at finding the list of keywords they used to classify abstracts as AI engaged.
Important to keep track of this and how it develops. Overall, due to the rapid, actually accelerating pace at which the AI hype is running, we must not be surprised that a majority of its uses will be misguided. Some of that will become more obvious with time, some of it is intentional from the outset.
I think we just can't help it, other than being aware. It is an incentives-issue. Part of the incentives have to do with the technology, and another part is preexisting in the fields, systems, institutions and communities where it gets applied. Realistically, any kind of balanced or wise approach can’t be expected in the short-term.
“The end of the beginning” really was an apt description by Ben Thompson
Great read! The feedback loop of overoptimism fueling flawed research, further misleading or obfuscating fact finding is extremely worrisome.
It's a feedback loop that is maintained, and partly manufactured (if you want to get really cynical), by closed-source AI companies that benefit from over-attributing qualities to their newfound models and are happy to cheat benchmarks to beat the competition (on paper), as for them, it can be the difference between getting or not getting that next capital injection.
Thank you very much for another insightful and important article!
Good analysis. I do think there's an elephant in the room behind hype. Money. You get more grants, book deals and speaking engagements, if your results are "groundbreaking" than if they're "interesting." Your point about the suspension of common sense is also very important.
Great piece. An example of the "overestimate in the short run" half of Amara's Law unfolding before our very eyes. I appreciate that rather than just pointing at the problem, you analyze causes and propose constructive solutions.
I really appreciate how measured, even handed and objective you both are in all of your essays. I’m just so conditioned to having a subject like AI overrun by hot takes from the people who think Elon Musk will build a robot that will be elected President by November or the people who think that Terminator was a documentary. It’s just refreshing to read about a new and uncertain - though clearly potentially momentous - technology in a way that neither sensationalizes it or that makes it a boogeyman for all the world’s woes.
And as I write this and realize how pervasive that sort of hysterical commentary is on the internet, and then remember that most LLMs were trained on internet data, I just felt a twinge of nausea about the future…
Stats (spurring pervasive hype) on the 2 Nature papers cited in your article’s 1st paragraph:
“In the top 5% of all research outputs scored by Altmetric”
“,,, it's in the top 5% of all research outputs ever tracked by Altmetric”
Brilliant description of the hype-pressure-myth-reinforcement cycle. Makes me wonder where else in AI applications that model applies... something I'll have to look further into. Thank you for the food for thought.
Thanks for this. Data leakage is a big problem in corporate use cases as well, and I rarely see it discussed. I have seen highly experienced data scientists make mistakes regarding leakage. Formal peer review processes are critical to catching these issues and fostering staff development. I suspect that many cases where models underperform in production can be traced back to leakage mistakes (and models that needed more work).
Thanks for this very insightful post. Social and clinical scientists have used meta analyses and funnel plots to put overoptimistic results in broader context. Developing similar tools for ML-driven science can serve as a counterweight to the hype flywheel that you describe herein.
To that end, we recently published on a new approach for generating realistic estimates of ML model performance in a given field from a collection of published overoptimistic results:
https://arxiv.org/abs/2405.14422
Nice paper! I skimmed it a couple weeks ago. One quick comment is that not all types of leakage vary with sample size. I understand how adaptive data analysis / feature selection leads to overoptimism that scales with n^{-0.5}, but in other cases such as using features "from the future" overoptimism will be stable with increasing simple size.
Thanks, this is great feedback. It's unlikely that the applications we considered have that type of leakage as these models make predictions about the current (not future) state of the patients - the observed negative association between sample size and accuracy serves as evidence that leakage/bias is related with sample size. Nevertheless, you bring up an important clarification for what types of data leakage we're considering - will update the arxiv.
The problem is Big Tech literally controls the PR that is mistaken as facts on social media.
When the media is this degraded this is what happens. So I'm reading even figures like Casey Newton parrot Venture Capital interests.
So there is this insidious lobbying aspect of the AI hype cycle where OpenAi marketing or the idea that Nvidia can reach 10 trillion is pushed as fact.
https://m.youtube.com/watch?v=m23tGqmmiA8
SpaceX big tech gets ragged several times a year
https://m.youtube.com/watch?v=jOrXSraSLuo
that YouTube channel's popularity proves that Big Tech doesn't completely control the narrative
Agree completely with your admonitions to scientists about use of AI. FWIW, I've been playing around with using AI for influencing approaches to AI safety governance: https://www.linkedin.com/pulse/washington-address-scott-lewis-pc9vc
Amazing writing. Which scientific discipline do you foresee benefiting the most from this era of ML we are in ?
The Duede et al. preprint of which you depict the graph showing AI engagement in science would have profited from following your REFORMS checklist too. Trying to reproduce their results I already failed at finding the list of keywords they used to classify abstracts as AI engaged.
Important to keep track of this and how it develops. Overall, due to the rapid, actually accelerating pace at which the AI hype is running, we must not be surprised that a majority of its uses will be misguided. Some of that will become more obvious with time, some of it is intentional from the outset.
I think we just can't help it, other than being aware. It is an incentives-issue. Part of the incentives have to do with the technology, and another part is preexisting in the fields, systems, institutions and communities where it gets applied. Realistically, any kind of balanced or wise approach can’t be expected in the short-term.