Subscribe
Sign in
Home
Notes
Start here
Get the book
Book exercises
About us
AI evaluation
Latest
Top
Discussions
New paper: AI agents that matter
Rethinking AI agent benchmarking and evaluation
Jul 3, 2024
•
Sayash Kapoor
and
Arvind Narayanan
108
Share this post
AI Snake Oil
New paper: AI agents that matter
Copy link
Facebook
Email
Notes
More
6
Scientists should use AI as a tool, not an oracle
How AI hype leads to flawed research that fuels more hype
Jun 3, 2024
•
Arvind Narayanan
and
Sayash Kapoor
138
Share this post
AI Snake Oil
Scientists should use AI as a tool, not an oracle
Copy link
Facebook
Email
Notes
More
19
AI leaderboards are no longer useful. It's time to switch to Pareto curves.
What spending $2,000 can tell us about evaluating AI agents
Apr 30, 2024
•
Sayash Kapoor
and
Arvind Narayanan
84
Share this post
AI Snake Oil
AI leaderboards are no longer useful. It's time to switch to Pareto curves.
Copy link
Facebook
Email
Notes
More
17
Will AI transform law?
The hype is not supported by current evidence
Jan 24, 2024
•
Arvind Narayanan
and
Sayash Kapoor
66
Share this post
AI Snake Oil
Will AI transform law?
Copy link
Facebook
Email
Notes
More
15
How Transparent Are Foundation Model Developers?
Introducing the Foundation Model Transparency Index
Oct 18, 2023
•
Sayash Kapoor
37
Share this post
AI Snake Oil
How Transparent Are Foundation Model Developers?
Copy link
Facebook
Email
Notes
More
9
Evaluating LLMs is a minefield
Annotated slides from a recent talk
Oct 4, 2023
•
Arvind Narayanan
and
Sayash Kapoor
90
Share this post
AI Snake Oil
Evaluating LLMs is a minefield
Copy link
Facebook
Email
Notes
More
6
Does ChatGPT have a liberal bias?
A new paper making this claim has many flaws. But the question merits research
Aug 18, 2023
•
Arvind Narayanan
and
Sayash Kapoor
37
Share this post
AI Snake Oil
Does ChatGPT have a liberal bias?
Copy link
Facebook
Email
Notes
More
8
Introducing the REFORMS checklist for ML-based science
ML-based science is in trouble. Clear reporting standards for researchers could help.
Aug 16, 2023
•
Sayash Kapoor
and
Arvind Narayanan
38
Share this post
AI Snake Oil
Introducing the REFORMS checklist for ML-based science
Copy link
Facebook
Email
Notes
More
8
Is GPT-4 getting worse over time?
A new paper going viral has been widely misinterpreted
Jul 19, 2023
•
Arvind Narayanan
and
Sayash Kapoor
123
Share this post
AI Snake Oil
Is GPT-4 getting worse over time?
Copy link
Facebook
Email
Notes
More
13
Quantifying ChatGPT’s gender bias
Benchmarks allow us to dig deeper into what causes biases and what can be done about it
Apr 26, 2023
•
Sayash Kapoor
and
Arvind Narayanan
47
Share this post
AI Snake Oil
Quantifying ChatGPT’s gender bias
Copy link
Facebook
Email
Notes
More
13
OpenAI’s policies hinder reproducible research on language models
LLMs have become privately-controlled research infrastructure
Mar 22, 2023
•
Sayash Kapoor
and
Arvind Narayanan
37
Share this post
AI Snake Oil
OpenAI’s policies hinder reproducible research on language models
Copy link
Facebook
Email
Notes
More
10
GPT-4 and professional benchmarks: the wrong answer to the wrong question
OpenAI may have tested on the training data. Besides, human benchmarks are meaningless for bots.
Mar 20, 2023
•
Arvind Narayanan
and
Sayash Kapoor
137
Share this post
AI Snake Oil
GPT-4 and professional benchmarks: the wrong answer to the wrong question
Copy link
Facebook
Email
Notes
More
21
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts