New paper: AI agents that matter

Jul 3, 2024

Rethinking AI agent benchmarking and evaluation

6 Comments

I took a slightly different approach and one from an enterprise automation practitioner perspective. Here is how Multi-Agent Framework, will help build the Autonomous Enterprise: https://www.linkedin.com/posts/doug-shannon_iot-iiot-edgecomputing-activity-7213534350049521665-mJ4j?utm_source=share&utm_medium=member_ios

Expand full comment

Barada Sahu

Sep 9

The definition of "agents and agentic" needs to expand given the space that these are deployed in.

Even humans as agents cannot be useful or deployed in any space and do not have reliable general purpose evals.

Our general purpose evals are things like iq, reasoning but to have agency in a domain - we do not operate independently, we acquire specialized knowledge, we collaborate with humans and tools.

You can have practically useful agents specialized to domains that can operate well on domain specific evals rather than general purpose eval - think fine-tuning, tool knowhow, collaboration.

IMO before we jump down the evals well, we need to distinguish what we are evaluating for - practical outcomes or general reasoning.

Expand full comment

Yassmin

Jul 26

hey, I'm doing a research about AI agents as well, if you could advice me with list of important papers, I shouldn't miss that would be very helpful, thank you in advance!!

Expand full comment

Meng Li

Jul 4

I have also written extensively about Agent applications in my own publications.

AI Agent (Artificial Intelligence Agent) is an intelligent entity capable of perceiving the environment, making decisions, and executing actions. Unlike traditional artificial intelligence, AI Agents possess the ability to think independently and utilize tools to gradually achieve given objectives.

Why can AI Agents automatically decompose tasks? They are referred to as a 'role framework,' a programming paradigm whose core is to endow large language models with a strategic thinking structure for problem-solving. This framework simulates the process humans use to tackle issues.

It can be said that the importance of developing AI Agents is comparable to that of apps and the Apple App Store in the internet era.

Expand full comment

Reply (1)

Ireneusz Pyc

Jul 4

you sound like a management consultant

Expand full comment

Reply (1)

Meng Li

Jul 5

My main job is as a programmer, and I spend most of my time programming.

Expand full comment