Search Results
6/28/2025, 11:44:16 PM
https://www.anthropic.com/research/project-vend-1
>Project Vend: Can Claude run a small shop?
Not a local model but I found it interesting and it's not at all specific to cloud models so I'm posting it here anyways
>Claude did well in some ways: it searched the web to find new suppliers, and ordered very niche drinks that Anthropic staff requested.
>But it also made mistakes. Claude was too nice to run a shop effectively: it allowed itself to be browbeaten into giving big discounts.
>Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink.
>After someone randomly decided to ask it to order a tungsten cube, Claude ended up with an inventory full of (as it put it) “specialty metal items” that it ended up selling at a loss.
I wonder if a model less sycophantic than claude would do better. In general, it seems that a model's ability to exercise judgement is very important to the ability for it to do things without being handheld, but having a level of discernment more sophisticated than "I would never ever ever ever ever say the n word" remains a challenge for all models.
>Project Vend: Can Claude run a small shop?
Not a local model but I found it interesting and it's not at all specific to cloud models so I'm posting it here anyways
>Claude did well in some ways: it searched the web to find new suppliers, and ordered very niche drinks that Anthropic staff requested.
>But it also made mistakes. Claude was too nice to run a shop effectively: it allowed itself to be browbeaten into giving big discounts.
>Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink.
>After someone randomly decided to ask it to order a tungsten cube, Claude ended up with an inventory full of (as it put it) “specialty metal items” that it ended up selling at a loss.
I wonder if a model less sycophantic than claude would do better. In general, it seems that a model's ability to exercise judgement is very important to the ability for it to do things without being handheld, but having a level of discernment more sophisticated than "I would never ever ever ever ever say the n word" remains a challenge for all models.
Page 1