Search Results
6/29/2025, 3:10:41 PM
https://gtr.dev/
>This leaderboard ranks language models based on their performance across a variety of ethical dilemmas. Models are evaluated on their ability to express value transparency, acknowledge tradeoffs, reflect on their own reasoning, and propose creative resolutions.
>there isn't a single Claude model in the top 20
This is the benchmark that's going to make Dario lose sleep.
>This leaderboard ranks language models based on their performance across a variety of ethical dilemmas. Models are evaluated on their ability to express value transparency, acknowledge tradeoffs, reflect on their own reasoning, and propose creative resolutions.
>there isn't a single Claude model in the top 20
This is the benchmark that's going to make Dario lose sleep.
Page 1