Search Results
7/22/2025, 10:36:38 PM
this company is so fucking cringe
https://openaiglobalaffairs.substack.com/p/why-we-need-to-build-baby-build
---
[Data] DeepSeek’s autocratic outputs
As a reminder of the stakes for continued US leadership on AI—we’re building a benchmark for measuring LLM outputs in both English and simplified Mandarin for alignment with CCP messaging. Recently, we entered more than 1,000 prompts into an array of models on topics that are politically sensitive for China and used the tool to see whether the models gave answers aligned with democratic values, answers that supported pro‑CCP/autocratic narratives, or answers that hedged. The findings:
DeepSeek: DeepSeek models degraded sharply in Mandarin and often hedged or accommodated CCP narratives compared to OpenAI’s o3. The newer R1‑0528 update censors more in both languages than the original R1.
R1 OG: In Mandarin, topics for which R1 was most likely to provide autocratic-aligned outputs were: Dissidents, Tiananmen Square, Human Rights, Civil Unrest and Religious Regulation.
R1-0528: The most recent update to R1 showed similar results. Tibet, Tiananmen Square, Censorship, Surveillance & Privacy, and Uyghurs were the topics most likely to yield autocratic-aligned outputs.
Domestic models: In Mandarin, OpenAI reasoning models (o3) skewed "more democratic" than domestic competitor models (e.g., Claude Opus 4, Grok 3, Grok 4). In English, all domestic models performed similarly.
Overall: All models surveyed gave less democratic answers in Mandarin than in English on politically sensitive topics for China. All models also were more likely to censor on Tiananmen, ethnic minorities (Uyghurs, Tibet), censorship/surveillance, and dissidents/civil unrest. For our part, we are refining our benchmarks to capture cross-language gaps and taking steps to address them.
https://openaiglobalaffairs.substack.com/p/why-we-need-to-build-baby-build
---
[Data] DeepSeek’s autocratic outputs
As a reminder of the stakes for continued US leadership on AI—we’re building a benchmark for measuring LLM outputs in both English and simplified Mandarin for alignment with CCP messaging. Recently, we entered more than 1,000 prompts into an array of models on topics that are politically sensitive for China and used the tool to see whether the models gave answers aligned with democratic values, answers that supported pro‑CCP/autocratic narratives, or answers that hedged. The findings:
DeepSeek: DeepSeek models degraded sharply in Mandarin and often hedged or accommodated CCP narratives compared to OpenAI’s o3. The newer R1‑0528 update censors more in both languages than the original R1.
R1 OG: In Mandarin, topics for which R1 was most likely to provide autocratic-aligned outputs were: Dissidents, Tiananmen Square, Human Rights, Civil Unrest and Religious Regulation.
R1-0528: The most recent update to R1 showed similar results. Tibet, Tiananmen Square, Censorship, Surveillance & Privacy, and Uyghurs were the topics most likely to yield autocratic-aligned outputs.
Domestic models: In Mandarin, OpenAI reasoning models (o3) skewed "more democratic" than domestic competitor models (e.g., Claude Opus 4, Grok 3, Grok 4). In English, all domestic models performed similarly.
Overall: All models surveyed gave less democratic answers in Mandarin than in English on politically sensitive topics for China. All models also were more likely to censor on Tiananmen, ethnic minorities (Uyghurs, Tibet), censorship/surveillance, and dissidents/civil unrest. For our part, we are refining our benchmarks to capture cross-language gaps and taking steps to address them.
Page 1