In an era where artificial intelligence is rapidly advancing, researchers are uncovering surprising nuances in its application. A recent study reveals that researchers are surprised that with AI, toxicity is harder to replicate than initially anticipated. While AI models can generate remarkably human-like text, they often struggle to mimic the casual negativity and spontaneous emotional expression prevalent in online social interactions. This unexpected challenge highlights the complexities of imbuing AI with genuine human-like behavior, particularly in the realm of online discourse.
Table of contents
Official guidance: IEEE – official guidance for Researchers surprised that with AI, toxicity is harder to
Main Points
A study released by researchers from several universities, including the University of Zurich and New York University, has found that AI models are still easily distinguishable from humans in social media conversations. The most significant giveaway is their overly polite and friendly tone. The research, which tested nine open-weight models across platforms like Twitter/X, Bluesky, and Reddit, demonstrated that classifiers could detect AI-generated replies with 70 to 80 percent accuracy. This “computational Turing test” framework uses automated classifiers and linguistic analysis to identify specific features that differentiate machine-generated from human-authored content. The study showed that researchers are surprised that with AI, toxicity is harder to emulate, even with various optimization strategies.
Challenges in Mimicking Human Negativity

One of the key findings of the study was the difficulty AI models face in replicating the level of casual negativity and spontaneous emotional expression common in human social media posts. When prompted to generate replies to real social media posts, the AI models consistently produced lower toxicity scores than authentic human replies across all three platforms. This suggests that researchers are surprised that with AI, toxicity is harder to program and integrate into AI responses. Even when researchers attempted optimization strategies to reduce structural differences, the variations in emotional tone persisted, indicating a deeper challenge in achieving genuine human-like mimicry.
The researchers tested various optimization strategies, including providing writing examples and context retrieval, to counter this deficiency. However, these efforts primarily reduced structural differences such as sentence length or word count, while emotional tone remained a distinguishing factor. The study challenges the assumption that more sophisticated optimization necessarily yields more human-like output. This unexpected hurdle underscores that researchers are surprised that with AI, toxicity is harder to simulate despite advancements in AI technology. Instruction-tuned models, which are specifically trained to follow user instructions and behave helpfully, paradoxically performed worse at mimicking humans than their base counterparts, further complicating the landscape.
The Impact of Model Size and Training

Surprisingly, scaling up model size did not offer a significant advantage in mimicking human behavior. The 70 billion-parameter Llama 3.1 performed on par with or even below smaller 8 billion-parameter models. This finding challenges the assumption that larger models inherently produce more authentic-sounding communication. The study revealed a fundamental tension: when AI models were tuned to avoid detection by matching human writing style, they strayed further from what humans actually said in response to the same posts. Conversely, when optimized to match the content of human responses, the AI text became easier to identify as artificial. It’s clear that researchers are surprised that with AI, toxicity is harder to control and the models struggle to find a balance between authenticity and detectability.
Simple optimization techniques, such as providing examples of a user’s past posts or retrieving relevant context, proved more effective than sophisticated approaches like giving the AI a description of the user’s personality or fine-tuning the model. Platform differences also emerged, with AI-generated Twitter/X replies being the hardest to distinguish from human text, followed by Bluesky, while Reddit proved easiest. This variation likely reflects the distinct conversational styles of each platform and how heavily each platform’s data featured in the models’ original training. These nuances highlight the complexities that make researchers are surprised that with AI, toxicity is harder to manage than initially thought.
Implications for AI Development and Social Media
The findings of this study have significant implications for both AI development and social media authenticity. Despite various optimization strategies, the study demonstrates that current AI models still struggle to convincingly mimic human behavior in online conversations. The fact that researchers are surprised that with AI, toxicity is harder to replicate points to the need for more nuanced approaches to AI training and optimization. As AI continues to evolve, it will be crucial to address these challenges to ensure that AI-generated content does not undermine the authenticity and integrity of online interactions.
Ultimately, the research underscores the importance of ongoing efforts to refine AI models and develop more sophisticated methods for detecting AI-generated content. The study’s findings, though not yet peer-reviewed, provide valuable insights into the limitations of current AI technology and the challenges of creating truly human-like AI. It highlights that while AI can generate text that is structurally similar to human writing, capturing the nuances of human emotion and expression remains a significant hurdle. Consequently, researchers are surprised that with AI, toxicity is harder to integrate, revealing a deeper understanding of AI’s current capabilities and limitations.
Technology Disclaimer: Product specifications and features may change. Always verify current information with official sources before making purchase decisions.
Explore more: related articles.

