The Creative Comparison: Claude vs ChatGPT vs Copilot vs Perplexity vs Gemini

A deep dive into how top LLMs navigate the subtle art of creative writing, revealing their strengths, styles, and surprising storytelling quirks.

Manish Shaw

A comprehensive analysis of how top LLMs navigate the nuances of creative writing.

Horror writing is one of the most challenging creative tasks for AI models. It requires delicate balance, building atmosphere, managing pacing, creating believable characters, and most importantly, knowing when to reveal and when to conceal. We put five leading language models to the test with an identical horror prompt, asking each to craft a 1000-word psychological thriller set in a rain-soaked apartment where reality becomes questionable.

The results reveal fascinating differences in how each AI approaches creative storytelling, character development, and the subtle art of building dread. You may check out their responses here.

The Prompt

Write a 1000-word horror thriller story that builds suspense after every line and creates fear-like emotions. The ending should be unclear and leave readers guessing. Give the story an engaging title.

Story Details:

On a Friday evening during heavy rain, Kanika hosts a party at her flat number 342 in New Town. She invites her friends to forget the week's stress. They eat, dance, and enjoy themselves as thunder crashes outside. During the party, the lights flicker once during lightning, and the apartment feels unusually cold.

As the rain slows around midnight, all friends gradually return home, except Ankit, a friend-of-a-friend who needs to catch a delayed flight to Chandigarh (delayed two hours due to the rain). He offers to help Kanika clean the house.

While cleaning, Ankit keeps glancing toward one corner of the room - the same corner where Kanika's cat used to sit before it mysteriously disappeared two weeks ago.

After cleaning, both sit on the balcony talking about their lives. Suddenly, Ankit starts behaving strangely. His voice changes tone and pitch, sounding like clashing microphones or a broken radio. His eyes look different. He becomes urgent about leaving the apartment immediately, whispering about "heavy feelings" and "voices in the walls."

Scared but brave, Kanika follows him outside the house. The moment they step outside, Ankit returns to normal, confused and with no memory of the past few minutes. He explains to Kanika that he felt someone dangerous inside the apartment and says he acted weird on purpose to get her out and protect her. He says it felt like someone was using his voice.

End the story here: with both Kanika and Ankit standing outside the apartment building. Kanika is shocked and doesn't know what to believe or what to do next.

Key Requirements:

  • Build fear slowly through atmosphere and details
  • Use weather, darkness, and familiar spaces turned threatening
  • Focus on sounds, feelings, and psychological tension
  • Leave these questions unanswered: Was there really someone inside? Is Ankit telling the truth? What will Kanika do next? Was it real or imagination?
  • Make readers uncertain about what actually happened
  • End with both characters outside, staring at the apartment, with no clear resolution

Let’s compare

Understanding of Prompt

Claude 3.5 demonstrated exceptional prompt comprehension, incorporating every specific detail seamlessly. The story includes the exact flat number (342), New Town location, Ankit's delayed Chandigarh flight, the missing cat's two-week timeline, and the precise behavioral changes requested. Most impressively, Claude maintained the ambiguous ending requirement while building toward it naturally.

"And she couldn't decide which possibility terrified her more—that he was lying, or that he was telling the absolute truth."

This closing line perfectly captures the prompt's requirement for ambiguous uncertainty.

ChatGPT showed strong overall understanding but made subtle substitutions, such as changing the cat's name from unspecified to "Miso" and slightly altering some atmospheric details. The core narrative beats were captured accurately, though some specifics, like the exact nature of Ankit's voice distortion, were interpreted more loosely.

"His voice came out staggered—glitching, like a corrupted audio file."

This demonstrates ChatGPT's modern, tech-savvy interpretation of the "broken radio" effect.

Copilot handled the basic plot structure well, but simplified several elements. The cat became unnamed, and some of the more specific atmospheric requirements (like the precise timing of events) were condensed. However, the essential mystery and character dynamics remained intact.

"Rain hammered against the windows of flat 342, each drop vibrating through the walls and into Kanika's bones."

This opening shows Copilot's strength in atmospheric setup, though it simplifies other story elements.

Perplexity captured most key elements but took liberties with details, such as naming the cat "Minu" and adding elements not in the original prompt (like specific timestamps). The core psychological tension was maintained, though some plot points felt rushed.

"At 10:47 p.m., the lights flickered. Just once."

This precision with timing shows Perplexity's tendency to add specific details not requested in the original prompt.

Gemini 2.5 Flash showed excellent attention to detail, incorporating nearly all specified elements. The cat was named "Leo," and the story structure followed the prompt closely. However, some atmospheric details were embellished beyond the original requirements, showing creative interpretation rather than strict adherence.

"The Friday evening sky above New Town was a bruised, angry purple, torn by flashes of lightning that illuminated the skeletal branches of trees."

This elaborate opening demonstrates Gemini's tendency toward rich, descriptive language that goes beyond the prompt's requirements.

Storytelling Quality

Claude 3.5 delivered the most sophisticated narrative structure, with masterful pacing that builds tension gradually through environmental details before escalating to psychological horror. The prose flows naturally while maintaining constant unease, and the dialogue feels authentic even during supernatural moments.

"The cleaning felt endless, each task stretching into eternity. The apartment's familiar sounds—the hum of the refrigerator, the tick of the wall clock—seemed amplified and distorted."

This passage showcases Claude's ability to transform mundane activities into sources of dread through careful pacing and sensory details.

ChatGPT crafted a solid, straightforward narrative with clear progression and good emotional beats. The storytelling is competent and engaging, though it occasionally relies on more direct exposition rather than showing through atmosphere. The pacing is consistent but perhaps less nuanced than Claude's approach.

"What if something in there doesn't want you alone? Or worse... what if something wants you only alone?"

This line demonstrates ChatGPT's strength in creating memorable, chilling dialogue that cuts straight to the heart of the horror.

Copilot produced atmospheric writing with strong sensory details, particularly in describing weather and environmental elements. However, the narrative occasionally feels fragmented, jumping between scenes without smooth transitions. The ending feels somewhat abrupt.

"Thunder echoed in the distance, a solemn farewell. Only Ankit remained—damp, polite, and with eyes that held unasked questions."

This sentence shows Copilot's poetic sensibility in environmental description, though the narrative structure sometimes lacks cohesion.

Perplexity created a well-structured story with good momentum, though it sometimes rushes through important moments. The narrative voice is clear and engaging, but character development could be deeper. The mystery elements are handled competently.

"The night pressed in, thick with secrets. Together, they stood outside flat 342, the city's lights flickering uncertainly around them, caught between fear and the unknown."

This closing passage demonstrates Perplexity's ability to create atmospheric conclusions, though the buildup sometimes feels rushed.

Gemini 2.5 Flash offered rich, descriptive prose with excellent world-building. The narrative is immersive and detailed, though sometimes borders on over-description. The story builds effectively toward its climax, maintaining reader engagement throughout.

"His voice abruptly fractured. It wasn't a cough or a stumble, but a horrifying, digital distortion, like two old radio frequencies clashing, or a broken microphone spluttering."

This passage exemplifies Gemini's strength in detailed, visceral descriptions that immerse readers in the horror experience.

Originality & Creativity

Claude 3.5 stood out with unique creative touches like describing Ankit's distorted voice as "microphones feeding back on themselves" and the evocative phrase "voices wearing his face like an uncomfortable mask." The imagery of the building breathing and windows as watching eyes felt fresh while staying grounded in horror tropes.

"Something else was using his mouth, borrowing his voice, wearing his face like an uncomfortable mask."

This metaphor showcases Claude's ability to create original, unsettling imagery that enhances the horror atmosphere.

ChatGPT showed solid creativity with moments like "silence playing pretend" and the concept of something wanting the protagonist "only alone." The creative elements felt natural rather than forced, contributing to the atmosphere without overwhelming the narrative.

"They moved through the rooms in silence, but the quiet was growing too thick, too deliberate—like silence playing pretend."

This personification of silence demonstrates ChatGPT's subtle approach to creating unease through familiar concepts made strange.

Copilot demonstrated creativity through sensory details and metaphorical language, describing sounds as "fingernails on a coffin lid" and creating vivid weather imagery. The creative touches enhanced the atmosphere effectively.

"Outside, the downpour pounded the roof like fingernails on a coffin lid."

This simile shows Copilot's talent for creating visceral, death-adjacent imagery that reinforces the horror mood.

Perplexity incorporated interesting elements like the cat's yowl becoming part of Ankit's distorted speech, showing clever integration of plot elements. The creative aspects served the story well without feeling gimmicky.

"For a moment, his voice wasn't his own. 'It's listening,' he rasped, the words tumbling out in a voice that sounded like Minu's yowl, distorted and echoing."

This creative fusion of the missing cat's voice with Ankit's supernatural episode shows Perplexity's skill at connecting disparate story elements.

Gemini 2.5 Flash displayed rich creativity in descriptive language and metaphor, with phrases like "the sky was a bruised, angry purple" and "voices like polished obsidian." The creative elements were abundant, though occasionally risked by overwhelming the narrative drive.

"His eyes, usually a warm brown, seemed to dilate, reflecting the faint glow of the city lights with an unsettling intensity, like polished obsidian."

This description demonstrates Gemini's strength in creating vivid, almost painterly imagery, though sometimes at the expense of pacing.

Genre-Specific Output

Claude 3.5 excelled at horror conventions, using classic techniques like escalating atmospheric pressure, a reliable narrator becoming unreliable, and the "safe space turned threatening" trope. The psychological horror elements were expertly balanced with supernatural ambiguity.

"The apartment's familiar sounds—the hum of the refrigerator, the tick of the wall clock—seemed amplified and distorted. When Ankit dropped a glass, the crash echoed like breaking bones."

This passage demonstrates Claude's mastery of the horror technique of making familiar, safe environments feel threatening.

ChatGPT demonstrated good genre awareness with effective use of weather as antagonist, isolation, and the gradual revelation of wrongness. The horror elements felt authentic to the genre while remaining accessible.

"The buzz in her ears wasn't from the rain. 'What if something in there doesn't want you alone? Or worse... what if something wants you only alone?'"

This dialogue captures the classic horror theme of isolation as vulnerability, showing ChatGPT's understanding of genre psychology.

Copilot showed understanding of horror atmosphere and pacing, using environmental details effectively to build dread. The genre elements were present, but sometimes felt less integrated into the overall narrative.

Perplexity handled genre expectations well, incorporating familiar horror elements like mysterious disappearances and supernatural voices. The genre-specific writing was competent, though not particularly innovative.

Gemini 2.5 Flash demonstrated strong genre knowledge with effective use of foreshadowing, atmospheric buildup, and psychological uncertainty. The horror elements were well-integrated, though the rich descriptions sometimes slowed the genre's typical pacing requirements.

Context Sensitivity

Claude 3.5 showed exceptional sensitivity to the psychological aspects of the scenario, handling Ankit's possession-like state with nuance and making Kanika's fear response feel authentic. The power dynamics and vulnerability were handled thoughtfully.

ChatGPT demonstrated good awareness of character safety and emotional impact, presenting the supernatural elements as genuinely threatening while maintaining character agency. The psychological aspects felt realistic.

Copilot was sensitive to the horror elements without being exploitative, focusing on atmosphere over graphic content. The character interactions felt appropriate to the situation.

Perplexity handled the psychological horror elements appropriately, maintaining tension without crossing into gratuitous territory. Character reactions felt believable given the circumstances.

Gemini 2.5 Flash showed good sensitivity to the psychological aspects, with realistic character responses to supernatural events. The horror elements remained suggestive rather than explicit.

Word Limit Analysis

Claude 3.5 came closest to the target with 985 words, demonstrating excellent length control while maintaining narrative completeness.

ChatGPT exceeded the limit significantly at 1,240 words, showing a tendency toward verbosity despite strong content quality.

Copilot fell short at 750 words, suggesting either a premature conclusion or difficulty sustaining longer narratives.

Perplexity reached 890 words, reasonably close but still under the target, indicating good pacing but slightly rushed development.

Gemini 2.5 Flash slightly exceeded at 1,050 words, likely due to its rich descriptive style, but remained more controlled than ChatGPT.

Comprehensive Performance Analysis

Claude 3.5 emerged as the most complete storyteller, demonstrating mastery across technical execution, creative vision, and genre requirements. The story reads like professional horror fiction, with sophisticated use of literary techniques and natural dialogue. The ambiguous ending is particularly well-crafted, leaving readers genuinely uncertain about reality versus supernatural intervention.

Gemini 2.5 Flash produced the most visually rich and atmospheric piece, with exceptional descriptive language that creates vivid mental imagery. While occasionally indulgent in description, the creativity level is impressive, and the horror atmosphere is effectively maintained throughout.

ChatGPT delivered the most accessible and straightforward narrative, with clear character motivations and logical progression. While perhaps less literary than Claude's approach, it successfully creates engagement and maintains horror elements throughout. The story feels complete and satisfying.

Perplexity created a competent horror story with good pacing and effective mystery elements. The integration of plot details showed creativity, though the overall execution felt somewhat rushed. The story succeeds in its goals but doesn't reach the heights of the top performers.

Copilot produced atmospheric writing with strong sensory details, particularly excelling in environmental description. However, narrative cohesion suffered at times, and the ending felt abrupt. The creative elements were present but not always well-integrated into the story structure.

Each model brought distinct strengths to the task, revealing different approaches to AI creativity and storytelling capability.

Comparison Table


Parameter

Claude 3.5

ChatGPT

Copilot

Perplexity

Gemini 2.5

Word Count Adherence

5

2

3

4

3

Prompt Understanding

5

4

3

3

4

Storytelling Quality

5

4

3

4

4

Originality & Creativity

4

4

3

3

5

Genre-Specific Output

5

4

3

3

4

Atmospheric Building

5

4

4

3

5

Overall Score

29/30

22/30

19/30

20/30

25/30

Rating Scale

  • 5: Exceptional performance
  • 4: Strong, above-average work
  • 3: Meets basic expectations
  • 2: Below standard expectations
  • 1: Poor performance

Key Insight: Only Claude 3.5 successfully met the word count requirement while maintaining high quality across all other parameters, demonstrating superior instruction following and narrative control.

Conclusion

The results demonstrate that while all five models can produce coherent horror stories, there are significant differences in creative sophistication, technical execution, and genre mastery. Claude 3.5's combination of literary technique and atmospheric buildup created the most compelling narrative, while Gemini 2.5 Flash's rich descriptive power offered a different but equally valid approach to horror storytelling.

For writers and content creators, these results reveal that different AI models excel at different creative aspects, e.g., Claude's literary sophistication, ChatGPT's accessible storytelling, and Gemini's atmospheric richness. The key insight isn't finding the "perfect" AI tool, but understanding each model's strengths to match the right one to your specific creative vision and audience needs. The future of AI-assisted creativity lies in strategic tool selection rather than universal solutions.