# Combined Exhaustive Prompting Guide for LTX-2.0: Official and Web-Searched Insights This document merges the original official LTX-2 Prompting Guide (from https://ltx.io/model/model-blog/prompting-guide-for-ltx-2) with updated web-searched insights (as of January 22, 2026), prioritizing Hugging Face (huggingface.co/Lightricks/LTX-2) and official LTX sites (ltx.video, ltx.studio, ltx.io). No data is lost; overlapping sections are integrated for completeness, with expansions from secondary sources (e.g., arXiv papers, community discussions, integrations like ComfyUI, ElevenLabs). Core official content remains primary; additions enhance without contradiction. Structured for efficient AI consumption: Hierarchical headings, bullet points for do's/don'ts, verbatim examples, categorized terms. Use for prompt creation: Focus on story-driven, chronological descriptions; iterate with tools like negative prompts. ## Introduction and Overview To get the most out of the LTX-2 model, a good prompt will make all the difference. The key is painting a complete picture of the story you’re telling that flows naturally from beginning to end, covering all the elements the model needs to bring your vision to life. If you’re new to writing prompts for video, this guide will help you construct an effective prompt. LTX-2 is a DiT-based audio-video foundation model for synchronized video and audio generation, released open-source in early 2026 (arXiv:2601.03233). It supports text-to-video (T2V), image-to-video (I2V), video-to-video (V2V), audio-to-video, and more. From Hugging Face: "Prompt following is heavily influenced by the prompting-style. The more elaborate the better." Official guides emphasize story-driven, chronological prompts like directing a scene. Prompts should be elaborate (up to 200 words), descriptive, and flow naturally. Recent updates include ElevenLabs partnership for audio-to-video and ComfyUI native support (Jan 5, 2026). Model excels at 4K video with audio, but outputs may not perfectly match prompts—iterate. ### General Technical Tips (Merged from Official and HF/Official Docs) - Resolution: Width/height divisible by 32; frame count by 8 + 1. Pad with -1 and crop if needed. - For I2V: Avoid high-res/detailed inputs to prevent static outputs; use landscape ratios. - Distilled models for fast iteration (8 steps, CFG=1); base for training. - Enhance prompts in pipelines (enhance_prompt=True). - Limitations: May amplify biases, generate inappropriate content; audio without speech lower quality; no factual info. - New: Supports Inference Providers like WaveSpeed, fal; 100+ Spaces; upscalers for resolution/FPS. - Integrations: ComfyUI (native nodes), Diffusers, PyTorch (Python 3.12+). - Training: Base model trainable; LoRAs/IC-LoRAs for motion/style (under 1 hour). ### Example Prompt 1 (Official): An action packed, cinematic shot of a monster truck driving fast towards the camera, the truck passes the cameras it pans left to follow the trucks reckless drive. dust and motion blur is around the truck, hand held feel to the camera as it tries to track its ride into the distance. the truck then drifts and turns around, then drives back towards the camera until seen in extreme close up. ### Example Prompt 2 (Official): A warm sunny backyard. The camera starts in a tight cinematic close-up of a woman and a man in their 30s, facing each other with serious expressions. The woman, emotional and dramatic, says softly, “That’s it... Dad’s lost it. And we’ve lost Dad.” The man exhales, slightly annoyed: “Stop being so dramatic, Jess.” A beat. He glances aside, then mutters defensively, “He’s just having fun.” The camera slowly pans right, revealing the grandfather in the garden wearing enormous butterfly wings, waving his arms in the air like he’s trying to take off. He shouts, “Wheeeew!” as he flaps his wings with full commitment. The woman covers her face, on the verge of tears. The tone is deadpan, absurd, and quietly tragic. ### Example Prompt 3 (from HF Discussions, Updated for Relevance): The turquoise waves crash against the dark, jagged rocks of the shore, sending white foam spraying into the air. The scene is dominated by the stark contrast between the bright blue water and the dark, almost black rocks. ## Key Aspects to Include (Do's: Essential Elements for Effective Prompts) - Establish the shot: Use cinematography terms that match your preferred film genre. Include aspects like scale or specific category characteristics to further refine the style you’re looking for. Use terms (e.g., close-up, wide shot). Specify genre (e.g., film noir, sci-fi). - Set the scene: Describe lighting conditions, color palette, surface textures, and atmosphere to shape the mood. Describe lighting, palette, textures, atmosphere. - Describe the action: Write the core action as a natural sequence, flowing from beginning to end. Chronological sequence in present tense. - Define your character(s): Include age, hairstyle, clothing, and distinguishing details. Express emotions through physical cues. Age, appearance, clothing, emotions via cues. - Identify camera movement(s): Specify when the view should shift and how. Including how subjects or objects appear after the camera motion gives the model a better idea of how to finish the motion. Dolly in, pan, tilt; relate to subject. - Describe the audio: Use clear descriptions for ambient sounds, music, audio, and speech. For dialogue, place the text between quotation marks and (if required) mention the language and accent you would like the character to have. Dialogue in quotes, accents, ambient sounds, music. From Reddit/HF: Start I2V prompts with "A cinematic scene of" for better motion trigger. Be descriptive: Instead of "woman walking," say "woman walks until she finds a door." From RunDiffusion (Oct 2025): Start with "A cinematic scene of" for I2V motion. From fal.ai (Jan 2026): For I2V, prompt motion/camera, not static descriptions already in image. ### Example Prompt for Key Aspects (Official): INT. OVEN – DAY. Static camera from inside the oven, looking outward through the slightly fogged glass door. Warm golden light glows around freshly baked cookies. The baker’s face fills the frame, eyes wide with focus, his breath fogging the glass as he leans in. Subtle reflections move across the glass as steam rises. Baker (whispering dramatically): “Today… I achieve perfection.” He leans even closer, nose nearly touching the glass. “Golden edges. Soft center. The gods themselves will smell these cookies and weep.” Baker: “Wait—” (beat) “Did I… forget the chocolate chips?” Cut to side view — coworker pops into frame, chewing casually. Coworker (mouth full): “Nope. You forgot the sugar.” Quick zoom back to the baker’s horrified face, pressed against the oven door, as cookies deflate behind the glass. Steam drifts upward in slow motion. pixar style acting and timing ## For Best Results (Do's: Optimization Tips) - Keep your prompt in a single flowing paragraph to give the model a cohesive scene to work with. - Use present tense verbs to describe movement and action. - Match your detail to the shot scale. Closeups need more precise detail than wide shots. - When describing camera movement, focus on the camera’s relationship to the subject. - You should expect to write 4 to 8 descriptive sentences to cover all the key aspects of the prompt. - Don’t be afraid to iterate! LTX-2 is designed for fast experimentation, so refining your prompt is part of the workflow. - Single flowing paragraph: Cohesive scene. - Present tense verbs: For dynamic action. - Match detail to scale: More for close-ups. - Focus camera on subject relationship. - 4-8 sentences: Cover all aspects. - Iterate: Refine based on outputs. From LTX Studio Blog: Use scene headers (INT./EXT., time), short tone descriptions, blocking for movement, dialogue with brackets for cues. For long shots (20s): Order actions sequentially, fit to duration, add closing actions. From Skywork AI: Six-part structure: Scene anchor; subject/action; camera/lens; visual style; motion/time; guardrails (e.g., no artifacts). From NVIDIA Guide: Elaborate prompts; pro tips for consistency. From ltx.studio (Dec 2025 AI Video Prompt Guide): Use scene headers (INT./EXT.), tone, blocking, dialogue cues. For consistency: AI Characters/Objects. From ltx.studio Long Shots (Nov 2025): For 20s videos, sequence actions to fit duration; add closers; order chronologically. From Stoke McToke (2025): Use negative prompts to constrain (e.g., "morphing, distortion, warping, artifacts, low quality, ugly, blurry"). From fal.ai: For Pro variant, higher fidelity; use Fast for iteration. Parameters: Steps, CFG affect quality/cost. ### Example Prompt for Best Results (Official): NT. DAYTIME TALK SHOW SET – AFTERNOON Soft studio lighting glows across a warm-toned set. The audience murmurs faintly as the camera pans to reveal three guests seated on a couch — a middle-aged couple and the show’s host sitting across from them. The host leans forward, voice steady but probing: Host: “When did you first notice that your daughter, Missy, started to spiral?” The woman’s face crumples; she takes a shaky breath and begins to cry. Her husband places a comforting hand on her shoulder, looking down before turning back toward the host. Father (quietly, with guilt): “We… we don’t know what we did wrong.” The studio falls silent for a moment. The camera cuts to the host, who looks gravely into the lens. Host (to camera): “Let’s take a look at a short piece our team prepared — chronicling Missy’s downward path.” The lights dim slightly as the camera pushes in on the mother’s tear-streaked face. The studio monitors flicker to life, beginning to play the segment as the audience holds its breath. ## Additional Helpful Terms (Do's: Vocabulary to Enhance Prompts) This is not an exhaustive list. Use it to give you some examples of how to craft the result you’re looking for. From official and HF: Build prompts with structure: Main action, movements/gestures, appearances, background, camera, lighting/colors. New from RunDiffusion: Genre-specific (e.g., horror: "eerie shadows, tense build-up"). ### Categories - Animation: stop-motion, 2D/3D animation, claymation, hand-drawn - Stylized: comic book, cyberpunk, 8-bit pixel, surreal, minimalist, painterly, illustrated - Cinematic: period drama, film noir, fantasy, epic space opera, thriller, modern romance, experimental film, arthouse, documentary #### Example Prompt for Animation Category (Official): Pinocchio is sitting in an interrogation room, looking nervous, and slightly sweating. He's saying very quietly to himself "I didn't do it... I didn't do it... I'm not a murderer". Pinocchio's nose is quickly getting longer and longer. The camera is zooming in on the double sided mirror in the back of the room, The mirror is turning black as the camera approaches it, and exposes a blurry silhouette of two FBI detectives who stand in the dark lit room on the other side. One of them is saying "I'm telling you, I have a feeling something is off with this kiddo #### Example Prompt for Stylized Category (Official): The young african american woman wearing a futuristic transparent visor and a bodysuit with a tube attached to her neck. she is soldering a robotic arm. she stops and looks to her right as she hears a suspicious strong hit sound from a distance. she gets up slowly from her chair and says with an angry african american accent: "Rick I told you to close that goddamn door after you!". then, a futuristic blue alien explorer with dreadlocks wearing a rugged outfit walks into the scene excitedly holding a futuristic device and says with a low robotic voice: "Fuck the door look what I found!". the alien hands the woman the device, she looks down at it excitedly as the camera zooms in on her intrigued illuminated face. she then says: "is this what I think it is?" she smiles excitedly. sci-fi style cinematic scene #### Example Prompt for Cinematic Category (Official): Cinematic action packed shot. the man says silently: "We need to run." the camera zooms in on his mouth then immediately screams: "NOW!". the camera zooms back out, he turns around, and starts running away, the camera tracks his run in hand held style. the camera cranes up and show him run into the distance down the street at a busy New York night. ### Visual Details - Lighting conditions: flickering candles, neon glow, natural sunlight, dramatic shadows - Textures: rough stone, smooth metal, worn fabric, glossy surfaces - Color palette: vibrant, muted, monochromatic, high contrast - Atmospheric elements: fog, rain, dust, particles, smoke #### Example Prompt for Visual Details (Official): The camera opens in a calm, sunlit frog yoga studio. Warm morning light washes over the wooden floor as incense smoke drifts lazily in the air. The senior frog instructor sits cross-legged at the center, eyes closed, voice deep and calm. “We are one with the pond.” All the frogs answer softly: “Ommm...” “We are one with the mud.” “Ommm...” He smiles faintly. “We are one with the flies.” A quiet pause. The camera slowly pans to the side — one frog twitches, eyes darting. Suddenly — *thwip!* — its tongue snaps out, catching a fly mid-air and pulling it into its mouth. The master exhales slowly, still serene. “But we do not chase the flies…” Beat. “…not during class.” The guilty frog freezes, then lowers his head in visible shame, folding his hands back into the meditative pose. The other frogs resume their chant: “Ommm...” Camera holds for a moment on the embarrassed frog, eyes closed too tightly, pretending nothing happened. ### Sound and Voice - Setting: Ambient coffeeshop noises, dripping rain and wind blowing, forest ambience with birds singing - Dialogue style: Energetic announcer, resonant voice with gravitas, distorted radio-style, robotic monotone, childlike curiosity - Volume: quiet whisper, mutters, shouts, screams New: Audio-to-video via ElevenLabs for voice cloning. #### Example Prompt for Sound and Voice (Official): A warm, intimate cinematic performance inside a cozy, wood-paneled bar, lit with soft amber practical lights and shallow depth of field that creates glowing bokeh in the background. The shot opens in a medium close-up on a young female singer in her 20s with short brown hair and bangs, singing into a microphone while strumming an acoustic guitar, her eyes closed and posture relaxed. The camera slowly arcs left around her, keeping her face and mic in sharp focus as two male band members playing guitars remain softly blurred behind her. Warm light wraps around her face and hair as framed photos and wooden walls drift past in the background. Ambient live music fills the space, led by her clear vocals over gentle acoustic strumming. ### Technical Style Markers - Camera language: follows, tracks, pans across, circles around, tilts upward, pushes in, pulls back, overhead view, handheld movement, over-the-shoulder, wide establishing shot, static frame - Film characteristics: jittery stop-motion, pixelated edges, lens flares, film grain - Scale indicators: expansive, epic, intimate, claustrophobic - Pacing and temporal effects: slow motion, time-lapse, rapid cuts, lingering shot, continuous shot, freeze-frame, fade-in, fade-out, seamless transition, dynamic movement, sudden stop - Specific visual effects (if relevant): particle systems, motion blur, depth of field #### Example Prompt for Technical Style Markers (Official): An animated cinematic shot. a robot, walks slowly, the camera dollys back and keep the robots slow walk in a medium shot. the robot start running slowly and heavily. it then stops, and the camera keeps dollying back, until a blue similiar robot appears in an over the shoulder shot. ## What Works Well with LTX-2 (Do's: Recommended Approaches) - Cinematic compositions: Wide, medium, and close-up shots with thoughtful lighting, shallow depth of field, and natural motion. - Emotive human moments: LTX-2 excels at single-subject emotional expressions, subtle gestures, and facial nuance. - Atmosphere & setting: Weather effects like fog, mist, golden hour light, soft shadows, rain, reflections, and ambient textures all help ground the scene. - Clean, readable camera language: Clear directions like “slow dolly in,” “handheld tracking,” or “over-the-shoulder” improve consistency. - Stylized aesthetics: Painterly, noir, analog film look, fashion editorial, pixelated animation, or surreal art styles work especially well when named early in the prompt. - Lighting and mood control: Backlighting, color palettes, soft rim light, flickering lamps — these anchor tone better than generic mood words. - Voice: Characters can talk and sing in various languages. - Cinematic: Wide/medium/close-ups with lighting, depth of field. - Emotive: Facial nuance, gestures. - Atmosphere: Weather effects, shadows, reflections. - Clean camera language: "Slow dolly in," "handheld tracking." - Stylized aesthetics: Named early (painterly, noir). - Lighting/mood: Backlighting, palettes. - Audio: Various languages, speech, sounds. From Medium: For audio sync, add sharp onsets (consonants, clicks); avoid time-stretching. From YouTube: Long prompts if no image; use AI agents for prompt generation. New from ltx.studio (Jan 2026): Audio-to-video for voiceovers; multi-language. ### Example Prompt for What Works Well (Official): EXT. SMALL TOWN STREET – MORNING – LIVE NEWS BROADCAST The shot opens on a news reporter standing in front of a row of cordoned-off cars, yellow caution tape fluttering behind him. The light is warm, early sun reflecting off the camera lens. The faint hum of chatter and distant drilling fills the air. The reporter, composed but visibly excited, looks directly into the camera, microphone in hand. Reporter (live): “Thank you, Sylvia. And yes — this is a sentence I never thought I’d say on live television — but this morning, here in the quiet town of New Castle, Vermont… black gold has been found!” He gestures slightly toward the field behind him. Reporter (grinning): “If my cameraman can pan over, you’ll see what all the excitement’s about.” The camera pans right, slowly revealing a construction site surrounded by workers in hard hats. A beat of silence — then, with a sudden roar, a geyser of oil erupts from the ground, blasting upward in a violent plume. Workers cheer and scramble, the black stream glistening in the morning light. The camera shakes slightly, trying to stay focused through the chaos. Reporter (off-screen, shouting over the noise): “There it is, folks — the moment New Castle will never forget!” The camera catches the sunlight gleaming off the oil mist before pulling back, revealing the entire scene — the small-town skyline ## What to Avoid with LTX-2 (Don'ts: Common Pitfalls) - Internal states: Avoid emotional labels like “sad” or “confused” without describing visual cues. Use posture, gesture, and facial expression instead. - Text and logos: LTX-2 does not currently generate readable or consistent text. Avoid signage, brand names, or printed material. - Complex physics or chaotic motion: Non-linear or fast-twisting motion (e.g., jumping, juggling) can lead to artifacts or glitches. However, dancing can work well. - Scene complexity overload: Too many characters, layered actions, or excessive objects reduce clarity and model accuracy. - Inconsistent lighting logic: Avoid mixing conflicting light sources (e.g., “a warm sunset with cold fluorescent glow”) unless clearly motivated. - Over complicated prompts: The more actions/ characters/ instructions you add, the higher the chance some of them won’t be seen in the output. Begin with simple things and layer on additional instructions as you iterate. - Internal states: Use cues instead of "sad." - Text/logos: Model doesn't generate readable text. - Complex physics: Artifacts in twisting motions (juggling bad; dancing ok). - Overload: Too many characters/actions/objects. - Inconsistent lighting: Conflicting sources. - Complicated prompts: Start simple, layer iteratively. From HF: Non-divisible resolutions; mismatched I2V details. From Reddit: Portrait ratios unreliable; vague prompts lead to static. New from Stoke McToke: Avoid without negatives—leads to invention of unwanted motion. From ltx.studio (Dec 2025): Overly long/complex without sequencing. ## General Advice for Prompt Creation - Master prompting by building detailed, story-driven prompts that turn your creative vision into stunning AI-generated videos. - Focus on cohesive, flowing narratives. - Experiment and refine iteratively. - Use this guide as a reference to ensure prompts include all key elements while avoiding pitfalls. - Treat as mini-scripts: Director's notes for flow. - Experiment: Use playgrounds (app.ltx.studio). - Multi-language: Supported for speech. - For 4K/50FPS: Structured prompts reduce shimmer/artifacts. - From Technical Report: Use for high-fidelity audio-video; leverages cross-attention for sync. - Community: Check HF discussions for full demo prompts; iterate with LoRAs for control. - New: Negative prompts essential; trainer guides for customs (fal.ai, Jan 2026): Use triggers, cinematographic caps. - No superseding model; LTX-2 core with updates like audio-to-video.