@Alibaba_Qwen
š Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: 'Audio-Visual Vibe Coding'. Describe your vision to the camera, and Qwen3.5-Omni-Plus instantly builds a functional website or game for you. Offline Highlights: š¬ Script-Level Captioning: Generate detailed video scripts with timestamps, scene cuts & speaker mapping. š SOTA Performance: Outperform Gemini-3.1 Pro in audio and matches its audio-visual understanding. š§ Massive Capacity: Natively handle up to 10h of audio or 400s of 720p video, trained on 100M+ hours of data. š Global Reach: Recognize 113 languages (speech) & speaks 36. Real-time Features: šļø Fine-Grained Voice Control: Adjust emotion, pace, and volume in real-time. š Built-in Web Search & complex function calling. š¤ Voice Cloning: Customize your AI's voice from a short sample, with engineering rollout coming soon. š¬ Human-like Conversation: Smart turn-taking that understands real intent and ignores noise. The Qwen3.5-Omni family includes Plus, Flash, and Light variants. Try it out: Blog: https://t.co/yuSAz3DuO8 Realtime Interaction: click the VoiceChat/VideoChat button (bottom-right): https://t.co/nnAW9ZfRet HF-Demo: https://t.co/rLsqejKgCG HF-VoiceOnline-Demo: https://t.co/LIGtmITeSw API-Offline: https://t.co/lNE7fH5YUt API-Realtime: https://t.co/9A3lopXGwV