Alibaba, ByteDance, and Kuaishou have launched advanced AI models in robotics and video generation, signaling China’s intensifying bid to compete head-to-head with U.S. leaders across multiple layers of the AI stack.
A wave of model releases from China’s largest technology firms this week highlights how the country’s AI sector is expanding beyond chatbots into robotics, cinematic video generation, and autonomous “agentic” systems.
Alibaba, ByteDance, and Kuaishou each unveiled new AI platforms aimed at competing directly with U.S. firms such as OpenAI, Nvidia, and Google.
Register for Tekedia Mini-MBA edition 19 (Feb 9 – May 2, 2026).
Register for Tekedia AI in Business Masterclass.
Join Tekedia Capital Syndicate and co-invest in great global startups.
Register for Tekedia AI Lab.
The momentum follows comments by Demis Hassabis, head of Google DeepMind, who told CNBC that Chinese AI models are only “months” behind Western competitors — an assessment that underscores how quickly performance gaps are narrowing.
The latest launches also show that China’s AI race is not confined to language models but extends into multimodal systems that integrate text, vision, sound, and physical interaction.
Robotics and the Push for “Embodied” AI
Alibaba’s DAMO Academy introduced RynnBrain, a model designed to give robots deeper awareness of their surroundings.
In demonstrations, robots equipped with grippers were shown identifying, counting, and picking up objects such as oranges, as well as retrieving items from a refrigerator. Tasks that appear simple to humans require substantial perception and reasoning capabilities for machines: object recognition, spatial mapping, trajectory planning, and sequential memory.
Adina Yakefu, a researcher at Hugging Face, said a central innovation in RynnBrain is built-in time and space awareness.
“Instead of simply reacting to immediate inputs, the robot can remember when and where events occurred, track task progress, and continue across multiple steps,” she said. That continuity is essential for real-world deployment in warehouses, factories, and service environments.
The strategic importance lies in “embodied AI” — systems that move beyond digital text responses to operate in physical space. Nvidia and Google are investing heavily in this domain, seeing robotics as a long-term driver of productivity in logistics, manufacturing, and healthcare. By building foundational robotics models, Alibaba signals ambitions to control a critical layer of industrial automation infrastructure.
If scaled, such systems could help address labor shortages in sectors like warehousing and elder care, while also raising new regulatory and safety considerations.
Video Generation Becomes a Commercial Battleground
In parallel, ByteDance launched Seedance 2.0, a text-to-video model capable of generating cinematic sequences from text prompts, images, or source video.
Users and researchers say the system shows significant gains in visual realism, camera motion, texture detail, and audio integration. Billy Boman, who runs a creative agency producing AI-generated content, described the recent progress as transformative. “Back in 2023 … it was difficult to get someone to run or to walk,” he said. “Now I can do anything.”
Seedance competes with OpenAI’s Sora and similar U.S. models, placing Chinese firms squarely in the global contest for dominance in generative media. Improvements in controllability and production efficiency make such tools increasingly viable for advertising, entertainment, and social media content creation.
However, rapid progress has also raised governance concerns. Chinese media reported that Seedance suspended a feature that enabled voice generation from an uploaded image after questions were raised about consent. The episode highlights the tension between technological capability and ethical guardrails — a theme that extends across global AI development.
Kuaishou’s Kling 3.0, released last week, adds another competitive entry. The model supports up to 15 seconds of video, improved consistency, and multilingual audio generation. Kling is currently offered to paying subscribers, suggesting a monetization strategy integrated into Kuaishou’s short-video ecosystem.
Kuaishou’s stock has risen more than 50% over the past year, reflecting investor optimism that AI-enhanced content tools can deepen engagement and create new revenue streams.
The Rise of Open-Source and Agentic Systems
Beyond robotics and video, Chinese firms are also accelerating in large language models and AI agents.
Zhipu AI released GLM-5, an open-source model with enhanced coding capabilities and long-running task management. The company said GLM-5 approaches Anthropic’s Claude Opus 4.5 on coding benchmarks and outperforms Google’s Gemini 3 Pro in some tests — claims that have not been independently verified by CNBC.
MiniMax unveiled M2.5, an updated open-source model with expanded agentic features. Agentic AI refers to systems capable of autonomously carrying out multi-step workflows, such as scheduling, research, or software deployment, with limited human intervention.
Open-source distribution is a notable feature of China’s AI push. By making models accessible to developers, companies can accelerate ecosystem adoption, gather feedback, and stimulate downstream innovation. This contrasts with more closed commercial strategies adopted by some Western AI leaders.
The breadth of releases suggests a coordinated push across multiple layers of the AI value chain — from foundational models to consumer-facing applications.
China’s AI firms benefit from vast domestic user bases, particularly in short-video platforms, providing real-time data and testing environments for generative systems. Integration into super-app ecosystems enables rapid commercialization.
At the same time, U.S. export controls on advanced semiconductors have added complexity to China’s AI ambitions. Companies are investing in optimizing models for locally available hardware and in developing domestic chip capabilities to reduce reliance on U.S. suppliers.
The competitive dynamic is no longer limited to benchmark performance. It now encompasses deployment speed, integration with hardware, regulatory alignment, and commercial scale.
Demis Hassabis’ assessment that Chinese models are “months” behind Western counterparts suggests a narrowing technological gap. The latest launches indicate that in areas such as video realism and embodied robotics, competition is increasingly defined by iteration cycles and ecosystem depth rather than by clear technological dominance from one side.
It also underlines Beijing’s intention to compete across all three fronts simultaneously, reinforcing a multipolar AI landscape where leadership may vary by domain rather than by geography alone.



