Alibaba’s Workforce Shrinks 34% in 2025 to 128,197 as Company Sheds Offline Retail Assets and Doubles Down on AI Ambitions

By

-

March 20, 2026

Alibaba Group Holding Ltd. disclosed Thursday that its global headcount fell sharply to 128,197 employees as of December 31, 2025, a 34% reduction from 194,320 a year earlier.

The development underlines aggressive divestitures of labor-intensive offline retail businesses and a strategic pivot toward artificial intelligence as the company’s primary growth engine.

The headcount drop, one of the largest percentage declines among major global tech firms in recent years, was driven primarily by the 2024 sale of Sun Art Retail Group (a hypermarket chain) at the end of the year and the earlier exit from its stake in department store operator Intime. Those transactions removed tens of thousands of employees from Alibaba’s consolidated numbers and marked the culmination of a multi-year effort to streamline non-core, capital-heavy retail operations.

Alibaba’s latest quarterly earnings report, covering the December 2025 quarter, showed revenue slightly missing analyst expectations while net profit plunged 67% year-over-year, underscoring the financial strain from restructuring costs, competitive pressures in core e-commerce, and heavy investment in cloud and AI infrastructure. Shares in Hong Kong fell 6% on Friday, reflecting investor disappointment with the profit decline and cautious near-term outlook.

Alibaba CEO Eddie Wu used the earnings call to reiterate the company’s ambition to evolve into a full-stack AI enterprise, spanning semiconductor design and manufacturing, cloud computing infrastructure, foundational models, and agentic AI applications. Wu set an explicit target of growing combined cloud and AI revenue to more than $100 billion annually within five years, a roughly fourfold increase from current levels, positioning the unit as the principal driver of future profitability.

This week, Alibaba launched Wukong, an agentic AI service tailored for businesses that enables autonomous, multi-step task execution across enterprise workflows. The company also announced price increases of up to 34% for certain cloud and storage services, citing rising demand and higher supply-chain costs for advanced compute resources.

The workforce reduction aligns with this pivot. By offloading asset-heavy retail operations, Alibaba has freed up capital and management bandwidth to fund massive AI R&D and infrastructure build-out, including domestic GPU alternatives, large-scale model training, and agentic platforms. The company has also aggressively recruited AI talent in recent quarters, offsetting some of the broader headcount decline.

Alibaba’s 34% staff reduction in 2025 is among the most dramatic of any major global tech company over the past year. It follows a pattern seen across the sector, from Silicon Valley to Hangzhou, where firms have shed jobs to improve efficiency, refocus on core growth areas (particularly AI), and respond to slower revenue growth and margin pressure.

The cuts were far larger than the 11% reduction reported for December 2024 compared with the prior year, indicating acceleration in 2025 as divestitures closed and AI investment ramped up. Alibaba’s remaining workforce continues to support its dominant e-commerce platforms (Taobao, Tmall), cloud business (Alibaba Cloud), logistics arm (Cainiao), and emerging AI initiatives.

Alibaba’s Hong Kong-listed shares declined 6% on Friday, extending year-to-date losses amid investor caution over the profit plunge, ongoing competitive intensity in e-commerce, and uncertainty surrounding the pace of AI monetization. The sharp workforce reduction was viewed as both a positive signal of cost discipline and a reminder of the challenges in transitioning from a consumer-internet giant to an AI-first enterprise.

Analysts noted that while the headcount drop improves operating leverage in the near term, the success of Alibaba’s AI strategy, including Wukong, cloud price adjustments, and semiconductor efforts, will be critical to reversing margin compression and driving sustainable growth.

Alibaba’s 2025 results and workforce disclosure mean the company is shedding legacy retail assets to fuel an all-in bet on AI across the stack. The company is positioning itself as a full-spectrum AI player, from chips to models to agentic applications, in direct competition with global leaders like Microsoft, Google, Amazon, and domestic rivals including Tencent and Baidu.

Some analysts have described the $100 billion cloud-and-AI revenue target over five years as one of the most ambitious growth projections in global tech. However, they warn that execution will depend on scaling compute capacity amid U.S. export restrictions, achieving rapid enterprise adoption of agentic tools like Wukong, and maintaining pricing power in a competitive cloud market.

OpenAI’s Agreement to Acquire Astral is a Strategic Investment Pivot

By

Paul Ugbede Godwin

-

March 20, 2026

0

OpenAI has announced an agreement to acquire Astral, a startup known for building high-performance, open-source developer tools in the Python ecosystem.

Astral is behind popular tools including: An extremely fast Python package and project manager (alternative to pip and similar tools). Ruff: A super-fast Python linter and formatter written in Rust. ty: A high-speed Python type checker and language server. These tools are widely adopted, powering millions of developer workflows and forming a key part of modern Python development.

The deal will bring Astral’s team and expertise into OpenAI’s Codex group; OpenAI’s AI-powered coding assistant and system, which has seen rapid growth and millions of users. OpenAI aims to accelerate Codex development, enabling deeper integrations so AI agents can interact more seamlessly with real developer tools across the full software development lifecycle.

Both companies emphasized a continued commitment to open source: OpenAI plans to support and maintain Astral’s projects post-closing, with ongoing community-focused development. Financial terms were not disclosed. The acquisition is not yet finalized—it remains subject to customary closing conditions and regulatory approval.

This move fits into OpenAI’s broader push to strengthen its position in AI-driven coding tools, especially amid competition from rivals like Anthropic which made a similar move by acquiring Bun in late 2025. It also aligns with OpenAI’s recent pattern of acquisitions to bolster developer ecosystems and agentic AI capabilities.

This is a strategic step toward making AI more deeply embedded in everyday developer workflows rather than just code generation. Anthropic acquired Bun in late 2025, marking its first public acquisition and a strategic move to bolster its AI-powered coding tools. Bun, the high-performance JavaScript and TypeScript runtime, toolkit, bundler, package manager, and test runner created by Jarred Sumner, joined Anthropic to power and accelerate Claude Code — Anthropic’s AI coding agent.

Claude Code launched generally available in May 2025 had already adopted Bun earlier in the year and shipped as a Bun executable to millions of users. The acquisition ensures stability, faster performance, and deeper integration for Claude Code, the Claude Agent SDK, and future AI coding products.

Bun remains fully open source under the MIT license, with ongoing public development on GitHub. Anthropic committed to continued investment in Bun’s core features for the broader JS/TS developer community. Financial terms were not disclosed. The move coincided with Anthropic revealing that Claude Code had hit a $1 billion annualized revenue run rate, highlighting the explosive growth of its developer-focused AI tools.

This acquisition reflects the intensifying competition in AI-driven developer workflows. AI companies are moving beyond pure model capabilities to vertically integrate the full software development stack — from runtime environments and tooling to agentic interactions. For context, Claude Code relies heavily on Bun for speed and reliability.

Post-acquisition, Claude Code saw notable performance gains attributed to Bun’s team led by Jarred Sumner working directly on optimizations. It parallels recent moves like OpenAI’s acquisition of Astral to enhance Codex with Python ecosystem tools (uv, Ruff, ty).

This fits Anthropic’s broader strategy: disciplined acquisitions that align with technical excellence, enterprise strength, and safe AI development — while keeping core tools open and community-driven. It’s a clear signal that winning in AI coding means owning or deeply controlling the infrastructure developers actually use.

Nvidia CEO Says Would Be Alarmed If $500k Engineers Do Not Spend Up to $250k Yearly on Token

By

Samuel Nwite

-

March 20, 2026

0

Nvidia CEO Jensen Huang delivered a striking message for engineering talent during an appearance on the “All-In Podcast” episode published Thursday: top engineers who fail to consume hundreds of thousands of dollars worth of AI tokens annually are a cause for serious concern.

Huang stated he would be “deeply alarmed” if one of Nvidia’s $500,000-a-year engineers spent less than half that amount — $250,000 — on AI tokens over the course of a year.

“That $500,000 engineer at the end of the year, I’m going to ask them how much did you spend in tokens? If that person said $5,000, I will go ape something else,” he said.

When asked whether Nvidia itself is spending $2 billion annually on tokens for its engineering team, Huang replied: “We’re trying to.”

He drew a direct analogy to outdated methods: “This is no different than one of our chip designers who says, ‘Guess what? I’m just going to use paper and pencil.’”

Huang argued that engineers who underutilize AI tokens are effectively limiting their own productivity and impact.

Tokens as a Recruiting and Productivity Tool

Huang went further, revealing that AI token budgets are already becoming a competitive recruiting lever in Silicon Valley.

“They’re going to make a few hundred thousand dollars a year, their base pay,” he said of engineers. “I’m going to give them probably half of that on top of it as tokens so that they could be amplified 10X.”

“It is now one of the recruiting tools in Silicon Valley: How many tokens comes along with my job?” Huang added. “And the reason for that is very clear, because every engineer that has access to tokens will be more productive.”

Tokens, the basic unit used by large language models to process and generate text, are typically charged on a per-thousand or per-million basis by providers such as OpenAI, Anthropic, Google, and others. Heavy usage can quickly become expensive, especially for engineers running large-scale experiments, fine-tuning models, or building complex agentic workflows.

Tokens as the “Fourth Component” of Compensation

Huang is not alone in viewing generous AI compute access as a critical talent differentiator. Business Insider reported earlier in March 2026 that tech companies are experimenting with offering token budgets alongside traditional salary, bonuses, and equity — effectively treating inference power as a new form of compensation.

Tomasz Tunguz of Theory Ventures described tokens as a potential “fourth component” of pay packages. Peter Gostev, AI capability lead at Arena (a startup focused on model performance benchmarking), suggested that frontier labs like OpenAI and Anthropic could create recruitment marketplaces listing token budgets alongside salary ranges.

Thibault Sottiaux, an engineering lead on OpenAI’s Codex team, noted on X that candidates increasingly ask how much compute they will receive.

Even OpenAI CEO Sam Altman has speculated about a future where compute access replaces traditional income support. In a May 2024 appearance on the same “All-In Podcast,” Altman mused: “I wonder if the future looks something more like Universal Basic Compute than Universal Basic Income, and everybody gets a slice of GPT-7’s compute. And they can use it, they can resell it, they can donate it to somebody to use for cancer research, but what you get is not dollars but this like slice, you own part of the productivity.”

As Engineers Shift to Token

Huang’s comments reflect Nvidia’s unique position at the center of the AI boom. As the dominant supplier of GPUs for training and inference, Nvidia benefits directly from skyrocketing token consumption across the industry. By framing heavy token usage as a productivity imperative — and even a recruiting tool — Huang is reinforcing the narrative that access to advanced AI compute is now a core requirement for top engineering talent.

The remarks also highlight a shift in how companies measure engineering productivity. Traditional metrics (lines of code, features shipped) are giving way to compute-intensive workflows: model experimentation, agent orchestration, large-scale data processing, and real-time inference. Engineers who underuse tokens may be seen as operating below their potential in an AI-native environment.

For talent acquisition, token budgets could become a powerful differentiator — especially as frontier models grow more expensive to run at scale. Startups and large tech firms alike may increasingly compete not just on salary and equity, but on how much high-quality inference capacity they can provide.

Huang’s stance is likely to accelerate the trend toward compute-inclusive compensation packages across Silicon Valley and beyond. As AI agents and multimodal models become central to software development, engineering roles will demand ever-larger token allocations — turning inference spend into a visible line item in hiring negotiations.

Nvidia itself stands to benefit disproportionately: more engineers consuming more tokens means more demand for Nvidia GPUs and cloud capacity. The company’s push to make heavy token usage a performance expectation — and even a hiring criterion — further cements its central role in the AI talent and productivity ecosystem.

The notion also underscores a broader philosophical shift: in the AI era, raw human intelligence is increasingly amplified (and measured) by access to compute. Engineers who maximize their token spend aren’t just more productive — they’re demonstrating mastery of the new tools defining the profession. For top talent, the question may soon be less “how much equity?” and more “how many tokens come with the job?”

Cursor Releases “Composer” which Outperforms Existing Coding Models

By

Paul Ugbede Godwin

-

March 20, 2026

0

Cursor; the AI-powered code editor from Anysphere has recently released Composer 2, their latest in-house “agentic” coding model. These are proprietary models optimized for low-latency, agentic coding—meaning the AI can autonomously plan, edit multiple files, test, and iterate on code within your codebase, rather than just suggesting snippets.

The big focus has always been on combining strong coding intelligence with exceptional speed often 4x faster than comparable frontier models in earlier versions and now, very competitive pricing. Cursor announced Composer 2, positioning it as achieving frontier-level coding performance at a dramatically lower cost.

CursorBench: 61.3%; their internal real-world coding tasks benchmark. Terminal-Bench 2.0: 61.7%; beats Anthropic’s Claude Opus 4.6 at 58.0%, though trails OpenAI’s GPT-5.4 at 75.1%. SWE-bench Multilingual: 73.7%. This represents a significant jump ~17 points on some metrics from Composer 1.5 in a short cycle.

It’s described as on par with or beating models like Claude Opus 4.6 in practical coding scenarios, while being much more affordable. Composer 2 Standard: $0.50 per million input tokens/ $2.50 per million output tokens. Composer 2 Fast (higher speed variant, now the default): $1.50 / $7.50 per million.

For comparison, Claude Opus 4.6 is around $5/$25, and GPT-5.4 is higher—making Composer 2 roughly 3-10x cheaper depending on the competitor. Technical edges include scaled reinforcement learning (RL) on real Cursor usage data, self-summarization for better long-context handling in multi-step tasks, and optimization for interactive agentic workflows in the IDE.

Users and early testers on X are calling it “legit,” with reports of it catching subtle bugs that other models including Claude and various GPT variants missed, and handling complex refactors efficiently. There are also leaks/claims that the base might build on Moonshot AI’s Kimi K2.5 with heavy continued pretraining + RL on Cursor’s proprietary coding data, which would explain the speed/cost advantages over fully proprietary frontier models from OpenAI or Anthropic.

It depends on the metric:Yes, on speed + cost + practical agentic coding in Cursor’s environment especially vs. similarly priced or even higher-priced options like recent Claude versions.

Partially, on raw intelligence—it’s frontier-level and beats some; Opus 4.6 on certain benches, but top models like GPT-5.4 still lead on the hardest tasks. The real win is the value: high performance at a fraction of the cost, making it feel like it outperforms in real developer workflows.

SWE-bench is one of the most widely used and respected benchmarks for evaluating how well large language models (LLMs) and AI coding agents can handle real-world software engineering tasks. Introduced in late 2023, it stands out because it uses actual problems from GitHub rather than synthetic or toy coding exercises.

SWE-bench tasks an AI with resolving real GitHub issues from popular open-source repositories. For each task, the model receives: The full codebase at the state before the issue was fixed. The issue description (title + body from GitHub). Sometimes additional context like comments. The goal is to generate a code patch (diff) that fixes the problem. Success is measured automatically: the patch is applied in a clean Docker environment, and the model’s change must make the relevant unit tests pass. Original full SWE-bench: ~2,294 tasks, all from 12 popular Python repositories.

Tasks include bug fixes, small features, refactors, and more — reflecting genuine developer work. This makes it much harder and more realistic than older benchmarks like HumanEval which tests isolated function completion because it requires: Understanding large, complex codebases often tens of thousands of lines.

Navigating dependencies and repo structure. Interpreting sometimes ambiguous or poorly written issue reports. Generating multi-file edits that don’t break existing functionality. SWE-bench Verified: A cleaned, human-validated subset of 500 tasks. Annotators checked that issues are clear, tests are correct, tasks are solvable from the given info, and no data leaks/memorization artifacts exist.

This version is more reliable for comparing models; less noise from bad tasks. Top models in early 2026 reach ~75-82% on Verified. SWE-bench Lite: A smaller, easier subset often ~300 tasks used for faster evaluation or when full runs are too expensive.

SWE-bench Multilingual: Extends the idea beyond Python. It includes 300+ curated tasks from repositories in 9 languages; Java, TypeScript/JavaScript, Go, Rust, C/C++, etc. This tests cross-language understanding and generalization — performance is noticeably lower than on Python-only versions because most frontier models are still heavily Python-biased in training data.

There are also community forks and extensions like SWE-bench Pro, SWE-bench Live, and others that add multi-language depth, harder tasks, or anti-contamination measures. Scores are usually % Resolved. On SWE-bench Verified: Often higher, e.g., 76-82% for top models like Claude Opus 4.6 or newer Sonnet/Opus variants.

On SWE-bench Multilingual: Lower overall, highlighting gaps in non-Python performance. In the context of Cursor’s Composer 2, they reported 73.7% on SWE-bench Multilingual — a very strong result, especially at their price point, showing it’s competitive even on the harder cross-language version.

Tests agentic capabilities: planning, exploration, multi-file editing, debugging loops. Many tasks are relatively “simple” bug fixes; hours of human work, not days/weeks. Potential data contamination/memorization risks some papers argue top scores partly come from models “remembering” popular repos.

Overall, SWE-bench and especially Verified + Multilingual remains the de facto standard for agentic coding evaluation in 2026 — far more indicative of real usefulness in tools like Cursor, Devin-style agents, or GitHub Copilot Workspace than function-level benchmarks.

Nvidia’s Million-GPU Deal With Amazon Signals the Next Phase of AI: From Training Arms Race to Inference Scale

By

Samuel Nwite

-

March 20, 2026

0

A landmark agreement between Nvidia and Amazon Web Services is offering one of the clearest signals yet of where the artificial intelligence economy is heading—and how the balance of power between chipmakers and cloud providers is evolving.

Nvidia will supply AWS with 1 million GPUs between now and 2027, according to the company’s vice president of hyperscale and high-performance computing, Ian Buck. The timeline aligns with chief executive Jensen Huang’s projection of a $1 trillion revenue opportunity tied to its next-generation Blackwell and Rubin chip architectures.

While the headline number is striking, the structure of the deal is more revealing. This is not a simple hardware purchase. It is a full-stack infrastructure partnership that spans compute, networking, and increasingly, inference—the stage of AI deployment where models generate responses and perform tasks in real time.

That distinction marks a turning point.

For much of the past two years, the AI boom has been defined by training—the process of building large language models using vast amounts of data and compute power. Nvidia’s dominance was built on supplying the GPUs required for that phase.

Now, the center of gravity is shifting toward inference. As AI systems move from development to widespread use, the demand profile changes. Instead of massive, one-off training runs, companies need sustained, efficient compute to serve millions—or billions—of user queries.

Buck captured the complexity of that shift bluntly: inference, he said, is “wickedly hard.”

To address it, AWS is not relying on a single class of chip. The deal includes a mix of Nvidia technologies—GPUs, Spectrum networking chips, and newer inference-focused processors such as Groq—alongside six additional Nvidia chip types. The goal is to optimize performance across different workloads, from large-scale model training to latency-sensitive applications like chatbots, recommendation engines, and autonomous systems.

This multi-chip approach reflects a broader industry reality. No single architecture can efficiently handle the full spectrum of AI tasks. Instead, hyperscalers are assembling heterogeneous compute stacks, combining different processors to balance cost, speed, and energy efficiency. That has implications for Nvidia’s long-term strategy. The company is no longer just a GPU vendor; it is positioning itself as a systems provider, integrating compute, networking, and software into a unified AI platform.

The inclusion of Nvidia’s ConnectX and Spectrum-X networking gear in AWS data centers is particularly significant. Traditionally, AWS has relied heavily on its own custom-built networking infrastructure, a core part of its competitive advantage. Opening that stack to Nvidia hardware suggests a deeper level of collaboration—and a recognition that AI workloads may require different architectural choices than traditional cloud computing.

It also signals a subtle shift in leverage. Hyperscalers like AWS have spent years developing in-house chips to reduce dependence on suppliers. Yet the scale and urgency of AI demand are forcing a more pragmatic approach: partnering with Nvidia even as they continue to build their own alternatives.

For AWS, the deal is about capacity and speed. Securing access to 1 million GPUs ensures it can meet surging customer demand for AI services, from startups building generative AI applications to enterprises embedding AI into core operations. But the agreement locks in long-term demand and reinforces Nvidia’s central role in the AI ecosystem. It also provides visibility into future revenue streams at a time when investors are closely watching whether the current AI spending boom can be sustained.

There is, however, a deeper competitive undercurrent.

The emphasis on inference chips—particularly newer offerings like Groq—suggests Nvidia is moving to defend its position against a growing field of specialized competitors. Startups and established players alike are targeting inference as a more cost-sensitive and potentially higher-volume segment than training.

If training established Nvidia’s dominance, inference will test its adaptability.

The economics are different. Training workloads are episodic and capital-intensive, favoring high-performance, high-margin chips. Inference workloads are continuous and cost-driven, requiring efficiency at scale. That shift could compress margins over time, even as total demand expands.

At the same time, the deal underscores the sheer scale of the AI buildout underway. A commitment of 1 million GPUs from a single cloud provider points to an infrastructure race that is still in its early stages. Data centers are being reconfigured, power consumption is rising, and supply chains are being stretched to meet demand.

This raises broader questions about sustainability—both in terms of energy usage and capital allocation. Hyperscalers are investing tens of billions of dollars in AI infrastructure, betting that demand will justify the outlay. Nvidia, in turn, is scaling production to meet that demand, tying its growth trajectory closely to the spending cycles of a handful of large customers.

The partnership with AWS illustrates how concentrated that ecosystem has become. A small number of companies—cloud providers, chipmakers, and large AI developers—are effectively shaping the architecture of the AI economy.

But AWS continues to develop its own chips, such as Trainium and Inferentia, aimed at reducing reliance on external suppliers. The coexistence of those efforts with large-scale Nvidia purchases reflects a dual strategy: build internally where possible, but buy externally where necessary to maintain competitiveness.

In that sense, the deal is both collaborative and competitive.

It locks Nvidia into the core of AWS’s AI infrastructure while reinforcing AWS’s role as a gatekeeper of AI services for enterprises. Each depends on the other, even as both seek to expand their own capabilities.

What emerges is a clearer picture of the next phase of the AI cycle. The initial scramble to build models is giving way to a longer, more complex process of deploying them at scale.

Alibaba’s Workforce Shrinks 34% in 2025 to 128,197 as Company Sheds Offline Retail Assets and Doubles Down on AI Ambitions

Like this:

OpenAI’s Agreement to Acquire Astral is a Strategic Investment Pivot

Like this:

Nvidia CEO Says Would Be Alarmed If $500k Engineers Do Not Spend Up to $250k Yearly on Token

Tokens as a Recruiting and Productivity Tool

Tokens as the “Fourth Component” of Compensation

As Engineers Shift to Token

Like this:

Cursor Releases “Composer” which Outperforms Existing Coding Models

Like this:

Nvidia’s Million-GPU Deal With Amazon Signals the Next Phase of AI: From Training Arms Race to Inference Scale

Like this:

Alibaba’s Workforce Shrinks 34% in 2025 to 128,197 as Company Sheds Offline Retail Assets and Doubles Down on AI Ambitions

Share this:

Like this:

OpenAI’s Agreement to Acquire Astral is a Strategic Investment Pivot

Share this:

Like this:

Nvidia CEO Says Would Be Alarmed If $500k Engineers Do Not Spend Up to $250k Yearly on Token

Tokens as a Recruiting and Productivity Tool

Tokens as the “Fourth Component” of Compensation

As Engineers Shift to Token

Share this:

Like this:

Cursor Releases “Composer” which Outperforms Existing Coding Models

Share this:

Like this:

Nvidia’s Million-GPU Deal With Amazon Signals the Next Phase of AI: From Training Arms Race to Inference Scale

Share this:

Like this: