DD
MM
YYYY

PAGES

DD
MM
YYYY

spot_img

PAGES

Home Blog Page 2

Nvidia CEO Says Would Be Alarmed If $500k Engineers Do Not Spend Up to $250k Yearly on Token

0

Nvidia CEO Jensen Huang delivered a striking message for engineering talent during an appearance on the “All-In Podcast” episode published Thursday: top engineers who fail to consume hundreds of thousands of dollars worth of AI tokens annually are a cause for serious concern.

Huang stated he would be “deeply alarmed” if one of Nvidia’s $500,000-a-year engineers spent less than half that amount — $250,000 — on AI tokens over the course of a year.

“That $500,000 engineer at the end of the year, I’m going to ask them how much did you spend in tokens? If that person said $5,000, I will go ape something else,” he said.

When asked whether Nvidia itself is spending $2 billion annually on tokens for its engineering team, Huang replied: “We’re trying to.”

He drew a direct analogy to outdated methods: “This is no different than one of our chip designers who says, ‘Guess what? I’m just going to use paper and pencil.’”

Huang argued that engineers who underutilize AI tokens are effectively limiting their own productivity and impact.

Tokens as a Recruiting and Productivity Tool

Huang went further, revealing that AI token budgets are already becoming a competitive recruiting lever in Silicon Valley.

“They’re going to make a few hundred thousand dollars a year, their base pay,” he said of engineers. “I’m going to give them probably half of that on top of it as tokens so that they could be amplified 10X.”

“It is now one of the recruiting tools in Silicon Valley: How many tokens comes along with my job?” Huang added. “And the reason for that is very clear, because every engineer that has access to tokens will be more productive.”

Tokens, the basic unit used by large language models to process and generate text, are typically charged on a per-thousand or per-million basis by providers such as OpenAI, Anthropic, Google, and others. Heavy usage can quickly become expensive, especially for engineers running large-scale experiments, fine-tuning models, or building complex agentic workflows.

Tokens as the “Fourth Component” of Compensation

Huang is not alone in viewing generous AI compute access as a critical talent differentiator. Business Insider reported earlier in March 2026 that tech companies are experimenting with offering token budgets alongside traditional salary, bonuses, and equity — effectively treating inference power as a new form of compensation.

Tomasz Tunguz of Theory Ventures described tokens as a potential “fourth component” of pay packages. Peter Gostev, AI capability lead at Arena (a startup focused on model performance benchmarking), suggested that frontier labs like OpenAI and Anthropic could create recruitment marketplaces listing token budgets alongside salary ranges.

Thibault Sottiaux, an engineering lead on OpenAI’s Codex team, noted on X that candidates increasingly ask how much compute they will receive.

Even OpenAI CEO Sam Altman has speculated about a future where compute access replaces traditional income support. In a May 2024 appearance on the same “All-In Podcast,” Altman mused: “I wonder if the future looks something more like Universal Basic Compute than Universal Basic Income, and everybody gets a slice of GPT-7’s compute. And they can use it, they can resell it, they can donate it to somebody to use for cancer research, but what you get is not dollars but this like slice, you own part of the productivity.”

As Engineers Shift to Token

Huang’s comments reflect Nvidia’s unique position at the center of the AI boom. As the dominant supplier of GPUs for training and inference, Nvidia benefits directly from skyrocketing token consumption across the industry. By framing heavy token usage as a productivity imperative — and even a recruiting tool — Huang is reinforcing the narrative that access to advanced AI compute is now a core requirement for top engineering talent.

The remarks also highlight a shift in how companies measure engineering productivity. Traditional metrics (lines of code, features shipped) are giving way to compute-intensive workflows: model experimentation, agent orchestration, large-scale data processing, and real-time inference. Engineers who underuse tokens may be seen as operating below their potential in an AI-native environment.

For talent acquisition, token budgets could become a powerful differentiator — especially as frontier models grow more expensive to run at scale. Startups and large tech firms alike may increasingly compete not just on salary and equity, but on how much high-quality inference capacity they can provide.

Huang’s stance is likely to accelerate the trend toward compute-inclusive compensation packages across Silicon Valley and beyond. As AI agents and multimodal models become central to software development, engineering roles will demand ever-larger token allocations — turning inference spend into a visible line item in hiring negotiations.

Nvidia itself stands to benefit disproportionately: more engineers consuming more tokens means more demand for Nvidia GPUs and cloud capacity. The company’s push to make heavy token usage a performance expectation — and even a hiring criterion — further cements its central role in the AI talent and productivity ecosystem.

The notion also underscores a broader philosophical shift: in the AI era, raw human intelligence is increasingly amplified (and measured) by access to compute. Engineers who maximize their token spend aren’t just more productive — they’re demonstrating mastery of the new tools defining the profession. For top talent, the question may soon be less “how much equity?” and more “how many tokens come with the job?”

Cursor Releases “Composer” which Outperforms Existing Coding Models

0

Cursor; the AI-powered code editor from Anysphere has recently released Composer 2, their latest in-house “agentic” coding model. These are proprietary models optimized for low-latency, agentic coding—meaning the AI can autonomously plan, edit multiple files, test, and iterate on code within your codebase, rather than just suggesting snippets.

The big focus has always been on combining strong coding intelligence with exceptional speed often 4x faster than comparable frontier models in earlier versions and now, very competitive pricing. Cursor announced Composer 2, positioning it as achieving frontier-level coding performance at a dramatically lower cost.

CursorBench: 61.3%; their internal real-world coding tasks benchmark. Terminal-Bench 2.0: 61.7%; beats Anthropic’s Claude Opus 4.6 at 58.0%, though trails OpenAI’s GPT-5.4 at 75.1%. SWE-bench Multilingual: 73.7%. This represents a significant jump ~17 points on some metrics from Composer 1.5 in a short cycle.

It’s described as on par with or beating models like Claude Opus 4.6 in practical coding scenarios, while being much more affordable. Composer 2 Standard: $0.50 per million input tokens/ $2.50 per million output tokens. Composer 2 Fast (higher speed variant, now the default): $1.50 / $7.50 per million.

For comparison, Claude Opus 4.6 is around $5/$25, and GPT-5.4 is higher—making Composer 2 roughly 3-10x cheaper depending on the competitor. Technical edges include scaled reinforcement learning (RL) on real Cursor usage data, self-summarization for better long-context handling in multi-step tasks, and optimization for interactive agentic workflows in the IDE.

Users and early testers on X are calling it “legit,” with reports of it catching subtle bugs that other models including Claude and various GPT variants missed, and handling complex refactors efficiently. There are also leaks/claims that the base might build on Moonshot AI’s Kimi K2.5 with heavy continued pretraining + RL on Cursor’s proprietary coding data, which would explain the speed/cost advantages over fully proprietary frontier models from OpenAI or Anthropic.

It depends on the metric:Yes, on speed + cost + practical agentic coding in Cursor’s environment especially vs. similarly priced or even higher-priced options like recent Claude versions.

Partially, on raw intelligence—it’s frontier-level and beats some; Opus 4.6 on certain benches, but top models like GPT-5.4 still lead on the hardest tasks. The real win is the value: high performance at a fraction of the cost, making it feel like it outperforms in real developer workflows.

SWE-bench is one of the most widely used and respected benchmarks for evaluating how well large language models (LLMs) and AI coding agents can handle real-world software engineering tasks. Introduced in late 2023, it stands out because it uses actual problems from GitHub rather than synthetic or toy coding exercises.

SWE-bench tasks an AI with resolving real GitHub issues from popular open-source repositories. For each task, the model receives: The full codebase at the state before the issue was fixed. The issue description (title + body from GitHub). Sometimes additional context like comments. The goal is to generate a code patch (diff) that fixes the problem. Success is measured automatically: the patch is applied in a clean Docker environment, and the model’s change must make the relevant unit tests pass. Original full SWE-bench: ~2,294 tasks, all from 12 popular Python repositories.

Tasks include bug fixes, small features, refactors, and more — reflecting genuine developer work. This makes it much harder and more realistic than older benchmarks like HumanEval which tests isolated function completion because it requires: Understanding large, complex codebases often tens of thousands of lines.

Navigating dependencies and repo structure. Interpreting sometimes ambiguous or poorly written issue reports. Generating multi-file edits that don’t break existing functionality. SWE-bench Verified: A cleaned, human-validated subset of 500 tasks. Annotators checked that issues are clear, tests are correct, tasks are solvable from the given info, and no data leaks/memorization artifacts exist.

This version is more reliable for comparing models; less noise from bad tasks. Top models in early 2026 reach ~75-82% on Verified. SWE-bench Lite: A smaller, easier subset often ~300 tasks used for faster evaluation or when full runs are too expensive.

SWE-bench Multilingual: Extends the idea beyond Python. It includes 300+ curated tasks from repositories in 9 languages; Java, TypeScript/JavaScript, Go, Rust, C/C++, etc. This tests cross-language understanding and generalization — performance is noticeably lower than on Python-only versions because most frontier models are still heavily Python-biased in training data.

There are also community forks and extensions like SWE-bench Pro, SWE-bench Live, and others that add multi-language depth, harder tasks, or anti-contamination measures. Scores are usually % Resolved. On SWE-bench Verified: Often higher, e.g., 76-82% for top models like Claude Opus 4.6 or newer Sonnet/Opus variants.

On SWE-bench Multilingual: Lower overall, highlighting gaps in non-Python performance. In the context of Cursor’s Composer 2, they reported 73.7% on SWE-bench Multilingual — a very strong result, especially at their price point, showing it’s competitive even on the harder cross-language version.

Tests agentic capabilities: planning, exploration, multi-file editing, debugging loops. Many tasks are relatively “simple” bug fixes; hours of human work, not days/weeks. Potential data contamination/memorization risks some papers argue top scores partly come from models “remembering” popular repos.

Overall, SWE-bench and especially Verified + Multilingual remains the de facto standard for agentic coding evaluation in 2026 — far more indicative of real usefulness in tools like Cursor, Devin-style agents, or GitHub Copilot Workspace than function-level benchmarks.

Nvidia’s Million-GPU Deal With Amazon Signals the Next Phase of AI: From Training Arms Race to Inference Scale

0

A landmark agreement between Nvidia and Amazon Web Services is offering one of the clearest signals yet of where the artificial intelligence economy is heading—and how the balance of power between chipmakers and cloud providers is evolving.

Nvidia will supply AWS with 1 million GPUs between now and 2027, according to the company’s vice president of hyperscale and high-performance computing, Ian Buck. The timeline aligns with chief executive Jensen Huang’s projection of a $1 trillion revenue opportunity tied to its next-generation Blackwell and Rubin chip architectures.

While the headline number is striking, the structure of the deal is more revealing. This is not a simple hardware purchase. It is a full-stack infrastructure partnership that spans compute, networking, and increasingly, inference—the stage of AI deployment where models generate responses and perform tasks in real time.

That distinction marks a turning point.

For much of the past two years, the AI boom has been defined by training—the process of building large language models using vast amounts of data and compute power. Nvidia’s dominance was built on supplying the GPUs required for that phase.

Now, the center of gravity is shifting toward inference. As AI systems move from development to widespread use, the demand profile changes. Instead of massive, one-off training runs, companies need sustained, efficient compute to serve millions—or billions—of user queries.

Buck captured the complexity of that shift bluntly: inference, he said, is “wickedly hard.”

To address it, AWS is not relying on a single class of chip. The deal includes a mix of Nvidia technologies—GPUs, Spectrum networking chips, and newer inference-focused processors such as Groq—alongside six additional Nvidia chip types. The goal is to optimize performance across different workloads, from large-scale model training to latency-sensitive applications like chatbots, recommendation engines, and autonomous systems.

This multi-chip approach reflects a broader industry reality. No single architecture can efficiently handle the full spectrum of AI tasks. Instead, hyperscalers are assembling heterogeneous compute stacks, combining different processors to balance cost, speed, and energy efficiency. That has implications for Nvidia’s long-term strategy. The company is no longer just a GPU vendor; it is positioning itself as a systems provider, integrating compute, networking, and software into a unified AI platform.

The inclusion of Nvidia’s ConnectX and Spectrum-X networking gear in AWS data centers is particularly significant. Traditionally, AWS has relied heavily on its own custom-built networking infrastructure, a core part of its competitive advantage. Opening that stack to Nvidia hardware suggests a deeper level of collaboration—and a recognition that AI workloads may require different architectural choices than traditional cloud computing.

It also signals a subtle shift in leverage. Hyperscalers like AWS have spent years developing in-house chips to reduce dependence on suppliers. Yet the scale and urgency of AI demand are forcing a more pragmatic approach: partnering with Nvidia even as they continue to build their own alternatives.

For AWS, the deal is about capacity and speed. Securing access to 1 million GPUs ensures it can meet surging customer demand for AI services, from startups building generative AI applications to enterprises embedding AI into core operations. But the agreement locks in long-term demand and reinforces Nvidia’s central role in the AI ecosystem. It also provides visibility into future revenue streams at a time when investors are closely watching whether the current AI spending boom can be sustained.

There is, however, a deeper competitive undercurrent.

The emphasis on inference chips—particularly newer offerings like Groq—suggests Nvidia is moving to defend its position against a growing field of specialized competitors. Startups and established players alike are targeting inference as a more cost-sensitive and potentially higher-volume segment than training.

If training established Nvidia’s dominance, inference will test its adaptability.

The economics are different. Training workloads are episodic and capital-intensive, favoring high-performance, high-margin chips. Inference workloads are continuous and cost-driven, requiring efficiency at scale. That shift could compress margins over time, even as total demand expands.

At the same time, the deal underscores the sheer scale of the AI buildout underway. A commitment of 1 million GPUs from a single cloud provider points to an infrastructure race that is still in its early stages. Data centers are being reconfigured, power consumption is rising, and supply chains are being stretched to meet demand.

This raises broader questions about sustainability—both in terms of energy usage and capital allocation. Hyperscalers are investing tens of billions of dollars in AI infrastructure, betting that demand will justify the outlay. Nvidia, in turn, is scaling production to meet that demand, tying its growth trajectory closely to the spending cycles of a handful of large customers.

The partnership with AWS illustrates how concentrated that ecosystem has become. A small number of companies—cloud providers, chipmakers, and large AI developers—are effectively shaping the architecture of the AI economy.

But AWS continues to develop its own chips, such as Trainium and Inferentia, aimed at reducing reliance on external suppliers. The coexistence of those efforts with large-scale Nvidia purchases reflects a dual strategy: build internally where possible, but buy externally where necessary to maintain competitiveness.

In that sense, the deal is both collaborative and competitive.

It locks Nvidia into the core of AWS’s AI infrastructure while reinforcing AWS’s role as a gatekeeper of AI services for enterprises. Each depends on the other, even as both seek to expand their own capabilities.

What emerges is a clearer picture of the next phase of the AI cycle. The initial scramble to build models is giving way to a longer, more complex process of deploying them at scale.

Block Rehires Some of The Laid-Off 4,000 Employees, Cites Clerical Error

0

Block Inc., a U.S.-Based financial technology company, has once again made the headlines, this time not for layoffs but for rehiring. The tech company is reported to have rehired some of the 4,000 employees it laid off last month.

In a recent post on LinkedIn, Andrew Harvard, who builds agentic AI experiences at Block, disclosed that he has been offered the opportunity to return to the company, which Block stated was due to a clerical error.

He wrote,

“Block leadership informed me that my layoff was due to a clerical error. They offered me the opportunity to return, and I’ve accepted. I’m grateful for the encouragement and outreach from so many of you after my initial post. It meant a great deal. To my former colleagues continuing to face the reality of layoffs, please feel welcome to contact me directly for support. I will make time to help you with whatever you need.”

Also, Creative Strategy Lead at Block, Chane Rennie, disclosed that he was asked to rejoin the company last week.

He wrote,

“Relieved to share that I was asked to rejoin Block and started back this week. To everyone who reached out with encouragement, references, job recs, job opportunities, or Zelda tips, just know it meant more than I can adequately express. I owe you all a beer, a hug, and probably both.”

Report reveals that Block has rehired at least four laid-off employees, according to LinkedIn posts from affected workers and their colleagues. The employees span multiple departments, from engineering to recruiting. Some said they were rehired soon after the February layoffs, while others said they rejoined later in March.

While the company did not specify the exact number of employees affected by the error, the development has raised concerns about the execution of large-scale workforce reductions within major corporations.

The incident underscores the challenges companies face when implementing rapid cost-cutting measures, particularly in times of economic uncertainty or strategic realignment. For affected employees, the experience has likely been disruptive, involving sudden job loss followed by an unexpected rehiring process.

Beyond operational implications, the situation may also have reputational consequences for Block, as such errors can erode employee trust and raise questions about internal controls and human resource management practices.

Block Inc., founded by Jack Dorsey, has been undergoing strategic shifts in recent years as it seeks to strengthen its position in digital payments, cryptocurrency, and financial services.

The company made headlines last month after it laid off a significant number of more than 4,000 of its workers, shrinking from over 10,000 employees to just under 6,000.

The decision, according to Dorsey, was framed not as a response to financial distress, but as a proactive embrace of artificial intelligence and “intelligence tools” that are fundamentally reshaping how companies operate.

Dorsey emphasized that Block’s core business remains strong, gross profit is growing, customer numbers are rising, and profitability is improving. Yet he argued that gradual, repeated layoffs over months or years would erode morale, focus, and stakeholder trust more than a single decisive action.

The decision to rehire affected staff signals an attempt by the company to correct its mistakes and mitigate the impact of the flawed layoff process. However, it also highlights the importance of precision and accountability in workforce management, especially during periods of significant organizational change.

Goldman warns oil could stay above $100 as Iran war fuels fears of global downturn

0

Goldman Sachs has warned that oil markets are entering a prolonged period of stress, with risks to prices tilted firmly to the upside as the U.S.-Israeli war with Iran disrupts supply flows and threatens to spill over into the global economy.

The latest escalation pushed Brent crude above $119 per barrel, underscoring the severity of the shock after Iranian strikes hit energy facilities across the Gulf. The attacks have forced production shut-ins and heightened uncertainty around the Strait of Hormuz, a critical artery that carries roughly one-fifth of global oil and gas supply.

Goldman’s analysis suggests the market may be underestimating how long the disruption could last. Drawing comparisons with past crises such as the 1973 oil embargo, the bank said supply shocks of this magnitude tend to persist, keeping prices elevated even after hostilities ease. While its base case assumes flows begin to recover from April and prices ease into the $70 range by late 2026, it warned that structural damage to infrastructure or prolonged geopolitical tension could keep oil above $100 per barrel into 2027.

That prospect is already reverberating across global financial markets, where the surge in oil prices is stoking fears of a broader economic downturn. Higher crude costs feed directly into fuel, transportation, and manufacturing expenses, raising input costs for businesses and eroding consumer purchasing power. For import-dependent economies, the shock is even more acute, often triggering currency pressures and widening trade deficits.

Economists say the current trajectory raises the risk of a stagflationary cycle—where growth slows while inflation accelerates. The longer oil remains elevated, the more likely it is to choke off demand, dampen industrial output, and weigh on global trade. Goldman noted that if disruptions persist, Brent could even surpass its 2008 peak, a scenario that would significantly tighten financial conditions worldwide.

Against this backdrop, central banks are being forced into a difficult position. The inflationary impulse from energy prices is complicating what had been a gradual shift toward monetary easing. Policymakers who were previously considering rate cuts are now reassessing their stance, with some weighing the need for tighter policy to prevent inflation from becoming entrenched.

The U.S. Federal Reserve, the European Central Bank, and other major monetary authorities are expected to hold a more hawkish tone in upcoming meetings, even as growth risks mount. Higher interest rates, while aimed at containing inflation, could further slow economic activity—deepening the risk of a synchronized global slowdown.

Goldman also highlighted the potential for additional market distortions. A widening spread between Brent and West Texas Intermediate could emerge if the United States considers export restrictions to shield domestic consumers, a move that would tighten global supply further. At the same time, while OPEC retains spare capacity, the bank cautioned that deploying it may not fully offset losses, particularly if infrastructure damage or security concerns limit production.

The crisis is exposing deeper structural vulnerabilities in the oil market. Years of underinvestment in upstream capacity, combined with rising geopolitical fragmentation, have reduced the system’s ability to absorb shocks. As a result, even partial disruptions are having outsized effects on prices and volatility.

Thus, the stakes are rising quickly for governments and policymakers. Elevated energy costs are not only a threat to growth but also a political risk, as households grapple with higher fuel and food prices. In emerging markets, where energy subsidies are often used to cushion consumers, fiscal pressures could intensify sharply.

In essence, the oil shock is no longer just a commodity story—it is becoming a macroeconomic one. With supply risks lingering, central banks on alert, and markets increasingly jittery, the trajectory of crude prices may now dictate the pace and stability of the global economic outlook in the months ahead.