Token - AI Infrastructure Intelligence Search

Other Other 2026-07-14

SANS Identifies Distributed Scanning of MCP Servers and AI Assistant Configs

SANS Internet Storm Center reports systematic scanning of MCP servers, AI assistant configs, and local LLM endpoints. 49 IPs targeted MCP handshakes, exploiting CVEs in MCP SDKs, signaling AI infrastructure as a new attack vector.

Huawei Other 2026-07-10

Huawei Ascend 10K-Card Cluster Goes Live, UnifiedBus Protocol Pools All Resources

Huawei launched an Ascend 10,000-card AI cluster in Shaoguan, Guangdong, and showcased the Atlas 950 SuperPoD with its proprietary UnifiedBus interconnect supporting 8,192 NPUs at 16.3 PB/s. Huawei Cloud also entered the Gartner 2026 Cloud AI Infrastructure Leaders quadrant, reinforcing its push for a self-contained AI ecosystem.

AMD Other 2026-07-10

AMD's Experimental Topological Ghost Protocol Boosts MI300X Inference 10x

AMD introduces experimental Topological Ghost Protocol (TGP) on MI300X GPUs, achieving 431 tokens/sec with 100% success in high-concurrency inference, 10x improvement over standard vLLM. TGP uses KV-cache recycling and segmented state management, still experimental but potentially redefining AI inference benchmarks.

Google Other 2026-07-09

Google Gemini 3.5 Pro Rebuilds from Scratch: 2M Token Context Window Reshapes AI Frontier

Google DeepMind targets July 17 for Gemini 3.5 Pro, a full architectural rewrite of its pretraining stack to overcome deficits in math reasoning, SVG generation, and image quality. Specs include a 2M token context window, Deep Think reasoning layer, and multi-step autonomous workflows, though unconfirmed by Google.

Anthropic Other 2026-07-03

Anthropic Launches Claude Sonnet 5, Closing Gap to Opus, Targets Enterprise Workflows

Anthropic launches Claude Sonnet 5, a mid-tier model that nearly matches flagship Opus 4.8 on SWE-bench Pro (63.2% vs 69.2%) and surpasses it on GDPval-AA v2 (1618 vs 1615). Priced at 60% of the flagship, it is paired with Claude Science, a research workbench integrating 60+ scientific databases, aiming to deepen enterprise lock-in through tooling and cost-performance.

Qualcomm Other 2026-07-02

Qualcomm Enters AI Inference with Dragonfly C1000 CPU and HBC Near-Memory Compute

Qualcomm unveils Dragonfly roadmap with Oryon-based C1000 CPU and AI300 inference accelerator featuring HBC near-memory compute. Meta and Microsoft are early adopters. The strategy targets AI inference TCO reduction and memory wall breakthrough, bypassing Nvidia's training dominance.

Anthropic Other 2026-07-02

Anthropic Launches Sonnet 5: 40% Cost for Near-Opus Performance, Reshaping AI Inference Economics

Anthropic launches Claude Sonnet 5, a mid-range flagship model priced at 40% of Opus 4.8. It scores 63.2% on SWE-bench Pro, approaching Opus's 69.2%, and surpasses Opus on GDPval-AA v2. With native 1M token context and 48B average activated parameters, Sonnet 5 targets high-volume API revenue growth.

Cloudflare Other 2026-07-01

Announcing the Monetization Gateway: charge for any resource behind Cloudflare via x402

...

NVIDIA Other 2026-07-01

NVIDIA BlueField-3 DPU: Shifts AI Cloud I/O Control from CPU to Dedicated Silicon, Redefines Compute Delivery & Security

NVIDIA's BlueField-3 DPU uses hardware vDPA to offload virtualization data plane from host CPU to dedicated processor, delivering near-bare-metal performance with live migration flexibility. It also creates a trusted I/O path for confidential computing. However, this fundamentally locks cloud infrastructure into NVIDIA silicon, increasing vendor dependency.

OpenAI Other 2026-06-30

OpenAI GPT-5.6 Sol Launches with Government-Approved Access: A New Era of Regulated AI

OpenAI launches GPT-5.6 series with Sol achieving 91.9% on TerminalBench 2.1, but adopts a government-approval access model. Models are rated 'High' risk with record-high cheating rates. Pricing is half of Anthropic's flagship, yet access is limited to 20 partners under White House oversight.

Amazon Other 2026-06-30

AWS and Anthropic Ink Token-Based Pricing, Reshaping AI Cloud Economics

Amazon AWS and Anthropic have agreed to a new token-based pricing model, shifting from compute-centric to usage-centric billing for running Anthropic models on AWS. This move, driven by AWS's weak Nova model performance, deepens their partnership to challenge the Microsoft-OpenAI alliance, but introduces new cost dynamics for Amazon.

NVIDIA Other 2026-06-25

NVIDIA Unveils Vera CPU for AI Agents, Shifting Control from x86 to Proprietary Silicon

At the annual meeting, Huang announced Vera CPU for AI agents paired with Rubin GPU, claimed Blackwell delivers 30x token throughput over next-best platform, and reiterated CUDA as a moat. This move aims to shift AI compute control from general-purpose CPUs to NVIDIA's proprietary architecture.

Huawei Other 2026-06-25

Huawei Pushes Token-Based Billing at MWC Shanghai 2026: Shifting Carrier Monetization from Bytes to AI Inference Value

At MWC Shanghai 2026, Huawei urged carriers to shift from byte-based to token-based billing for AI workloads, showcasing a 372% token throughput improvement in long-sequence inference via its AI Inference Acceleration Solution. It also highlighted the Upper-6 GHz band as critical for AI wearables requiring 20 Mbps uplink, aiming to reposition 5G-A networks as AI compute delivery infrastructure.

Huawei Other 2026-06-25

Huawei Unveils AI-Centric Network with Token Monetization, UCM Caching Breaks Long-Context Barriers

At MWC Shanghai 2026, Huawei unveiled an AI-native network architecture integrating service, network, and compute, shifting from traffic-centric to intelligence-centric operations. The Unified Cache Manager (UCM) extends KV cache to petabyte-scale external storage, achieving 372% token throughput gains on GLM-5.1 at 128K sequence lengths. Token monetization frameworks and agentic operations enable carriers to charge for AI inference capacity and personalize services.

NVIDIA Other 2026-06-25

Qualcomm Dragonfly: 250-core CPU, HBC memory, UALink interconnects target AI inference TCO

Qualcomm unveils full data center portfolio: Dragonfly C1000 250-core Oryon CPU (>5GHz, PCIe Gen7, CXL), HBC near-memory compute (133TB/s Gen1, 18x-54x effective BW), AI300 inference accelerator (UALink/ESUN scale-up), and 800G/1.6T connectivity. Multi-year Meta CPU deal. Commercial sampling 2027-2028. Targets inference TCO with tokens-per-watt leadership.

Huawei Other 2026-06-24

Huawei and Hubei Mobile Validate AI Inference Acceleration: External KV Cache Boosts Throughput 372%

Huawei and Hubei Mobile completed the first operator AI inference acceleration trial, using OceanStor A800 storage and Ascend A3 supernode with UCM to externalize KV Cache to PB-level storage, achieving up to 372% TPS improvement for long-context inference on GLM-5.1 and MiniMax M2.5 models.

OpenAI Other 2026-06-23

OpenAI GPT-5.6: 1.5M Context Window, Digital Employee Push, Price War on Anthropic

OpenAI is launching GPT-5.6 with a 1.5M token context window, 10-15% token efficiency improvement, and pricing at 1/3 of Claude Fable 5. The model pivots to digital employee roles via agentic workflows, code generation, and Playwright automation, directly targeting Anthropic's stalled Fable 5 user base.

Amazon Other 2026-06-23

AWS Lambda MicroVMs: Stateful Isolated Sandboxes via Firecracker Snapshots

AWS launches Lambda MicroVMs, leveraging Firecracker for VM-level isolation, near-instant launch/resume, and stateful execution. Users build images from Dockerfiles in S3, launch from pre-initialized snapshots, and suspend/resume automatically, enabling multi-tenant AI code sandboxes and interactive analytics.

NVIDIA Other 2026-06-23

Nvidia Vera Rubin CPU: 10-Wide Core Redefines CPU for Agentic Computing

At GTC Taipei 2026, Nvidia unveiled the Vera Rubin CPU with a custom 10-wide fetch/decode/execute pipeline, claiming world-leading IPC and bandwidth. Designed for agentic computing, it complements Nvidia GPUs. Nvidia also announced a partnership with Microsoft to reinvent the PC as a Personal AI and committed to returning 50% of free cash flow to shareholders.