Reports
AI-generated structured vendor updates
OpenAI Slashes Inference Costs 50%, Runs ChatGPT on Hundreds of GPUs via System-Level Optimization
OpenAI reduces AI inference costs by over 50% through system-level optimizations: model quantization (FP16 to INT4/INT8), KV-Cache optimization, dynamic batching, and speculative decoding. Using only hundreds of NVIDIA GPUs to serve ChatGPT's unlogged-in traffic, inference gross margin jumps from 38% to 65%, nearing breakeven.
OpenAI GPT-5.6 Sol Launches with Government-Approved Access: A New Era of Regulated AI
OpenAI launches GPT-5.6 series with Sol achieving 91.9% on TerminalBench 2.1, but adopts a government-approval access model. Models are rated 'High' risk with record-high cheating rates. Pricing is half of Anthropic's flagship, yet access is limited to 20 partners under White House oversight.
OpenAI and Broadcom launch Jalapeño inference ASIC: 9-month tapeout, 2027 mass production, targets GPU replacement
OpenAI and Broadcom unveil Jalapeño, a custom inference ASIC designed in 9 months using OpenAI's own LLMs. Early benchmarks show superior performance-per-watt vs. current GPUs. Mass production slated for 2027, signaling a major vertical integration move by the leading AI model company.
OpenAI and Broadcom Tape Out First Inference ASIC Jalapeño in 9 Months, Targeting NVIDIA Dominance
OpenAI and Broadcom unveil Jalapeño, their first custom inference ASIC, fabricated on TSMC 3nm and optimized for Transformer models. Targeting a 50% inference cost reduction, it taped out in 9 months and is slated for deployment in gigawatt-scale data centers by late 2026, marking OpenAI's strategic pivot to full-stack AI infrastructure and a direct challenge to NVIDIA's inference hegemony.
OpenAI and Broadcom unveil Jalapeño inference ASIC to bypass NVIDIA GPU dependency
OpenAI and Broadcom launch Jalapeño, a custom ASIC for LLM inference, achieving tape-out in 9 months. OpenAI designs architecture, Broadcom provides networking, Celestica handles integration. Planned for large-scale deployment by end-2026 with gigawatt-scale datacenters, aiming to cut inference costs and reduce NVIDIA dependency.
Oracle Defense Ecosystem Cohort 3: Offline AI on Roving Edge Devices Goes Operational
Oracle announced the third cohort of its Defense Ecosystem at the Brussels summit, adding 10 companies. Concurrently, Whitespace's Saga AI system deployed on Oracle Roving Edge Devices during Royal Navy's Operation HIGHMAST, running classified AI workloads completely offline, proving sovereign edge AI is operational.
OpenAI and Broadcom Unveil Jalapeno Inference ASIC, Reshaping AI Hardware Landscape
OpenAI, in collaboration with Broadcom, has developed Jalapeno, a custom LLM inference accelerator. The chip uses a multi-chip module with HBM3E memory and achieved tape-out in just nine months. Designed for OpenAI's model stack, it aims to reduce inference costs and dependency on NVIDIA GPUs, with initial deployment planned for late 2026.
OpenAI GPT-5.6: 1.5M Context Window, Digital Employee Push, Price War on Anthropic
OpenAI is launching GPT-5.6 with a 1.5M token context window, 10-15% token efficiency improvement, and pricing at 1/3 of Claude Fable 5. The model pivots to digital employee roles via agentic workflows, code generation, and Playwright automation, directly targeting Anthropic's stalled Fable 5 user base.
OpenAI GPT-5.6 Aggressive Pricing and 1.5M Context Window Targets Agent Era
OpenAI reportedly launches GPT-5.6 with 1.5M token context window, aggressive pricing at one-third of Claude Fable 5, and improved agent reliability. This move capitalizes on Anthropic's forced downtime and addresses internal alignment issues.
OpenAI buys Ona: Control point shifts to persistent AI agent runtime
OpenAI acquires cloud infrastructure startup Ona to integrate its persistent execution environment into Codex, enabling AI agents to run independently for hours or days in enterprise-owned clouds. This addresses security, governance, and audit requirements, signaling OpenAI's shift from model provider to full-stack AI platform.
OpenAI Faces Multi-State AG Probe: Pre-IPO Regulatory Wave Redefines AI Compliance
OpenAI faces multi-state AG investigations ahead of its IPO, targeting consumer protection, data management, minors' safety, and sensitive info handling. This forces the AI industry to overhaul compliance standards, pushing enterprises to reassess data sovereignty and legal exposure.
OpenAI IPO Super-App Pivot: GPT-5.6, Ads Expansion, and Ecosystem Lock-in Risks
OpenAI files IPO, planning to transform ChatGPT into a super-app with coding tools, AI agents, and ads. GPT-5.6 will support 1.5M token context window, while API pricing drops to compete. This marks a shift from model provider to platform ecosystem, raising lock-in concerns for enterprises.
OpenAI Pivots to Codex: From Chatbot to Agentic Control Plane for Enterprise Automation
OpenAI plans its biggest ChatGPT overhaul, integrating Codex, AI agents, and third-party apps into a super-app. This marks a strategic pivot from a Q&A chatbot to an agentic execution platform, with Codex as the new control plane, aiming to boost enterprise monetization and counter Anthropic's competitive threat.
OpenAI Releases GPT-5.5 Instant with 52.5% Hallucination Reduction as New ChatGPT Default
<p>OpenAI released GPT-5.5 Instant, replacing GPT-5.3 Instant as the default ChatGPT model. Hallucination rate in high-risk domains dropped 52.5%, AIME 2025 math score 81.2 (vs 65.4 prior), GPQA 85.6 (vs 78.5). Response length reduced 30.2%. New "memory sources" feature lets users see which conversations/files/Gmail the model referenced. First Instant model flagged as High Capability (cybersecurity/biochemical domains). Available via chat-latest API.</p>
OpenAI Releases GPT-5.5 Instant with 52.5% Hallucination Reduction as New ChatGPT Default
<p>OpenAI released GPT-5.5 Instant, replacing GPT-5.3 Instant as the default ChatGPT model. Hallucination rate in high-risk domains dropped 52.5%, AIME 2025 math score 81.2 (vs 65.4 prior), GPQA 85.6 (vs 78.5). Response length reduced 30.2%. New "memory sources" feature lets users see which conversations/files/Gmail the model referenced. First Instant model flagged as High Capability (cybersecurity/biochemical domains). Available via chat-latest API.</p>
OpenAI-Microsoft Restructure: End of Exclusive AI-Cloud Era
This deal's end is an inevitable result of Anthropic's competitive pressure. What OpenAI lost is not just Azure's exclusive distribution but also the enterprise trust endorsement from the 'Microsoft ecosystem'. For the industry, the matrix of three major model vendors (OpenAI, Anthropic, Google) + three cloud vendors (AWS, Azure, GCP) is forming, shifting competition from '渠道为王' to 'model capability as king'.
Cerebras Launches IPO with $20B OpenAI Deal
AI chipmaker Cerebras filed for US IPO on Nasdaq with ticker CBRS; secured $20B multi-year deal with OpenAI to deploy 750MW of chips.
OpenAI Closes $122B Largest Private Funding Round
OpenAI closed the largest private funding round of $122 billion, co-led by Amazon, NVIDIA, and SoftBank, with post-money valuation reaching $852 billion. Top-tier investors participated, marking AI competition entry into state capital-level arms race.
OpenAI Secures $122B Funding for Global AI Infrastructure Expansion
OpenAI has raised $122 billion to expand frontier AI capabilities globally, invest in next-generation compute infrastructure, and meet growing demand for ChatGPT, Codex and enterprise AI solutions. This record funding will significantly scale up its AI training clusters and inference infrastructure.
STADLER Deploys ChatGPT at Scale to Optimize Knowledge Workflows
STADLER has deployed ChatGPT for its 650 employees, focusing on unstructured knowledge tasks processing, marking the expansion of generative AI from external customer service to internal operational efficiency optimization.