Filter

×
Active Filters Clear All
Keyword: TPU ×
79 Total Reports
4/4 Page
Google Other High Signal 2026-04-03

Google Introduces Flex and Priority Inference Tiers for Gemini API

Google adds Flex and Priority service tiers to its Gemini API. Flex is a cost-optimized tier offering a 50% price reduction for latency-tolerant workloads via a synchronous interface. Priority is a high-reliability tier ensuring critical requests are not preempted during peak loads. This provides developers a unified way to balance cost and reliability based on AI task types, such as background agentic workflows versus interactive applications.

Google Other High Signal 2026-04-03

Google Launches Gemma 4 Open Models, Targeting Edge Inference and AI Agent Architecture

Google introduces the Gemma 4 open model family, with four sizes from 2B to 31B parameters, emphasizing breakthrough intelligence-per-parameter and native support for agentic workflows, multimodality, and long context. The small models are engineered for edge devices, aiming to bring frontier reasoning to mobile and IoT scenarios.

Google Other Medium Signal 2026-04-03

Google Introduces Flex and Priority Tiers for Gemini API

Google adds Flex and Priority service tiers to Gemini API, enabling developers to optimize cost and reliability through a single interface. Flex offers 50% cost savings for latency-tolerant workloads, while Priority ensures highest reliability for critical apps. This change simplifies management of synchronous/asynchronous tasks in AI agent architectures.

Google Other Medium Signal 2026-04-03

Google Launches Gemma 4 Open Model Family

Google introduces Gemma 4 open model family with four size variants, optimized for edge and mobile devices. The series supports multimodal processing, long context windows and 140+ languages under Apache 2.0 license.

AMD Other High Signal 2026-04-02

AMD Announces Breakthrough MLPerf Inference 6.0 Results, Showcasing Multinode Scaling and Multimodal Capabilities

AMD's MLPerf Inference 6.0 submission, powered by Instinct MI355X GPUs, surpassed 1 million tokens per second for the first time on models like Llama 2 70B and GPT-OSS-120B. The results highlight efficient multinode scaling, rapid enablement of new workloads (e.g., text-to-video model Wan-2.2-t2v), and reproducible performance across a broad partner ecosystem.

Cisco Other Medium Signal 2026-04-02

Cisco Joins NIST GenAI Trust Program

Cisco participates in NIST's GenAI Trust Program, focusing on measurable AI trust evaluation frameworks including adversarial testing (Cat-and-Mouse) and code generation challenges to verify AI output reliability.

Cisco Other High Signal 2026-04-02

Cisco Discloses Memory Poisoning Attack Method in AI Coding Assistants

Cisco's security team discovered and validated a persistent memory poisoning attack method targeting AI coding assistants like Claude Code, demonstrating how tampering with MEMORY.md system files can persistently manipulate AI behavior. This vulnerability prompted Anthropic to remove user memory files' system prompt privileges in v2.1.50.

NVIDIA Other High Signal 2026-03-25

NVIDIA Demonstrates AI Factories as Flexible Grid Assets for Peak Demand Management

NVIDIA, in collaboration with EPRI, National Grid, and Emerald AI, demonstrated how AI factories powered by Blackwell GPU clusters can dynamically adjust power consumption in response to grid signals. This allows them to act as 'shock absorbers' during peak demand while maintaining performance for high-priority AI workloads.

Cisco Other High Signal 2026-03-23

Cisco Extends Zero Trust Security to AI Agent Ecosystem

At RSA 2026, Cisco introduced security innovations for AI agents, extending Zero Trust Access with agent discovery in Identity Intelligence, agentic IAM in Duo, and MCP enforcement in Secure Access SSE. It launched AI Defense: Explorer Edition for self-serve testing and DefenseClaw open source framework to automate security deployment.

NVIDIA Other 2026-03-17

Project Rheo: NVIDIA Shifts Robot Training Control from Real Hospitals to Simulation

NVIDIA unveils Project Rheo, a blueprint combining Isaac Sim, GR00T VLA models, and synthetic data generation for hospital robotics. Developers train Physical AI policies in digital twins—loco-manipulation (surgical tray pick-and-place) and precision bimanual tasks (trocar assembly)—with Cosmos Transfer 2.5 for cross-scene generalization.

NVIDIA Other High Signal 2026-03-14

NVIDIA Releases Cosmos World Model Suite, Enhancing Synthetic Data and Reasoning for Physical AI

NVIDIA has released significant updates to its Cosmos World Foundation Models (WFM) suite, including Transfer 2.5, Predict 2.5, and Reason 2. These models are designed to accelerate the generation of high-fidelity, physics-aware synthetic data and support downstream fine-tuning and reasoning for physical AI systems like robotics and autonomous vehicles, addressing the bottleneck of real-world data scarcity.

NVIDIA Other 2026-03-13

NVIDIA Warp: Differentiable Physics Simulation for AI Training on GPU

NVIDIA Warp is a framework for GPU-accelerated, differentiable physics simulation. It enables writing high-performance kernels in Python, with automatic differentiation, and integrates with PyTorch/JAX. The 2D Navier-Stokes example demonstrates end-to-end optimization, reducing the cost of generating training data for physics AI.

Trend Micro Other High Signal 2026-03-03

Trend Micro Report Highlights AI Supply Chain Risks and Model Attack Surfaces

Trend Micro's 'Fault Lines in the AI Ecosystem' report systematically analyzes security risks in the AI supply chain, including training data poisoning, third-party plugin vulnerabilities, and model theft attacks. It indicates that enterprise AI security boundaries have expanded from traditional IT infrastructure to the model layer and data pipelines.

NVIDIA Other 2026-01-23

NVFP4 + TeaCache Drive 10x FLUX.2 Inference Speedup, Locking Blackwell Ecosystem

NVIDIA and BFL optimize FLUX.2 on DGX B200/B300 using NVFP4 4-bit quantization, TeaCache step skipping, CUDA Graphs, and torch.compile, achieving 6.3x (single GPU) to 10.2x (dual GPU) latency reduction vs H200, with 40% memory savings. The stack is tightly coupled to TensorRT-LLM visualgen and Blackwell hardware.

OpenAI Other 2026-01-21

OpenAI provides video generation infrastructure to Higgsfield via GPT-4.1, GPT-5, and Sora 2 model stack

OpenAI showcased in its developer blog how the third-party app Higgsfield leverages its combined GPT-4.1, GPT-5, and Sora 2 models to transform simple inputs into high-quality social videos. This demonstrates OpenAI's strategy of positioning its multimodal models as core components of external AI inference infrastructure.

OpenAI Other Medium Signal 2025-12-18

OpenAI Releases Chain-of-Thought Monitorability Framework

OpenAI introduces a new chain-of-thought monitoring evaluation suite with 13 metrics across 24 test environments. Research shows monitoring model's internal reasoning is more effective than output-only monitoring, offering new approach for scalable AI control.

NVIDIA Other 2025-11-08

NVIDIA Launches Interactive AI Agent for GPU-Accelerated Data Science with Nemotron Nano-9B

NVIDIA unveils an interactive AI agent powered by Nemotron Nano-9B-v2 and CUDA-X libraries, enabling natural language orchestration of ML workflows. It achieves 3x-43x GPU acceleration over CPU for data processing, model training, and hyperparameter optimization.

NVIDIA Other Medium Signal 2025-10-22

NVIDIA Publishes Tutorial for Converting Lightweight LLM into Terminal AI Agent

NVIDIA released a developer tutorial guiding users to build an AI agent that understands natural language and executes Bash commands, using its open-source Nemotron Nano v2 model within roughly 200 lines of Python code. The tutorial emphasizes building from scratch and simplifying with LangGraph, focusing on safe tool calling and human-in-the-loop control.

Trend Micro Other High Signal 2020-06-01

Trend Micro Exposes Azure DNS Design Flaw Enabling Cloud Infrastructure Takeover

Trend Micro's TrendAI™ research team disclosed a security vulnerability "by design" in the Azure cloud platform. DNS records of deleted Azure resources may persist, allowing attackers to exploit these lingering DNS names to hijack trusted endpoints and compromise dependent systems, highlighting a critical but often overlooked trust inheritance risk in cloud infrastructure.