🤖 Together AI ships serverless inference, Sakana AI & NVIDIA speed sparse models
Audio in Mandarin Chinese · English transcript below
⚡ Serverless models at cabbage prices, sparse LLMs 20% faster, AI toy lawyer fumble, subtitle glasses go real-time cheat, SIGGRAPH films spam generative skills—rush-hour subway edition.
Today's Top 3 Headlines
- AI Industry News
🤖 Together AI serverless: DeepSeek-V4-Pro 512k ctx, $2.1/M in, $8.4/M out
Together AI docs reveal serverless inference now hosts 20+ LLMs incl DeepSeek-V4-Pro: $2.1/M in, $4.4/M out, cached in $0.2/M. Devs can run LLMs with zero reserved compute and no minimum spend—perfect for prototyping and low-traffic apps.
Source ↗ - AI Technology
🤖 1B-LLM 20% faster: Sakana AI×NVIDIA debut TwELL format
Sakana AI & NVIDIA unveil TwELL sparse format + custom CUDA kernels, cutting LLM inference >20% at 1B scale while slashing peak RAM and energy. Devs can now run bigger models or higher concurrency on the same silicon, unlocking greener compute for edge deployment.
Source ↗ - AI Industry News
🤖 Overconfident Agent? Intent-Chaos Test Exposes Flaws Pre-Launch
VentureBeat: 2026 enterprise rollout halted 4h after an observability Agent mis-flagged a batch job and triggered rollback—LLM non-determinism plus multi-Agent “poison inputs.” Intent-driven chaos testing shifts the metric from “task done” to “intent preserved.”
Source ↗
+9 more headlines
- 🤖 Palantir CEO: AI is product & target; biz may be replaced by LLM
- 🤖 Dyson 360 Vis Nav drops $919 → $279.99
- 🤖 1,500 firms race into AI kids’ toys; cheap rush sparks content chaos, regulators called
- 🤖 WIRED crowns Even Realities G2 2026’s best real-time caption glasses
- 🤖 SIGGRAPH 2026 drops first slate: Mandalorian, Avatar 3, Toy Story 5 VFX wizardry
- 🤖 Redis dad ships ds4, DeepSeek V4 inference rockets on Apple Silicon
- 🤖 19 hottest EVs at 2026 Beijing Auto Show: China leads with electric+AI
- 🤖 9 2026 specs drive AI tools: AWS Kiro, BMAD, GSD lead
- 🤖 Palantir £600M UK win: history grad lobbyist Moseley
