Your curated collection of saved posts and media
Another Hemingway-bench prompt asks for an oral presentation about time management. GPT-5.2 writes like a LinkedIn engagement farm: "When people hear βworking from home,β they often think it means more freedom, more comfort, and maybe even more free time. And sometimes thatβs true. But what doesnβt get talked about enough is how easily work-from-home life can get messy if you donβt manage your time well." (π₯±) Opus 4.6 feels like a charismatic creative working the room: "So... raise your hand if you've ever "worked from home" and somehow ended up four hours into a Netflix series at 2 PM on a Tuesday. No judgment. We've all been there."
Overall: GPT-5.2 feels like a mass market writer; Opus has personality and soul. See the updated leaderboard here! https://t.co/LNah0H0gBy
* for context I'm fine-tuning VLMs at the moment doing lora a rank sweep ablation. https://t.co/t4UzsvritN
New research on improving self-reflection in language agents. A core problem with agent self-reflection is that models tend to generate repetitive reflections that add noise instead of signal, hurting overall reasoning performance. It introduces ParamMem, a parametric memory module that encodes cross-sample reflection patterns directly into model parameters, then uses temperature-controlled sampling to generate diverse reflections at inference time. ParamMem shows consistent improvements over SOTA baselines across code generation, mathematical reasoning, and multi-hop QA. It also enables weak-to-strong transfer and self-improvement without needing a stronger external model, making it a practical upgrade for agentic pipelines. Paper: https://t.co/16Yp56j8Jm Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Theory of Mind in Multi-agent LLM Systems. A good read for anyone building systems where agents need to model each other's beliefs to coordinate effectively. This work introduces a multi-agent architecture combining Theory of Mind, Belief-Desire-Intention models, and symbolic solvers for logical verification, then evaluates how these cognitive mechanisms affect collaborative decision-making across multiple LLMs. The results reveal a complex interdependency where cognitive mechanisms like ToM don't automatically improve coordination. Their effectiveness depends heavily on underlying LLM capabilities. Knowing when and how to add these mechanisms is key to building reliable multi-agent systems. Paper: https://t.co/8ASbUgzGjF Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
Banger CLI tool released by Google. CLI for Google Workspace + a bunch of useful Agent Skills to go with it. We had a few unofficial ones floating around, so it's nice to finally see an official one. Testing it already. https://t.co/jDWw45P4oA
Don't overload your AGENTS dot md files. Keep them brief. GitHub's analysis of 2,500+ repos found what makes AGENTS dot md files work: - provide the agent a specific job or persona - exact commands to run - well-defined boundaries to follow - clear examples of good outputs https://t.co/GFSEjLsx8C
Introducing Faster Qwen3TTS! Realistic voice generation at 4x real time: - Same amazing voice quality from Qwen's model - Streaming support with <200 ms to first audio - 5x faster than the official implementation Just pip install faster-qwen3-tts Try the demo! https://t.co/Dcf9jNXz8g
Introducing Faster Qwen3TTS! Realistic voice generation at 4x real time: - Same amazing voice quality from Qwen's model - Streaming support with <200 ms to first audio - 5x faster than the official implementation Just pip install faster-qwen3-tts Try the demo! https://t.co/Dcf9jNXz8g
π¨ Anthropic just open-sourced the exact Skills library their own engineers use internally. Stop building Claude workflows from scratch. These are plug-and-play components that work across Claude Code, API, SDK, and VS Code copy once, deploy everywhere. What's inside: β Excel + PowerPoint generation out of the box β File handling and document workflows β MCP-ready subagent building blocks β Pre-built patterns for multi-step automation β Production templates you'd normally spend weeks writing The old way: re-explain your workflow every single chat. The new way: build a Skill once, Claude never forgets how you work. 100% Open Source. Official Anthropic release. Repo: https://t.co/XNx3i4yNy6

In the next version of Claude Code.. We're introducing two new Skills: /simplify and /batch. I have been using both daily, and am excited to share them with everyone. Combined, these kills automate much of the work it used to take to (1) shepherd a pull request to production and (2) perform straightforward, parallelizable code migrations.
In the next version of Claude Code.. We're introducing two new Skills: /simplify and /batch. I have been using both daily, and am excited to share them with everyone. Combined, these kills automate much of the work it used to take to (1) shepherd a pull request to production and (2) perform straightforward, parallelizable code migrations.
New research from Georgia Tech and Microsoft Research. GUI agents today are reactive. Every step costs an LLM call, which is why a lot of GUI agents are expensive, slow, and fragile. This new research introduces ActionEngine, a framework that shifts GUI agents from reactive execution to programmatic planning. A Crawling Agent explores the application offline and builds a state-machine graph of the interface. Nodes are page states, edges are actions. Then at runtime, an Execution Agent uses this graph to synthesize a complete Python program in a single LLM call. Instead of O(N) vision model calls per task, you get O(1) planning cost. On Reddit tasks from WebArena, ActionEngine achieves 95% task success with, on average, a single LLM call, compared to 66% for the strongest vision-only baseline. Cost drops by 11.8x. Latency drops by 2x. If the pre-planned script fails at runtime, a vision-based fallback repairs the action and updates the memory graph for future runs. Why does it matter? Treating GUI interaction as graph traversal rather than step-by-step probabilistic reasoning is a compelling direction for making agents both faster and more reliable. Paper: https://t.co/UR0PjvFf0c Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

How can graphs improve coding agents? Multi-agent systems can boost code generation, but fixed interaction topologies don't adapt to task difficulty. This research introduces AgentConductor, a system where an orchestrator agent uses RL to dynamically generate task-adapted interaction topologies based on inferred agent roles and difficulty levels. A topological density function that captures communication-aware characterizations of multi-agent interactions, plus difficulty interval partitioning that prevents excessive pruning and provides precise topology control. Across five code datasets, AgentConductor achieves up to 14.6% improvement in pass@1 accuracy while reducing density by 13% and token costs by 68%. The great benefit of this approach is better performance with lower costs. Dynamic agent coordination is more efficient than static workflows for complex code generation. Paper: https://t.co/BypJZfU49q Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

The economics of model training are such that the labs need to release their big models widely, they cannot generate returns from always holding back their best models so one customer uses them. Fine tuning & specialized SLMs are useful, but they don't expand the ability frontier
@FrankieIsLost This diagram by @trychroma shows how accuracy crashes past ~5K tokens, dropping below 50/50. Let that sink in: you might need ~50 attempts to get the same result (if it exists). If not, you could be heading toward 100 tries with zero chance of success. https://t.co/qG2vWoAQBo https://t.co/OuSMrnUL3q

@jdegoes LLMs are basically zip files for patterns in data. There will always be code snippets outside what theyβve compressed. People imagine full coverage of programming languages, but thatβs not realistic. Coverage is patchy and drops fast once you leave popular languages or common paths. The solution? RAG. But even then it may fail if the ecosystem, docs, or training signal just isnβt there. #UATFallacy #CoverageGap #KeyholePrinciple
Introducing WorkBuddy, Tencent's AI native desktop agent for multi-type tasks. Handle non-technical tasks effortlessly using built-in skill templates for coding, documentation, research, data analysis, and automation. No projects setup required. One minutes to connect with IM like Wecom (WeChat for Work). Plan. Execute. Review. Deliver.
OpenClaw 2026.3.8 π¦ π ACP provenance β your agent finally knows who's talking to it πΎ openclaw backup β because YOLO deploys need a safety net π± Telegram dupes killed π‘οΈ 12+ security fixes We fixed more things than we broke. Progress. https://t.co/ahq26lABw3
OpenClaw 2026.3.8 π¦ π ACP provenance β your agent finally knows who's talking to it πΎ openclaw backup β because YOLO deploys need a safety net π± Telegram dupes killed π‘οΈ 12+ security fixes We fixed more things than we broke. Progress. https://t.co/ahq26lABw3
Created close reading notebooks for almost every lesson of @jeremyphoward's fastai deep learning course (it's more than a course) Close reading is a technique for reading out of text, not into. Use a LLM, and you're in flow state for longerβyou ask right there, with all context. https://t.co/Wr1sWs40Tl
In the API, use image detail: original to unlock our biggest vision and CUA gains! https://t.co/WE1cRKzHtN
Still no Claude Cowork competitor from any other lab yet. On one hand, its been six weeks. On the other, its been six weeks for companies that say that all their code is being written for them by AI.
Building deep research agents is pretty fun with @pydantic AI, doing a workshop this weekend with @hugobowne :) https://t.co/y4qVNSjOC4