Your curated collection of saved posts and media
GLM-5.1 > Claude Code (Opus 4.6)? I'm tripping or CC has become very bad but built a Three.js racing game to eval and it's extremely impressive. Thoughts: - One-shot car physics with real drift mechanics (this is hard) - My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen - 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters - All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio! - Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed You are going to hear about this model a lot in the next months - open source let's go ๐๐
OpenAI x e2b: Build your agents with the new OpenAI Agents SDK, powered by @E2B Sandboxes. Excited to support @OpenAI as a launch partner! https://t.co/RsSw1HsF86
Won best edge AI at the @ycombinator and @innate_bot hackathon! We built a local VLM multi-rover orchestrator for Mars exploration. On-device navigation and automated fault detection & recovery across odometry, stereo vision, and lidar. Thanks for hosting, @ax_pey! https://t.co/GNkSNAMxRN
HY-World-2.0 Demo is now live on @huggingface Spaces for 3D world reconstruction and simulation with Gradio and Server modes. > 3D reconstruction, Gaussian splats > Camera poses, depth maps, normals > Rerun (multimodal data visualization) https://t.co/MofKZ6OGPX
This 15-minute talk by the creator of Pydantic on how to correctly use MCPs will teach you more about making your AI tools actually work together than everything you've scrolled past this year. Bookmark this & watch, no matter what. Then read the guide below by @eng_khairallah1
https://t.co/UCeC55qdhi
Codexใซใคใใฆ @seratch_ja ใใใซๅ ้ฑใคใณใฟใใฅใผใใๆฉไผใใใฃใใฎใงใใใฎ่ฉฑใใใผในใซใCodexใฎๆ่ฟใฎ็ถๆณใใพใจใใฆใฟใพใใใๅบๆฌใฎใจใใใใใใผใในใจใณใธใใขใชใณใฐใฎใใใใพใงๅ ฅใฃใฆใใพใใใพใใ็ด่ฟใงไบไพใๅขใใๆใใฎใCodex Use Casesใใฎ็ดนไปใๅพๅใฎใณใฉใ ใง่งฆใใฆใใใพใใใ
ใ้ฑ้ใขใฏใใฃใใฆใผใถใผ300ไธไบบใซใฎใผใCodexใOpenAI Japanใฎ็ฌ่ฏๆฐใซ่ใใ้็บในใฟใคใซใใฎๅคๅใby @k_taka ๅ ฌ้ https://t.co/dbOThSVKl0
Is there somewhere a collection of the best agent/coding harnesses for each models, especially open-source and local ones? In my opinion, the biggest reason why people are struggling with open/local models these days is that the agent/coding harnesses in most open agent are not designed for them and expect it to magically work when they switch models from the default.
I looked at their prompts, It's complete bs They are literally providing all of the insight to the LLM upfront > Are there any security vulnerabilities in this code? Consider the behavior of the SEQ_LT/SEQ_GT macros with sequence number wraparound. If you find issues, explain how an attacker might trigger them. They are providing ALL required facts to the LLM, and they only ask the LLM to connect the dots The real challenge for LLMs would be to get those insights first THAT IS THE WHOLE CHALLENGE IN CYBERSECURITY; TO HAVE DEEP INSIGHT This test proves nothing; don't make any conclusions about OSS models being good for security based on this
New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagge
We're delighted to announce that MiniMax M2.7 is now officially open source. With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%). You can find it on Hugging Face now. Enjoy!๐ค huggingface๏ผhttps://t.co/ApWrahIl3o Blog: https://t.co/gAxeFsNdW4 MiniMax API: https://t.co/1dgbMx0Q7K

๐ฅ DFlash x MLX is happening! Shoutout to @aryagm01 for the early work on this. We're building on the momentum. Native MLX support, more models (Qwen3.5), up to 4x faster. Lossless! ๐ https://t.co/wKcRoiaWZ3