Your curated collection of saved posts and media
We are looking for excellent people to help build our vertically integrated AI stack. Numerics, quantization, HW simulators, compiler, runtime, kernel performance, RTL, verification, emulation, DFT, physical design, post Si bringup. Join us at Tesla!
Codexについて @seratch_ja さんに先週インタビューする機会があったので、その話をベースに、Codexの最近の状況をまとめてみました。基本のところからハーネスエンジニアリングのさわりまで入っています。また、直近で事例が増えた感じの「Codex Use Cases」の紹介も後半のコラムで触れておきました。
『週間アクティブユーザー300万人にのぼるCodex、OpenAI Japanの瀬良氏に聞く「開発スタイル」の変化』by @k_taka 公開 https://t.co/dbOThSVKl0
You can now bring your Cloudflare Sandbox to use with the @OpenAI AgentsSDK Click "Deploy to Cloudflare" enter in your keys and you're good to go! -https://t.co/yFv5M4Xl1f
Build long-running agents with more control over agent execution. New capabilities in the Agents SDK: • Run agents in controlled sandboxes • Inspect and customize the open-source harness • Control when memories are created and where they’re stored https://t.co/zPyuLup6b6
There's a broadly held misconception in AI that methods that scale well are simple methods -- even, that simple methods usually scale. This is completely wrong. Pretty much none of the truly simple methods in ML scale well. SVM, kNN, random forests are some of the simplest methods out there, and they don't scale at all. Meanwhile "train a transformer via backprop and gradient descent" is a very high-entropy method, easily 10x more complex than random forest fitting. But it scales very well. Further, given a simple method that doesn't scale, it is usually the case that you alter it to make it scale by adding a lot of complication. For instance, take a simple a simple combinatorial search-based method (not scalable at all) -- you can make it scale by adding deep learning guidance (which blows up complexity). Scalability usually belongs to high-entropy, complex systems.
I looked at their prompts, It's complete bs They are literally providing all of the insight to the LLM upfront > Are there any security vulnerabilities in this code? Consider the behavior of the SEQ_LT/SEQ_GT macros with sequence number wraparound. If you find issues, explain how an attacker might trigger them. They are providing ALL required facts to the LLM, and they only ask the LLM to connect the dots The real challenge for LLMs would be to get those insights first THAT IS THE WHOLE CHALLENGE IN CYBERSECURITY; TO HAVE DEEP INSIGHT This test proves nothing; don't make any conclusions about OSS models being good for security based on this
New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagge
OpenAI x e2b: Build your agents with the new OpenAI Agents SDK, powered by @E2B Sandboxes. Excited to support @OpenAI as a launch partner! https://t.co/RsSw1HsF86
Is there somewhere a collection of the best agent/coding harnesses for each models, especially open-source and local ones? In my opinion, the biggest reason why people are struggling with open/local models these days is that the agent/coding harnesses in most open agent are not designed for them and expect it to magically work when they switch models from the default.
Thanks @_akhaliq sharing our work! Can frontier Multimodal Agents play games as well as humans? 🤩We are excited to introduce 🎮GameWorld: towards standardized and verfiable evaluation for multimodal game agents. 🕹️ 34 browser games 📌 170 tasks 🤖 18 multimodal agent baselines, covering 1. Computer-use (CUA) agents 👉 raw keyboard + mouse actions 2. Generalist multimodal agents 👉 semantic action parsinga GameWorld show that even sota agents still perform far below novice human players. 📹Watch our live runs: https://t.co/wrhKJD9JVx 🌐project page: https://t.co/J906LQ6Sfj 💻github: https://t.co/W1vL99MDg5 work done with @OuyyyangMingyu @who_s_yuan Hwee Tou Ng, @MikeShou1
GameWorld Towards Standardized and Verifiable Evaluation of Multimodal Game Agents paper: https://t.co/IfbTgfNnSM https://t.co/gL3BURxzkV
Hermes hands on guide: concepts, core mechanisms, and real world use cases. https://t.co/m6bTIbPOck
What if virtual humans could see, think, and act in 3D worlds like us?! We present Visually-Grounded Humanoid Agents 🎉 Our agents 👀perceive via RGB-D vision, 🧠plan with context-aware reasoning, and 🏃act with full-body motion in 3D scenes. Check 🔗 https://t.co/kd0zCu7W2h https://t.co/VQZnjE2Pnx