Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
S
SkalskiP
@skalskip92
๐Ÿ“…
Dec 06, 2024
513d ago
๐Ÿ†”82251340
โญ0.86

PaliGemma2 for image to JSON data extraction - used google/paligemma2-3b-pt-336 checkpoint; I tried to make it happen with 224, but 336 performed a lot better - trained on A100 with 40GB VRAM - trained with LoRA colab with complete fine-tuning code: https://t.co/M1lbYXQUg6 https://t.co/DHNHGePaqM

Media 1Media 2
โค๏ธ749
likes
๐Ÿ”95
retweets
๐Ÿ–ผ๏ธ Media
I
Tanishq Mathew Abraham, Ph.D.
@iScienceLuvr
๐Ÿ“…
Dec 06, 2024
513d ago
๐Ÿ†”75993431
โญ1.00

NVILA: Efficient Frontier Visual Language Models abs: https://t.co/4lk7WHWwYr NVIDIA introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy. Model arch focuses on scaling up spatial and temporal resolutions, and then compressing visual tokens, allowing for efficient processing of high resolutions. Also uses "DeltaLoss" data pruning and FP8 training. Competitive with proprietary VLMs on visual understanding benchmarks.

Media 1
โค๏ธ360
likes
๐Ÿ”87
retweets
๐Ÿ–ผ๏ธ Media๐Ÿ”— Links
O
elvis
@omarsar0
๐Ÿ“…
Dec 05, 2024
514d ago
๐Ÿ†”80153396

OpenAI o1 System Card https://t.co/M1kTwoDV6h

Media 1
โค๏ธ253
likes
๐Ÿ”31
retweets
๐Ÿ–ผ๏ธ Media
O
elvis
@omarsar0
๐Ÿ“…
Dec 05, 2024
514d ago
๐Ÿ†”61718424

Nice set of tips for mitigating AI hallucinations. https://t.co/k0oS0cXlF3

Media 1
โค๏ธ596
likes
๐Ÿ”108
retweets
๐Ÿ–ผ๏ธ Media
_
AK
@_akhaliq
๐Ÿ“…
Dec 05, 2024
514d ago
๐Ÿ†”45326765

Google presents PaliGemma 2 A Family of Versatile VLMs for Transfer https://t.co/9cdEqFxYFc

Media 1
โค๏ธ356
likes
๐Ÿ”56
retweets
๐Ÿ–ผ๏ธ Media
E
Ethan Mollick
@emollick
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”94576092
โญ0.95

Wait, how did I miss this? https://t.co/UTvXUdBBLX

@g_leech_ โ€ข

New lifeform identified in the human gut. Function and symbiosis unknown. https://t.co/AQZL3joKuI

Media 1
โค๏ธ201
likes
๐Ÿ”22
retweets
๐Ÿ–ผ๏ธ Media
O
elvis
@omarsar0
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”30727465
โญ1.00

๐Ÿฆ† Docling reaches 12.3Kโญ๏ธ There are now a few parsers for LLMs out there but this is one of the most popular. Supports PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown. Other features include advanced PDF understanding, integrations, OCR for scanned PDFs, and even a CLI. The big feature request I am seeing is support for other types of information like code and math equations. That's coming soon!

Media 1
โค๏ธ239
likes
๐Ÿ”29
retweets
๐Ÿ–ผ๏ธ Media
V
Varun Sharma
@varunconfirms
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”64213509

Your company is alive because of its customers. You have a job because of your customers. Customers matter. The Voice of Customer matters. Enter @enterpret_ai https://t.co/fgW7DGwZgR

Media 1
โค๏ธ101
likes
๐Ÿ”18
retweets
๐Ÿ–ผ๏ธ Media
C
Chip Huyen
@chipro
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”11065035
โญ0.91

Itโ€™s done! 150,000 words, 200+ illustrations, 250 footnotes, and over 1200 reference links. My editor just told me the manuscript has been sent to the printers. - The ebook will be coming out later this week. - Paperback copies should be available in a few weeks (hopefully before the end of the year). Preorder: https://t.co/kZVAEDQcMo - The full manuscript is also accessible on O'Reilly platform: https://t.co/P7GkBTKH7H This wouldnโ€™t have been possible without the help of so many people who reviewed the early drafts, answered my thousands of questions, introduced me to fascinating use cases, or helped me see the beauty of overlooked techniques. Thank you everyone for making this happen!

Media 1
โค๏ธ5,791
likes
๐Ÿ”604
retweets
๐Ÿ–ผ๏ธ Media
I
Tanishq Mathew Abraham, Ph.D.
@iScienceLuvr
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”50808049
โญ0.81

"Itโ€™s clear to me that if you zoom out way into the future, and you look back and ask what Appleโ€™s biggest contribution was, it will be in the health area." "But we have research going on." (regarding AI for health) - Tim Cook Great to see that Apple is all in on healthtech! https://t.co/h0dQlUpBsg

Media 1
โค๏ธ12
likes
๐Ÿ”1
retweets
๐Ÿ–ผ๏ธ Media
V
Vik Paruchuri
@VikParuchuri
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”03852598
โญ0.76

I'm excited to launch the Datalab API! This builds on my last year of work with marker and surya (30k Github stars). - PDF-> markdown, OCR, layout analysis, table recognition - 15s for 250 page pdf -> markdown - 99.99% uptime https://t.co/sa7mHmdiZd

Media 1
โค๏ธ714
likes
๐Ÿ”66
retweets
๐Ÿ–ผ๏ธ Media
L
LlamaIndex ๐Ÿฆ™
@llama_index
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”95327365

Implement super-fast RAG using LlamaIndex Workflows and Groq ๐Ÿš€ Learn how to build a powerful Retrieval-Augmented Generation system with our Workflows feature, including a comparison to alternatives like LangGraph: โžก๏ธ Create an event-driven architecture for complex AI applications โžก๏ธ Integrate Groq's high-performance LLMs for reranking and response synthesis โžก๏ธ Visualize your workflow for better understanding and debugging Step-by-step guide covers: โ€ข Data indexing and retrieval โ€ข LLM-based reranking โ€ข Response synthesis using CompactAndRefine Read the full tutorial: https://t.co/XFSmYMxJld

Media 1
โค๏ธ34
likes
๐Ÿ”8
retweets
๐Ÿ–ผ๏ธ Media
A
AshutoshShrivastava
@ai_for_success
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”33903628
โญ0.86

Genie 2: We now have Prompt-to-Game. Google DeepMind introduced Genie 2, a foundation world model capable of generating an endless variety of action-controllable, playable 3D environments for training and evaluating embodied agents. Based on a single prompt image. 9 examples from blog ๐Ÿ‘‡

โค๏ธ680
likes
๐Ÿ”110
retweets
๐Ÿ–ผ๏ธ Media
I
Ivan Leo
@ivanleomk
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”89946900

lmfao look at what @cursor_ai suggests when u type Grok ahahahah https://t.co/H6lYgJiOGc

Media 1
โค๏ธ3
likes
๐Ÿ–ผ๏ธ Media
O
elvis
@omarsar0
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”77152619
โญ1.00

DataLab: A Unified Platform for LLM-Powered Business Intelligence Introduces DataLab, a unified BI platform that integrates an LLM-based agent framework with an augmented computational notebook interface. DataLab achieves state-of-the-art performance on various BI tasks across popular research benchmarks. It also achieves up to a 58.58% increase in accuracy and a 61.65% reduction in token cost on enterprise-specific BI tasks.

Media 1
โค๏ธ509
likes
๐Ÿ”109
retweets
๐Ÿ–ผ๏ธ Media
L
LlamaIndex ๐Ÿฆ™
@llama_index
๐Ÿ“…
Dec 05, 2024
515d ago
๐Ÿ†”93362508

Over on LinkedIn, Hanane D continues her excellent series of posts on data analysis with a new explanation of how to perform financial analysis with the `aisuite` library and LlamaParse, check it out! https://t.co/CW5DgSKm25 https://t.co/L1JiYrQ2PX

Media 1
โค๏ธ36
likes
๐Ÿ”15
retweets
๐Ÿ–ผ๏ธ Media
S
Sawyer Merritt
@SawyerMerritt
๐Ÿ“…
Dec 05, 2024
515d ago
๐Ÿ†”69399143

Elon Musk's xAI plans to expand its Colossus Supercomputer in Memphis to house 1 million+ GPUs, the Greater Memphis Chamber said today. Colossus was already the largest Supercomputer in the world with 100k GPUs. Now Elon is about to spend tens of billions to make it 10x bigger๐Ÿ˜ณ https://t.co/6Domx9NfTu

Media 1
โค๏ธ7,923
likes
๐Ÿ”895
retweets
๐Ÿ–ผ๏ธ Media
J
jason liu
@jxnlco
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”25297839

talking to one of the creators of openai's swarm about pydantic ai https://t.co/sZyGxgsiwX

Media 1
โค๏ธ38
likes
๐Ÿ”3
retweets
๐Ÿ–ผ๏ธ Media
D
davidad ๐ŸŽ‡
@davidad
๐Ÿ“…
Thu Dec 05
๐Ÿ†”45611447
โญ0.61

@repligate ALERT. ALERT. CODE (D) https://t.co/0QVOPavdJ6

Media 1
โค๏ธ24
likes
๐Ÿ”2
retweets
๐Ÿ–ผ๏ธ Media
M
MHcommunicate #Mastodon๐Ÿ‘‰@mhcommunicate@social
@MHcommunicate
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”45758763

To protect, build market share, tech, service providers must ask when & how to integrate emerging tech. Stay competitive add GenAI to product roadmap: https://t.co/HRRCpIgnmJ by @Gartner_inc ๐Ÿ‘ˆ #GartnerHT #ProductManagement #Tech #GenAI @bimedotcom @Khulood_Almani https://t.co/2TxGfbTD1J

Media 1
โค๏ธ40
likes
๐Ÿ”24
retweets
๐Ÿ–ผ๏ธ Media
O
elvis
@omarsar0
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”70943163
โญ1.00

Composition of Experts Proposes Composition of Experts, an efficient modular compound AI system leveraging multiple expert LLMs. A router helps select the right expert for a given input, enabling efficient resource utilization and improving performance. While the general idea is not new, the two-step routing mechanism is interesting. I also like the focus on architecture flexibility and significantly reducing the cost of building compound AI systems.

Media 1
โค๏ธ112
likes
๐Ÿ”16
retweets
๐Ÿ–ผ๏ธ Media
E
Ethan Mollick
@emollick
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”20696607

Come on Amazon... https://t.co/uBDf1OTbnE

Media 1Media 2
โค๏ธ59
likes
๐Ÿ”4
retweets
๐Ÿ–ผ๏ธ Media
L
LlamaIndex ๐Ÿฆ™
@llama_index
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”19188331
โญ1.00

Learn how to build an intelligent legal document navigation system using multi-graph, multi-agent recursive retrieval! ๐Ÿง ๐Ÿ“„ This article demonstrates: ๐Ÿ” Creating document hierarchy and definition graphs ๐Ÿค– Implementing a multi-agent workflow for smart traversal ๐Ÿ“Š Leveraging https://t.co/wJqEPNOeu6, https://t.co/vJSQKOXBZO and LlamaIndex Key features: โ€ข Recursive retrieval of clauses and footnotes โ€ข Intelligent navigation through document hierarchy โ€ข Integration of legal definitions for context Read more: https://t.co/HzcnkZ7H4Y Or check out the repo: https://t.co/ebmXJSAuw6

Media 1
โค๏ธ190
likes
๐Ÿ”47
retweets
๐Ÿ–ผ๏ธ Media
K
Andrej Karpathy
@karpathy
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”35380613

The (true) story of development and inspiration behind the "attention" operator, the one in "Attention is All you Need" that introduced the Transformer. From personal email correspondence with the author @DBahdanau ~2 years ago, published here and now (with permission) following some fake news about how it was developed that circulated here over the last few days. Attention is a brilliant (data-dependent) weighted average operation. It is a form of global pooling, a reduction, communication. It is a way to aggregate relevant information from multiple nodes (tokens, image patches, or etc.). It is expressive, powerful, has plenty of parallelism, and is efficiently optimizable. Even the Multilayer Perceptron (MLP) can actually be almost re-written as Attention over data-indepedent weights (1st layer weights are the queries, 2nd layer weights are the values, the keys are just input, and softmax becomes elementwise, deleting the normalization). TLDR Attention is awesome and a *major* unlock in neural network architecture design. It's always been a little surprising to me that the paper "Attention is All You Need" gets ~100X more err ... attention... than the paper that actually introduced Attention ~3 years earlier, by Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: "Neural Machine Translation by Jointly Learning to Align and Translate". As the name suggests, the core contribution of the Attention is All You Need paper that introduced the Transformer neural net is deleting everything *except* Attention, and basically just stacking it in a ResNet with MLPs (which can also be seen as ~attention per the above). But I do think the Transformer paper stands on its own because it adds many additional amazing ideas bundled up all together at once - positional encodings, scaled attention, multi-headed attention, the isotropic simple design, etc. And the Transformer has imo stuck around basically in its 2017 form to this day ~7 years later, with relatively few and minor modifications, maybe with the exception better positional encoding schemes (RoPE and friends). Anyway, pasting the full email below, which also hints at why this operation is called "attention" in the first place - it comes from attending to words of a source sentence while emitting the words of the translation in a sequential manner, and was introduced as a term late in the process by Yoshua Bengio in place of RNNSearch (thank god? :D). It's also interesting that the design was inspired by a human cognitive process/strategy, of attending back and forth over some data sequentially. Lastly the story is quite interesting from the perspective of nature of progress, with similar ideas and formulations "in the air", with a particular mentions to the work of Alex Graves (NMT) and Jason Weston (Memory Networks) around that time. Thank you for the story @DBahdanau !

Media 1
โค๏ธ6,598
likes
๐Ÿ”1,002
retweets
๐Ÿ–ผ๏ธ Media
S
SkalskiP
@skalskip92
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”38388308

I missed VLMs I worked on other stuff for the past few months, but I'm back! I'm fine-tuning Florence2 to extract data from documents in JSON format. https://t.co/9Tco0F1HsL

Media 1
โค๏ธ667
likes
๐Ÿ”43
retweets
๐Ÿ–ผ๏ธ Media
E
Ethan Mollick
@emollick
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”79216618
โญ1.00

And then there were six or so Based on the stats, it looks like Amazon's Nova Pro is a competitive frontier model. This rounds out the GPT-4/Gen1 models: GPT-4o, Gemini 1.5, Claude 3.5, Grok 2, Llama 3.2 & maybe the three non-US models: Qwen, Yi & Mistral. Gen2 models up next? https://t.co/8s0CVL45Fy

Media 1
โค๏ธ215
likes
๐Ÿ”44
retweets
๐Ÿ–ผ๏ธ Media
E
Ethan Mollick
@emollick
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”62706864

This used to be a very hard problem, I have seen a number of startups try (and fail) to solve it over the years. Now it is suddenly pretty trivial with open tools (though I am not sure why Maria got a new haircut) https://t.co/FR6TAcfi5l

Media 1Media 2
โค๏ธ160
likes
๐Ÿ”13
retweets
๐Ÿ–ผ๏ธ Media
A
Artificial Analysis
@ArtificialAnlys
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”18030814

Amazon has launched Nova, a highly competitive family of foundation models. Nova Pro, Lite and Flash set new standards for the intelligence that can be accessed at the price and speed these models are offered at. Nova Pro, the flagship model, ranks amongst the leading frontier models in the Artificial Analysis Quality Index. With a score of 75, Pro ranks higher than GPT-4o (November release), Mistral Large 2 and Llama 3.1 405B. Access is priced competitively at $0.8/1M Input tokens and $3.2/1M output tokens, ~1/3 the cost of GPT-4o ($2.5/$10). Nova Lite and Micro are smaller and faster models that offer competitive intelligence for their price class. Micro can be accessed at 157 output tokens/s, faster than Gemini 1.5 Flash, Llama 3.1 8B (median of providers) and GPT-4o mini. Lite and Micro are competitively priced at $0.06 and $0.1 per 1M tokens respectively (blended 3:1, input:output) positioning them well for speed and/or price-sensitive use-cases. See below for deep dives on the performance and capabilities of these models.

Media 1
โค๏ธ454
likes
๐Ÿ”89
retweets
๐Ÿ–ผ๏ธ Media
L
Liorโšก
@LiorOnAI
๐Ÿ“…
Dec 02, 2024
517d ago
๐Ÿ†”86133244

@hume_ai just released a new voice modulation tool that lets you create unique AI voices in seconds. You can even use sliders to adjust voices along 10 dimensions including: - Relaxedness: from tense to relaxed. - Masculine/Feminine: from masculine to feminine. - Buoyancy: from deflated to buoyant. If you're not sure how to describe the exact qualities of a voice, the voice sliders let you experiment and tweak until you land on the perfect fit.

โค๏ธ50
likes
๐Ÿ”12
retweets
๐Ÿ–ผ๏ธ Media
A
abhishek
@abhi1thakur
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”44821140

https://t.co/iNLCwp0a60

Media 1
โค๏ธ69
likes
๐Ÿ”5
retweets
๐Ÿ–ผ๏ธ Media
I
Tanishq Mathew Abraham, Ph.D.
@iScienceLuvr
๐Ÿ“…
Dec 04, 2024
515d ago
๐Ÿ†”32385348
โญ0.86

Leading computer vision researchers Lucas Beyer (@giffmana), Alexander Kolesnikov (@__kolesnikov__), Xiaohua Zhai (@XiaohuaZhai) have left Google DeepMind to join OpenAI! They were behind recent SOTA vision approaches and open-source models like ViT, SigLIP, PaliGemma https://t.co/KmQVTBIitB

Media 1
โค๏ธ543
likes
๐Ÿ”48
retweets
๐Ÿ–ผ๏ธ Media
M
Mervin Praison
@MervinPraison
๐Ÿ“…
Dec 03, 2024
516d ago
๐Ÿ†”42389342

Pydantic AI Agents: NEW Multi-Agent Framework ๐Ÿ“š Step-by-step Hands-on Tutorial ๐Ÿ” How to Install PydanticAI? ๐Ÿ› ๏ธ What Models are Supported? Used @GroqInc ๐ŸŽฏ How to Create First Agent? โšก How to Add Custom Tools? ๐Ÿ”Œ Built-in validation ๐ŸŽฎ How it differs from CrewAI or AutoGen? @pydantic @samuel_colvin @jxnlco

โค๏ธ421
likes
๐Ÿ”69
retweets
๐Ÿ–ผ๏ธ Media