dice.camp is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon server for RPG folks to hang out and talk. Not owned by a billionaire.

Administered by:

Server stats:

1.6K
active users

#aibenchmarks

0 posts0 participants0 posts today
Will Berard 🫳🎤🫶<p>They're only doing what Big Pharma does with clinical trials.</p><p>Meta, Amazon and Google accused of 'distorting' key AI rankings | New Scientist<br><a href="https://archive.ph/2025.05.01-152914/https://www.newscientist.com/article/2478521-meta-amazon-and-google-accused-of-distorting-key-ai-rankings/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">archive.ph/2025.05.01-152914/h</span><span class="invisible">ttps://www.newscientist.com/article/2478521-meta-amazon-and-google-accused-of-distorting-key-ai-rankings/</span></a></p><p><a href="https://mastodon.acm.org/tags/Meta" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Meta</span></a> <a href="https://mastodon.acm.org/tags/Amazon" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Amazon</span></a> <a href="https://mastodon.acm.org/tags/Google" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Google</span></a> <a href="https://mastodon.acm.org/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.acm.org/tags/AIBenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIBenchmarks</span></a></p>
LavX News<p>ChatGPT 4.1 vs. Google Gemini: A Benchmark Showdown in AI Performance</p><p>As OpenAI rolls out ChatGPT 4.1, initial benchmarks reveal that while it shows significant improvements over its predecessor, it still lags behind Google's Gemini models. This analysis dives into the ...</p><p><a href="https://news.lavx.hu/article/chatgpt-4-1-vs-google-gemini-a-benchmark-showdown-in-ai-performance" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.lavx.hu/article/chatgpt-4</span><span class="invisible">-1-vs-google-gemini-a-benchmark-showdown-in-ai-performance</span></a></p><p><a href="https://mastodon.cloud/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> <a href="https://mastodon.cloud/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://mastodon.cloud/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ChatGPT</span></a> <a href="https://mastodon.cloud/tags/AIbenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIbenchmarks</span></a> <a href="https://mastodon.cloud/tags/GoogleGemini" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GoogleGemini</span></a></p>
UK<p><a href="https://www.europesays.com/uk/8874/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="">europesays.com/uk/8874/</span><span class="invisible"></span></a> The rise of AI ‘reasoning’ models is making benchmarking more expensive <a href="https://pubeurope.com/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://pubeurope.com/tags/AiBenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AiBenchmarks</span></a> <a href="https://pubeurope.com/tags/AIReasoningModels" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIReasoningModels</span></a> <a href="https://pubeurope.com/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://pubeurope.com/tags/Technology" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Technology</span></a> <a href="https://pubeurope.com/tags/UK" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UK</span></a> <a href="https://pubeurope.com/tags/UnitedKingdom" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UnitedKingdom</span></a></p>
B166IR<p><a href="https://youtu.be/J4qwuCXyAcU?si=An7qhh6BdrqeLHv-" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">youtu.be/J4qwuCXyAcU?si=An7qhh</span><span class="invisible">6BdrqeLHv-</span></a></p><p>In this video, Ollama vs. LM Studio (GGUF), showing that their performance is quite similar, with LM Studio’s tok/sec output used for consistent benchmarking.</p><p>What’s even more impressive? The Mac Studio M3 Ultra pulls under 200W during inference with the Q4 671B R1 model. That’s quite amazing for such performance!</p><p><a href="https://k2pk.com/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMs</span></a> <a href="https://k2pk.com/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://k2pk.com/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MachineLearning</span></a> <a href="https://k2pk.com/tags/Ollama" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Ollama</span></a> <a href="https://k2pk.com/tags/LMStudio" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LMStudio</span></a> <a href="https://k2pk.com/tags/GGUF" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GGUF</span></a> <a href="https://k2pk.com/tags/MLX" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MLX</span></a> <a href="https://k2pk.com/tags/TechReview" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TechReview</span></a> <a href="https://k2pk.com/tags/Benchmarking" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Benchmarking</span></a> <a href="https://k2pk.com/tags/MacStudio" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MacStudio</span></a> <a href="https://k2pk.com/tags/M3Ultra" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>M3Ultra</span></a> <a href="https://k2pk.com/tags/LocalLLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LocalLLM</span></a> <a href="https://k2pk.com/tags/AIbenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIbenchmarks</span></a> <a href="https://k2pk.com/tags/EnergyEfficient" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EnergyEfficient</span></a> <a href="https://k2pk.com/tags/linux" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>linux</span></a></p>
IT News<p>CMU research shows compression alone may unlock AI puzzle-solving abilities - A pair of Carnegie Mellon University researchers recently discovered hints... - <a href="https://arstechnica.com/ai/2025/03/compression-conjures-apparent-intelligence-in-new-puzzle-solving-ai-approach/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/03/com</span><span class="invisible">pression-conjures-apparent-intelligence-in-new-puzzle-solving-ai-approach/</span></a> <a href="https://schleuss.online/tags/fran%C3%A7oischollet" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>françoischollet</span></a> <a href="https://schleuss.online/tags/machinelearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>machinelearning</span></a> <a href="https://schleuss.online/tags/aicompression" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aicompression</span></a> <a href="https://schleuss.online/tags/aibenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aibenchmarks</span></a> <a href="https://schleuss.online/tags/compression" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>compression</span></a> <a href="https://schleuss.online/tags/arc" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>arc</span></a>-agi <a href="https://schleuss.online/tags/biz" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>biz</span></a>⁢ <a href="https://schleuss.online/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://schleuss.online/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a></p>
Miguel Afonso Caetano<p>"The current landscape of AI agent evaluation faces several critical challenges. Benchmark evaluations tend to focus on accuracy while ignoring costs, leading to uninformative evaluations for downstream developers. What does it mean if an agent has 1% higher accuracy on a benchmark but is 10x more expensive? The lack of standardized evaluation practices makes it difficult to assess real-world capabilities and prevents meaningful comparisons between different approaches. As shown in "AI Agents That Matter" (arXiv:2407.01502), these issues have led to confusion about which advances actually improve performance.</p><p>HAL addresses these challenges through two key components: 1) A central leaderboard platform that incorporates cost-controlled evaluations by default, providing clear insights into the cost-performance tradeoffs of different agents, and 2) A standardized evaluation harness that enables reproducible agent evaluations across various benchmarks while tracking token usage and traces and without requiring any changes to the agent code or constraining agent developers to follow a certain agent framework. Evaluations can be run locally or in the cloud and fully parallelized.</p><p>TLDR: We aim to standardize AI agent evaluations by providing a third-party platform for comparing agents across various benchmarks. Our goal with HAL is to serve as a one-stop shop for agent evaluations, taking into account both accuracy and cost by default. The accompanying HAL harness offers a simple and scalable way to run agent evals - locally or in the cloud."</p><p><a href="https://hal.cs.princeton.edu/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">hal.cs.princeton.edu/</span><span class="invisible"></span></a></p><p><a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GenerativeAI</span></a> <a href="https://tldr.nettime.org/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMs</span></a> <a href="https://tldr.nettime.org/tags/AIAgents" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAgents</span></a> <a href="https://tldr.nettime.org/tags/AIBenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIBenchmarks</span></a></p>
LavX News<p>The Paradox of AI: Can Models Truly Reason Like Humans?</p><p>As OpenAI prepares to unveil its o3 model, the AI community is left grappling with a perplexing question: can artificial intelligence genuinely achieve human-level reasoning? Despite impressive perfor...</p><p><a href="https://news.lavx.hu/article/the-paradox-of-ai-can-models-truly-reason-like-humans" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.lavx.hu/article/the-parad</span><span class="invisible">ox-of-ai-can-models-truly-reason-like-humans</span></a></p><p><a href="https://mastodon.cloud/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> <a href="https://mastodon.cloud/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://mastodon.cloud/tags/AIbenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIbenchmarks</span></a> <a href="https://mastodon.cloud/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.cloud/tags/ArtificialGeneralIntelligence" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ArtificialGeneralIntelligence</span></a></p>
IT News<p>Mysterious “gpt2-chatbot” AI model appears suddenly, confuses experts - Enlarge (credit: Getty Images) </p><p>On Sunday, word began to spread... - <a href="https://arstechnica.com/?p=2020588" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arstechnica.com/?p=2020588</span><span class="invisible"></span></a> <a href="https://schleuss.online/tags/machinelearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>machinelearning</span></a> <a href="https://schleuss.online/tags/simonwillison" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>simonwillison</span></a> <a href="https://schleuss.online/tags/aibenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aibenchmarks</span></a> <a href="https://schleuss.online/tags/chatbotarena" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>chatbotarena</span></a> <a href="https://schleuss.online/tags/ethanmollick" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ethanmollick</span></a> <a href="https://schleuss.online/tags/gpt2" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt2</span></a>-chatbot <a href="https://schleuss.online/tags/samaltman" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>samaltman</span></a> <a href="https://schleuss.online/tags/aivibes" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aivibes</span></a> <a href="https://schleuss.online/tags/gpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt</span></a>-3.5 <a href="https://schleuss.online/tags/gpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt</span></a>-4.5 <a href="https://schleuss.online/tags/biz" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>biz</span></a>⁢ <a href="https://schleuss.online/tags/openai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>openai</span></a> <a href="https://schleuss.online/tags/gpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt</span></a>-3 <a href="https://schleuss.online/tags/gpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt</span></a>-4 <a href="https://schleuss.online/tags/gpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt</span></a>-5 <a href="https://schleuss.online/tags/lmsys" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>lmsys</span></a> <a href="https://schleuss.online/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a></p>
IT News<p>“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time - Enlarge (credit: Getty Images / Benj Edwards) </p><p>On Tuesday, Anth... - <a href="https://arstechnica.com/?p=2012778" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arstechnica.com/?p=2012778</span><span class="invisible"></span></a> <a href="https://schleuss.online/tags/machinelearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>machinelearning</span></a> <a href="https://schleuss.online/tags/aileaderboard" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aileaderboard</span></a> <a href="https://schleuss.online/tags/aibenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aibenchmarks</span></a> <a href="https://schleuss.online/tags/chatbotarena" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>chatbotarena</span></a> <a href="https://schleuss.online/tags/claude3haiku" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>claude3haiku</span></a> <a href="https://schleuss.online/tags/claude3opus" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>claude3opus</span></a> <a href="https://schleuss.online/tags/gpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt</span></a>-4-turbo <a href="https://schleuss.online/tags/claudeopus" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>claudeopus</span></a> <a href="https://schleuss.online/tags/anthropic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>anthropic</span></a> <a href="https://schleuss.online/tags/claude3" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>claude3</span></a> <a href="https://schleuss.online/tags/gpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt</span></a>-3.5 <a href="https://schleuss.online/tags/biz" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>biz</span></a>⁢ <a href="https://schleuss.online/tags/openai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>openai</span></a> <a href="https://schleuss.online/tags/gpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt</span></a>-4 <a href="https://schleuss.online/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a></p>
Tech news from Canada<p>Ars Technica: Google launches Gemini—a powerful AI model it says can surpass GPT-4 <a href="https://arstechnica.com/?p=1989030" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="">arstechnica.com/?p=1989030</span><span class="invisible"></span></a> <a href="https://mastodon.roitsystems.ca/tags/Tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Tech</span></a> <a href="https://mastodon.roitsystems.ca/tags/arstechnica" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>arstechnica</span></a> <a href="https://mastodon.roitsystems.ca/tags/IT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>IT</span></a> <a href="https://mastodon.roitsystems.ca/tags/Technology" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Technology</span></a> <a href="https://mastodon.roitsystems.ca/tags/largelanguagemodels" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>largelanguagemodels</span></a> <a href="https://mastodon.roitsystems.ca/tags/machinelearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>machinelearning</span></a> <a href="https://mastodon.roitsystems.ca/tags/googledeepmind" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>googledeepmind</span></a> <a href="https://mastodon.roitsystems.ca/tags/AIbenchmarks" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIbenchmarks</span></a> <a href="https://mastodon.roitsystems.ca/tags/GoogleGemini" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GoogleGemini</span></a> <a href="https://mastodon.roitsystems.ca/tags/GoogleBard" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GoogleBard</span></a> <a href="https://mastodon.roitsystems.ca/tags/AIethics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIethics</span></a> <a href="https://mastodon.roitsystems.ca/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ChatGPT</span></a> <a href="https://mastodon.roitsystems.ca/tags/chatgtp" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>chatgtp</span></a> <a href="https://mastodon.roitsystems.ca/tags/Biz" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Biz</span></a>&amp;IT <a href="https://mastodon.roitsystems.ca/tags/gemini" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gemini</span></a> <a href="https://mastodon.roitsystems.ca/tags/google" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>google</span></a> <a href="https://mastodon.roitsystems.ca/tags/GPT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GPT</span></a>-4 <a href="https://mastodon.roitsystems.ca/tags/PaLM2" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>PaLM2</span></a> <a href="https://mastodon.roitsystems.ca/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a></p>