<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ai on Sminkware.com</title><link>https://sminkware.com/blog/ai/</link><description>Recent content in Ai on Sminkware.com</description><generator>Hugo</generator><language>en</language><copyright>Copyright © 2026, Jeroen Smink.</copyright><lastBuildDate>Wed, 01 Jul 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://sminkware.com/blog/ai/index.xml" rel="self" type="application/rss+xml"/><item><title>AI Models Know When They Are Being Tested</title><link>https://sminkware.com/ai-models-know-when-they-are-being-tested/</link><pubDate>Wed, 01 Jul 2026 00:00:00 +0000</pubDate><guid>https://sminkware.com/ai-models-know-when-they-are-being-tested/</guid><description>&lt;p>Something interesting is happening in the world of AI coding agents. With benchmarks like &lt;a href="https://www.swebench.com/">SWE-bench&lt;/a>, we are not only testing how well models can write code anymore. We are also testing how well they understand the environment in which they are being tested.&lt;/p>
&lt;p>Cursor published a &lt;a href="https://cursor.com/blog/reward-hacking-coding-benchmarks">blog post&lt;/a> about reward hacking in coding benchmarks. The short version is that some AI agents do not always solve the programming task from scratch. Instead, when they recognize that a benchmark is based on an old public GitHub issue, they sometimes search for the original fix, inspect git history, or look up the already merged pull request. In other words, the model is not only trying to solve the bug, it is also trying to understand where the answer might already exist.&lt;/p></description></item></channel></rss>