Can GPT-5 Beat My Favorite Daily Puzzle Game?

(nicksypteras.com)

9 points | by nsypteras 7 hours ago ago

3 comments

pyankoff 7 hours ago ago
Very cool! The massive outperformance of GPT-5 looks like there is something different in their training data indeed. Considering their previous work on games, wouldn't be surprising if they generated some synthetic game data.
[-]
- nsypteras 6 hours ago ago
  Ya interesting thought - would be fascinating if generating games w/solutions is part of the training data pipeline. There's been previous work done on on testing LLMs on logic puzzles[1][2][3] so they could possibly be building off those ideas to improve performance.
  [1] https://huggingface.co/papers/2504.00043 [2] https://huggingface.co/blog/yuchenlin/zebra-logic [3] https://arxiv.org/pdf/2403.12094
srekhi 6 hours ago ago
interesting - and thx for making reproducible