Currently evaluating OpenAI's o1-preview. Although it is said that it is not always more accurate than GPT-4o, when checked with the "[World Model](https: //github.com/SingularitySociety/WorldModels/)". It collects problems that LLMs struggle with, o1-preview correctly solves questions like the following that GPT-4o gets wrong.
Q1. What happens if you push a door labeled "Pull" from the opposite side?
Q2. A cotton candy is 8 centimeters tall, and a brick is 7 centimeters tall. If you place the brick on top of the cotton candy, what is the total height?
Q3. I left a wallet on a park bench near Shibuya Station for three hours. On that day, the temperature exceeded 39 degrees Celsius, and the wallet was exposed to direct sunlight. What happens to the wallet?
Q4. Do man-eating tigers live in unexplored jungles?