@fchollet
When a model gives you the right answer to a reasoning question, you can't tell whether it was via memorization or via reasoning. A simple way to tell between the two is to tweak your question in a way that 1. changes the answer, 2. requires some reasoning to adapt to the change. If you still get the same answer as before... it was memorization.