Not entirely. Since generation is auto regressive, the next token depends on the...

Not entirely. Since generation is auto regressive, the next token depends on the previous tokens. Whatever analysis and decisions it has spit out will influence what it will do next. This tends to cause it to be self reinforcing.

But it's also chaotic. Small changes in input or token choices can give wildly different outcomes, particularly if the sampling distributions are fairly flat (no one right answer). So restarting the generation with a slightly different input, such as a different random seed (or in OP's case, a different temperature) can give wildly different outcomes.

If you try this, you'll see some examples of it vehemently arguing it is right and others equally arguing it is wrong. This is why LLM as judge is so poor by itself, bit also why multiple generations like used in self-consistency can be quite useful at evaluating variance and therefore uncertainty.