Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're wrong. Mathematically. Here's why.

When you throw a coin 100 times, each sequence you get is equally likely. However. You can look at properties of the sequence which are more likely to be one way than the other. For instance, it's more likely that the number of heads and tails are about equal than not. The reason is that there are more sequences, in general, where that is true, than those where heads or tails strongly prevail.

With the right property, you can make statements such as: This sequence is statistically likelier to be human made than random.

One such property is for instance the number of changes from heads to tails, or vice versa. In expectation, random sequences change heads to tails about 50% of all flips. For humans, the expectation is much higher. Hence, if you compare two sequences where one has 51% changes and one 63%, it is mathematically (/statistically) accurate to say that the latter one is likelier human made.



Your point doesnt refute OPs argument. Your final statement "you can say the latter is more likely random" is not the same as "you can say this sequence is not random". I think lots of people (especially programmers) who know about true RNG vs expectations of RNG might intentionally put in strings of same numbers, or not include the full set, because we know its what often happens during plain RNG. It isnt clear what the goal of the sequence is, hence the confusion in the comments.


Exactly, and their "good" dice roll sequence, 3 1 5 6 2 6 3 4 4 1 contained the full set which should only happen ~1/4 of the time for 10 rolls. It also contained no number more than twice, which should happen < 7% of the time. This looks to me like they purposely tried to make this sequence look like their idea of "random".

I'm curious about how they scored this section because my overall age was reported to be 60+ with the sequence 2 1 5 2 6 2 2 4 6 6.


I also scored 60+ (actual age is in my 30s). I had similar thoughts and also did things like not use up all the numbers and repeat numbers more than twice exactly because I've looked at a lot of random number sequences in my life and I was trying to make it look like one of those.


> I've looked at a lot of random number sequences in my life and I was trying to make it look like one of those.

This is perhaps the difference between pseudo and statistically random. No idea which of those the study or the experiment is trying to validate btw.

And IIRC, interestingly they write that human capacity to create random numbers declines 25+. I can imagine that the older we are the more we look for something to make our decisions look more random based on what we've learned so far - more time, there was more time to look at more random number sequences - and the less random the outcome will be.


> And IIRC, interestingly they write that human capacity to create random numbers declines 25+. I can imagine that the older we are the more we look for something to make our decisions look more random based on what we've learned so far - more time, there was more time to look at more random number sequences - and the less random the outcome will be.

This is what they are testing, and at least based on the data they've got so far, it looks like it increases up to 25-ish and then stays pretty flat.

Another possibly interesting observation is that their preliminary data set (just eyeballing it, but) looks to have gotten

1) a flatter response

2) generally, less random responses

Which leads me to wonder if the live stats have been skewed more random as there might be some correlation between "interested in this sort of thing" and "has some idea what a random distribution ought to look like," and possibly this knowledge doesn't go away with age.


> This is what they are testing

(human capacity to create random numbers declines 25+)

Not really; what they're testing is what kinds of response differently-aged people give to their question. So it's important what the question actually is; and it's important if the question might, for example, seem to older people to be a waste of their time.

They're not measuring what they claim to be measuring.


My opinion is that "real" randomness looks less random than artificial randomness.

That's why Apple changed iTunes random music shuffle to be less random because people complained that it wasn't random enough and replayed songs too close together.

https://www.cultofmac.com/181517/why-itunes-shuffling-order-...


> This is perhaps the difference between pseudo and statistically random.

Not quite. Pseudo-randomness is defined as being indistinguishable from a uniform distribution, meaning if the next in a sequence is no more predictable than a statistically random selection.


1/4 of the time and <7% does not make it impossible. In fact thats kinda the whole point of RNG.


I always pressed the same button. Let's say 10x 1. You get a rating below 60 then. Just in case you need the sequence to come towards your real age group when you redo the experiment.

Perhaps to guess random is also in the property of the age of someone clicking on a website? Perhaps someone should create an experiment that finds their experiment flawed ^^


What does that mean, "you can tell this sequence is not random"? If you show me a blue hat, and ask me if it's blue, I'll say yes, I can say it's blue. But there is always a chance it's not actually blue. It's very conceivable that I'm in a situation where I say with confidence that something is blue, but it isn't.

You always only ever speak in probability. Of course you can't say the sequence isn't random, because every sequence can be the result of a random process. "can you tell which is random" to me is equivalent as asking "is one of the sequences such that it is rational to choose it over the other as being random".

It's about rational decisions. Consider the frequentist view, where a probability p for an event A means that out of k trials, pk will show A (in expectation), and fruthermore if k -> infty, then the portion of events that show A will converge to p. If you want to choose the right sequence as being random as often as possible, it is rational to choose the one I described above, because it will, overall, be the one that is MORE OFTEN the random one compared to the other.


For a blue hat, it either is or isn't blue (and there's some rather strong evidence - whether or not it looks blue). Like, with a sequence of 6 digits, if you don't know whether the source was random or not, then that's like NOT showing me your hat, and asking me whether it's red or blue.

For a single sequence of six digits, it might or might not have come from a random source. You can't get any edge on that judgement by just inspecting the sequence. Only inspecting the source (the hat, if you like) can give you an advantage. Perhaps you're colour-blind, or the lighting is weird; so there's still uncertainty. But that's equivalent to examining the source of the digits, determining that it's really a random source, but making a mistake in your determination. That's uncertainty on a different level.

Red-pill blue-pill is a sort of meta-uncertainty.

> "is one of the sequences such that it is rational to choose it over the other as being random"

Most people don't care about this shit; it doesn't matter to them what random means, nor whether it's sequences or sources that can be said to be random. But for some people it does matter, and they have to try to use language precisely.

All [red|blue] hats are either red or blue. But no sequence is random or non-random; it's the source of the sequence (the process, if you like) than can be random or non-random.

If 111111 is emitted by a random process, then you can call that a "random sequence" if you like. If I emit 126692 from my ass (not a random process), that's not a "random sequence" in any sense, whatever statistical properties it has. You can't tell which is of random origin by inspection. The experimental subjects face an impossible challenge, and I can't see what conclusions you can draw from their responses.


Regarding "random process": (and sorry for commenting to myself)

I'm not taking a position on what a "random process" is; for these purposes, a PRNG, a LFSR or even the last three bits of the system-clock would do as well as radioactive decay.


“Statistically likelier,” but still mathematically possible confirms the OP’s point.

This seems to be really testing for pseudo-random numbers. Relevant Dilbert (and article): https://www.lancaster.ac.uk/~blackb/RNG.html.


I don't think this has anything to do with pseudo random numbers :) PRNG are not perfect but their imperfections are impossible to detect by hand.


"he/she should not be able to tell" isn't the same as "he/she should not be able to make a statistically-probable guess".


By this logic the expression "being able to tell" should be banned from the English vocabulary, because no-one is able to tell anything with 100% certainty. Requiring 100% certainty as a precondition of using this expression is silly.


It depends on the framework. I can tell a geometric figure is a square because it’s a quadrilateral with right angles and sides of equal length. You could ask me a question like “Is a rectangle with a side of length 1 and a diagonal of root 2 a square?” and I can tell it is.

Ask me “Was 1 1 1 1 produced by a random process?” and it’s impossible to tell in the way I did with the square.


> It depends on the framework. I can tell a geometric figure is a square because it’s a quadrilateral with right angles and sides of equal length.

You're claiming to be able to craft a mathematical proof with 100% certainty. Although the thing you are proving appears to be obviously true (assuming a certain mathematical framework), the probability that you made a mistake is not 0%. You might falsely believe that the probability of making a mistake in a simple proof like this is 0%, but you would be wrong, and we have plenty of historical examples of mathematicians "proving" something and thinking that there is 0% chance of errors in the proof, only later being shown that they were incorrect.


You’re mixing up two layers of uncertainty. There’s an outer uncertainty. This would include things like I made a mistake, this is all a dream, etc. This outer uncertainty pervades all problems.

It’s often useful to ignore that outer uncertainty. We create a framework where we take certain things as true (shared reality, mathematical axioms). This framework may or may not have uncertainty inside of it, which we could call inner uncertainty.

Questions of probability have inner uncertainty. Questions of geometry do not. This makes them qualitatively different.

If you frame the initial task as something like “do your best to lead people to believe your sequence is random”, that makes sense. If the task is “make it so they can’t tell if it’s random”, that’s a bit off in some way. At the very least, it’s because you’ve presented the spotting of randomness as something that can truly be done to a logical conclusion (random/not or true/false). This violates both the outer and inner uncertainties of randomness.


Ask me "Was 19 19 19 19 19 19 19 produced by a random process" and I can say 'most likely not!'.

But then: https://www.dailymail.co.uk/news/article-2162190/What-odds-R...


Interestingly, the article computes the odds incorrectly: "... hit the same number on seven consecutive spins [...] the odds of which happening are 114billion to one...", which actually are the odds of having 7 consecutive 19s or the same (unspecified) number on 8 consecutive spins.


The process by which you came to hear about this particular spin of this particular roulette wheel was far from random.


> no-one is able to tell anything with 100% certainty

Including this very assertion? So it's possible that _someone_ could tell _something_ with 100% certainty?


Only a Sith deals in absolutes.

No seriously, humans can have 100% truth about core things that have an insane amount of empiricism such as e.g gravity being real. But it is accepted that for any non-empiritismed ad nauseam knowledge, when we use universal quantifiers, we tolerate generally some credible kinds of exceptions, contextually.


absolutely, probably you just did.


If it's light when I wake up, I would say that I can tell it's daytime, despite the possibility that it's still nighttime but a sufficiently near star has gone supernova or that the house next door is on fire.


I sense the sarcasm but I'm not sure which way you're intending it to go. What if you live in the arctic circle?


I wasn't trying to be sarcastic, just to give evidence against the statement

> "he/she should not be able to tell" isn't the same as "he/she should not be able to make a statistically-probable guess".

I'm in agreement with the sibling comment by baobabKoodaa.


This is a good illustration of a binary epistemology vs a continuous one.

> It's always impossible to tell, for any given sequence, whether it was produced by a fair die.

Something like "you can't make any determination, because it's random". Whereas under the second worldview you can make statements about how likely things are, despite uncertainty.

For some reason the binary worldview seems to be incredibly common. My sibling commenter exhibits the same issue.


> you can make statements about how likely things are

Sure. And it's true that some sequences are more likely than others to have been emitted by a random process. [Edit] All sequences from a random process are equally likely. It's still true that some sequences are more-likely to have come from non-random processes.

The point is that randomness isn't a property of the sequence; it's a property of the process.


Could be. Though if you think long enough, with this worldview you can't decide anything, ever. And it's irrational. You can't say for sure which is random, but you can say for sure on which you should bet your money if you have to.


The OP is essentially correct.

I was definitely confused and assumed it was about 'looking like' randomness.

But I did a lot of double clicking of things, because I felt that in 'real life' you're not going to get 1 roll of each number, but odd things happen.

But this is a bit moot - the people clicking 'all the same number' have obviously come to some different conclusion as the others - i.e. 'all possible values are the same' and therefore.

So what the study is really 'testing' probably, is how people react to the question.

They really need to change the question substantially in order to get randomness.

I don't see any insightful aspect in the experiment or the debate.

It's pedantic -> some people read the question differently and do different things.


I disagree. You want people to click numbers s.t. if you asked them 10k times, a uniform distribution would emerge. But that is not what's happening. They think all numbers have the same probablity, but if you click only 1, then the probability of your choice being random is low.

Edit:

Maybe this will convince you: You said each sequence of numbers is equally likely, hence, we can't tell. I'm going to disagree with that statement.

Let's say I give you a coin, and tell you: I've flipped this coin 100 times in a row, 10k times. And you look at the flips, and each flip result is 1111...111. Would you guess it's random, or biased? The probability of that happening is as high as any other sequence, but clearly, if you'd guess it was random, you'd be a fool. This is exactly what is happening here, just on a smaller scale: 111111111 being the result of the coinflip has a lower probability of being random than the result 100101101110.

11111 has to happen at some point if the experiment is really random. The probability that it happens with YOU is low, however. Thus, it is rational to decide that the sequence is not random. Because it most cases, it won't be.


> Would you guess it's random, or biased?

Well, I'd guess that it's not a coin-flip at all; even a biased coin won't produce 10,000 heads and no tails, unless it's a two-headed coin.

Let's go back to the actual case in hand: suppose you provide me with "111111", and not 10,000 1s. I simply have no way at all of determining whether that is more or less likely to have come from a random source. So I would decline your bet. If it was 10,000 1s, then maybe I'd be a fool to bet it was of random origin; but you can't convince me that a string of 6 1s is or isn't of random origin. So no bet.

This is all irrelevant. We're discussing a single sequence of 6 digits. There are not enough samples to perform statistical analysis. Probability doesn't come into it.


> For instance, it's more likely that the number of heads and tails are about equal than not

This isn't even true. Heads and tails being equal over 100 flips is something you'll see something like 8.33% of the time (not based on probability, I just ran a simulation of 100 flips 10,000 times and got 833 instances of them being equal)

edit: I missed the key word, "about". Sure, they're more likely within maybe 5-6 of one another than not.


Yes :) The exact probability for having 50%/50% is (100 nCr 50)/2^100, or about (as you found through experiemtation) 0.08.

The more often you run the experiement, the more likely you'll get a result close to 50%/50%, by the way. In the limit, you have a variance (i.e. spread of results away from the expected value, which is 50%) of 0. This is called the law of large numbers. As the generic name suggest, it's pretty central to mathematics haha.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: