Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://arxiv.org/abs/1705.08439

The original paper.

The references in the paper paint a much clearer picture of where exactly the idea behind reinforcement learning with optimal, suboptimal, random oracles comes from. There are also mathematical proofs that these setups work.

I was quite shocked to not see [6, 16] references in any of the recent MCTS papers.

These references prove why the stuff works and show how well it works. But the whole field of imitation learning seems invisible to the deep RL papers. Don't have the faintest idea why.

The algorithm described is the ultimate generalized algorithm. If you have the expert policy the algorithm is learning completely supervised, if expert policy is suboptimal but the score (loss) is fully calculable the learned policy will outperform the reference policy, if expert policy is completely random the algorithm behaves as reinforcement learning.

What the paper at the top adds is the ability to improve the expert policy with the learned one simultaneously in unison and the math covered previously guarantees improvement.



And of course, the name is in reference to Daniel Kahneman's excellent research and book by the same title. One of the most influential pieces of literature I've had the pleasure of reading, everyone should read it.

https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow


> One of the most influential pieces of literature I've had the pleasure of reading, everyone should read it.

seriously ? i have had almost an opposite reaction to this, more in line with : https://jasoncollins.org/2016/06/29/re-reading-kahnemans-thi... (which was discussed here https://news.ycombinator.com/item?id=12030791)


I liked that book very much when it came out, but I'm now in the leery camp. Replication and the general quality of studies is a matter of concern, but even one of the core tenets of the book, namely system I vs system II, turns out to be a great oversimplification that can fool as much as it can help.

The thing is, Kahneman is likeable, he has a good reputation and books like these are pure candy for the audience who enjoy that kind of literature (myself included)--but that's also a good warning signal. Could it be too simple, too convenient and too satisfying to be true? How do you know when you fall in love with an idea/theory?


I'm more concerned with, when you discount the studies that have been shown to be poor, does the overall system 1/system 2 model become weaker? I think it does, a bit, but he takes pains to point out that it is just a shorthand and not a real thing, so we should already be skeptical of taking it too far? How about the thinking vs remembering selves? I think that holds up ok.

We're early in our understanding of human behavior. Many of our ideas are wrong. Everything is hard to test. But though the book has flaws and should be evaluated again (would love to see it rewritten), I still think there is usefulness in the models he suggested. And in behavioral economics in general. I think we're far away from having enough replicated studies to be really "sure" of anything, especially with this crisis, but moving in a reasonable direction.


Loosely related : I thought he distanciated himself from his older writings by now ?


Good question. From: http://slatestarcodex.com/2014/12/12/beware-the-man-of-one-s...

  But the question remains: what happens when (like in most cases) you don’t have a funnel plot?

  I don’t have a good positive answer. I do have several good negative answers.

  Decrease your confidence about most things if you’re not sure that you’ve investigated every piece of evidence.

  Do not trust websites which are obviously biased (eg Free Republic, Daily Kos, Dr. Oz) when they tell you they’re going to give you “the state of the evidence” on a certain issue, even if the evidence seems very stately indeed. This goes double for any site that contains a list of “myths and facts about X”, quadruple for any site that uses phrases like “ingroup member uses actual FACTS to DEMOLISH the outgroup’s lies about Y”, and octuple for RationalWiki.

  Most important, even if someone gives you what seems like overwhelming evidence in favor of a certain point of view, don’t trust it until you’ve done a simple Google search to see if the opposite side has equally overwhelming evidence.


(As an aside, please don't use indentation for quoting. I like your points, but they're hard to read:

"But the question remains: what happens when (like in most cases) you don’t have a funnel plot?

"I don’t have a good positive answer. I do have several good negative answers.

"Decrease your confidence about most things if you’re not sure that you’ve investigated every piece of evidence.

"Do not trust websites which are obviously biased (eg Free Republic, Daily Kos, Dr. Oz) when they tell you they’re going to give you “the state of the evidence” on a certain issue, even if the evidence seems very stately indeed. This goes double for any site that contains a list of “myths and facts about X”, quadruple for any site that uses phrases like “ingroup member uses actual FACTS to DEMOLISH the outgroup’s lies about Y”, and octuple for RationalWiki.

"Most important, even if someone gives you what seems like overwhelming evidence in favor of a certain point of view, don’t trust it until you’ve done a simple Google search to see if the opposite side has equally overwhelming evidence."


I wonder wether. this algorithm could be explained using Bayesian Learning. How it solves tbe trade-off between exploitation and exploration.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: