Real probability distributions are usually very peaked, with vast wastelands of minuscule probability punctuated by sudden Everests. The Markov chain then converages to the nearest peak and stays there, leading t very biased probability estimates. It's as if the drunkard followed the scent of alcohol to the nearest tavern and stayed there all night, instead of wandering all around the city like we wanted him to.
The Master Algorithm is likely depend on multiple different discplines and maybe, some "knots" have not been tackled yet.All of the tribes we've met so far have one thing in common: they learn an explicity model of the phenomenon under consideration, whether it's a set of rules, a multilayer perceptron, a genetic program, or a Bayesian network. When they don't have enought data to do that, they're stumped. But analogizers can learn from as little as one example because they never form a model. Let's see what they do instead.
Introduction to Ch. VII