"The inductive learning hypothesis. Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples"
"We shall see that most current theory of machine learning rests on the crucial assumption that the distribution of training examples is identical to the distribution of test examples. Concept learning. Inferring a boolean-valued function from training examples of its input and output."
"As illustrated by these first two steps, positive training example may force the S boundary of the version space to become increasingly general. Negative training examples play the complimentary role of forcing the G boundary to become increasing specific."
"When gradient descent falls in a local minimum with respect to one of these weights, it will not necessarily be in a local minimum with respect to the other weights. In fact, the more weights in network, the more dimensions that might provide "escape routs" for gradient descent to fall away from the"
"The proof of this involoves showing that any function can e approximated by a inear combination of some samll region, and then showing that two layers of sigmoid units are sufficient to produce good local approximations."
"The only likely impact on the final error is that different error-minimization procedures may fall into different local minima. Bishop (1996) contains a general discussion of several parameter optimization methods for training networks A variety of methods have been proposed to dynamically grow or s"