Foundational Issues in Artificial Intelligence and Cognitive Science 评价人数不足
读书笔记 13 Connectionism
The central point to be made, however, is that connectionism and PDP approaches are just as committed to, and limited by, encodingism as are compositional, or symbol manipulational, approaches (though in somewhat differing ways). The “representations,” the symbols, of a connectionist system are just as empty for the system as are those of any standard data structure (Christiansen & Chater, 1992; Van Gulick, 1982); a connectionist approach does not escape the necessity of a user semantics.

A fixed connectionist system is, therefore, specified by three sorts of information. The first is the graph of the nodes and directed connections among them and from the environment. The second is the set of weights on those connections. The third is the rules by which the nodes determine their resultant activations given their input activations. Both the graph and the weights, together with the relevant rules for setting node activations from inputs, determine a space of possible patterns of activation of the nodes, and a dynamics of the possible changes in activation patterns, forming paths or trajectories through the space of possible activation patterns. In general, this space will be a space of vectors of possible node activation levels, and the dynamics will determine trajectories of possible movement through paths of activation vectors

The excitement concerning connectionism and PDP derives from the fact that the weights of the system are themselves not fixed, and that it proves possible in some cases to adjust the weights according to well defined rules and error correction experiences so that particular desired differentiations of input patterns are obtained. Such adjustment of the weights is then interpreted as learning — learning of the desired differentiations, categorizations, of the input patterns.


A related advantage of the connectionist approach is the sense in which the differentiating stable patterns of activation are intrinsically distributed over the entire set of (output) nodes. This distributivity of “representation” originates in the same aspect of connectionist net architecture as does the parallelism of computation mentioned above — each of the multiple activation nodes can in principle also be a node of parallel computation — but the distributivity yields its own distinct advantages. These are argued to include: 1) as with the parallelism, the distributed nature of the “representations” is at least reminiscent of the apparently distributed nature of the brain, and 2) such distributivity is held to provide “graceful degradation” in which damage to the network, or to its connection weights, yields only a gradual degradation of the differentiating abilities of the network, instead of the catastrophic failure that would be expected from the typical computer program with damaged code. This is reminiscent of, and argued to be for reasons similar to, the gradual degradation of hologram images when the holograms are physically damaged.
This phase space approach highlights several of the powerful characteristics of PDP models. The space of activation patterns of a PDP network is a space of the intrinsic dynamics of the system, not a space of (encoded) information that the system in some way makes use of. It is like an automaton in which the states form a smooth surface (differentiable manifold), the state transitions are continuous on that manifold, and the state transitions intrinsically move “downward” into local attraction basins in the overall manifold. The space, then, does not have to be searched in any of the usual senses — the system dynamics intrinsically move toward associated differentiating regions of stability.
Viewing connectionist systems in terms of modeling their (weight space adjustable) intrinsic dynamics, instead of in terms of the classical programmed informational manipulations and usages, is an additional perspective on both their distributed and their parallel nature. Because the activation space is the space of the possibilities and possible dynamics of the entire system, and because nothing restricts those dynamics to any simply isolable subspaces (such as would be equivalent to changing just one symbol in a data structure), then any properties of that space, such as input-differentiating dynamically-attracting regions of stability, will necessarily be properties distributed over the whole system and dynamically parallel with respect to all “parts” of the system.


Most fundamentally, the primary advantages of PDP systems are simultaneously the source of their primary weaknesses. On one hand, the emergent nature of connectionist differentiations transcends the combinatoric restrictions of standard symbol manipulation approaches. Any model that is restricted to combinations of any sort of atom, whether they be presumed representational atoms or any other kind, intrinsically cannot model the emergence of those atoms: combinations have to make use of already available atoms (Bickhard, 1991b). The classical approach, then, cannot capture the emergence of input pattern differentiators, while PDP approaches can.
...the combinatoric atomism of the standard symbol manipulation approach allows precisely what its usual name implies: the manipulation of separable “symbols.” This creates the possibility of (combinatoric) generativity, componentiality, high context and condition specificity of further system actions and constructions, a differentiation of representational functions from general system activity, a “lifting” of representational issues out of the basic flow of system activity into specialized subsystems, and so on. All of these can be of vital importance, if not necessary, to the modeling of various sorts of cognitive activity, and all of them are beyond the capabilities of connectionist approaches as understood today, or at least can be approximated only with inefficient and inflexible kludges. A restriction to combinatorics dooms a model to be unable to address the problem of the emergence of representations, but an inability to do combinatorics dooms a system to minimal representational processing.
connectionist systems do have a distinct disadvantage with respect to the systemic constructions and manipulations of their “representations.” Put simply, symbol manipulation approaches have no way to get new “representations” (atoms), while connectionist approaches have no way of doing much with the “representations” that they can create. Of course, in neither case is there any possibility of real representations, for the systems themselves

Problem of backpropagation and supervised learning

the learning rules and corresponding necessary tutoring experiences that have been explored so far tend to be highly artificial and inefficient (e.g., back-propagation, Rumelhart & McClelland, 1986; McClelland & Rumelhart, 1986; Rich & Knight, 1991). It is not clear that they will suffice for many practical problems, and it is clear that they are not similar to the way the brain functions. It is also clear that they cannot be applied in a naturalistic learning environment — one without a deliberate tutor (or built-in equivalent; see the discussion of learning in passive systems above). (There are “learning” algorithms that do not involve tutors — instead the system simply “relaxes” into one of several possible appropriate weight vectors — but these require that all the necessary organization and specification of the appropriate possible weight vectors be built in from the beginning; this is even further from a general learning procedure.)

A limitation that will not usually be relevant for practical considerations, but is deeply relevant for ultimate programmatic aspirations, is that the network topology of a PDP system is fixed. From a practical design perspective, this is simply what would be expected. From a scientific perspective, however, concerning the purported evolutionary, the embryological, and, most mysteriously, the developmental and learning origins of such differentiators, this fixedness of the network topology is at best a severe incompleteness. There is no homunculus that could serve as a network designer in any of these constructive domains (see, however, Quartz, 1993).

The authors further argue that connectionist approaches are indeed encodingism, unabling of creating genuine representations.

The encodingist commitments of PDP approaches follow readily from their characterization above: PDP systems can generate novel systems of input pattern differentiators, but to take these differentiating activation patterns as representations of the differentiated input patterns is to take them as encodings.
Note that these encodings do not look like the “symbols” of the standard “symbol” manipulation approach, but they are encodings nevertheless in the fundamental sense that they are taken to be representations by virtue of their “known” correspondences with what is taken to be represented. Standard “symbols” are encodings, but not all encodings are standard “symbols” (Touretzky & Pomerleau, 1994; Vera & Simon, 1994). Both model representation as atemporal correspondences — however much they might change over time and be manipulated over time — with what is represented: the presumed representationality of the correspondences is not dependent on temporal properties or temporal extension (Shanon, 1993)
The PDP system has no epistemic relationship whatsoever with the categories of input patterns that its stable conditions can be seen to differentiate — seen by the user or designer, not by the system. PDP systems do not typically interact with their differentiated environments (Cliff, 1991), and they perforce have no goals with respect to those environments. Their environmental differentiations, therefore, cannot serve any further selection functions within the system, and there would be no criteria of correctness or incorrectness even if there were some such further selections.
A major and somewhat ironic consequence of the fact that PDP systems are not interactive and do not have goals is that these deficiencies make it impossible for connectionist networks to make good on the promise of constituting real learning systems — systems that learn from the environment, not just from an omniscient teacher with artificial access to an ad hoc weight manipulation procedure. The basic point is that, without output and goals, there is no way for the system to functionally — internally — recognize error, and, without error, there is no way to appropriately invoke any learning procedure. In a purely passive network, any inputs that might, from an observer perspective, be considered to be error-signals will be just more inputs in the general flow of inputs to the network — will just be more of the pattern(s) to be “recognized.” Even for back-propagation to work, there must be output to the teacher or tutor — and for competitive “learning,” which can occur without outputs, all the relevant information is predesigned into the competitive relationships within the network.
In other words, connectionist networks are caught in exactly the same skepticism-solipsism impossibility of learning that confounds any other encodingist system. No strictly passive system can generate internally functional error, and, therefore, no strictly passive system can learn. Furthermore, even an interactive system with goals, that therefore might be able to learn something, will not be legitimately understood to have genuine “first person” representations so long as representation is construed in epistemically passive terms — as merely the product of input processing — such that the interactions become based on the supposed already generated input encodings rather than the interactions being epistemically essential to the constitution of the representations. It is no accident that all “learning” that has been adduced requires designerprovided foreknowledge of what constitutes error with regard to the processing of the inputs, and generally also requires designer variation and selection constructions and designer-determined errors (or else already available designer foreknowledge of relevant design criteria) within the space of possible network topological designs to find one that “works.”
This point connects with the interactive identification of representational content with indicated potential interactions and their internal outcomes — connectionism simply provides a particular instance of the general issues regarding learning that were discussed earlier. It is only with respect to such strictly internal functional “expectations,” such contents, that error for the system can be defined, and, therefore, only with respect to such contents that learning can occur, and, therefore, only out of such functional “expectations” that representation can emerge. Representation must be emergent out of some sort of functional relationships that are capable of being found in error by the system itself, and the only candidate for that is output-to-input potentialities, or, more generally, interactive potentialities. Representation must be constructable, whether by evolution or development or learning; construction requires error-for-the-system; and the possibility of error-forthe-system requires indications of interactive potentialities. In short, representation must be constructed out of, and emergent out of, indications of interactive potentialities.
PDP systems are, in effect, models of the emergence of logical transducers: transducers of input categories into activation patterns. But the complexity and the emergent character of the “transduction” relationship in a PDP network does not alter the basic fact that the system itself does not know what has been “transduced,” nor even that anything like transduction, or categorization, has occurred. All relevant representational information is in the user or designer, not in the system.

《Foundational Issues in Artificial Intelligence and Cognitive Science》的全部笔记 1篇
免费下载 iOS / Android 版客户端