Hybrid Systems: Basic Concepts

Extract from Knowledge-Based Artificial Neural Networks (KBANN) by G.Towell and  J.Shavlik

Introduction
Two extreme teaching approaches to forecast a value or to classify an object are known in Artificial Intelligence. They are: (1) hand-build systems (e.g., expert systems [58]) and (2) empirical learning [42,47]. Hand-build systems correspond to giving a person a "domain theory" without an extensive set of examples; one could call this learning by being told. Conversely, empirical learning corresponds to giving a person lots of examples without any explanation of why the examples are members of a particular class or why they have particular target values, e.g., stock prices. In machine learning, a domain theory [28] is a collection of rules that describes task-specific inferences that can be drawn from the given facts. For classification problems, a domain theory can be used to prove whether or not an object is a member of a particular class. Unfortunately, neither of approaches (1) and (2) is completely satisfactory. They each suffer from from flaws that preclude them from being a generally applicable method.
The flaws of each method are, for the most part, complementary (see sections 2.1-2.2). Hence, 'hybrid' system that effectively combines a hand-built systems with an empirical learning algorithm might be like a student who is taught using a combination of theoretical information and examples. Similarly hybrid learning systems (reviewed in sections 2.4 and 6) should find synergies that make them more effective than either hand-build rules or empirical learning algorithms used in isolation.

The Need for Hybrid Systems

We further motivate the need of hybrid systems by listing some of the important weaknesses of hand-built rules and empirical learning systems.

2.1. Hand-build systems.

Hand-build systems are not-learning systems (except insofar as they are later altered by hand). They simply do what they are told; they do not learn at the knowledge level [9]. Despite their apparent simplicity, such systems pose many problems for those that build them.

2.2. Empirical learning.

Empirical learning systems inductively generalize specific examples. Thus, they require little theoretical knowledge about the problem domain; instead they require a large library of examples. Their almost complete ignorance of problem-specific theory means that they do not address important aspects of induction. Some of the most significant problems are:

2.3. Artificial neural networks

Artificial neural networks (ANNs), which form the basis of Knowledge-based ANN (KBANN), are a particular methods for empirical learning. ANNs have proven to be equal, or superior, to other empirical learning systems over a wide range of domains, when evaluated in terms of their generalization ability [50,2]. However, they have a set of problems unique to their style of empirical learning. Among these problems are:

2.4. Hybrid (integrated [22]) Learning Systems

There is a significant gap between two extreme approaches: the knowledge-intensive, learning-by-being-told approach of hand-built systems and the virtually knowledge-free approach of empirical learning. They are just two ends of a spectrum along which an intelligent system may operate. Staying at one end or the other end simplifies the learning problem by allowing strong assumptions to be made about the nature of what needs to be learned. However, the middle ground is appealing; it offers the possibility that synergistic combinations of theory and data will result in powerful learning systems.

There are also psychological evidence that people rarely, if ever, learn purely from theory or examples [62]. Finally, there is a purely practical consideration; hybrid systems have proven effective on several real-world problems [57,33,36,54,19](also see section 5).

Knowledge-based Artificial Neural Networks

KBANN-(Knowledge-based Artificial neural Networks) -the successor to EBL-ANN alhgorithm [51] -is such a system. The approach taken by KBANN is outlined in Table 1. Breafly, the idea is to insert a set of hand-constructed, symbolic rules into a neural network. The network is then refined usingstandard neural learning algorithms and a set of classified training examples. The refined networl can then   function as a highly-accurate classifier. I final step for KBANN, the extraction of refined, comprehensible rules from the training neural network, has been the subject of much effort [56].

The KBANN approach to learning

A1. Information about features. KBANN is able to handle examples described using the following five types of teatures that are commonly used in machne learning; A7.Information about rules.  Rules for use by KBANN must bein the form of propositional Horn clauses. In addition, disjuncts may only be expressed as multiple rules with the same consequent. Finally, the rules may not contain cycles.

A8. Rule types. KBANN recognizes two distinct types of rules. The first type is "definitional'. Definitional rules are rules that are not to be changed during the training phase. The second type of rule is "nondefinitional" or "variable". Variable rules are standard rules; they are subject to change during training. Therefore, consequents are connected to antecedents other than those mentioned in the rule, and weights on the links enthering into the consequent are changed durung neural learning.

A9. Special Antecedents. In addition to simply listing positive antecedents to form a simple conjunctive rule, KBANN allows the following special predicates in the antecedents of its rules.

The specification of antecedents is recursive. Hence, any of these types of antecedents may be nested within each other.

5.1. Hybrid systems versus other learning systems

One hypothesis about hybrid learning systems is that they should be more efficient than a system that learns from data alone. An aspect of this efficiency hypothesis is that a theory and data learning systems should require fewer training examples than systems that learn only from data. The ability to learn from relatively few training examples is important because training examples are often hard to find or expensive to collect. Thus, it is important for a learning system to be able to extract useful generalizations from a small set of examples. {This paper} confirms the hypothesis that hybrid systems can more efficiently than standard backpropagation generalize from a small set of examples (roughly one-third the number of training examples required by backpropagation). On the other hand a hybrid system with sparse domain knowledge does not improve an empirical learning system . Also the domain theory may prevent the network from learning a solution which differs ignificantly from the solution proposed by the theory. The next conclusion from experiments is that standard backpropagation should outperform KBANN only when large amount of training data are available.

5.7. Comparing KBANN to standard backpropagation

Two hypotheses may account for KBANN's abilities with respect to backpropagation:

5.8 Discussion of the sources of KBANN's strength.

Tests (this paper) indicate that alone neither structure nor weight (focusing) account for the superiority of KBANN-nets over standard ANNs. therefore the third hypothesis, that is a combination of structure and the focusing weights that give KBANN its advantage over backpropagation, is likely true. It is the combination of structure and weight that supplies the derived features that give KBANN its power.

6.2. Systems based on inductive logic programming. Other hybrid systems

There has been a recent proliferation of work in" inductive logic Programming" (e.g., [30]) often with the same goal as KBANN; namely, to refine an existing domain theory using classified examples. As an example of these systems, we use FOCL [37].
FOCL is an extension of FOIL [43] that allows the inductive learning mechanism of FOIL to work on user-supplied background knowledge. Hence, FOCL is able to add and delete relations from existing rules as well as add entirely new riles to the existing domain theory. Using an ID3-like information gain heuristic, FOCL is able to correct arbitrary types of errors in the providing domain theory when given a set of positive and negative examples of the problem. Tests on artificial domains show that FOCL, like KBANN, is tolerant of a variety of errors in the initial domain theory and the training examples [39].

6.3. Systems based on Neural Networks

Many hybrid systems have been developed recently that use neural networks as their basis. The focus of these systems has varied from reducing the number of required examples [29], to probabilistic logic [24], to the propagation of Mycin-like certainty factors [21], to the refinement of systems for fuzzy logic [5]. Rather than trying to review these diverse systems, we focus only on two developments.
Fu [13] described a system very similar to KBANN in that it uses symboloc knowledge in the form of rules to establish the structure and connection weights in a neural network. However, Fu's system uses non-differentiable activation functions to handle disjunctions in the rule sets. To toelrate this non-differentiability, Fu carefully organized his network into layers in which all of the the units at a given layer are either differentiable or not. then during training, different laerning mechanisms are applied to the different types of units. An empirical study in medical diagnosis showed this technique to be quite effective. However, the method is closely tied to its particular learning mechanism. Hence, his system is unable to use developments in the methods of training neural networks (e.g., [59]).
Katz [18] describes a system which, much like KBANN, uses a knowledge base that has been mapped (apparently by hand) into a neural network. The focus of Katz's work is on improving the time required to classify an object rather that generalization.
This speed increase in achieved by building new connections which allow his system to bypass intermediate procession stages. As a result, the initial resemblance of the network to the knowledge base is rapidly lost during learning. Thus, although the network is able to make corrections to its initial knowledge, it is impossible to interpret those corrections as symbolic rules which could be used in the construction of future networks.

6.4. Trends in hybrid systems research

There has been a pronounced trend in the development of hybrid systems. Early systems such as UNIMEN [22] required correct theoretical information. While subsequent systems eased the correctness requirement, they introduced new constrains such as strict over-generality of the domain theory [12]. More recent systems as KBANN allow arbitrary imperfection in theories. On the data side, earlier systems required that data be noise free while more recent systems allow noisy examples. However, these reductions in quality requirements for both theory and data are generally accomplished by the need for more examples.
In addition, a recent trend is the new hybrid systems tend to be built on the top of an established empirical learning systems. this is the case for Labyrinth-k, EITHER, FOCL and KBANN. Earlier systems tended to have specially-designed empirical components which were able to take advantage of the strong constraints on theory and data that these systems made.

Limitation

KBANN's rule syntax is limited. KBANN can not handle rules with cycles or variable . A research area is the expansion of KBANN to admit certain type of cyclic rules [23]. However, it is unlikely that in the near future KBANN will be able to handle quantified variables in its rules, as there are currently no appropriate techniques for binding variables within neural network. Inductive logic programming systems such as FOIL[43}, FOCL[37] and GOLEM[19] allow relational rule sets; hence, they avoid the restrictions inherent to KBANN (p. 34).
BK  This limitation of KBANN motivates us to pay more attantion to development of Inductive Logic Programming approach and related frist-order logic forecasting rules.  Below  we desctribe basic mathematical definitions related to first-order logic.

References

[1] S. Ahmad, A Study of Scaling and Generalization in Neural Networks, Technical
Report CCSR­88­13, University of Illinois, Center for Complex Systems Research
(1988).
[2] L. Atlas, R. Cole, Y. Muthusamy, A. Lippman, J. Connor, D. Park, M. El­Sharkawi,
and R. J. Marks, A peerformance comparison of trained multi­layer perceptrons
and trained classification trees, Proceedings of IEEE 78 (1990) 1614--1619.
[3] J. Bachant and J. McDermott, R1 Revisited: Four Years in the Trenches, AI
Magazine 5 (1984) 21--32.
[4] E. Barnard and R. A. Cole, A Neural­Net Training Program based on Conjugate­
Gradient Optimization, Technical Report CSE 89­014, Oregon Graduate Insti­
tute, Beaverton, OR (1989).
[5] H. R. Berenji, Refinement of Approximate Reasoning­Based Controllers by Rein­
forcement Learning, in: Proceedings of the Eighth International Machine Learn­
ing Workshop, Evanston, IL (1991), 475--479.
[6] L. A. Birnbaum and G. C. Collins, eds, Proceedings of the Eighth International
Machine Learning Workshop, Morgan Kaufmann, Evanston, IL (1991).
[7] W. W. Cohen, Compiling Prior Knowledge into an Explicit Bias, in: Proceed­
ings of the Ninth International Machine Learning Workshop, Aberdeen, Scotland
(1992), 102--110.
[8] S. Cost and S. Salzberg, A Weighted Nearest Neighbor Algorithm for Learning
With Symbolic Features, Machine Learning 10 (1993) 57--78.
[9] T. G. Dietterich, Learning at the Knowledge Level, Machine Learning 1 (1986)
287--316.
[10] S. E. Fahlman and C. Lebiere, The Cascade­Correlation Learning Architecture,
in: Advances in Neural Information Processing Systems, volume 2, Denver, CO
(1989), 524--532.
[11] D. H. Fisher, Knowledge Acquisition via Incremental Conceptual Clustering,
Machine Learning 2 (1987) 139--172.
[12] N. S. Flann and T. G. Dietterich, A Study of Explanation­Based Methods for
Inductive Learning, Machine Learning 4 (1989) 187--226.
[13] L. M. Fu, Integration of Neural Heuristics into Knowledge­Based Inference,
Connection Science 1 (1989) 325--340.
[14] J. Gennari, P. Langley, and D. Fisher, Models of incremental concept formation,
Artificial Intelligence 40 (1989) 11--61.
[15] G. E. Hinton, Connectionist Learning Procedures, Artificial Intelligence 40 (1989)
185--234.
[16] R. C. Holte, L. E. Acker, and B. W. Porter, Concept Learning and the Problem of
Small Disjuncts, in: Proceedings of the Eleventh International Joint Conference
on Artificial Intelligence, Detroit, MI (1989), 813--819.
[17] IUB Nomenclature Committee, Ambiguity Codes, European Journal of Bio­
chemistry 150 (1985) 1--5.
[18] B. F. Katz, EBL and SBL: A Neural Network Synthesis, in: Proceedings of the
Eleventh Annual Conference of the Cognitive Science Society, Ann Arbor, MI
(1989), 683--689.
[19] R. D. King, S. Muggleton, R. A. Lewis, and J. E. Sternberg, Drug Design
by Machine Learning: The use of Inductive Logic Programming to Model the
Structure---Activity Relationships of Trimethoprim Analogues Binding to Dihydro­
folate Reductase, Proceeding of the National Academy of Sciences, USA 89
(1992) 11322--11326.
[20] S. C. Kleene, Representation of Events in Nerve Nets and Finite Automata,
in: Automata Studies, C. E. Shannon and J. McCarthy (Eds.), 3--41, Princeton
University Press, Princeton, NJ (1956).
[21] R. C. Lacher, S. I. Hruska, and D. C. Kuncicky, Backpropagation Learning in
Expert Networks, IEEE Transactions on Neural Networks 3 (1992) 62--72.
[22] M. Lebowitz, Integrated Learning: Controlling Explanation, Cognitive Science
10 (1986) 219--240.
[23] R. Maclin and J. W. Shavlik, Refining Algorithms with Knowledge­Based Neural
Networks: Improving th Chou­Fasman Algorithm for Protein Folding, Machine
Learning 11 (1993) 195--213.
[24] J. J. Mahoney and R. J. Mooney, Combining connectionist and symbolic learning
to refine certainty­factor rule­bases, Connection Science 5 (1993) 339--364.
[25] W. S. McCulloch and W. A. Pitts, A Logical Calculus of Ideas Immanent in
Nervous Activity, Bulletin of Mathematical Biophysics 5 (1943) 115--133.
[26] R. S. Michalski and G. Tecuci, eds, Proceedings of the First International Work­
shop on Multistrategy Learning, George Mason University, Harpers Ferry, W. Va.
(1991).
[27] J. Mingers, An Empirical Comparison of Pruning Methods for Decision Tree
Induction, Machine Learning 4 (1989) 227--243.
[28] T. M. Mitchell, R. Keller, and S. Kedar­Cabelli, Explanation­Based Generalization:
A Unifying View, Machine Learning 1 (1986) 47--80.
[29] T. M. Mitchell and S. B. Thrun, Explanation­Based Neural Network Learning
for Robot Control, in: Advances in Neural Information Processing Systems,
volume 5, Denver, CO (1992), 287--294.
[30] S. Muggleton, ed, Inductive Logic Programming, Academic Press, San Diego,
CA (1992).
[31] S. Muggleton, R. D. King, and J. E. Sternberg, Protein Secondary Structure
Prediction Using Logic­Based Machine Learning, Protein Engineering 5 (1992)
647--657.
[32] G. L. Murphy and D. L. Medin, The Role of Theories in Conceptual Coherence,
Psychological Review 91 (1985) 289--316.
[33] M. O. Noordewier, G. G. Towell, and J. W. Shavlik, Training Knowledge­Based
Neural Networks to Recognize Genes in DNA Sequences, in: Advances in Neural
Information Processing Systems, volume 3, Denver, CO (1991), 530--536.
[34] M. C. O'Neill, Escherichia Coli Promoters: I. Consensus as it relates to spac­
ing class, specificity, repeat substructure, and three dimensional orgainzation,
Journal of Biological Chemistry 264 (1989) 5522--5530.
[35] D. W. Opitz and J. W. Shavlik, Heuristically Expanding Knowledge­Based Neural
Networks, in: Proceedings of the Thirteenth International Joint Conference on
Artificial Intelligence, Chambery, France (1993).
[36] D. Ourston and R. J. Mooney, Theory refinement combining analytical and
empirical methods, Artificial Intelligence 66 (1994) 273--310.
[37] M. Pazzani and D. Kibler, The Utility of Knowledge in Inductive Learning, Machine
Learning 9 (1992) 57--94.
[38] M. J. Pazzani, The influence of prior knowledge on concept acquisition­ Exper­
imental and computational results, Journal of Experimental Psychology­ Learn­
ing, Memory & Cognition 17 (1991) 416--432.
[39] M. J. Pazzani, C. A. Brunk, and G. Silverstein, A Knowledge­Intensive Approach
to Learning Relational Concepts, in: Inductive Logic Programming, S. Muggleton
(Eds.), 373--394, Academic Press (1992).
[40] F. J. Pineda, Generalization of Back­Propagation to Recurrent Neural Networks,
Physics Review Letters 59 (1987) 2229--2232.
[41] L. Y. Pratt, J. Mostow, and C. A. Kamm, Direct Transfer of Learned Information
Among Neural Networks, in: Proceedings of the Ninth National Conference on
Artificial Intelligence, Anaheim, CA (1991), 584--589.
[42] J. R. Quinlan, Induction of Decision Trees, Machine Learning 1 (1986) 81--106.
[43] J. R. Quinlan, Learning Logical Definitions from Relations, Machine Learning 5
(1990) 239--266.
[44] L. Rendell and H. Cho, Empirical Learning as a Function of Concept Character,
Machine Learning 5 (1990) 267--298.
[45] B. L. Richards and R. J. Mooney, Refinement of first­order horn­clause domain
theories, Machine Learning (to appear).
[46] F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of
Brain Mechanisms, Spartan, New York (1962).
[47] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal Represen­
tations by Error Propagation, in: Parallel Distributed Processing: Explorations
in the microstructure of cognition. Volume 1: Foundations, D. E. Rumelhart and
J. L. McClelland (Eds.), 318--363, MIT Press, Cambridge, MA (1986).
[48] R. Schank, G. C. Collins, and L. E. Hunter, Transcending Inductive Category
Formation in Learning, Behavioral and Brain Sciences 9 (1986) 639--686.
[49] G. M. Scott, J. W. Shavlik, and W. H. Ray, Refining PID Controllers using Neural
Networks, Neural Computation 4 (1992) 746--757.
[50] J. W. Shavlik, R. J. Mooney, and G. G. Towell, Symbolic and Neural Net Learning
Algorithms: An Empirical Comparison, Machine Learning 6 (1991) 111--143.
[51] J. W. Shavlik and G. G. Towell, An Approach to Combining Explanation­Based
and Neural Learning Algorithms, Connection Science 1 (1989) 233--255.
[52] E. H. Shortliffe and B. G. Buchanan, A Model of Inexact Reasoning in Medicine,
in: Rule­Based Expert Systems, B. G. Buchanan and E. H. Shortliffe (Eds.),
233--262, Addison­Wesley, Reading, MA (1984).
[53] G. D. Stormo, Consensus Patterns in DNA, in: Methods in Enzymology, volume
183, 211--221, Academic Press, Orlando, FL (1990).
[54] K. Thompson, P. Langley, and W. Iba, Using Background Knowledge in Con­
cept Formation, in: Proceedings of the Eighth International Machine Learning
Workshop, Evanston, IL (1991), 554--558.
[55] G. G. Towell, Symbolic Knowledge and Neural Networks: Insertion, Refine­
ment, and Extraction, PhD thesis, Computer Sciences Department, University
of Wisconsin, Madison, WI (1991).
[56] G. G. Towell and J. W. Shavlik, Extracting Refined Rules from Knowledge­Based
Neural Networks, Machine Learning 13 (1993) 71--101.
[57] G. G. Towell, J. W. Shavlik, and M. O. Noordewier, Refinement of Approx­
imately Correct Domain Theories by Knowledge­Based Neural Networks, in:
Proceedings of the Eighth National Conference on Artificial Intelligence, Boston,
MA (1990), 861--866.
[58] D. A. Waterman, A Guide to Expert Systems, Addison Wesley, Reading, MA
(1986).
[59] R. L. Watrous, Learning Algorithms for Connectionist Networks: Applied Gradi­
ent Methods of Nonlinear Optimization, in: Proceedings of the First International
Conference on Neural Networks, volume II (1987), 619--627.
[60] J. D. Watson, H. H. Hopkins, J. W. Roberts, J. A. Steitz, and A. M. Weiner, The
Molecular Biology of the Gene, Benjamin­Cummings, Menlo Park, CA (1987).
[61] S. M. Weiss and C. A. Kulikowski, Computer Systems that Learn, Morgan
Kaufmann, San Mateo, CA (1990).
[62] E. J. Wisniewski and D. L. Medin, Is it a Pocket or a Purse? Tightly Coupled
Theory and Data Driven Learning, in: Proceedings of the Eighth International
Machine Learning Workshop, Evanston, IL (1991), 564--569.
[63] D. H. Wolpert, On Overfitting Avoidance as Bias, Technical Report LA­UR­90­
3460, Santa Fe Institute, Santa Fe, NM (1992).