Hybrid Systems: Basic Concepts

Extract from Knowledge-Based Artificial Neural Networks (KBANN) by G.Towell and J.Shavlik

Introduction

Two extreme teaching approaches to forecast a value or to classify an object are known in Artificial Intelligence. They are: (1) hand-build systems (e.g., expert systems [58]) and (2) empirical learning [42,47]. Hand-build systems correspond to giving a person a "domain theory" without an extensive set of examples; one could call this learning by being told. Conversely, empirical learning corresponds to giving a person lots of examples without any explanation of why the examples are members of a particular class or why they have particular target values, e.g., stock prices. In machine learning, a domain theory [28] is a collection of rules that describes task-specific inferences that can be drawn from the given facts. For classification problems, a domain theory can be used to prove whether or not an object is a member of a particular class. Unfortunately, neither of approaches (1) and (2) is completely satisfactory. They each suffer from from flaws that preclude them from being a generally applicable method.

The flaws of each method are, for the most part, complementary (see sections 2.1-2.2). Hence, 'hybrid' system that effectively combines a hand-built systems with an empirical learning algorithm might be like a student who is taught using a combination of theoretical information and examples. Similarly hybrid learning systems (reviewed in sections 2.4 and 6) should find synergies that make them more effective than either hand-build rules or empirical learning algorithms used in isolation.

The Need for Hybrid Systems

We further motivate the need of hybrid systems by listing some of the important weaknesses of hand-built rules and empirical learning systems.

2.1. Hand-build systems.

Hand-build systems are not-learning systems (except insofar as they are later altered by hand). They simply do what they are told; they do not learn at the knowledge level [9]. Despite their apparent simplicity, such systems pose many problems for those that build them.

Typically, hand-build systems assume that their domain theory is complete and correct. However, for most real-world tasks, completeness and correctness are extremely difficult, if not impossible, to achieve. In fact, in explanation-bases learning [28] one of the major issues is dealing with incomplete and incorrect domain theories.
Domain theories can be intractable to use [28]. To make a domain theory as complete and correct as possible, it may be necessary to write thousands of interactive, possibly recursive rules. Use of such rule sets may be intolerably slow.
Domain theories can be difficult to modify [3]. As interactions proliferate in a rule set, it becomes difficult to predict all of the changes resulting from modifying a single rule.

2.2. Empirical learning.

Empirical learning systems inductively generalize specific examples. Thus, they require little theoretical knowledge about the problem domain; instead they require a large library of examples. Their almost complete ignorance of problem-specific theory means that they do not address important aspects of induction. Some of the most significant problems are:

An unbounded number of features can be used to describe any object [48]. Hence, the user's choice of features can make cases appear very similar or very different.
Features relevant to the task are context dependent [48]. For example, the observation that paper money is flammable may be only relevant when a bank is on fire.
Complex features constructed from the initial features may considerably simplify learning [44]. However, feature construction is a difficult, error-prone, enterprise.
Even when a large set of examples are available, small sets of exceptions may be either unrepresented or very poorly represented [16]. As a result, uncommon cases may be very difficult to correctly handle.

2.3. Artificial neural networks

Artificial neural networks (ANNs), which form the basis of Knowledge-based ANN (KBANN), are a particular methods for empirical learning. ANNs have proven to be equal, or superior, to other empirical learning systems over a wide range of domains, when evaluated in terms of their generalization ability [50,2]. However, they have a set of problems unique to their style of empirical learning. Among these problems are:

Training times are lengthy [50].
The initial parameters of the network can greatly affect how well concepts are learned [1].
There is not yet a problem-independent way to choose a good network topology, although there has been considerable research in this directions (e.g., [10]).
After training, neural network are often very difficult to interpret [56].

2.4. Hybrid (integrated [22]) Learning Systems

There is a significant gap between two extreme approaches: the knowledge-intensive, learning-by-being-told approach of hand-built systems and the virtually knowledge-free approach of empirical learning. They are just two ends of a spectrum along which an intelligent system may operate. Staying at one end or the other end simplifies the learning problem by allowing strong assumptions to be made about the nature of what needs to be learned. However, the middle ground is appealing; it offers the possibility that synergistic combinations of theory and data will result in powerful learning systems.

There are also psychological evidence that people rarely, if ever, learn purely from theory or examples [62]. Finally, there is a purely practical consideration; hybrid systems have proven effective on several real-world problems [57,33,36,54,19](also see section 5).

Knowledge-based Artificial Neural Networks

KBANN-(Knowledge-based Artificial neural Networks) -the successor to EBL-ANN alhgorithm [51] -is such a system. The approach taken by KBANN is outlined in Table 1. Breafly, the idea is to insert a set of hand-constructed, symbolic rules into a neural network. The network is then refined usingstandard neural learning algorithms and a set of classified training examples. The refined networl can then function as a highly-accurate classifier. I final step for KBANN, the extraction of refined, comprehensible rules from the training neural network, has been the subject of much effort [56].

The KBANN approach to learning

Given:

A list of features used to describe examples
An approximately-correct domain theory describing the proble to be solved
A set of classified training examples

Translate the domain theory into a neural network
Train the knowledge-based network using the classified examples
Use the trained network to classify future examples
(Optionally) extract a refined domain theory [56]

A1. Information about features. KBANN is able to handle examples described using the following five types of teatures that are commonly used in machne learning;

nominal from among a list (e.g., red, green, blue)
Boolean values must be either true ot false
hierarchcal values exist in an ISA hiararchy
linear values are numeric and continious
ordered values are non-numeric but have a total ordering

A7.Information about rules. Rules for use by KBANN must bein the form of propositional Horn clauses. In addition, disjuncts may only be expressed as multiple rules with the same consequent. Finally, the rules may not contain cycles.

A8. Rule types. KBANN recognizes two distinct types of rules. The first type is "definitional'. Definitional rules are rules that are not to be changed during the training phase. The second type of rule is "nondefinitional" or "variable". Variable rules are standard rules; they are subject to change during training. Therefore, consequents are connected to antecedents other than those mentioned in the rule, and weights on the links enthering into the consequent are changed durung neural learning.

A9. Special Antecedents. In addition to simply listing positive antecedents to form a simple conjunctive rule, KBANN allows the following special predicates in the antecedents of its rules.

Predicate Description
Not Negate an antecedent
Greater-than Only for "linear" and "ordered" features
Less-than Only for "linear" and "ordered" features
N-true Allows compact specification of concepts of the form "If N of the following M antecedents are true THEN..."

The specification of antecedents is recursive. Hence, any of these types of antecedents may be nested within each other.

5.1. Hybrid systems versus other learning systems

One hypothesis about hybrid learning systems is that they should be more efficient than a system that learns from data alone. An aspect of this efficiency hypothesis is that a theory and data learning systems should require fewer training examples than systems that learn only from data. The ability to learn from relatively few training examples is important because training examples are often hard to find or expensive to collect. Thus, it is important for a learning system to be able to extract useful generalizations from a small set of examples. {This paper} confirms the hypothesis that hybrid systems can more efficiently than standard backpropagation generalize from a small set of examples (roughly one-third the number of training examples required by backpropagation). On the other hand a hybrid system with sparse domain knowledge does not improve an empirical learning system . Also the domain theory may prevent the network from learning a solution which differs ignificantly from the solution proposed by the theory. The next conclusion from experiments is that standard backpropagation should outperform KBANN only when large amount of training data are available.

5.7. Comparing KBANN to standard backpropagation

Two hypotheses may account for KBANN's abilities with respect to backpropagation:

Structure is responsible for KBANN's strength. That is, the topology of a KBANN-net is better suited to learning in a particular domain than a standard ANN (i.e.., an ANN with a single layer of hidden units and complete connectivity between layers). Experimental results (this paper) clearly refute the hypothesis that the strength of KBANN arises solely from the determination of network topology.
Initial weights are responsible for KBANN-net select critical features of the domain, thereby simplifying learning and reducing problems resulting from spurious correlations in the training examples. Experimental results indicate that some, but not all of the improvement of KBANN over standard backpropagation, results from its identification of important input features.

5.8 Discussion of the sources of KBANN's strength.

Tests (this paper) indicate that alone neither structure nor weight (focusing) account for the superiority of KBANN-nets over standard ANNs. therefore the third hypothesis, that is a combination of structure and the focusing weights that give KBANN its advantage over backpropagation, is likely true. It is the combination of structure and weight that supplies the derived features that give KBANN its power.

6.2. Systems based on inductive logic programming. Other hybrid systems

There has been a recent proliferation of work in" inductive logic Programming" (e.g., [30]) often with the same goal as KBANN; namely, to refine an existing domain theory using classified examples. As an example of these systems, we use FOCL [37].

FOCL is an extension of FOIL [43] that allows the inductive learning mechanism of FOIL to work on user-supplied background knowledge. Hence, FOCL is able to add and delete relations from existing rules as well as add entirely new riles to the existing domain theory. Using an ID3-like information gain heuristic, FOCL is able to correct arbitrary types of errors in the providing domain theory when given a set of positive and negative examples of the problem. Tests on artificial domains show that FOCL, like KBANN, is tolerant of a variety of errors in the initial domain theory and the training examples [39].

6.3. Systems based on Neural Networks

Many hybrid systems have been developed recently that use neural networks as their basis. The focus of these systems has varied from reducing the number of required examples [29], to probabilistic logic [24], to the propagation of Mycin-like certainty factors [21], to the refinement of systems for fuzzy logic [5]. Rather than trying to review these diverse systems, we focus only on two developments.

Fu [13] described a system very similar to KBANN in that it uses symboloc knowledge in the form of rules to establish the structure and connection weights in a neural network. However, Fu's system uses non-differentiable activation functions to handle disjunctions in the rule sets. To toelrate this non-differentiability, Fu carefully organized his network into layers in which all of the the units at a given layer are either differentiable or not. then during training, different laerning mechanisms are applied to the different types of units. An empirical study in medical diagnosis showed this technique to be quite effective. However, the method is closely tied to its particular learning mechanism. Hence, his system is unable to use developments in the methods of training neural networks (e.g., [59]).

Katz [18] describes a system which, much like KBANN, uses a knowledge base that has been mapped (apparently by hand) into a neural network. The focus of Katz's work is on improving the time required to classify an object rather that generalization.

This speed increase in achieved by building new connections which allow his system to bypass intermediate procession stages. As a result, the initial resemblance of the network to the knowledge base is rapidly lost during learning. Thus, although the network is able to make corrections to its initial knowledge, it is impossible to interpret those corrections as symbolic rules which could be used in the construction of future networks.

6.4. Trends in hybrid systems research

There has been a pronounced trend in the development of hybrid systems. Early systems such as UNIMEN [22] required correct theoretical information. While subsequent systems eased the correctness requirement, they introduced new constrains such as strict over-generality of the domain theory [12]. More recent systems as KBANN allow arbitrary imperfection in theories. On the data side, earlier systems required that data be noise free while more recent systems allow noisy examples. However, these reductions in quality requirements for both theory and data are generally accomplished by the need for more examples.

In addition, a recent trend is the new hybrid systems tend to be built on the top of an established empirical learning systems. this is the case for Labyrinth-k, EITHER, FOCL and KBANN. Earlier systems tended to have specially-designed empirical components which were able to take advantage of the strong constraints on theory and data that these systems made.

Limitation

KBANN's rule syntax is limited. KBANN can not handle rules with cycles or variable . A research area is the expansion of KBANN to admit certain type of cyclic rules [23]. However, it is unlikely that in the near future KBANN will be able to handle quantified variables in its rules, as there are currently no appropriate techniques for binding variables within neural network. Inductive logic programming systems such as FOIL[43}, FOCL[37] and GOLEM[19] allow relational rule sets; hence, they avoid the restrictions inherent to KBANN (p. 34).

BK This limitation of KBANN motivates us to pay more attantion to development of Inductive Logic Programming approach and related frist-order logic forecasting rules. Below we desctribe basic mathematical definitions related to first-order logic.

References

[1] S. Ahmad, A Study of Scaling and Generalization in Neural Networks, Technical
Report CCSR8813, University of Illinois, Center for Complex Systems Research
(1988).
[2] L. Atlas, R. Cole, Y. Muthusamy, A. Lippman, J. Connor, D. Park, M. ElSharkawi,
and R. J. Marks, A peerformance comparison of trained multilayer perceptrons
and trained classification trees, Proceedings of IEEE 78 (1990) 1614--1619.
[3] J. Bachant and J. McDermott, R1 Revisited: Four Years in the Trenches, AI
Magazine 5 (1984) 21--32.
[4] E. Barnard and R. A. Cole, A NeuralNet Training Program based on Conjugate
Gradient Optimization, Technical Report CSE 89014, Oregon Graduate Insti
tute, Beaverton, OR (1989).
[5] H. R. Berenji, Refinement of Approximate ReasoningBased Controllers by Rein
forcement Learning, in: Proceedings of the Eighth International Machine Learn
ing Workshop, Evanston, IL (1991), 475--479.
[6] L. A. Birnbaum and G. C. Collins, eds, Proceedings of the Eighth International
Machine Learning Workshop, Morgan Kaufmann, Evanston, IL (1991).
[7] W. W. Cohen, Compiling Prior Knowledge into an Explicit Bias, in: Proceed
ings of the Ninth International Machine Learning Workshop, Aberdeen, Scotland
(1992), 102--110.
[8] S. Cost and S. Salzberg, A Weighted Nearest Neighbor Algorithm for Learning
With Symbolic Features, Machine Learning 10 (1993) 57--78.
[9] T. G. Dietterich, Learning at the Knowledge Level, Machine Learning 1 (1986)
287--316.
[10] S. E. Fahlman and C. Lebiere, The CascadeCorrelation Learning Architecture,
in: Advances in Neural Information Processing Systems, volume 2, Denver, CO
(1989), 524--532.
[11] D. H. Fisher, Knowledge Acquisition via Incremental Conceptual Clustering,
Machine Learning 2 (1987) 139--172.
[12] N. S. Flann and T. G. Dietterich, A Study of ExplanationBased Methods for
Inductive Learning, Machine Learning 4 (1989) 187--226.
[13] L. M. Fu, Integration of Neural Heuristics into KnowledgeBased Inference,
Connection Science 1 (1989) 325--340.
[14] J. Gennari, P. Langley, and D. Fisher, Models of incremental concept formation,
Artificial Intelligence 40 (1989) 11--61.
[15] G. E. Hinton, Connectionist Learning Procedures, Artificial Intelligence 40 (1989)
185--234.
[16] R. C. Holte, L. E. Acker, and B. W. Porter, Concept Learning and the Problem of
Small Disjuncts, in: Proceedings of the Eleventh International Joint Conference
on Artificial Intelligence, Detroit, MI (1989), 813--819.
[17] IUB Nomenclature Committee, Ambiguity Codes, European Journal of Bio
chemistry 150 (1985) 1--5.
[18] B. F. Katz, EBL and SBL: A Neural Network Synthesis, in: Proceedings of the
Eleventh Annual Conference of the Cognitive Science Society, Ann Arbor, MI
(1989), 683--689.
[19] R. D. King, S. Muggleton, R. A. Lewis, and J. E. Sternberg, Drug Design
by Machine Learning: The use of Inductive Logic Programming to Model the
Structure---Activity Relationships of Trimethoprim Analogues Binding to Dihydro
folate Reductase, Proceeding of the National Academy of Sciences, USA 89
(1992) 11322--11326.
[20] S. C. Kleene, Representation of Events in Nerve Nets and Finite Automata,
in: Automata Studies, C. E. Shannon and J. McCarthy (Eds.), 3--41, Princeton
University Press, Princeton, NJ (1956).
[21] R. C. Lacher, S. I. Hruska, and D. C. Kuncicky, Backpropagation Learning in
Expert Networks, IEEE Transactions on Neural Networks 3 (1992) 62--72.
[22] M. Lebowitz, Integrated Learning: Controlling Explanation, Cognitive Science
10 (1986) 219--240.
[23] R. Maclin and J. W. Shavlik, Refining Algorithms with KnowledgeBased Neural
Networks: Improving th ChouFasman Algorithm for Protein Folding, Machine
Learning 11 (1993) 195--213.
[24] J. J. Mahoney and R. J. Mooney, Combining connectionist and symbolic learning
to refine certaintyfactor rulebases, Connection Science 5 (1993) 339--364.
[25] W. S. McCulloch and W. A. Pitts, A Logical Calculus of Ideas Immanent in
Nervous Activity, Bulletin of Mathematical Biophysics 5 (1943) 115--133.
[26] R. S. Michalski and G. Tecuci, eds, Proceedings of the First International Work
shop on Multistrategy Learning, George Mason University, Harpers Ferry, W. Va.
(1991).
[27] J. Mingers, An Empirical Comparison of Pruning Methods for Decision Tree
Induction, Machine Learning 4 (1989) 227--243.
[28] T. M. Mitchell, R. Keller, and S. KedarCabelli, ExplanationBased Generalization:
A Unifying View, Machine Learning 1 (1986) 47--80.
[29] T. M. Mitchell and S. B. Thrun, ExplanationBased Neural Network Learning
for Robot Control, in: Advances in Neural Information Processing Systems,
volume 5, Denver, CO (1992), 287--294.
[30] S. Muggleton, ed, Inductive Logic Programming, Academic Press, San Diego,
CA (1992).
[31] S. Muggleton, R. D. King, and J. E. Sternberg, Protein Secondary Structure
Prediction Using LogicBased Machine Learning, Protein Engineering 5 (1992)
647--657.
[32] G. L. Murphy and D. L. Medin, The Role of Theories in Conceptual Coherence,
Psychological Review 91 (1985) 289--316.
[33] M. O. Noordewier, G. G. Towell, and J. W. Shavlik, Training KnowledgeBased
Neural Networks to Recognize Genes in DNA Sequences, in: Advances in Neural
Information Processing Systems, volume 3, Denver, CO (1991), 530--536.
[34] M. C. O'Neill, Escherichia Coli Promoters: I. Consensus as it relates to spac
ing class, specificity, repeat substructure, and three dimensional orgainzation,
Journal of Biological Chemistry 264 (1989) 5522--5530.
[35] D. W. Opitz and J. W. Shavlik, Heuristically Expanding KnowledgeBased Neural
Networks, in: Proceedings of the Thirteenth International Joint Conference on
Artificial Intelligence, Chambery, France (1993).
[36] D. Ourston and R. J. Mooney, Theory refinement combining analytical and
empirical methods, Artificial Intelligence 66 (1994) 273--310.
[37] M. Pazzani and D. Kibler, The Utility of Knowledge in Inductive Learning, Machine
Learning 9 (1992) 57--94.
[38] M. J. Pazzani, The influence of prior knowledge on concept acquisition Exper
imental and computational results, Journal of Experimental Psychology Learn
ing, Memory & Cognition 17 (1991) 416--432.
[39] M. J. Pazzani, C. A. Brunk, and G. Silverstein, A KnowledgeIntensive Approach
to Learning Relational Concepts, in: Inductive Logic Programming, S. Muggleton
(Eds.), 373--394, Academic Press (1992).
[40] F. J. Pineda, Generalization of BackPropagation to Recurrent Neural Networks,
Physics Review Letters 59 (1987) 2229--2232.
[41] L. Y. Pratt, J. Mostow, and C. A. Kamm, Direct Transfer of Learned Information
Among Neural Networks, in: Proceedings of the Ninth National Conference on
Artificial Intelligence, Anaheim, CA (1991), 584--589.
[42] J. R. Quinlan, Induction of Decision Trees, Machine Learning 1 (1986) 81--106.
[43] J. R. Quinlan, Learning Logical Definitions from Relations, Machine Learning 5
(1990) 239--266.
[44] L. Rendell and H. Cho, Empirical Learning as a Function of Concept Character,
Machine Learning 5 (1990) 267--298.
[45] B. L. Richards and R. J. Mooney, Refinement of firstorder hornclause domain
theories, Machine Learning (to appear).
[46] F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of
Brain Mechanisms, Spartan, New York (1962).
[47] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal Represen
tations by Error Propagation, in: Parallel Distributed Processing: Explorations
in the microstructure of cognition. Volume 1: Foundations, D. E. Rumelhart and
J. L. McClelland (Eds.), 318--363, MIT Press, Cambridge, MA (1986).
[48] R. Schank, G. C. Collins, and L. E. Hunter, Transcending Inductive Category
Formation in Learning, Behavioral and Brain Sciences 9 (1986) 639--686.
[49] G. M. Scott, J. W. Shavlik, and W. H. Ray, Refining PID Controllers using Neural
Networks, Neural Computation 4 (1992) 746--757.
[50] J. W. Shavlik, R. J. Mooney, and G. G. Towell, Symbolic and Neural Net Learning
Algorithms: An Empirical Comparison, Machine Learning 6 (1991) 111--143.
[51] J. W. Shavlik and G. G. Towell, An Approach to Combining ExplanationBased
and Neural Learning Algorithms, Connection Science 1 (1989) 233--255.
[52] E. H. Shortliffe and B. G. Buchanan, A Model of Inexact Reasoning in Medicine,
in: RuleBased Expert Systems, B. G. Buchanan and E. H. Shortliffe (Eds.),
233--262, AddisonWesley, Reading, MA (1984).
[53] G. D. Stormo, Consensus Patterns in DNA, in: Methods in Enzymology, volume
183, 211--221, Academic Press, Orlando, FL (1990).
[54] K. Thompson, P. Langley, and W. Iba, Using Background Knowledge in Con
cept Formation, in: Proceedings of the Eighth International Machine Learning
Workshop, Evanston, IL (1991), 554--558.
[55] G. G. Towell, Symbolic Knowledge and Neural Networks: Insertion, Refine
ment, and Extraction, PhD thesis, Computer Sciences Department, University
of Wisconsin, Madison, WI (1991).
[56] G. G. Towell and J. W. Shavlik, Extracting Refined Rules from KnowledgeBased
Neural Networks, Machine Learning 13 (1993) 71--101.
[57] G. G. Towell, J. W. Shavlik, and M. O. Noordewier, Refinement of Approx
imately Correct Domain Theories by KnowledgeBased Neural Networks, in:
Proceedings of the Eighth National Conference on Artificial Intelligence, Boston,
MA (1990), 861--866.
[58] D. A. Waterman, A Guide to Expert Systems, Addison Wesley, Reading, MA
(1986).
[59] R. L. Watrous, Learning Algorithms for Connectionist Networks: Applied Gradi
ent Methods of Nonlinear Optimization, in: Proceedings of the First International
Conference on Neural Networks, volume II (1987), 619--627.
[60] J. D. Watson, H. H. Hopkins, J. W. Roberts, J. A. Steitz, and A. M. Weiner, The
Molecular Biology of the Gene, BenjaminCummings, Menlo Park, CA (1987).
[61] S. M. Weiss and C. A. Kulikowski, Computer Systems that Learn, Morgan
Kaufmann, San Mateo, CA (1990).
[62] E. J. Wisniewski and D. L. Medin, Is it a Pocket or a Purse? Tightly Coupled
Theory and Data Driven Learning, in: Proceedings of the Eighth International
Machine Learning Workshop, Evanston, IL (1991), 564--569.
[63] D. H. Wolpert, On Overfitting Avoidance as Bias, Technical Report LAUR90
3460, Santa Fe Institute, Santa Fe, NM (1992).