Hybrid Systems: Basic Concepts
Extract from Knowledge-Based
Artificial Neural Networks (KBANN) by G.Towell and J.Shavlik
Introduction
Two extreme teaching approaches to forecast a value or to classify an object
are known in Artificial Intelligence. They are: (1) hand-build
systems (e.g., expert systems [58]) and (2) empirical
learning [42,47]. Hand-build systems correspond to giving a person
a "domain theory" without an extensive set of examples; one could call
this learning by being told. Conversely, empirical learning corresponds
to giving a person lots of examples without any explanation of why the
examples are members of a particular class or why they have particular
target values, e.g., stock prices. In machine learning, a domain theory
[28] is a collection of rules that describes task-specific inferences that
can be drawn from the given facts. For classification problems, a domain
theory can be used to prove whether or not an object is a member of a particular
class. Unfortunately, neither of approaches (1) and (2) is completely satisfactory.
They each suffer from from flaws that preclude them from being a generally
applicable method.
The flaws of each method are, for the most part, complementary (see sections
2.1-2.2). Hence, 'hybrid' system that effectively combines a hand-built
systems with an empirical learning algorithm might be like a student who
is taught using a combination of theoretical information and examples.
Similarly hybrid learning systems (reviewed in sections 2.4 and 6) should
find synergies that make them more effective than either hand-build rules
or empirical learning algorithms used in isolation.
The Need for Hybrid Systems
We further motivate the need of hybrid systems by listing some of the important
weaknesses of hand-built rules and empirical learning systems.
2.1. Hand-build systems.
Hand-build systems are not-learning systems (except insofar as they are
later altered by hand). They simply do what they are told; they do not
learn at the knowledge level [9]. Despite their apparent simplicity, such
systems pose many problems for those that build them.
-
Typically, hand-build systems assume that their domain theory is complete
and correct. However, for most real-world tasks, completeness and correctness
are extremely difficult, if not impossible, to achieve. In fact, in explanation-bases
learning [28] one of the major issues is dealing with incomplete
and incorrect domain theories.
-
Domain theories can be intractable to use [28]. To
make a domain theory as complete and correct as possible, it may be necessary
to write thousands of interactive, possibly recursive rules. Use of such
rule sets may be intolerably slow.
-
Domain theories can be difficult to modify [3]. As
interactions proliferate in a rule set, it becomes difficult to predict
all of the changes resulting from modifying a single rule.
2.2. Empirical learning.
Empirical learning systems inductively generalize specific examples. Thus,
they require little theoretical knowledge about the problem domain; instead
they require a large library of examples. Their almost complete ignorance
of problem-specific theory means that they do not address important aspects
of induction. Some of the most significant problems are:
-
An unbounded number of features can be used to describe any object [48].
Hence, the user's choice of features can make cases appear very similar
or very different.
-
Features relevant to the task are context dependent [48]. For example,
the observation that paper money is flammable may be only relevant when
a bank is on fire.
-
Complex features constructed from the initial features may considerably
simplify learning [44]. However, feature construction is a difficult, error-prone,
enterprise.
-
Even when a large set of examples are available, small sets of exceptions
may be either unrepresented or very poorly represented [16]. As a result,
uncommon
cases may be very difficult to correctly handle.
2.3. Artificial neural networks
Artificial neural networks (ANNs), which form the basis of Knowledge-based
ANN (KBANN), are a particular methods for empirical learning. ANNs have
proven to be equal, or superior, to other empirical learning systems over
a wide range of domains, when evaluated in terms of their generalization
ability [50,2]. However, they have a set of problems unique to their style
of empirical learning. Among these problems are:
-
Training times are lengthy [50].
-
The initial parameters of the network can greatly affect how well concepts
are learned [1].
-
There is not yet a problem-independent way to choose a good network topology,
although there has been considerable research in this directions (e.g.,
[10]).
-
After training, neural network are often very difficult to interpret [56].
2.4. Hybrid (integrated [22]) Learning Systems
There is a significant gap between two extreme approaches: the knowledge-intensive,
learning-by-being-told approach of hand-built systems and the virtually
knowledge-free approach of empirical learning. They are just two ends of
a spectrum along which an intelligent system may operate. Staying at one
end or the other end simplifies the learning problem by allowing strong
assumptions to be made about the nature of what needs to be learned. However,
the middle ground is appealing; it offers the possibility that synergistic
combinations of theory and data will result in powerful learning systems.
There are also psychological evidence that people rarely, if ever, learn
purely from theory or examples [62]. Finally, there is a purely practical
consideration; hybrid systems have proven effective on several real-world
problems [57,33,36,54,19](also see section 5).
Knowledge-based Artificial Neural Networks
KBANN-(Knowledge-based Artificial neural Networks) -the successor
to EBL-ANN alhgorithm [51] -is such a system. The approach taken by KBANN
is outlined in Table 1. Breafly, the idea is to insert a set of hand-constructed,
symbolic rules into a neural network. The network is then refined usingstandard
neural learning algorithms and a set of classified training examples. The
refined networl can then function as a highly-accurate classifier.
I final step for KBANN, the extraction of refined, comprehensible rules
from the training neural network, has been the subject of much effort [56].
The KBANN approach to learning
-
Given:
-
A list of features used to describe examples
-
An approximately-correct domain theory describing the proble to be solved
-
A set of classified training examples
-
Do:
-
Translate the domain theory into a neural network
-
Train the knowledge-based network using the classified examples
-
Use the trained network to classify future examples
-
(Optionally) extract a refined domain theory [56]
A1. Information about features. KBANN is able to handle examples
described using the following five types of teatures that are commonly
used in machne learning;
-
nominal from
among a list (e.g., red, green, blue)
-
Boolean values must
be either true ot false
-
hierarchcal values exist in an ISA hiararchy
-
linear
values are numeric and continious
-
ordered values are
non-numeric but have a total ordering
A7.Information about rules. Rules for use by KBANN must bein
the form of propositional Horn clauses. In
addition, disjuncts may only be expressed as multiple rules with the same
consequent. Finally, the rules may not contain cycles.
A8. Rule types. KBANN recognizes two distinct types of
rules. The first type is "definitional'. Definitional rules are rules that
are not to be changed during the training phase. The second type of rule
is "nondefinitional" or "variable". Variable rules are standard rules;
they are subject to change during training. Therefore, consequents are
connected to antecedents other than those mentioned in the rule, and weights
on the links enthering into the consequent are changed durung neural learning.
A9. Special Antecedents. In addition to simply listing positive
antecedents to form a simple conjunctive rule, KBANN allows the following
special predicates in the antecedents of its rules.
-
Predicate Description
-
Not
Negate an antecedent
-
Greater-than Only for "linear" and "ordered" features
-
Less-than Only for "linear"
and "ordered" features
-
N-true
Allows compact specification of concepts of the form "If N of the
following M antecedents are true THEN..."
The specification of antecedents is recursive. Hence, any of these types
of antecedents may be nested within each other.
5.1. Hybrid systems versus other learning systems
One hypothesis about hybrid learning systems is that they should be
more efficient than a system that learns from data alone. An aspect of
this efficiency hypothesis is that a theory and data learning systems should
require fewer training examples than systems that learn only from data.
The ability to learn from relatively few training examples is important
because training examples are often hard to find or expensive to collect.
Thus, it is important for a learning system to be able to extract useful
generalizations from a small set of examples. {This paper} confirms the
hypothesis that hybrid systems can more efficiently than standard backpropagation
generalize from a small set of examples (roughly one-third the number of
training examples required by backpropagation). On the other hand a hybrid
system with sparse domain knowledge does not improve an empirical learning
system . Also the domain theory may prevent the network from learning a
solution which differs ignificantly from the solution proposed by the theory.
The next conclusion from experiments is that standard backpropagation should
outperform KBANN only when large amount of training data are available.
5.7. Comparing KBANN to standard backpropagation
Two hypotheses may account for KBANN's abilities with respect to backpropagation:
-
Structure is responsible for KBANN's strength. That is, the topology of
a KBANN-net is better suited to learning in a particular domain than a
standard ANN (i.e.., an ANN with a single layer of hidden units and complete
connectivity between layers). Experimental results (this paper) clearly
refute the hypothesis that the strength of KBANN arises solely from the
determination of network topology.
-
Initial weights are responsible for KBANN-net select critical features
of the domain, thereby simplifying learning and reducing problems resulting
from spurious correlations in the training examples. Experimental results
indicate that some, but not all of the improvement of KBANN over standard
backpropagation, results from its identification of important input features.
5.8 Discussion of the sources of KBANN's strength.
Tests (this paper) indicate that alone neither structure nor weight (focusing)
account for the superiority of KBANN-nets over standard ANNs. therefore
the third hypothesis, that is a combination of structure and the focusing
weights that give KBANN its advantage over backpropagation, is likely true.
It is the combination of structure and weight that supplies the derived
features that give KBANN its power.
6.2. Systems based on inductive logic programming. Other hybrid systems
There has been a recent proliferation of work in" inductive logic Programming"
(e.g., [30]) often with the same goal as KBANN; namely, to
refine an existing domain theory using classified examples. As an
example of these systems, we use FOCL [37].
FOCL is an extension of FOIL [43] that allows the inductive learning mechanism
of FOIL to work on user-supplied background knowledge.
Hence, FOCL is able to add and delete relations from existing rules as
well as add entirely new riles to the existing domain theory. Using an
ID3-like information gain heuristic, FOCL
is able to correct arbitrary types of errors in the providing domain theory
when given a set of positive and negative examples of the problem. Tests
on artificial domains show that FOCL, like KBANN, is tolerant of a variety
of errors in the initial domain theory and the training examples [39].
6.3. Systems based on Neural Networks
Many hybrid systems have been developed recently that use neural networks
as their basis. The focus of these systems has varied from reducing the
number of required examples [29], to probabilistic logic [24], to the propagation
of Mycin-like certainty factors [21], to the refinement of systems for
fuzzy logic [5]. Rather than trying to review these diverse systems, we
focus only on two developments.
Fu [13] described a system very similar to KBANN in that it uses symboloc
knowledge in the form of rules to establish the structure and connection
weights in a neural network. However, Fu's system uses non-differentiable
activation functions to handle disjunctions in the rule sets. To toelrate
this non-differentiability, Fu carefully organized his network into layers
in which all of the the units at a given layer are either differentiable
or not. then during training, different laerning mechanisms are applied
to the different types of units. An empirical study in medical diagnosis
showed this technique to be quite effective. However, the method is closely
tied to its particular learning mechanism. Hence, his system is unable
to use developments in the methods of training neural networks (e.g., [59]).
Katz [18] describes a system which, much like KBANN, uses a knowledge base
that has been mapped (apparently by hand) into a neural network. The focus
of Katz's work is on improving the time required
to classify an object rather that generalization.
This speed increase in achieved by building new connections which allow
his system to bypass intermediate procession stages. As a result, the initial
resemblance of the network to the knowledge base is rapidly lost during
learning. Thus, although the network is able to make corrections to its
initial knowledge, it is impossible to interpret those corrections as symbolic
rules which could be used in the construction of future networks.
6.4. Trends in hybrid systems research
There has been a pronounced trend in the development of hybrid systems.
Early systems such as UNIMEN [22] required correct theoretical information.
While subsequent systems eased the correctness requirement, they introduced
new constrains such as strict over-generality of the domain theory [12].
More recent systems as KBANN allow arbitrary imperfection in theories.
On the data side, earlier systems required that data be noise free while
more recent systems allow noisy examples. However, these reductions in
quality requirements for both theory and data are generally accomplished
by the need for more examples.
In addition, a recent trend is the new hybrid systems tend to be built
on the top of an established empirical learning systems. this is the case
for Labyrinth-k, EITHER, FOCL and KBANN. Earlier systems tended to have
specially-designed empirical components which were able to take advantage
of the strong constraints on theory and data that these systems made.
Limitation
KBANN's rule syntax is limited. KBANN can not handle rules with cycles
or variable . A research area is the expansion of KBANN to admit certain
type of cyclic rules [23]. However, it is unlikely
that in the near future KBANN will be able to handle quantified variables
in its rules, as there are currently no appropriate techniques for
binding variables within neural network. Inductive
logic programming systems such as FOIL[43}, FOCL[37] and GOLEM[19] allow
relational rule sets; hence, they avoid the restrictions inherent to KBANN
(p. 34).
BK This limitation of KBANN motivates us to pay more attantion to
development of Inductive Logic Programming approach and related frist-order
logic forecasting rules. Below we desctribe basic mathematical
definitions related to first-order logic.
References
[1] S. Ahmad, A Study of Scaling and Generalization in Neural Networks,
Technical
Report CCSR8813, University of Illinois, Center for Complex
Systems Research
(1988).
[2] L. Atlas, R. Cole, Y. Muthusamy, A. Lippman, J. Connor, D. Park,
M. ElSharkawi,
and R. J. Marks, A peerformance comparison of trained multilayer
perceptrons
and trained classification trees, Proceedings of IEEE 78 (1990) 1614--1619.
[3] J. Bachant and J. McDermott, R1 Revisited: Four Years in the Trenches,
AI
Magazine 5 (1984) 21--32.
[4] E. Barnard and R. A. Cole, A NeuralNet Training Program based
on Conjugate
Gradient Optimization, Technical Report CSE 89014, Oregon Graduate
Insti
tute, Beaverton, OR (1989).
[5] H. R. Berenji, Refinement of Approximate ReasoningBased Controllers
by Rein
forcement Learning, in: Proceedings of the Eighth International Machine
Learn
ing Workshop, Evanston, IL (1991), 475--479.
[6] L. A. Birnbaum and G. C. Collins, eds, Proceedings of the Eighth
International
Machine Learning Workshop, Morgan Kaufmann, Evanston, IL (1991).
[7] W. W. Cohen, Compiling Prior Knowledge into an Explicit Bias, in:
Proceed
ings of the Ninth International Machine Learning Workshop, Aberdeen,
Scotland
(1992), 102--110.
[8] S. Cost and S. Salzberg, A Weighted Nearest Neighbor Algorithm
for Learning
With Symbolic Features, Machine Learning 10 (1993) 57--78.
[9] T. G. Dietterich, Learning at the Knowledge Level, Machine Learning
1 (1986)
287--316.
[10] S. E. Fahlman and C. Lebiere, The CascadeCorrelation Learning
Architecture,
in: Advances in Neural Information Processing Systems, volume 2, Denver,
CO
(1989), 524--532.
[11] D. H. Fisher, Knowledge Acquisition via Incremental Conceptual
Clustering,
Machine Learning 2 (1987) 139--172.
[12] N. S. Flann and T. G. Dietterich, A Study of ExplanationBased
Methods for
Inductive Learning, Machine Learning 4 (1989) 187--226.
[13] L. M. Fu, Integration of Neural Heuristics into KnowledgeBased
Inference,
Connection Science 1 (1989) 325--340.
[14] J. Gennari, P. Langley, and D. Fisher, Models of incremental concept
formation,
Artificial Intelligence 40 (1989) 11--61.
[15] G. E. Hinton, Connectionist Learning Procedures, Artificial Intelligence
40 (1989)
185--234.
[16] R. C. Holte, L. E. Acker, and B. W. Porter, Concept Learning and
the Problem of
Small Disjuncts, in: Proceedings of the Eleventh International Joint
Conference
on Artificial Intelligence, Detroit, MI (1989), 813--819.
[17] IUB Nomenclature Committee, Ambiguity Codes, European Journal
of Bio
chemistry 150 (1985) 1--5.
[18] B. F. Katz, EBL and SBL: A Neural Network Synthesis, in: Proceedings
of the
Eleventh Annual Conference of the Cognitive Science Society, Ann Arbor,
MI
(1989), 683--689.
[19] R. D. King, S. Muggleton, R. A. Lewis, and J. E. Sternberg, Drug
Design
by Machine Learning: The use of Inductive Logic Programming to Model
the
Structure---Activity Relationships of Trimethoprim Analogues Binding
to Dihydro
folate Reductase, Proceeding of the National Academy of Sciences, USA
89
(1992) 11322--11326.
[20] S. C. Kleene, Representation of Events in Nerve Nets and Finite
Automata,
in: Automata Studies, C. E. Shannon and J. McCarthy (Eds.), 3--41,
Princeton
University Press, Princeton, NJ (1956).
[21] R. C. Lacher, S. I. Hruska, and D. C. Kuncicky, Backpropagation
Learning in
Expert Networks, IEEE Transactions on Neural Networks 3 (1992) 62--72.
[22] M. Lebowitz, Integrated Learning: Controlling Explanation, Cognitive
Science
10 (1986) 219--240.
[23] R. Maclin and J. W. Shavlik, Refining Algorithms with KnowledgeBased
Neural
Networks: Improving th ChouFasman Algorithm for Protein Folding,
Machine
Learning 11 (1993) 195--213.
[24] J. J. Mahoney and R. J. Mooney, Combining connectionist and symbolic
learning
to refine certaintyfactor rulebases, Connection Science 5
(1993) 339--364.
[25] W. S. McCulloch and W. A. Pitts, A Logical Calculus of Ideas Immanent
in
Nervous Activity, Bulletin of Mathematical Biophysics 5 (1943) 115--133.
[26] R. S. Michalski and G. Tecuci, eds, Proceedings of the First International
Work
shop on Multistrategy Learning, George Mason University, Harpers Ferry,
W. Va.
(1991).
[27] J. Mingers, An Empirical Comparison of Pruning Methods for Decision
Tree
Induction, Machine Learning 4 (1989) 227--243.
[28] T. M. Mitchell, R. Keller, and S. KedarCabelli, ExplanationBased
Generalization:
A Unifying View, Machine Learning 1 (1986) 47--80.
[29] T. M. Mitchell and S. B. Thrun, ExplanationBased Neural Network
Learning
for Robot Control, in: Advances in Neural Information Processing Systems,
volume 5, Denver, CO (1992), 287--294.
[30] S. Muggleton, ed, Inductive Logic Programming, Academic Press,
San Diego,
CA (1992).
[31] S. Muggleton, R. D. King, and J. E. Sternberg, Protein Secondary
Structure
Prediction Using LogicBased Machine Learning, Protein Engineering
5 (1992)
647--657.
[32] G. L. Murphy and D. L. Medin, The Role of Theories in Conceptual
Coherence,
Psychological Review 91 (1985) 289--316.
[33] M. O. Noordewier, G. G. Towell, and J. W. Shavlik, Training KnowledgeBased
Neural Networks to Recognize Genes in DNA Sequences, in: Advances in
Neural
Information Processing Systems, volume 3, Denver, CO (1991), 530--536.
[34] M. C. O'Neill, Escherichia Coli Promoters: I. Consensus as it
relates to spac
ing class, specificity, repeat substructure, and three dimensional
orgainzation,
Journal of Biological Chemistry 264 (1989) 5522--5530.
[35] D. W. Opitz and J. W. Shavlik, Heuristically Expanding KnowledgeBased
Neural
Networks, in: Proceedings of the Thirteenth International Joint Conference
on
Artificial Intelligence, Chambery, France (1993).
[36] D. Ourston and R. J. Mooney, Theory refinement combining analytical
and
empirical methods, Artificial Intelligence 66 (1994) 273--310.
[37] M. Pazzani and D. Kibler, The Utility of Knowledge in Inductive
Learning, Machine
Learning 9 (1992) 57--94.
[38] M. J. Pazzani, The influence of prior knowledge on concept acquisition
Exper
imental and computational results, Journal of Experimental Psychology
Learn
ing, Memory & Cognition 17 (1991) 416--432.
[39] M. J. Pazzani, C. A. Brunk, and G. Silverstein, A KnowledgeIntensive
Approach
to Learning Relational Concepts, in: Inductive Logic Programming, S.
Muggleton
(Eds.), 373--394, Academic Press (1992).
[40] F. J. Pineda, Generalization of BackPropagation to Recurrent
Neural Networks,
Physics Review Letters 59 (1987) 2229--2232.
[41] L. Y. Pratt, J. Mostow, and C. A. Kamm, Direct Transfer of Learned
Information
Among Neural Networks, in: Proceedings of the Ninth National Conference
on
Artificial Intelligence, Anaheim, CA (1991), 584--589.
[42] J. R. Quinlan, Induction of Decision Trees, Machine Learning 1
(1986) 81--106.
[43] J. R. Quinlan, Learning Logical Definitions from Relations, Machine
Learning 5
(1990) 239--266.
[44] L. Rendell and H. Cho, Empirical Learning as a Function of Concept
Character,
Machine Learning 5 (1990) 267--298.
[45] B. L. Richards and R. J. Mooney, Refinement of firstorder
hornclause domain
theories, Machine Learning (to appear).
[46] F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the
Theory of
Brain Mechanisms, Spartan, New York (1962).
[47] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal
Represen
tations by Error Propagation, in: Parallel Distributed Processing:
Explorations
in the microstructure of cognition. Volume 1: Foundations, D. E. Rumelhart
and
J. L. McClelland (Eds.), 318--363, MIT Press, Cambridge, MA (1986).
[48] R. Schank, G. C. Collins, and L. E. Hunter, Transcending Inductive
Category
Formation in Learning, Behavioral and Brain Sciences 9 (1986) 639--686.
[49] G. M. Scott, J. W. Shavlik, and W. H. Ray, Refining PID Controllers
using Neural
Networks, Neural Computation 4 (1992) 746--757.
[50] J. W. Shavlik, R. J. Mooney, and G. G. Towell, Symbolic and Neural
Net Learning
Algorithms: An Empirical Comparison, Machine Learning 6 (1991) 111--143.
[51] J. W. Shavlik and G. G. Towell, An Approach to Combining ExplanationBased
and Neural Learning Algorithms, Connection Science 1 (1989) 233--255.
[52] E. H. Shortliffe and B. G. Buchanan, A Model of Inexact Reasoning
in Medicine,
in: RuleBased Expert Systems, B. G. Buchanan and E. H. Shortliffe
(Eds.),
233--262, AddisonWesley, Reading, MA (1984).
[53] G. D. Stormo, Consensus Patterns in DNA, in: Methods in Enzymology,
volume
183, 211--221, Academic Press, Orlando, FL (1990).
[54] K. Thompson, P. Langley, and W. Iba, Using Background Knowledge
in Con
cept Formation, in: Proceedings of the Eighth International Machine
Learning
Workshop, Evanston, IL (1991), 554--558.
[55] G. G. Towell, Symbolic Knowledge and Neural Networks: Insertion,
Refine
ment, and Extraction, PhD thesis, Computer Sciences Department, University
of Wisconsin, Madison, WI (1991).
[56] G. G. Towell and J. W. Shavlik, Extracting Refined Rules from
KnowledgeBased
Neural Networks, Machine Learning 13 (1993) 71--101.
[57] G. G. Towell, J. W. Shavlik, and M. O. Noordewier, Refinement
of Approx
imately Correct Domain Theories by KnowledgeBased Neural Networks,
in:
Proceedings of the Eighth National Conference on Artificial Intelligence,
Boston,
MA (1990), 861--866.
[58] D. A. Waterman, A Guide to Expert Systems, Addison Wesley, Reading,
MA
(1986).
[59] R. L. Watrous, Learning Algorithms for Connectionist Networks:
Applied Gradi
ent Methods of Nonlinear Optimization, in: Proceedings of the First
International
Conference on Neural Networks, volume II (1987), 619--627.
[60] J. D. Watson, H. H. Hopkins, J. W. Roberts, J. A. Steitz, and
A. M. Weiner, The
Molecular Biology of the Gene, BenjaminCummings, Menlo Park, CA
(1987).
[61] S. M. Weiss and C. A. Kulikowski, Computer Systems that Learn,
Morgan
Kaufmann, San Mateo, CA (1990).
[62] E. J. Wisniewski and D. L. Medin, Is it a Pocket or a Purse? Tightly
Coupled
Theory and Data Driven Learning, in: Proceedings of the Eighth International
Machine Learning Workshop, Evanston, IL (1991), 564--569.
[63] D. H. Wolpert, On Overfitting Avoidance as Bias, Technical Report
LAUR90
3460, Santa Fe Institute, Santa Fe, NM (1992).