Feedforward Neural Networks With Constrained Weights

Altaf H. Khan
Department of Engineering, University of Warwick, Coventry, CV4 7AL, England

August 1996

Please email your comments or questions about this thesis to altaf@altafkhan.com

The complete thesis (218 pages) is available here as a pdf file (5MB)

PhD Thesis Abstract

The conventional multilayer feedforward network having continuous-weights is expensive to implement in digital hardware. Two new types of networks are proposed which lend themselves to cost-effective implementations in hardware and have a fast forward-pass capability. These two differ from the conventional model in having extra constraints on their weights: the first allows its weights to take integer values in the range [-3, 3] only, whereas the second restricts its synapses to the set {-1,0,1} while allowing unrestricted offsets. The benefits of the first configuration are in having weights which are only 3-bits deep and a multiplication operation requiring a maximum of one shift, one add, and one sign-change instruction. The advantages of the second are in having 1-bit synapses and a multiplication operation which consists of a single sign-change instruction.

The procedure proposed for training these networks starts like the conventional error backpropagation procedure, but becomes more and more discretised in its behaviour as the network gets closer to an error minimum. Mainly based on steepest descent, it also has a perturbation mechanism to avoid getting trapped in local minima, and a novel mechanism for rounding off `near integers'. It incorporates weight elimination implicitly, which simplifies the choice of the start-up network configuration for training.

It is shown that the integer-weight network, although lacking the universal approximation capability, can implement learning tasks, especially classification tasks, to acceptable accuracies. A new theoretical result is presented which shows that the multiplier-free network is a universal approximator over the space of continuous functions of one variable. In light of experimental results it is conjectured that the same is true for functions of many variables.

Decision and error surfaces are used to explore the discrete-weight approximation of continuous-weight networks using discretisation schemes other than integer weights. The results suggest that provided a suitable discretisation interval is chosen, a discrete-weight network can be found which performs as well as a continuous-weight networks, but that it may require more hidden neurons than its conventional counterpart.

Experiments are performed to compare the generalisation performances of the new networks with that of the conventional one using three very different benchmarks: the MONK's benchmark, a set of artificial tasks designed to compare the capabilities of learning algorithms, the `onset of diabetes mellitus' prediction data set, a realistic set with very noisy attributes, and finally the handwritten numeral recognition database, a realistic but very structured data set. The results indicate that the new networks, despite having strong constraints on their weights, have generalisation performances similar to that of their conventional counterparts.