Document Type
Article
Department or Administrative Unit
Computer Science
Publication Date
1-9-2022
Abstract
The information bottleneck principle was recently proposed as a theory meant to explain some of the training dynamics of deep neural architectures. Via information plane analysis, patterns start to emerge in this framework, where two phases can be distinguished: fitting and compression. We take a step further and study the behaviour of the spatial entropy characterizing the layers of convolutional neural networks (CNNs), in relation to the information bottleneck theory. We observe pattern formations which resemble the information bottleneck fitting and compression phases. From the perspective of semiotics, also known as the study of signs and sign-using behavior, the saliency maps of CNN’s layers exhibit aggregations: signs are aggregated into supersigns and this process is called semiotic superization. Superization can be characterized by a decrease of entropy and interpreted as information concentration. We discuss the information bottleneck principle from the perspective of semiotic superization and discover very interesting analogies related to the informational adaptation of the model. In a practical application, we introduce a modification of the CNN training process: we progressively freeze the layers with small entropy variation of their saliency map representation. Such layers can be stopped earlier from training without a significant impact on the performance (the accuracy) of the network, connecting the entropy evolution through time with the training dynamics of a network.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Recommended Citation
Musat, B., & Andonie, R. (2022). Information Bottleneck in Deep Learning - A Semiotic Approach. International Journal of Computers Communications & Control, 17(1). https://doi.org/10.15837/ijccc.2022.1.4650
Journal
International Journal of Computers Communications & Control
Rights
Copyright © 2022 by the authors.
Comments
This article was originally published Open Access in International Journal of Computers Communications & Control. The full-text article from the publisher can be found here.