In today's world, Grammar-based code is a constantly talked about topic that affects people of all ages and in all parts of the world. Its impact does not go unnoticed and its relevance is undeniable in various aspects of daily life. Both on a personal and professional level, Grammar-based code has generated debate, has been the subject of study and has aroused the interest of numerous experts. Throughout history, Grammar-based code has evolved and adapted to social, political and technological changes, significantly influencing the way we face the challenges of the present and the future. In this article, we will explore in depth the impact of Grammar-based code and analyze its influence in different contexts, with the aim of better understanding its importance and the implications it has for today's society.
Lossless data compression algorithm
Grammar-based codes or Grammar-based compression are compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence , a grammar-based code transforms into a context-free grammar .
The problem of finding a smallest grammar for an input sequence (smallest grammar problem) is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints.
Generally, the produced grammar is further compressed by statistical encoders like arithmetic coding.
Examples and characteristics
The class of grammar-based codes is very broad. It includes block codes, the multilevel pattern matching (MPM) algorithm, variations of the incremental parsing Lempel-Ziv code, and many other new universal lossless compression algorithms.
Grammar-based codes are universal in the sense that they can achieve asymptotically the entropy rate of any stationary, ergodic source with a finite alphabet.
Practical algorithms
The compression programs of the following are available from external links.
Sequitur is a classical grammar compression algorithm that sequentially translates an input text into a CFG, and then the produced CFG is encoded by an arithmetic coder.
Re-Pair is a greedy algorithm using the strategy of most-frequent-first substitution. The compressive performance is powerful, although the main memory space requirement is very large.
GLZA, which constructs a grammar that may be reducible, i.e., contain repeats, where the entropy-coding cost of "spelling out" the repeats is less than the cost creating and entropy-coding a rule to capture them. (In general, the compression-optimal SLG is not irreducible, and the Smallest Grammar Problem is different from the actual SLG compression problem.)
^Kieffer, J. C.; Yang, E.-H. (2000), "Grammar-based codes: A new class of universal lossless source codes", IEEE Trans. Inf. Theory, 46 (3): 737–754, doi:10.1109/18.841160
^Charikar, M.; Lehman, E.; Liu, D.; Panigrahy, R.; Prabharakan, M.; Sahai, A.; Shelat, A. (2005), "The Smallest Grammar Problem", IEEE Trans. Inf. Theory, 51 (7): 2554–2576, doi:10.1109/tit.2005.850116, S2CID6900082
^Nevill-Manning, C. G.; Witten, I. H. (1997), "Identifying Hierarchical Structure in Sequences: A linear-time algorithm", Journal of Artificial Intelligence Research, 7 (4): 67–82, arXiv:cs/9709102, doi:10.1613/jair.374, hdl:10289/1186, S2CID2957960
^Conrad, Kennon J.; Wilson, Paul R. (2016). "Grammatical Ziv-Lempel Compression: Achieving PPM-Class Text Compression Ratios with LZ-Class Decompression Speed". 2016 Data Compression Conference (DCC). p. 586. doi:10.1109/DCC.2016.119. ISBN978-1-5090-1853-6. S2CID3116024.