Functional Linguistics of Biological Sequences

To this point we have dealt formally only with the structural nature of nucleic acids, which is amenable to linguistic formulation because of its relative simplicity; we will find that a functional or informational view of the language of biological sequences is less clear cut. This in no way weakens the results presented to this point. The closure properties derived for operations on nucleic acids, for example, apply to any language encoded in DNA or in any other string for which those operations are defined.

Rather, the greater richness of the language of genes and proteins indicates all the more the need for a well-founded descriptive paradigm. Moreover, it will be seen that the most interesting aspects of biological languages may reside at the point where structural and functional components interact.

A functional view will also allow us to expand our horizons beyond the relatively local phenomena of secondary structure, to large regions of the genome or even entire genomes (represented formally, perhaps, as strings derived by concatenation of chromosomes).

This will allow us in turn to reason linguistically about processes of evolution, at least at a conceptual level. Partial Order of Superposition It may be supposed that this distinction between structural and functional linguistics corresponds to the conventional one drawn between syntax and semantics.

There is much to recommend this, insofar as gene products (i.e. proteins) and their biological activities may be thought of as the meaning of the information in genes, and perhaps entire organisms as the meaning of genomes. On the other hand, the gene grammars presented earlier clearly demonstrate a syntactic nature, and as such grammars are further elaborated with function-specific “motifs” it may be difficult to make a sharp delineation between syntax and semantics.

Ultimately, the semantics of DNA may be based on evolutionary selection; a certain view of syntax may allow sequences that do not support life (or not very well), just as syntactically valid English sentences may nevertheless be nonsensical. The discussion that follows will not attempt to resolve where such a line should be drawn, though the potential utility of the distinction should perhaps be borne in mind.

