Gene synthesis refers to the technology of artificially synthesizing double-stranded DNA molecules in vitro through reverse transcription of mRNA of a known template gene. Intelligent synthetic gene design is critical to gene engineering and the efficient production of recombinant protein through heterologous hosts.

Unfortunately, not all genes can be successfully and effectively expressed in heterologous expression systems. The intrinsic sequence characteristics of genes including stability, codon bias, GC content, and mRNA secondary structure play unexpected roles in regulating translation. The genetic code consists of 64 different tri-nucleotide codons which correspond to only 20 amino acids. This degeneracy allows multiple synonymous codons to encode the same protein. Codon optimization, described as altering codons within the gene to improve recombinant protein expression, is an important part of efficient synthetic gene design.

The Origins of Codon Usage Bias

Codon bias arises from the observed uneven usage of codons across different organisms. In Escherichia coli and Saccharomyces cerevisiae (yeast), certain synonymous codons are optimal and preferred to match the most abundant tRNAs in the cell or bind to those tRNAs with best binding strength. The preferred codons might tend to be read by abundant tRNA molecules while low-usage codons might tend to be read by scarce tRNA. The reason why some highly expressed genes possess preferentially selected codons is still unknown. One conventional perspective is optimal codons would be translated faster than rare codons, enhancing the efficiency of translation. Another alternative assumption is that using preferred codons may increase translation accuracy.

The Functional Impact of Codon Optimization

Codon usage bias is correlated with gene expression levels. In heterologous protein expression, the gene of interest can be overexpressed. Their products can take up to 30% of the cell’s total proteins. The attempt to generate more protein by changing codon assignments led to broad use of codon-optimized mRNAs. Originally, codons within the gene were altered by replacing rare codons with synonymous counterparts, which were more preferable and more frequently used by hosts. It was found that optimized codons led to an increase in corresponding protein expression in both plants and mammalian cells. Surprisingly, expression of viral proteins has also been found to decrease after substitution of synonymous codons or adjacent codons. The many unanswered questions related to codon optimization may have profound significance in exploring novel methods of vaccine design.

Strategies of Codon Optimization

A variety of approaches and programs can design and produce various codon-optimized mRNA sequences. The quantification of codon usage as well as the completion of codon changes must be considered. Synthetic codon optimization tends to substitute rare codons with synonymous counterparts used at a higher frequency. Another variation referred to as codon harmonization alters codons within gene sequences to correlate with the codon usage bias of the host organism.

Admittedly, for protein expression, optimizing codon usage alone is not sufficient to perfect the design of synthetic genes. Many other factors can potentially interfere; for example, mRNA secondary structure can affect gene transcription. Additionally, cryptic splice sites, polyadenylation signals, and other regulatory elements ought to be avoided, as they can lead to undesirable processing of mRNA. GC content has a direct impact on the binding stability and annealing temperature of DNA sequences. Translation initiation and termination efficiency also influence protein output and solubility. Only by taking all of above factors into consideration can gene synthesis codon optimization achieve maximum value.