Natural Products

Natural products are chemical compounds and substances produced by living organisms, and are omnipresent in our everyday lives. Many natural products, called secondary metabolites, are not essential for normal growth of the organism that makes them, but assist in predator defence, interspecies competition, or reproduction. These naturally occurring compounds represent an important source for the development of new therapeutic agents (1), but their discovery is a time consuming and labor intensive process. Activity-guided screening methods and culture-based approaches often lead to high rates of known compound isolation, and yield few new metabolites.

Traditional methods have shown that nature can provide many surprising new compounds, yet even well known organisms can harbor cryptic gene clusters for which the chemical products have yet to be discovered (2). Increasing numbers of fully sequenced genomes, along with improved understanding of the genetics and enzymology of secondary metabolism, present an opportunity to enhance natural product discovery through systematic bioinformatic data analysis.

Penicillin core
Well known examples of natural products include caffeine and penicillin.

Polyketides and Nonribosomal Peptides

Two major classes of bacterial secondary metabolites are polyketides and nonribosomal peptides. These compounds are often biosynthesized by large, multifunctional enzymes that sequentially construct products in an assembly line process from small carboxylic acid and amino acid building blocks (3).

Non-ribosomal peptide synthetases (NRPS) are large multi-enzyme complexes. They have a modular structure, with each module being responsible for the activation, thiolation, modification and condensation of one specific amino acid. Each module consists of a number of domains two of which are commonly referred to as A (adenylation) and C (condensation).The A domain activates a specific amino acid (analogous to a t-RNA) and transfers it to the PCP (peptidyl carrier protein) which holds on to the growing peptide as a thioester. The C domain forms a peptide bond between the next amino acyl and the peptidyl unit. Modifying domains for epimerisation, heterocyclisation or oxidation could be additionally integrated (4).

Polyketides can be assembled by modular polyketide synthases, which are functionally related to NRPS. Contrary to peptides and their amino acid building blocks, polyketides are assembled from acyl units. Selection of the monomers is performed by acyltransferase- (AT) domains. Domains responsible for the elongation step are termed ketosynthase- (KS) domains (5).

KS and C domains Domains of PKS and NRPS
KS ketosynthase
AT acyltransferase
ACP acyl carrier protein

Phylogeny of KS and C domains

Domain-specific phylogenetic analyses of the different modules has shown the elongation domains in NRPS and PKS assembly lines, called C and KS domains, respectively, to be the most informative in terms of predicting pathway associations. Protein sequences for majority of these domains are grouped in pathway-specific clades according to their chemical activity (6,7).

The phylogeny of KS domains delineates the major classes of polyketide synthases, according to the architecture of their biosynthesis enzymes. Type I PKS possess a multidomain architecture, that consist of either sets of modules corresponding to the number of acyl units in the product, or a single set of catalytic domains that act iteratively. Type II PKS carry each catalytic site on a separate protein (8).

The phylogeny of NRPS C domains clearly reflects the six functional categories that are known. An LCL domain catalyzes a peptide bond between two L-amino acids, a DCL domain links an L-amino acid to a growing peptide ending with a D-amino acid, and heterocyclization domains catalyze both peptide bond formation and subsequent cyclization of cysteine, serine or threonine residues. Three other subtypes of C domains are starter domains, epimerization domains and dual epimerization/condensation domains (6).

KS Domain Subgroup Clades

KS Type I

Type I PKSs contain multiple different catalytic centers located on a single polypeptide chain, organized into different domains and modules. They can act iteratively or modular and are divided into different subgroups.

Cis-AT modular PKSs posess a multidomain architecture consisting of multiple sets of modules. Each module is responsible for the incorporation of one builiding block and contains at least three domains KS, AT and ACP(8).

Enediynes are a family of biologically active natural products. The enediyne core consists of two acetylenic groups conjugated to double bond or incipient double bond within a nine- or ten-membered ring (11).

Trans-AT modular PKS operons lack cognate AT domains; this activity is provided instead by a discrete protein encoded in trans.

Hybrids are biosynthetic assembly lines that include both PKS and NRPS components. KS groups found in hybrid assembly lines usually clade in different groups, with the exception KS domains situated downstream of a PCP domain, which catalyze a condensation reaction between an amino acid and an acyl precursor. These KS domains are designated as hybrids, and form a monophyletic group.

Iterative type I PKS domains contain the characteristic domains of type I PKSs. However, they act iterativley and reuse domains in a cyclic fashion.

KS1 domains are monophyletic group, containing a variety of KS domains that are all present in the first module of assembly lines. Typical starter KS (KSQ) domains belong to this group, as well as KS domains in curacin and salinisporamide biosynthesis pathways, which incorporate unusual precursors, such as in the biosynthesis of salinisporamide.

Polyunsaturated fatty acids (PUFAs) are long chain fatty acids containing more than one double bond, including omega-3-and omega-6- fatty acids. PUFAs are not considered as polyketides. However, like the fatty acid synthases, they are related to PKSs and therefore included in this tree.

KS Type II

Type II KS groups carry each catalytic site on a distinct protein. They contain certain subgroups: KS-alpha (KSa) domains perform the condensation steps, while KS-beta (KSb) domains, sometimes known as Chain length factor, determines the number of iterative condensation steps that occur. KSs involved in spore pigment form a biosynthesis specific subgroup of KS-alpha (SP).(10).

Fatty acid synthases

It has been shown that fatty acids and PKS derived secondary metabolites share a common evolutionary history. (8). For this reason, HMM models created from polyketide synthases often detect KS domains involved in fatty acid synthesis as well. Fatty acid synthase genes FabB and FabD KSs from E.coli have been included in our KS reference tree to show this relationship.


C Domains


The first module of a Non-ribosomal peptide synthase (NRPS) usually does not contain a C domain. Instead, these starter domains acylate the first amino acid with a fatty acid, polyketide or other molecules.


LCL domains catalyze formation of a peptide bond between two L-amino acids.


DCL domains link an L-amino acid to a growing peptide ending with a D-amino acid.


Cyclization domains catalyze both peptide bond formation and subsequent cyclization of cysteine, serine or threonine residues.


Epimerization domains change the chirality of the last amino acid in the chain from L- to D- amino acid.


Dual domains catalyze condensation and epimerization

Modified amino acid

Modified amino acid domains (modAA) appear to be involved in the modification of the incorporated amino acid, for example the dehydration of serine to dehydroalanine.


Hybrid C domains are each located downstream of an aminotransferase domain and appear to be involved in the condensation of an amino acid to an aminated polyketide resulting in a hybrid PKS/NRPS secondary metabolite.



  1. Li JW, Vederas JC (2009) Drug discovery and natural products: end of an era or an endless frontier? Science 325, 161-165.
  2. Zerikly M, Challis GL (2009) Strategies for the discovery of new natural products by genome mining. Chembiochem 10, 625-633.
  3. Fischbach MA, Walsh CT (2006) Assembly-line enzymology for polyketide and nonribo- somal Peptide antibiotics: logic, machinery, and mechanisms. Chem Rev 106, 3468-3496.
  4. Schwarzer D, Finking R, Marahiel MA (2003) Nonribosomal peptides: from genes to products. Nat Prod Rep 20, 275-287.
  5. Hertweck, C. (2009) The biosynthetic logic of polyketide diversity. Angew Chem Int Ed Engl 48, 4688-716.
  6. Rausch, C., I. Hoof, T. Weber, W. Wohlleben, and D. H. Huson (2007) Phylogenetic analysis of condensation domains in NRPS sheds light on their functional evolution. BMC Evol Biol 7:78.
  7. Jenke-Kodama H, Dittmann E (2009) Evolution of metabolic diversity: Insights from microbial polyketide synthases. Phytochemistry.
  8. Jenke-Kodama, H., A. Sandmann, R. Mueller, and E. Dittmann. 2005. Evolutionary implications of bacterial polyketide synthases. Mol. Biol. Evol. 22:2027-2039.
  9. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press (1998)
  10. Metsa-Ketela, M. et al. (2002). Molecular evolution of aromatic polyketides and comparative sequence analysis of polyketide ketosynthase and 16S ribosomal DNA genes from various streptomyces species. AEM 68, 4472-4479
  11. Shen, B. (2003) Polyketide Biosynthesis beyond the Type I, II, and III Polyketide Synthase Paradigms. Curr. Opinion Chem. Biol.,7:285-295.