--- title: Chemodiversity subtitle: A short overview of this project author: Stefan Dresselhaus license: BSD affiliation: Theoretic Biology Group
Bielefeld University abstract: Attempt to find indications for chemodiversity in the plant secondary metabolism according to the screening hypothesis date: \today papersize: a4 fontsize: 10pt documentclass: scrartcl margin: 0.2 slideNumber: true ... What is chemodiversity? ----------------------- - It was observed, that many plants seem to produce many compounds with no obvious purpose - Using resources to produce such compounds (instead of i.e. growing) should yield a fitness-disadvantage - one expects evolution to eliminate such behavior Question: Why is this behavior observed? -------------------------------- - Are these compounds necessary for some unresearched reason? - unknown environmental effects? - unknown intermediate products for necessary defenses? - speculative diversity because they could be useful after genetic mutations? Screening Hypothesis -------------------- - First suggested by Jones & Firn ([1991](https://doi.org/10.1098/rstb.1991.0077)) - new (random) compounds are rarely biologically active - plants have a higher chance finding an active compound if they diversify - many (inactive) compounds are sustained for a while because they may be precursors to biologically active substances . . . There are indications for and against this hypothesis by [various groups](https://nph.onlinelibrary.wiley.com/doi/full/10.1111/nph.12526#nph12526-bib-0093). -------------------------------------------------------------------------------- Setting up a simulation ======================= >If you wish to make apple pie from scratch, you must first create the universe > - Carl Sagan -------------------------------------------------------------------------------- Defining Chemistry ------------------ - First of all we define the chemistry of our environment, so we know all possible interactions and can manipulate them at will. - We differentiate between **`Substrate`{.haskell}** and **`Products`{.haskell}**: - **`Substrate`{.haskell }** can just be used (i.e. real substrates if the whole metabolism should be simulated, **`PPM`{.haskell}**^[1]^ in our simplified case) - **`Products`{.haskell }** are nodes in our chemistry environment. - In Code: ```haskell data Compound = Substrate Nutrient | Produced Component | GenericCompound Int ``` ::: footer ^[1]^: plants primary metabolism ::: Usage in the current Model -------------------------- - The Model used for evaluation just has one `Substrate`{.haskell}: `PPM`{.haskell} with a fixed Amount to account for effects of sucking primary-metabolism-products out of the primary metabolic cycle - This is used to simulate i.e. worse growth, fertility and other things affecting the fitness of a plant. - We are not using named Compounds, but restrict to generic `Compound 1`{.haskell}, `Compound 2`{.haskell} ... - Not done, but worth exploring: - Take a "real-world" snapshot of Nutrients and Compounds and recreate them - See if the simulation follows the real world Defining a Metabolism --------------------- - We define **`Enzyme`{.haskell}s** as - having a recipe for a chemical reaction - are reversible - may have dependencies on catalysts to be present - may have higher dominance over other enzymes with the same reaction - Input can be `Substrate`{.haskell} and/or `Products`{.haskell} - Outputs can only be `Products`{.haskell} - $\Rightarrow$ This makes them to Edges in a graph combining the chemical compounds Usage in the current Model -------------------------- - `Enzyme`{.haskell}s all - only map `1`{.haskell} input to `1`{.haskell} Output with a production rate of `1`{.haskell} per `Enzyme`{.haskell} (i.e. `-1 Compound 2 -> +1 Compound 5`{.haskell}) - are equally dominant - need no catalysts Defining Predators ------------------ - **`Predator`{.haskell}s** consist of - a list of `Compound`{.haskell}s that can kill them - a fitness impact ($[0..1]$) as the probability of killing the plant - an expected number of attacks per generation - a probability ($[0..1]$) of appearing in a single generation - `Predator`{.haskell} need not necessary be biologically motivated - i.e. rare, nearly devastating attacks (floods, droughts, ...) with realistic probabilities Example Environment ------------------- :::::::::::::: {.columns} ::: {.column width=37%} - The complete environment now consists of - `Compound`{.haskell}s: ![](img/compound_example.png){style="vertical-align:middle"} - `Enzyme`{.haskell}s: ![](img/enzyme_example.png){style="vertical-align:middle"} - `Predator`{.haskell}s: ![](img/predator_example.png){style="vertical-align:middle"} ::: ::: {.column width=63% .fragment} ![Our default test-environment](img/environment.tree.png){width=75%} Additional rules: - Every "subtree" from the marked `PPM`{.haskell} is treated as a separate species (fungi, animals, ...) $\Rightarrow$ Every predator can only be affected by toxins in the same part of the tree - Trees can be automatically generated in a decent manner to search for environmens where specific effects may arise ::: :::::::::::::: ::::: notes ::::: CTRL+Click for zoom! - All starts at PPM (Plant Primary Metabolism) - Red = Toxic - Blue = Predators :::: -------------------------------------------------------------------------------- Plants ------ A **`Plant`{.haskell}** consists of - a **`Genome`{.haskell}**, a simple list of genes - Triple of `(Enzyme, Quantity, Activation)`{.haskell} - without order or locality (i.e. interference of neighboring genes) - `Quantity`{.haskell} is just an optimization (=Int) to group identical `Activation`{.haskell}s - `Activation`{.haskell} is a float $\in [0..1]$ to regulate the activity of the `Enzyme`{.haskell} genetically - an `absorbNutrients`{.haskell}-Function to simulate various effects when absorbing nutrients out of the environment, depending on the environment (i.e. *can* use informations about chemistry, predators, etc.) - Not used in our simulation, as we only have `PPM`{.haskell} as "nutrient" and we take everything given to us. Metabolism simulation --------------------- Creation of compounds from the given resources is an iterative process: - First of all we create a conversion Matrix $\Delta_c$ with corresponding startvector $s_0$. - We now iterate $s_i = (\mathbb{1} + \Delta_c) \cdot s_{i-1}$ for a fixed number of times (currently: $100$) to simulate the metabolism^[2]^. ::: footer ::: ^[2]^: Thats a 'lie', we calculate $(\mathbb{1} + \Delta_c)^{100}$ efficiently via `lapack`-internals ::: - Entries in the matrix come from the `Genome`{.haskell}: an `Enzyme`{.haskell} which converts $i$ to $j$ with quantity $q$ and activity $a$ yield $$\begin{eqnarray*} \Delta_c[i,j] &\mathrel{+}=& q\cdot a,\\ \Delta_c[j,i] &\mathrel{+}=& q\cdot a, \\ \Delta_c[i,i] &\mathrel{-}=& q\cdot a, \\ \Delta_c[j,j] &\mathrel{-}=& q\cdot a \end{eqnarray*}.$$ - This makes the Enzyme-reaction invertible as both ways get treated equally. Metabolism-example ------------------ - Given a simple Metabolism with $1$ nutrient (first row/column) and $2$ Enzymes in sequence, we have given $\Delta_c$ wtih corresponding startvector $s_0$: $$\Delta_c = 0.01 \cdot \begin{pmatrix} -1 & 1 & 0 \\ 1 & -2 & 1 \\ 0 & 1 & -1 \\ \end{pmatrix}, s_0 = \begin{pmatrix}\text{PPM:} & 3 \\ \text{Compound1:} & 0 \\ \text{Compound2:} & 0\end{pmatrix}.$$ - In the simulation this yields us $$s_{100} \approx \begin{pmatrix}\text{PPM:} & 1 \\ \text{Compound1:} & 1 \\ \text{Compound2:} & 1\end{pmatrix},$$ which is the expected outcome for an equilibrium. Assumptions for metabolism simulation ------------------------------------- - All Enzymes are there from the beginning - All Enzyme-reactions are reversible without loss - static conversion-matrix for fast calculations (unsuited, if i.e. enzymes depend on catalysts) - One genetic enzyme corresponds to (infinitely) many real (proportional weaker) enzymes in the plant, which get controlled via the "activation" parameter Fitness ------- - We handle fitness as $\text{survival-probability} \in [0..1]$ and model each detrimental effect as probability which get multiplied together. - To calculate the fitness of an individual we take three distinct effects into consideration: - Static costs of enzymes - Creating enzymes weakens the primary cycle and thus possibly beneficial traits (growth, attraction of beneficial organisms, ...) $$F_s := \text{static_cost_factor} \cdot \sum_i q_i \cdot a_i \quad | \quad (e_i,q_i, a_i) \in \text{Genome}$$ - limits the amount of dormant enzymes - Cost of active enzymes - Cost of using up nutrients $$F_e := \text{active_cost_factor} \cdot \frac{\text{Nutrients used}}{\text{Nutrients available}}$$ - Deterrence of attackers $F_d$ (next slide) Attacker -------- - Predators are modeled after [Svennungsen et al. (2007)](http://doi.org/10.1098/rspb.2007.0456) - Each predator has an expected number of attacks $P_a$, that are poisson-distributed with impact $P_i$. - Plants can defend themselves via - toxins that the predator is affected by with impact-probability $D_t(P_i)$ - herd-immunity via effects like automimicry: $D_{pop} = \mathbb{E}[D_t(P_i)]$ - All this yields the formula: $$F_d := 1 - e^{- (D_{pop} \cdot P_a) (1-D_t(P_i))}$$ - The attacker-model is only valid for many reasonable assumptions - equilibrium population dynamics - equal dense population - which individual to attack is independently chosen - etc. (Details in the paper linked above) Haploid mating -------------- - We hold the population-size fixed at $100$ - Each plant has a reproduction-probability of $$p(\textrm{reproduction}) = \frac{\textrm{plant-fitness}}{\textrm{total fitness in population}}$$ yielding a fitness-weighted distribution from that $100$ new offspring are drawn - in inheritance each gene of the parent goes through different steps (with given default-values)^[3]^ ::::: footer ^[3]^: in case of quantity $q > 1$ the process is repeated $q$ times independently. :::: - **mutation**: with $p_{mut} = 0.01$ another random enzyme is produced, but activation kept - **duplication**: with $p_{dup} = 0.05$ the gene gets duplicated (quantity $+1$) - **deletion**: with $p_{del} = p_{dup}$ the gene get deleted (or quantity $-1$) - **addition**: with $p_{add} = 0.005$ an additional gene producing a random enzyme with activation $0.5$ gets added as mutation from genes we do not track (i.e. primary cycle) - **activation-noise**: activation is changed by $c_{noise} = \pm 0.01$ drawn from a uniform distribution, clamped to $[0..1]$ :::: notes - Default values **not** motivated in any way! - finding out how these values influence is core! :::: -------------------------------------------------------------------------------- Simulations ----------- - Overall question: What parameters are necessary for chemodiversity? - How can we see chemodiversity? - We define an Enzyme $E$ as divers, if the average of this Enzyme in the population stays below $0.5$, so $E_i \in E_{div} \text{iff.} \mathbb{E}[E_i] < 0.5$ - We can then count the number of diverse Enzymes per plant $E_{d,p_i} = |\left\lbrace E_i | E_i \in E_{div}, E_{i,p_i} > 0.5, \right\rbrace|$ - To get an insight into how this behaves we observe several other parameters every generation: - Fitness $\in [0..1]$ - Number of different compounds created - Amount of compounds created - Number of Plants theoretically resistant to predator $i$ (i.e. **can** produce a toxin to defend themselves, albeit not to $100\%$. Simulations (cont.) ------------------- - General setup of the simulation: - All using the example-environment shown before - $27$ different compounds, $1$ Nutrient (simulating the primary metabolism) - $7$ of $27$ compounds are toxic - at least $3$ compounds are needed for total immunity - $4$ predators set to `AlwaysAttack`{.haskell} - Duration of $2000$ generations - $\text{static_enzyme_cost} = 0.02$ - $\text{nutrient_impact} = 0.1$ - Different setups tested: - Behavior of predators (`AlwaysAttack`{.haskell}, `AttackRandom`{.haskell}, `AttackInterval 10`{.haskell}, `AttackInterval 100`{.haskell}) - varying $\text{static_enzyme_cost}$ from $0.0$ to $0.20$ in steps of $0.02$ - effectively limits the amount of maximal enzymes to $\frac{1}{\text{static_enzyme_cost}}$ - varying $\text{nutrient_impact}$ from $0.0$ to $1.0$ in steps of $0.1$ - makes toxins less/more costly to produce -------------------------------------------------------------------------------- Results ======= >It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong. > - Richard P. Feynman -------------------------------------------------------------------------------- Effect of Predator-Behavior onto chemodiversity ---------------------------------------- ![Graph](img/attackRate_E_d_mu_vs_C_mu.png) Effect of static enzyme cost ---------------------------- ![Graph](img/staticCost_Fitness_vs_num_compounds.png) Effect of static enzyme cost (cont.) ------------------------------------ ![Graph](img/staticCost_Fitness_vs_e_d_mu.png) Effect of static enzyme cost (cont.) ------------------------------------ ![Graph](img/staticCost_e_d_mu_vs_num_compounds.png) Effect of nutrient-impact ------------------------- ![Graph](img/nutrientCost_Fitness_vs_num_compounds.png) Effect of nutrient-impact (cont.) --------------------------------- ![Graph](img/nutrientCost_Fitness_vs_e_d_mu.png) Effect of nutrient-impact (cont.) --------------------------------- ![Graph](img/nutrientCost_e_d_mu_vs_num_compounds.png)