chemodiversity/vortrag/vortrag.md
2018-06-27 13:02:02 +02:00

408 lines
14 KiB
Markdown

---
title: Chemodiversity
subtitle: A short overview of this project
author: Stefan Dresselhaus
license: BSD
affiliation: Theoretic Biology Group<br>
Bielefeld University
abstract: Attempt to find indications for chemodiversity in the plant secondary metabolism according to the screening hypothesis
date: \today
papersize: a4
fontsize: 10pt
documentclass: scrartcl
margin: 0.2
slideNumber: true
...
What is chemodiversity?
-----------------------
- It was observed, that many plants seem to produce many compounds with no
obvious purpose
- Using resources to produce such compounds (instead of i.e. growing) should
yield a fitness-disadvantage
- one expects evolution to eliminate such behavior
Question: Why is this behavior observed?
--------------------------------
- Are these compounds necessary for some unresearched reason?
- unknown environmental effects?
- unknown intermediate products for necessary defenses?
- speculative diversity because they could be useful after genetic mutations?
Screening Hypothesis
--------------------
- First suggested by Jones & Firn ([1991](https://doi.org/10.1098/rstb.1991.0077))
- new (random) compounds are rarely biologically active
- plants have a higher chance finding an active compound if they diversify
- many (inactive) compounds are sustained for a while because they may be
precursors to biologically active substances
. . .
There are indications for and against this hypothesis by [various groups](https://nph.onlinelibrary.wiley.com/doi/full/10.1111/nph.12526#nph12526-bib-0093).
--------------------------------------------------------------------------------
Setting up a simulation
=======================
>If you wish to make apple pie from scratch, you must first create the universe
> - Carl Sagan
--------------------------------------------------------------------------------
Defining Chemistry
------------------
- First of all we define the chemistry of our environment, so we know all possible
interactions and can manipulate them at will.
- We differentiate between **`Substrate`{.haskell}** and
**`Products`{.haskell}**:
- **`Substrate`{.haskell
}** can just be used (i.e. real substrates if the whole metabolism
should be simulated, **`PPM`{.haskell}**^[1]^ in our simplified case)
- **`Products`{.haskell
}** are nodes in our chemistry environment.
- In Code:
```haskell
data Compound = Substrate Nutrient
| Produced Component
| GenericCompound Int
```
::: footer
^[1]^: plants primary metabolism
:::
Usage in the current Model
--------------------------
- The Model used for evaluation just has one `Substrate`{.haskell}:
`PPM`{.haskell} with a fixed Amount to account for effects of sucking
primary-metabolism-products out of the primary metabolic cycle
- This is used to simulate i.e. worse growth, fertility and other things
affecting the fitness of a plant.
- We are not using named Compounds, but restrict to generic `Compound
1`{.haskell}, `Compound 2`{.haskell} ...
- Not done, but worth exploring:
- Take a "real-world" snapshot of Nutrients and Compounds and recreate them
- See if the simulation follows the real world
Defining a Metabolism
---------------------
- We define **`Enzyme`{.haskell}s** as
- having a recipe for a chemical reaction
- are reversible
- may have dependencies on catalysts to be present
- may have higher dominance over other enzymes with the same reaction
- Input can be `Substrate`{.haskell} and/or `Products`{.haskell}
- Outputs can only be `Products`{.haskell}
- $\Rightarrow$ This makes them to Edges in a graph combining the chemical
compounds
Usage in the current Model
--------------------------
- `Enzyme`{.haskell}s all
- only map `1`{.haskell} input to `1`{.haskell} Output with a production rate of `1`{.haskell} per `Enzyme`{.haskell}
(i.e. `-1 Compound 2 -> +1 Compound 5`{.haskell})
- are equally dominant
- need no catalysts
Defining Predators
------------------
- **`Predator`{.haskell}s** consist of
- a list of `Compound`{.haskell}s that can kill them
- a fitness impact ($[0..1]$) as the probability of killing the plant
- an expected number of attacks per generation
- a probability ($[0..1]$) of appearing in a single generation
- `Predator`{.haskell} need not necessary be biologically motivated
- i.e. rare, nearly devastating attacks (floods, droughts, ...) with realistic
probabilities
Example Environment
-------------------
:::::::::::::: {.columns}
::: {.column width=37%}
- The complete environment now consists of
- `Compound`{.haskell}s:
![](img/compound_example.png){style="vertical-align:middle"}
- `Enzyme`{.haskell}s:
![](img/enzyme_example.png){style="vertical-align:middle"}
- `Predator`{.haskell}s:
![](img/predator_example.png){style="vertical-align:middle"}
:::
::: {.column width=63% .fragment}
![Our default test-environment](img/environment.tree.png){width=75%}
Additional rules:
- Every "subtree" from the marked `PPM`{.haskell} is treated as a separate
species (fungi, animals, ...)
$\Rightarrow$ Every predator can only be affected by toxins in the same part of the tree
- Trees can be automatically generated in a decent manner to search for
environmens where specific effects may arise
:::
::::::::::::::
::::: notes :::::
CTRL+Click for zoom!
- All starts at PPM (Plant Primary Metabolism)
- Red = Toxic
- Blue = Predators
::::
--------------------------------------------------------------------------------
Plants
------
A **`Plant`{.haskell}** consists of
- a **`Genome`{.haskell}**, a simple list of genes
- Triple of `(Enzyme, Quantity, Activation)`{.haskell}
- without order or locality (i.e. interference of neighboring genes)
- `Quantity`{.haskell} is just an optimization (=Int) to group identical
`Activation`{.haskell}s
- `Activation`{.haskell} is a float $\in [0..1]$ to regulate the activity of
the `Enzyme`{.haskell} genetically
- an `absorbNutrients`{.haskell}-Function to simulate various effects when
absorbing nutrients out of the environment, depending on the environment (i.e.
*can* use informations about chemistry, predators, etc.)
- Not used in our simulation, as we only have `PPM`{.haskell} as "nutrient"
and we take everything given to us.
Metabolism simulation
---------------------
Creation of compounds from the given resources is an iterative process:
- First of all we create a conversion Matrix $\Delta_c$ with corresponding
startvector $s_0$.
- We now iterate $s_i = (\mathbb{1} + \Delta_c) \cdot s_{i-1}$ for a fixed number of times
(currently: $100$) to simulate the metabolism^[2]^.
::: footer :::
^[2]^: Thats a 'lie', we calculate $(\mathbb{1} + \Delta_c)^{100}$ efficiently via
`lapack`-internals
:::
- Entries in the matrix come from the `Genome`{.haskell}: an `Enzyme`{.haskell} which
converts $i$ to $j$ with quantity $q$ and activity $a$ yield
$$\begin{eqnarray*}
\Delta_c[i,j] &\mathrel{+}=& q\cdot a,\\
\Delta_c[j,i] &\mathrel{+}=& q\cdot a, \\
\Delta_c[i,i] &\mathrel{-}=& q\cdot a, \\
\Delta_c[j,j] &\mathrel{-}=& q\cdot a
\end{eqnarray*}.$$
- This makes the Enzyme-reaction invertible as both ways get treated equally.
Metabolism-example
------------------
- Given a simple Metabolism with $1$ nutrient (first row/column) and $2$ Enzymes
in sequence, we have given $\Delta_c$ wtih corresponding startvector $s_0$:
$$\Delta_c = 0.01 \cdot \begin{pmatrix}
-1 & 1 & 0 \\
1 & -2 & 1 \\
0 & 1 & -1 \\
\end{pmatrix}, s_0 = \begin{pmatrix}\text{PPM:} & 3 \\ \text{Compound1:} & 0 \\ \text{Compound2:} & 0\end{pmatrix}.$$
- In the simulation this yields us
$$s_{100} \approx \begin{pmatrix}\text{PPM:} & 1 \\ \text{Compound1:} & 1 \\ \text{Compound2:} & 1\end{pmatrix},$$
which is the expected outcome for an equilibrium.
Assumptions for metabolism simulation
-------------------------------------
- All Enzymes are there from the beginning
- All Enzyme-reactions are reversible without loss
- static conversion-matrix for fast calculations (unsuited, if i.e. enzymes
depend on catalysts)
- One genetic enzyme corresponds to (infinitely) many real (proportional weaker)
enzymes in the plant, which get controlled via the "activation" parameter
Fitness
-------
- We handle fitness as $\text{survival-probability} \in [0..1]$ and model each
detrimental effect as probability which get multiplied together.
- To calculate the fitness of an individual we take three distinct effects into
consideration:
- Static costs of enzymes
- Creating enzymes weakens the primary cycle and thus possibly beneficial
traits (growth, attraction of beneficial organisms, ...)
$$F_s := \text{static_cost_factor} \cdot \sum_i q_i \cdot a_i \quad | \quad (e_i,q_i, a_i) \in \text{Genome}$$
- limits the amount of dormant enzymes
- Cost of active enzymes
- Cost of using up nutrients
$$F_e := \text{active_cost_factor} \cdot \frac{\text{Nutrients used}}{\text{Nutrients available}}$$
- Deterrence of attackers $F_d$ (next slide)
Attacker
--------
- Predators are modeled after [Svennungsen et al. (2007)](http://doi.org/10.1098/rspb.2007.0456)
- Each predator has an expected number of attacks $P_a$, that are
poisson-distributed with impact $P_i$.
- Plants can defend themselves via
- toxins that the predator is affected by with impact-probability $D_t(P_i)$
- herd-immunity via effects like automimicry: $D_{pop} = \mathbb{E}[D_t(P_i)]$
- All this yields the formula:
$$F_d := 1 - e^{- (D_{pop} \cdot P_a) (1-D_t(P_i))}$$
- The attacker-model is only valid for many reasonable assumptions
- equilibrium population dynamics
- equal dense population
- which individual to attack is independently chosen
- etc. (Details in the paper linked above)
Haploid mating
--------------
- We hold the population-size fixed at $100$
- Each plant has a reproduction-probability of
$$p(\textrm{reproduction}) = \frac{\textrm{plant-fitness}}{\textrm{total fitness in population}}$$
yielding a fitness-weighted distribution from that $100$ new offspring are
drawn
- in inheritance each gene of the parent goes through different steps (with
given default-values)^[3]^
::::: footer
^[3]^: in case of quantity $q > 1$ the process is repeated $q$ times
independently.
::::
- **mutation**: with $p_{mut} = 0.01$ another random enzyme is produced, but
activation kept
- **duplication**: with $p_{dup} = 0.05$ the gene gets duplicated (quantity $+1$)
- **deletion**: with $p_{del} = p_{dup}$ the gene get deleted (or quantity $-1$)
- **addition**: with $p_{add} = 0.005$ an additional gene producing a random
enzyme with activation $0.5$ gets added as mutation from genes we do not
track (i.e. primary cycle)
- **activation-noise**: activation is changed by $c_{noise} = \pm 0.01$ drawn from
a uniform distribution, clamped to $[0..1]$
:::: notes
- Default values **not** motivated in any way!
- finding out how these values influence is core!
::::
--------------------------------------------------------------------------------
Simulations
-----------
- Overall question: What parameters are necessary for chemodiversity?
- How can we see chemodiversity?
- We define an Enzyme $E$ as divers, if the average of this Enzyme in the
population stays below $0.5$, so $E_i \in E_{div} \text{iff.} \mathbb{E}[E_i] < 0.5$
- We can then count the number of diverse Enzymes per plant $E_{d,p_i} =
|\left\lbrace E_i | E_i \in E_{div}, E_{i,p_i} > 0.5, \right\rbrace|$
- To get an insight into how this behaves we observe several other parameters
every generation:
- Fitness $\in [0..1]$
- Number of different compounds created
- Amount of compounds created
- Number of Plants theoretically resistant to predator $i$ (i.e. **can** produce
a toxin to defend themselves, albeit not to $100\%$.
Simulations (cont.)
-------------------
- General setup of the simulation:
- All using the example-environment shown before
- $27$ different compounds, $1$ Nutrient (simulating the primary metabolism)
- $7$ of $27$ compounds are toxic
- at least $3$ compounds are needed for total immunity
- $4$ predators set to `AlwaysAttack`{.haskell}
- Duration of $2000$ generations
- $\text{static_enzyme_cost} = 0.02$
- $\text{nutrient_impact} = 0.1$
- Different setups tested:
- Behavior of predators (`AlwaysAttack`{.haskell}, `AttackRandom`{.haskell}, `AttackInterval 10`{.haskell}, `AttackInterval 100`{.haskell})
- varying $\text{static_enzyme_cost}$ from $0.0$ to $0.20$ in steps of $0.02$
- effectively limits the amount of maximal enzymes to $\frac{1}{\text{static_enzyme_cost}}$
- varying $\text{nutrient_impact}$ from $0.0$ to $1.0$ in steps of $0.1$
- makes toxins less/more costly to produce
--------------------------------------------------------------------------------
Results
=======
>It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong.
> - Richard P. Feynman
--------------------------------------------------------------------------------
Effect of Predator-Behavior onto chemodiversity
----------------------------------------
![Graph](img/attackRate_E_d_mu_vs_C_mu.png)
Effect of static enzyme cost
----------------------------
![Graph](img/staticCost_Fitness_vs_num_compounds.png)
Effect of static enzyme cost (cont.)
------------------------------------
![Graph](img/staticCost_Fitness_vs_e_d_mu.png)
Effect of static enzyme cost (cont.)
------------------------------------
![Graph](img/staticCost_e_d_mu_vs_num_compounds.png)
Effect of nutrient-impact
-------------------------
![Graph](img/nutrientCost_Fitness_vs_num_compounds.png)
Effect of nutrient-impact (cont.)
---------------------------------
![Graph](img/nutrientCost_Fitness_vs_e_d_mu.png)
Effect of nutrient-impact (cont.)
---------------------------------
![Graph](img/nutrientCost_e_d_mu_vs_num_compounds.png)