4. Analysis of feedforward loops
Design principles
The C1-FFL with AND logic has an on-delay, but no off-delay.
The C1-FFL with OR logic has an off-delay, but no on-delay.
The C1-FFL with both AND and OR logic can filter out short input impulses.
The I1-FFL with AND logic is a pulse generator and also speeds response compared to an unregulated circuit.
Concept
When multiple factors regulate a single gene, we need to specify the logic of the regulation, usually OR or AND.
[49]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
cmd = "pip install --upgrade colorcet biocircuits watermark"
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
# ------------------------------
import numpy as np
import scipy.integrate
import biocircuits
import biocircuits.apps
import colorcet
colors = colorcet.b_glasbey_category10
import bokeh.io
import bokeh.layouts
import bokeh.models
import bokeh.plotting
# Set to True to have fully interactive plots
interactive_python_plots = False
notebook_url = "localhost:8888"
bokeh.io.output_notebook()
Finding 3-gene motifs in a bacterial transcriptional network
After three chapters, we have developed a toolbox of numerical, analytical, and plotting-based approaches, and applied them to various single-gene and two-gene circuits. In the last chapter, we discussed a toggle switch that generates bistability via mutual repression by two repressor proteins. We will now move on to systems with more components and greater functional complexity, starting with three-component circuits.
There are thirteen different ways in which three transcription factors can regulate one another, as diagrammed below. Which ones should we analyze first? A sensible choice is to again look toward natural gene regulatory networks and ask which of these potential circuits are overrepresented in natural circuits, i.e. which (if any) of these are network motifs, which are, as a reminder, statistically over-represented circuits in natural networks (circuits).
Image adapted fromMilo, et al., 2002.
Milo and coworkers (Milo, et al., 2002) performed just such an analysis. The authors then tabulated the number of times that each of these thirteen regulatory patterns in maps of natural transcriptional circuits. But a question remains— how often would one expect to see any one of these given patterns within a circuit of that size? In order to answer this question, a null model for a network is required. We hinted at this concept in Chapter 2, and here we go into more detail.
Random graphs enable motif identification
Consider a directed graph representing a hypothetical regulatory circuit (left). As before, each numbered node in this network represents an operon or transcription factor, and arrows indicate regulation of the target node by a transcription factor in the source node. To find over-represented patterns in this graph, we can compare it to an ensemble of randomized variants and look for sub-graphs that occur more frequently in the real circuit than in the random variants.
Schematic comparison between a specific circuit (left) and a set of randomized variants with the same incoming and outgoing arrow distributions (right). Image modified fromMilo et al, Science, 2002.
To make the comparison as fair as possible, we need to design variant graphs that maintain the key statistical properties of the original graph. Specifically, we insist that all variant graphs have the same number of nodes and arrows. But this is not really enough: It would not make sense to compare a graph whose arrows are distributed over all the nodes to a variant in which, say, all arrows stem from a single source node, or converge on a single target node. We therefore need to impose a stronger constraint: all variant graphs should maintain the exact distribution of incoming and outgoing arrows for each node.
If you examine the graphs above, you can see that they were constructed to have the same numbered nodes. Furthermore, the number of arrows exiting and entering each numbered node is exactly the same in all graphs. For example, node 4 always has 2 outgoing arrows. Node 6 always has one incoming and two outgoing arrows, node 12 always has 2 incoming arrows, and so on.
To generate the randomized graphs, imagine cutting each arrow in half with a scissors to generate a dangling “+” end connected to an arrowhead and a dangling “–” end connected to a source node. Then imagine tying each “+” end to a randomly chosen “–” end. Voila, we have a new randomized graph, guaranteed through this procedure to maintain the same joint distribution of incoming and outgoing arrows. For more details, see the algorithms in Shen-Orr et al, 2002 and in Newman et al, 2001.
The feedforward loop is overrepresented in natural transcriptional networks
Now that we understand how to formulate a proper null model for motif identification, we return to the analysis by Milo, et al. The authors used a z-score to quantify over- or under-representation in units of the standard deviation of the number of occurrences for the sub-graph in the randomized circuits,
\begin{align} z = \frac{n_{obs}-\langle n\rangle}{\sigma}, \end{align}
where \(n_{obs}\) is the number of times the sub-graph was observed in the actual circuit, \(\langle n\rangle\) is the mean number of times it was observed in randomized circuits, and \(\sigma = \sqrt{\langle (n-\langle n \rangle)^2\rangle}\) is the standard deviation of the number of times it was observed in randomized circuits.
They did this for several transcriptional circuits, including E. coli, yeast (two versions), and B. subtilis. In the plot below (data digitized from Milo, et al., 2004), the four networks are so similar in their z-score profiles that the different organisms overlap almost perfectly.
[50]:
x = np.array([1, 4, 2, 7, 3, 8, 5, 9, 11, 6, 10, 12, 13])
z_ecoli = np.array([-0.5, -0.5, -0.5, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0])
z_yeast_1 = np.array([-0.5, -0.5, -0.5, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0])
z_yeast_2 = np.array([-0.5, -0.5, -0.5, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0])
z_bacillus = np.array([-0.5, -0.5, -0.5, 0, 0.032, 0, 0.5, 0, 0, 0, 0, 0, 0])
p = bokeh.plotting.figure(
frame_width=500,
frame_height=100,
x_axis_label="motif",
y_axis_label="normalized z-score",
x_range=[0, 14],
y_range=[-1.0, 0.7],
tools="save",
)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.ticker = bokeh.models.tickers.FixedTicker(ticks=x)
p.yaxis.ticker = bokeh.models.tickers.FixedTicker(ticks=[-0.5, 0, 0.5])
p.ray(0, 0, color='black', line_width=1)
p.ray(0, 0, angle=-np.pi, color='black', line_width=1)
ecoli = p.circle(x, z_ecoli, fill_alpha=0, size=10)
yeast_1 = p.square(x, z_yeast_1, fill_alpha=0, size=10, color=colors[1])
yeast_2 = p.diamond(x, z_yeast_2, fill_alpha=0, size=12, color=colors[2])
bacillus = p.star(x, z_bacillus, fill_alpha=0, size=10, color=colors[3])
items = [("E. coli", [ecoli]), ("yeast 1", [yeast_1]), ("yeast 2", [yeast_2]), ("B. subtilis", [bacillus])]
legend = bokeh.models.Legend(items=items)
legend.click_policy = "hide"
p.add_layout(legend, 'right')
bokeh.io.show(p)
As you can see, one particular motif, number 5, is overrepresented in all of the transcriptional networks, number 7:
This sub-graph, termed the feed-forward loop (FFL), has one node that regulates a target node two ways: directly, and indirectly through the third node.
Several features are striking from their results:
There is strong over-representation of feed-forward loops (FFLs). In E. coli, one expects to see 7±5 FFLs by chance, but one observes this pattern 42 times in the real circuit.
The over-representation of FFLs is conserved across three distinct organisms, suggesting it is a general property of transcriptional circuits.
Most other sub-graphs are neither over- nor under-represented.
Three sub-graphs are statistically under-represented. They occur significantly less often than one would expect by chance, provoking the question of what problems they might present as components of larger circuits. These three sub-graphs are all sub-graphs of the FFL sub-graph. Thus, it is not that, e.g., divergent regulation (represented by sub-graph 1) is rare; it is just that when it does occur, it does so in the context of the FFL.
There are many kinds of FFLs
Our description of regulatory interactions in the FFL so far has been oversimplified in several ways:
We have not distinguished between positive regulation (activation) and negative regulation (repression).
We have not considered how multiple regulators combine to control expression of a mutual target operon.
We have ignored the quantitative aspects of regulation.
Understanding the biological function of a motif requires thinking about these aspects more carefully.
To address the first point, we can classify the overall FFL motif into \(2^3=8\) different categories depending on which of its 3 arrows are positive or negative:
In this classification, half of the FFL architectures are coherent, meaning that X’s direct regulation of Z and its indirect regulation of Z are of the same type, both activating or both repressing. the other half are incoherent, meaning that the direct and indirect regulatory paths have the opposite sign.
We can further sub-classify the FFLS, according to how the regulatory arrows converging on the third node (now labeled “Z”) combine. Consider first the example where both X and Y activate Z, as in the C1-FFL and I4-FFl. In AND regulation, both X and Y need to be simultaneously present at high levels for Z to be expressed. In OR regulation, either input being at a high level is sufficient to activate Z. We will discuss this logic in more detail momentarily.
Now that we have defined what FFLs are and how we can represent them, we will spend the rest of this chapter (and even some beyond) considering what functions the various FFLs can perform for cells.
The most-encountered FFLs
While FFLs in general are motifs, some FFLS are more often encountered than others. In the figure below, using data taken from a review by Uri Alon, we see relative abundance of the eight different FFLs in E. coli and S. cerevisiae. Two FFLs, C1-FFL and I1-FFL, stand out as having much higher abundance than the other six. We will focus our study on these two in this chapter.
[51]:
# Data based on Alon, Nature Rev. Genet., 2007, https://doi.org/10.1038/nrg2102
species = ["yeast", "E. coli"]
ffls = reversed(["C1 ", "C2 ", "C3 ", "C4 ", "I1 ", "I2 ", "I3 ", "I4 "])
data = {
"species": species,
"E. coli": reversed([0.464, 0.09, 0, 0, 0.374, 0.055, 0.017, 0]),
"yeast": reversed([0.377, 0.035, 0.105, 0.08, 0.28, 0.027, 0.052, 0.044]),
}
x = [(ffl, sp) for ffl in ffls for sp in species]
frac = sum(zip(data["E. coli"], data["yeast"]), ())
source = bokeh.models.ColumnDataSource(data=dict(x=x, frac=frac))
p = bokeh.plotting.figure(
y_range=bokeh.models.FactorRange(*x),
frame_height=450,
title="Relative abundance of FFLs",
)
p.hbar(
y="x",
left=0,
right="frac",
height=0.9,
source=source,
fill_color=bokeh.transform.factor_cmap(
"x", palette=colors, factors=species, start=1, end=2
),
line_color="white",
)
p.x_range.start = 0
p.y_range.range_padding = 0.05
p.yaxis.group_label_orientation = "horizontal"
p.ygrid.grid_line_color = None
bokeh.io.show(p)
Logic of regulation by two transcription factors
Because X and Y both regulate Z in an FFL, we need to specify how they collaborate in the regulation.
For the sake of illustration, let us assume we are discussing C1-FFL, where X activates Z and Y also activates Z. One can imagine a scenario where both X and Y need to be present to turn on Z. For example, they could be binding partners that together serve to recruit polymerase to the promoter. We call this AND logic. In other words, to get expression of Z, we must have X AND Y
. Conversely, if either X or Y may each alone activate Z, we have OR logic. That is, to get expression of Z,
we must have X OR Y
.
So, to fully specify an FFL, we need to also specify the logic, either AND or OR, of how Z is regulated. Including choice of logic gives a total of 16 possible FFLs.
We are now left with the task of figuring out how to mathematically encode AND and OR logic. Before doing so, we note that, as discussed previously, we are using Hill functions, which are phenomenological functions describing how effectors may regulate gene expression capturing both the necessary concentration of effector (\(k\)) and the ultrasensitivity of the regulation (\(n\)). When the molecular details of the regulation mechanics of an effector are known, we may derive the appropriate functions describing gene expression regulation rather than using Hill function. Similarly, for two effectors, we could also derive the functions from the molecular details and discover what kind of logic emerges. See, for example, this 2005 paper by Bintu and coworkers. We often do not know the molecular details, and Hill functions and the two-effector variants thereof we present below are quite useful in analyzing the properties of circuit architectures.
We now proceed to formally write mathematical expressions for the dynamics of a gene product Z under regulatory control of effectors X and Y. The dynamics of the concentration of Z may be written as
\begin{align} \frac{\mathrm{d}z}{\mathrm{d}t} = \beta \,f(x, y) - \gamma z, \end{align}
where the lowercase letters denote the concentrations of the respective species.
Our goal is to specify the dimensionless regulatory function \(f(x, y)\) that encodes how X and Y may together regulate Z. Our approach is to assign a “weight” to each state of a promoter region. With two effectors, X and Y, the promoter region could be unbound, bound with X, bound with Y, or bound with both X and Y. To get the regulatory function, we sum the weights of states that allow polymerase binding and divide by the sum of all weights. This gives the fraction of time that expression of the gene is “on.” For example, if X and Y are both activators and they together have AND logic, we have
\begin{align} f(x, y) = \frac{\text{X and Y bound weight}}{(\text{unbound weight}) + (\text{X bound weight}) + (\text{Y bound weight}) + (\text{X and Y bound weight})} \end{align}
The weights are chosen to give Hill-like functions.
promoter region state |
weight |
dimensionless weight |
---|---|---|
unbound |
\[1\]
|
\[1\]
|
X bound |
\[(x/k_x)^{n_x}\]
|
\[x^{n_x}\]
|
Y bound |
\[(y/k_y)^{n_y}\]
|
\[y^{n_y}\]
|
X and Y bound |
\[(x/k_x)^{n_x}\,(y/k_y)^{n_y}\]
|
\[x^{n_x}\,y^{n_y}\]
|
The dimensionless weights are given by substituting \(x \leftarrow x/k_x\) and \(y \leftarrow y/k_y\). We will use the dimensionless versions of these functions henceforth. We note that the denominator of the regulatory function \(f(x,y)\) is always the same,
\begin{align} 1 + x^{n_x} + y^{n_y} + x^{n_x} y^{n_y} = (1 + x^{n_x})(1 + y^{n_y}). \end{align}
Alternatively, we could have a structure where maximally only one of the two effectors may be bound at a time (for example due to steric reasons), in this case the states and weights are given in the table below.
promoter region state |
weight |
dimensionless weight |
---|---|---|
unbound |
\[1\]
|
\[1\]
|
X bound |
\[(x/k_x)^{n_x}\]
|
\[x^{n_x}\]
|
Y bound |
\[(y/k_y)^{n_y}\]
|
\[y^{n_x}\]
|
In this case, the denominator for all of the regulatory functions is \(1 + x^{n_x} + y^{n_y}\). We will refer to such regulatory functions as corresponding to “single occupancy.”
With this prescription, let us proceed to write the regulatory functions \(f(x, y)\) for various architectures.
Logic with two activators
Let us start first with X and Y, both activating, with AND logic, as seen in the C1-FFL and I4-FFL. To help conceptualize how the logic translates into expression of Z before we get into the mathematical expressions, we can construct a truth table for whether or not Z is on, given the on/off status of X and Y. The truth table is shown below, with a zero entry meaning that the gene is not on and a one entry meaning it is on.
X |
Y |
Z |
---|---|---|
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
We can also construct a truth table for OR logic with X and Y both activating.
X |
Y |
Z |
---|---|---|
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
1 |
Following the above prescription, the dimensionless regulatory functions are
\begin{align} &\text{AND logic: } f(x,y) = \frac{x^{n_x} y^{n_y}}{(1 + x^{n_x})(1 + y^{n_y})},\\[1em] &\text{OR logic: } f(x,y) = \frac{x^{n_x} + y^{n_y} + x^{n_x} y^{n_y}}{(1 + x^{n_x})(1 + y^{n_y})}. \end{align}
If only single-occupancy is allowed, the gene can never be activated with AND logic, and the regulatory function with OR logic is
\begin{align} &\text{OR logic (single occupancy): } f(x,y) = \frac{x^{n_x} + y^{n_y}}{1 + x^{n_x} + y^{n_y}}. \end{align}
We can make plots of these regulatory functions to demonstrate how they represent the respective logic. To accentuate the logic, we will choose very sharp Hill functions \(n_x = n_y = 20\).
[52]:
def xyz_im_plot(x, y, z, x_log, y_log, z_log, title=None, palette="Viridis256"):
"""Display x, y, z data as an image."""
p_log = bokeh.plotting.figure(
frame_height=200,
frame_width=200,
x_range=(x_log.min(), x_log.max()),
y_range=(y_log.min(), y_log.max()),
x_axis_label="x",
y_axis_label="y",
title=title,
toolbar_location=None,
x_axis_type="log",
y_axis_type="log",
)
p_log.image(
image=[z_log],
x=x_log.min(),
y=y_log.min(),
dw=x_log.max() - x_log.min(),
dh=x_log.max() - x_log.min(),
palette=palette,
alpha=0.8,
)
p = bokeh.plotting.figure(
frame_height=200,
frame_width=200,
x_range=(x.min(), x.max()),
y_range=(y.min(), y.max()),
x_axis_label="x",
y_axis_label="y",
title=title,
toolbar_location=None,
)
p.image(
image=[z],
x=x.min(),
y=y.min(),
dw=x.max() - x.min(),
dh=x.max() - x.min(),
palette=palette,
alpha=0.8,
)
p_log.visible = True
p.visible = False
radio_button_group = bokeh.models.RadioButtonGroup(
labels=["log", "linear"], active=0, width=100
)
col = bokeh.layouts.column(
p_log, p, bokeh.layouts.row(bokeh.models.Spacer(width=100), radio_button_group)
)
radio_button_group.js_on_change(
"active",
bokeh.models.CustomJS(
args=dict(p_log=p_log, p=p),
code="""
if (p_log.visible == true) {
p_log.visible = false;
p.visible = true;
}
else {
p_log.visible = true;
p.visible = false;
}
""",
)
)
return col
# Get x and y values for plotting
x_log = np.logspace(-2, 2, 200)
y_log = np.logspace(-2, 2, 200)
x = np.linspace(0, 2, 200)
y = np.linspace(0, 2, 200)
xx, yy = np.meshgrid(x, y)
xx_log, yy_log = np.meshgrid(x_log, y_log)
# Parameters (steep Hill functions)
nx = 20
ny = 20
# Generate plots
p_and = xyz_im_plot(
xx,
yy,
biocircuits.aa_and(xx, yy, nx, ny),
xx_log,
yy_log,
biocircuits.aa_and(xx_log, yy_log, nx, ny),
title="two activators, AND logic",
)
p_or = xyz_im_plot(
xx,
yy,
biocircuits.aa_or(xx, yy, nx, ny),
xx_log,
yy_log,
biocircuits.aa_or(xx_log, yy_log, nx, ny),
title="two activators, OR logic",
)
p_or_single = xyz_im_plot(
xx,
yy,
biocircuits.aa_or_single(xx, yy, nx, ny),
xx_log,
yy_log,
biocircuits.aa_or_single(xx_log, yy_log, nx, ny),
title="two act., OR logic, single occ.",
)
bokeh.io.show(
bokeh.layouts.column(
bokeh.layouts.row(p_and, bokeh.models.Spacer(width=30), p_or),
bokeh.models.Spacer(height=20),
bokeh.layouts.row(bokeh.models.Spacer(width=300), p_or_single),
)
)