4. Analysis of feedforward loops
Design principles
The C1FFL with AND logic has an ondelay, but no offdelay.
The C1FFL with OR logic has an offdelay, but no ondelay.
The C1FFL with both AND and OR logic can filter out short input impulses.
The I1FFL with AND logic is a pulse generator and also speeds response compared to an unregulated circuit.
Concept
When multiple factors regulate a single gene, we need to specify the logic of the regulation, usually OR or AND.
[49]:
# Colab setup 
import os, sys, subprocess
if "google.colab" in sys.modules:
cmd = "pip install upgrade colorcet biocircuits watermark"
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
# 
import numpy as np
import scipy.integrate
import biocircuits
import biocircuits.apps
import colorcet
colors = colorcet.b_glasbey_category10
import bokeh.io
import bokeh.layouts
import bokeh.models
import bokeh.plotting
# Set to True to have fully interactive plots
interactive_python_plots = False
notebook_url = "localhost:8888"
bokeh.io.output_notebook()
Finding 3gene motifs in a bacterial transcriptional network
After three chapters, we have developed a toolbox of numerical, analytical, and plottingbased approaches, and applied them to various singlegene and twogene circuits. In the last chapter, we discussed a toggle switch that generates bistability via mutual repression by two repressor proteins. We will now move on to systems with more components and greater functional complexity, starting with threecomponent circuits.
There are thirteen different ways in which three transcription factors can regulate one another, as diagrammed below. Which ones should we analyze first? A sensible choice is to again look toward natural gene regulatory networks and ask which of these potential circuits are overrepresented in natural circuits, i.e. which (if any) of these are network motifs, which are, as a reminder, statistically overrepresented circuits in natural networks (circuits).
Image adapted fromMilo, et al., 2002.
Milo and coworkers (Milo, et al., 2002) performed just such an analysis. The authors then tabulated the number of times that each of these thirteen regulatory patterns in maps of natural transcriptional circuits. But a question remains— how often would one expect to see any one of these given patterns within a circuit of that size? In order to answer this question, a null model for a network is required. We hinted at this concept in Chapter 2, and here we go into more detail.
Random graphs enable motif identification
Consider a directed graph representing a hypothetical regulatory circuit (left). As before, each numbered node in this network represents an operon or transcription factor, and arrows indicate regulation of the target node by a transcription factor in the source node. To find overrepresented patterns in this graph, we can compare it to an ensemble of randomized variants and look for subgraphs that occur more frequently in the real circuit than in the random variants.
Schematic comparison between a specific circuit (left) and a set of randomized variants with the same incoming and outgoing arrow distributions (right). Image modified fromMilo et al, Science, 2002.
To make the comparison as fair as possible, we need to design variant graphs that maintain the key statistical properties of the original graph. Specifically, we insist that all variant graphs have the same number of nodes and arrows. But this is not really enough: It would not make sense to compare a graph whose arrows are distributed over all the nodes to a variant in which, say, all arrows stem from a single source node, or converge on a single target node. We therefore need to impose a stronger constraint: all variant graphs should maintain the exact distribution of incoming and outgoing arrows for each node.
If you examine the graphs above, you can see that they were constructed to have the same numbered nodes. Furthermore, the number of arrows exiting and entering each numbered node is exactly the same in all graphs. For example, node 4 always has 2 outgoing arrows. Node 6 always has one incoming and two outgoing arrows, node 12 always has 2 incoming arrows, and so on.
To generate the randomized graphs, imagine cutting each arrow in half with a scissors to generate a dangling “+” end connected to an arrowhead and a dangling “–” end connected to a source node. Then imagine tying each “+” end to a randomly chosen “–” end. Voila, we have a new randomized graph, guaranteed through this procedure to maintain the same joint distribution of incoming and outgoing arrows. For more details, see the algorithms in ShenOrr et al, 2002 and in Newman et al, 2001.
The feedforward loop is overrepresented in natural transcriptional networks
Now that we understand how to formulate a proper null model for motif identification, we return to the analysis by Milo, et al. The authors used a zscore to quantify over or underrepresentation in units of the standard deviation of the number of occurrences for the subgraph in the randomized circuits,
\begin{align} z = \frac{n_{obs}\langle n\rangle}{\sigma}, \end{align}
where \(n_{obs}\) is the number of times the subgraph was observed in the actual circuit, \(\langle n\rangle\) is the mean number of times it was observed in randomized circuits, and \(\sigma = \sqrt{\langle (n\langle n \rangle)^2\rangle}\) is the standard deviation of the number of times it was observed in randomized circuits.
They did this for several transcriptional circuits, including E. coli, yeast (two versions), and B. subtilis. In the plot below (data digitized from Milo, et al., 2004), the four networks are so similar in their zscore profiles that the different organisms overlap almost perfectly.
[50]:
x = np.array([1, 4, 2, 7, 3, 8, 5, 9, 11, 6, 10, 12, 13])
z_ecoli = np.array([0.5, 0.5, 0.5, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0])
z_yeast_1 = np.array([0.5, 0.5, 0.5, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0])
z_yeast_2 = np.array([0.5, 0.5, 0.5, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0])
z_bacillus = np.array([0.5, 0.5, 0.5, 0, 0.032, 0, 0.5, 0, 0, 0, 0, 0, 0])
p = bokeh.plotting.figure(
frame_width=500,
frame_height=100,
x_axis_label="motif",
y_axis_label="normalized zscore",
x_range=[0, 14],
y_range=[1.0, 0.7],
tools="save",
)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.ticker = bokeh.models.tickers.FixedTicker(ticks=x)
p.yaxis.ticker = bokeh.models.tickers.FixedTicker(ticks=[0.5, 0, 0.5])
p.ray(0, 0, color='black', line_width=1)
p.ray(0, 0, angle=np.pi, color='black', line_width=1)
ecoli = p.circle(x, z_ecoli, fill_alpha=0, size=10)
yeast_1 = p.square(x, z_yeast_1, fill_alpha=0, size=10, color=colors[1])
yeast_2 = p.diamond(x, z_yeast_2, fill_alpha=0, size=12, color=colors[2])
bacillus = p.star(x, z_bacillus, fill_alpha=0, size=10, color=colors[3])
items = [("E. coli", [ecoli]), ("yeast 1", [yeast_1]), ("yeast 2", [yeast_2]), ("B. subtilis", [bacillus])]
legend = bokeh.models.Legend(items=items)
legend.click_policy = "hide"
p.add_layout(legend, 'right')
bokeh.io.show(p)
As you can see, one particular motif, number 5, is overrepresented in all of the transcriptional networks, number 7:
This subgraph, termed the feedforward loop (FFL), has one node that regulates a target node two ways: directly, and indirectly through the third node.
Several features are striking from their results:
There is strong overrepresentation of feedforward loops (FFLs). In E. coli, one expects to see 7±5 FFLs by chance, but one observes this pattern 42 times in the real circuit.
The overrepresentation of FFLs is conserved across three distinct organisms, suggesting it is a general property of transcriptional circuits.
Most other subgraphs are neither over nor underrepresented.
Three subgraphs are statistically underrepresented. They occur significantly less often than one would expect by chance, provoking the question of what problems they might present as components of larger circuits. These three subgraphs are all subgraphs of the FFL subgraph. Thus, it is not that, e.g., divergent regulation (represented by subgraph 1) is rare; it is just that when it does occur, it does so in the context of the FFL.
There are many kinds of FFLs
Our description of regulatory interactions in the FFL so far has been oversimplified in several ways:
We have not distinguished between positive regulation (activation) and negative regulation (repression).
We have not considered how multiple regulators combine to control expression of a mutual target operon.
We have ignored the quantitative aspects of regulation.
Understanding the biological function of a motif requires thinking about these aspects more carefully.
To address the first point, we can classify the overall FFL motif into \(2^3=8\) different categories depending on which of its 3 arrows are positive or negative:
In this classification, half of the FFL architectures are coherent, meaning that X’s direct regulation of Z and its indirect regulation of Z are of the same type, both activating or both repressing. the other half are incoherent, meaning that the direct and indirect regulatory paths have the opposite sign.
We can further subclassify the FFLS, according to how the regulatory arrows converging on the third node (now labeled “Z”) combine. Consider first the example where both X and Y activate Z, as in the C1FFL and I4FFl. In AND regulation, both X and Y need to be simultaneously present at high levels for Z to be expressed. In OR regulation, either input being at a high level is sufficient to activate Z. We will discuss this logic in more detail momentarily.
Now that we have defined what FFLs are and how we can represent them, we will spend the rest of this chapter (and even some beyond) considering what functions the various FFLs can perform for cells.
The mostencountered FFLs
While FFLs in general are motifs, some FFLS are more often encountered than others. In the figure below, using data taken from a review by Uri Alon, we see relative abundance of the eight different FFLs in E. coli and S. cerevisiae. Two FFLs, C1FFL and I1FFL, stand out as having much higher abundance than the other six. We will focus our study on these two in this chapter.
[51]:
# Data based on Alon, Nature Rev. Genet., 2007, https://doi.org/10.1038/nrg2102
species = ["yeast", "E. coli"]
ffls = reversed(["C1 ", "C2 ", "C3 ", "C4 ", "I1 ", "I2 ", "I3 ", "I4 "])
data = {
"species": species,
"E. coli": reversed([0.464, 0.09, 0, 0, 0.374, 0.055, 0.017, 0]),
"yeast": reversed([0.377, 0.035, 0.105, 0.08, 0.28, 0.027, 0.052, 0.044]),
}
x = [(ffl, sp) for ffl in ffls for sp in species]
frac = sum(zip(data["E. coli"], data["yeast"]), ())
source = bokeh.models.ColumnDataSource(data=dict(x=x, frac=frac))
p = bokeh.plotting.figure(
y_range=bokeh.models.FactorRange(*x),
frame_height=450,
title="Relative abundance of FFLs",
)
p.hbar(
y="x",
left=0,
right="frac",
height=0.9,
source=source,
fill_color=bokeh.transform.factor_cmap(
"x", palette=colors, factors=species, start=1, end=2
),
line_color="white",
)
p.x_range.start = 0
p.y_range.range_padding = 0.05
p.yaxis.group_label_orientation = "horizontal"
p.ygrid.grid_line_color = None
bokeh.io.show(p)
Logic of regulation by two transcription factors
Because X and Y both regulate Z in an FFL, we need to specify how they collaborate in the regulation.
For the sake of illustration, let us assume we are discussing C1FFL, where X activates Z and Y also activates Z. One can imagine a scenario where both X and Y need to be present to turn on Z. For example, they could be binding partners that together serve to recruit polymerase to the promoter. We call this AND logic. In other words, to get expression of Z, we must have X AND Y
. Conversely, if either X or Y may each alone activate Z, we have OR logic. That is, to get expression of Z,
we must have X OR Y
.
So, to fully specify an FFL, we need to also specify the logic, either AND or OR, of how Z is regulated. Including choice of logic gives a total of 16 possible FFLs.
We are now left with the task of figuring out how to mathematically encode AND and OR logic. Before doing so, we note that, as discussed previously, we are using Hill functions, which are phenomenological functions describing how effectors may regulate gene expression capturing both the necessary concentration of effector (\(k\)) and the ultrasensitivity of the regulation (\(n\)). When the molecular details of the regulation mechanics of an effector are known, we may derive the appropriate functions describing gene expression regulation rather than using Hill function. Similarly, for two effectors, we could also derive the functions from the molecular details and discover what kind of logic emerges. See, for example, this 2005 paper by Bintu and coworkers. We often do not know the molecular details, and Hill functions and the twoeffector variants thereof we present below are quite useful in analyzing the properties of circuit architectures.
We now proceed to formally write mathematical expressions for the dynamics of a gene product Z under regulatory control of effectors X and Y. The dynamics of the concentration of Z may be written as
\begin{align} \frac{\mathrm{d}z}{\mathrm{d}t} = \beta \,f(x, y)  \gamma z, \end{align}
where the lowercase letters denote the concentrations of the respective species.
Our goal is to specify the dimensionless regulatory function \(f(x, y)\) that encodes how X and Y may together regulate Z. Our approach is to assign a “weight” to each state of a promoter region. With two effectors, X and Y, the promoter region could be unbound, bound with X, bound with Y, or bound with both X and Y. To get the regulatory function, we sum the weights of states that allow polymerase binding and divide by the sum of all weights. This gives the fraction of time that expression of the gene is “on.” For example, if X and Y are both activators and they together have AND logic, we have
\begin{align} f(x, y) = \frac{\text{X and Y bound weight}}{(\text{unbound weight}) + (\text{X bound weight}) + (\text{Y bound weight}) + (\text{X and Y bound weight})} \end{align}
The weights are chosen to give Hilllike functions.
promoter region state 
weight 
dimensionless weight 

unbound 
\[1\]

\[1\]

X bound 
\[(x/k_x)^{n_x}\]

\[x^{n_x}\]

Y bound 
\[(y/k_y)^{n_y}\]

\[y^{n_y}\]

X and Y bound 
\[(x/k_x)^{n_x}\,(y/k_y)^{n_y}\]

\[x^{n_x}\,y^{n_y}\]

The dimensionless weights are given by substituting \(x \leftarrow x/k_x\) and \(y \leftarrow y/k_y\). We will use the dimensionless versions of these functions henceforth. We note that the denominator of the regulatory function \(f(x,y)\) is always the same,
\begin{align} 1 + x^{n_x} + y^{n_y} + x^{n_x} y^{n_y} = (1 + x^{n_x})(1 + y^{n_y}). \end{align}
Alternatively, we could have a structure where maximally only one of the two effectors may be bound at a time (for example due to steric reasons), in this case the states and weights are given in the table below.
promoter region state 
weight 
dimensionless weight 

unbound 
\[1\]

\[1\]

X bound 
\[(x/k_x)^{n_x}\]

\[x^{n_x}\]

Y bound 
\[(y/k_y)^{n_y}\]

\[y^{n_x}\]

In this case, the denominator for all of the regulatory functions is \(1 + x^{n_x} + y^{n_y}\). We will refer to such regulatory functions as corresponding to “single occupancy.”
With this prescription, let us proceed to write the regulatory functions \(f(x, y)\) for various architectures.
Logic with two activators
Let us start first with X and Y, both activating, with AND logic, as seen in the C1FFL and I4FFL. To help conceptualize how the logic translates into expression of Z before we get into the mathematical expressions, we can construct a truth table for whether or not Z is on, given the on/off status of X and Y. The truth table is shown below, with a zero entry meaning that the gene is not on and a one entry meaning it is on.
X 
Y 
Z 

0 
0 
0 
0 
1 
0 
1 
0 
0 
1 
1 
1 
We can also construct a truth table for OR logic with X and Y both activating.
X 
Y 
Z 

0 
0 
0 
0 
1 
1 
1 
0 
1 
1 
1 
1 
Following the above prescription, the dimensionless regulatory functions are
\begin{align} &\text{AND logic: } f(x,y) = \frac{x^{n_x} y^{n_y}}{(1 + x^{n_x})(1 + y^{n_y})},\\[1em] &\text{OR logic: } f(x,y) = \frac{x^{n_x} + y^{n_y} + x^{n_x} y^{n_y}}{(1 + x^{n_x})(1 + y^{n_y})}. \end{align}
If only singleoccupancy is allowed, the gene can never be activated with AND logic, and the regulatory function with OR logic is
\begin{align} &\text{OR logic (single occupancy): } f(x,y) = \frac{x^{n_x} + y^{n_y}}{1 + x^{n_x} + y^{n_y}}. \end{align}
We can make plots of these regulatory functions to demonstrate how they represent the respective logic. To accentuate the logic, we will choose very sharp Hill functions \(n_x = n_y = 20\).
[52]:
def xyz_im_plot(x, y, z, x_log, y_log, z_log, title=None, palette="Viridis256"):
"""Display x, y, z data as an image."""
p_log = bokeh.plotting.figure(
frame_height=200,
frame_width=200,
x_range=(x_log.min(), x_log.max()),
y_range=(y_log.min(), y_log.max()),
x_axis_label="x",
y_axis_label="y",
title=title,
toolbar_location=None,
x_axis_type="log",
y_axis_type="log",
)
p_log.image(
image=[z_log],
x=x_log.min(),
y=y_log.min(),
dw=x_log.max()  x_log.min(),
dh=x_log.max()  x_log.min(),
palette=palette,
alpha=0.8,
)
p = bokeh.plotting.figure(
frame_height=200,
frame_width=200,
x_range=(x.min(), x.max()),
y_range=(y.min(), y.max()),
x_axis_label="x",
y_axis_label="y",
title=title,
toolbar_location=None,
)
p.image(
image=[z],
x=x.min(),
y=y.min(),
dw=x.max()  x.min(),
dh=x.max()  x.min(),
palette=palette,
alpha=0.8,
)
p_log.visible = True
p.visible = False
radio_button_group = bokeh.models.RadioButtonGroup(
labels=["log", "linear"], active=0, width=100
)
col = bokeh.layouts.column(
p_log, p, bokeh.layouts.row(bokeh.models.Spacer(width=100), radio_button_group)
)
radio_button_group.js_on_change(
"active",
bokeh.models.CustomJS(
args=dict(p_log=p_log, p=p),
code="""
if (p_log.visible == true) {
p_log.visible = false;
p.visible = true;
}
else {
p_log.visible = true;
p.visible = false;
}
""",
)
)
return col
# Get x and y values for plotting
x_log = np.logspace(2, 2, 200)
y_log = np.logspace(2, 2, 200)
x = np.linspace(0, 2, 200)
y = np.linspace(0, 2, 200)
xx, yy = np.meshgrid(x, y)
xx_log, yy_log = np.meshgrid(x_log, y_log)
# Parameters (steep Hill functions)
nx = 20
ny = 20
# Generate plots
p_and = xyz_im_plot(
xx,
yy,
biocircuits.aa_and(xx, yy, nx, ny),
xx_log,
yy_log,
biocircuits.aa_and(xx_log, yy_log, nx, ny),
title="two activators, AND logic",
)
p_or = xyz_im_plot(
xx,
yy,
biocircuits.aa_or(xx, yy, nx, ny),
xx_log,
yy_log,
biocircuits.aa_or(xx_log, yy_log, nx, ny),
title="two activators, OR logic",
)
p_or_single = xyz_im_plot(
xx,
yy,
biocircuits.aa_or_single(xx, yy, nx, ny),
xx_log,
yy_log,
biocircuits.aa_or_single(xx_log, yy_log, nx, ny),
title="two act., OR logic, single occ.",
)
bokeh.io.show(
bokeh.layouts.column(
bokeh.layouts.row(p_and, bokeh.models.Spacer(width=30), p_or),
bokeh.models.Spacer(height=20),
bokeh.layouts.row(bokeh.models.Spacer(width=300), p_or_single),
)
)