Problem 11.1: Temporal gene expression and inferring a circuit


You suspect three genes, A, B, and C are part of a (possibly disconnected) three-gene circuit. You have fluoroscent markers for each, so you can monitor the amount of A, B, and C protein in a given cell. You acquire temporal measurements of the protein quantity in 100 cells. (The cells are independent, so the levels in one cell do not affect those in another.) The data are available in the file problem_11.1_time_traces.csv. Note that these are not real data; they were generated using stochastic simulation of model circuits, a technique discussed in later chapters.

a) Make plots of protein level versus time for two or three cells. Hint: You can load in the data and parse it using Pandas. For example, to obtain the time points and intensities for cell 16, you could do the following.

[1]:
import pandas as pd

# Load the data into a Pandas data frame
df = pd.read_csv('problem_11.1_time_traces.csv')

# Extract the time points as a Numpy array
t = df.loc[df['cell']==16, 'time (min)'].values

# Extract the time points as a Numpy array
t = df.loc[df['cell']==16, 'time (min)'].values

# Extract the fluorescence values as a Numpy arrays
a = df.loc[df['cell']==16, 'A intensity (a.u.)'].values
b = df.loc[df['cell']==16, 'B intensity (a.u.)'].values
c = df.loc[df['cell']==16, 'C intensity (a.u.)'].values

Furthermore, you can verify for yourself if you like, but the time points of the measurements are the same for all cells and the time points are evenly spaced.

b) Compute the autocorrelation for the A, B, and C levels. Also compute all pairwise cross-correlations (A-B, A-C, and B-C). Display plots of these for the same cells you considered in part (a).

c) Compute the mean autocorrelation and all mean pairwise cross-correlations. That is, average the results from part (b) over the 100 cells. Plot these mean correlations.

d) From your analysis and plots in the previous section, propose one or two possible three-gene circuit architectures that are consistent with measured data and explain why they may be reasonable. Propose one or two architectures that are not consistent with the measured data and explain why.

e) You expect a fourth gene, which we will call X, might be coupled to one or more of the components of the putative A-B-C circuit. The fluorescent intensities for X are also included in the data set. Is there evidence that X is connected to the A-B-C circuit?