• A sensitivity analysis of a biological module discovery pipeline

      Long, James; Roth, Mitchell; Rhodes, John; Marr, Thomas; Hartman, Chris (2015-05)
      Gene expression is the term applied to the combination of transcription, the process of copying information stored in DNA (deoxyribonucleic acid) into a transcript, and translation, the process of reading a transcript in order to manufacture a cellular product. Cellular products are typically proteins, which can combine either structurally or in concert to accomplish one or more tasks. Cooperating protein combinations are called modules, and it is thought that groups of transcripts with high correlation between their respective concentrations may indicate such modules. An open-source version of the CODENSE algorithm was developed with improved correlation methods to computationally test this hypothesis on an artificial transcription network containing a known module motif. The artificial network was used as input to a biochemical simulator in order to obtain synthetic transcription data, which was then fed to the pipeline whose purpose it is to discover modules in such data. Any discovered modules are compared to the known modules in the original network during a sensitivity analysis, where the process is repeated thousands of times with slightly varied parameters for each run. This process quantifies the sensitivity of pipeline output to each parameter of the pipeline, the most sensitive of which suggest what parts of the pipeline may be candidates for further refinement. The sensitivity analysis was then extended to include variation of biological network parameters, and noisy data. Lessons learned were then extended to the case of two known modules.