The Cancer Genome Atlas is an incredible project to collect and analyze thousands of samples from dozens of tumor types. The dataset represents an enormous challenge to in terms of scale, complexity, and scope. This tool is a reflection of the insights produced from comparing datatypes of very different natures. Many are genomic (and can be mapped to chromosomal position), but some are not (like clinical information and histopathologic findings). A great deal of the data has either continuous values (like gene expression) or categorical values (like tumor subtype), but some have ordinal values. Data may be quite complete for all samples or very sparse. The equitable comparison and analysis of these data using statistical methods, as well as machine learning, is valuable in building a more inclusive picture of the biological processes that lead to tumor formation and growth.
RE is a window onto the analytical research and computation done by my colleagues. The circular ideogram layout, called Circvis, displays data features and associations. The edges inside of the circle indicate the association of genomic feature pairs. The inner ring displays associations between genomic features and clinical features. The outer ring is called the karyotype bands and acts as a visual partitioning of the chromosomes for geneticists much like the outlines of nations on maps. The accompanying filter panel is used to define the types of data and associations to search for.
I wrote the original draft of the application. I and Jake Lin expanded that into the current UI. The backend services were created by Hector Rovira, Jake Lin, Andrea Eakin, and myself. Timo Erkkila pushed us to create this interface. In the meantime, Sheila Reynolds, Vesteinn Thorsson, Lisa Iype, and others have contributed great ideas and priceless feedback.