HK Database and Human Gene Expression Analysis


Synopsis:

Every cell in the human body has the same DNA, yet each cell is clearly not the same (a brain cell is different from a liver cell is different from a skin cell). So what helps make each cell different when they each have the same DNA, the same set of genes? The main answer is gene expression, which genes are active in which cell and at what level.
Much research has been done on gene expression with a lot of it devoted to detecting patterns in gene expression. There are three main categories for genes in terms of their expression levels: Housekeeping genes are ones that are expressed in all genes. They generally consist of genes that are important for basic cell maintenance. On the other hand, tissue-specific genes are ones that are only expressed in one tissue. Generally they consist of genes that have unique functions for the tissue in which they appear. Most of the research done on gene expression is done on these two categories, while the third category is often dismissed. These are the "mid-range" or "intermediate" (no defiinitive term exists) genes. These are genes that are expressed in some but not all tissues, more than tissue-specific, less than housekeeping.
This research will consist of analyzing gene expression data and identifying trends with a focus on these mid-range/intermediate genes. For this we have created a database that contains human gene expression data from three different research groups as well as the genic DNA sequences from Ensembl (specifically the gene sequences, coding sequences, intron sequences, 5' & 3'UTR, and protein sequences). In addition, the gene ontologies for each gene are included as well. From this, we hope to establish connections between gene expression and their sequences and ontologies. Some of the questions we will address are:

  • Is there some way to categorize these mid-range/intermediate genes?
  • Connection between sequence lengths and gene expression?
  • Clustering of genes with similar expression levels?
  • Connection between gene homology and gene expression levels? Do genes with low levels of gene expression show more homology with tissue-specific, while genes with high expression levels show more homology with housekeeping genes?
  • Connection between gene ontology and gene expression levels?


Notes/Diagrams:

HK Layout .pdf
Materials and Methods .pdf
1to1/1toMany/All Definitions .pdf

Data

General Stats on Expression Data .txt
Max/Min/Avg/StdDev expression amount in each HuGE tissue .txt
Max/Min/Avg/StdDev expression amount in each GNF tissue .txt
Max/Min/Avg/StdDev expression amount in each MPSS tissue .txt

Huge

HuGE housekeeping affymetrix IDs, genes they correspond to, and corresponding transcripts / total transcripts .txt

GNF


Graphs

Graphs of gene expression across tissues .xls
Graphs of average genic lengths across expressed tissues .xls
Scatter plot of expression values .pdf WARNING: LARGE FILE!!!
Variance/Standard Deviation of average genic lengths exon .xls intron .xls 5'-UTR .xls 3'-UTR .xls
Average Intron/Exon count across expressed tissues .xls
Graphs of tissue expression list
Graphs of genic lengths list

General Data

Housekeeping affymetrix IDs, genes they correspond to, and corresponding transcripts / total transcripts .txt
Tissue-specfic affymetrix IDs, genes they correspond to, and corresponding transcripts / total transcripts .txt
No expression affymetrix IDs, genes they correspond to, and corresponding transcripts / total transcripts .txt
Mid-range affymetrix IDs, genes they correspond to, and corresponding transcripts / total transcripts .txt
Mid-range affymetrix IDs and tissues they are expressed in .txt

1-to-1 ratio (one gene, one affymetrix probe)

Housekeeping genes .txt
Tissue-specific genes and tissues they are expressed in .txt
No expression genes .txt
Mid-range genes and number of tissues they are expressed in .txt

Housekeeping GO IDs, GO name, GO category, number of genes with this GO, and number of transcripts with this GO .txt
Tissue-specific GO IDs, GO name, GO category, number of genes with this GO, and number of transcripts with this GO .txt
Mid-range GO IDs, GO name, GO category, number of genes with this GO, and number of transcripts with this GO .txt

Housekeeping genic lengths, averages, and standard deviations .txt
Tissue-specific genic lengths, averages, and standard deviations .txt

Chromosomal breakdown .txt

1-to-many ratio (one gene, many affymetrix probes)

Housekeeping genes .txt
Tissue-specific genes and tissues they are expressed in .txt
No expression genes .txt
Mid-range genes and number of tissues they are expressed in .txt

All (many genes, many affymetrix probes)

Housekeeping genes .txt
Tissue-specific genes and tissues they are expressed in .txt
No expression genes .txt
Mid-range genes and numbers of tissues they are expressed in .txt




Resources:

Ensembl, for human genome sequence and gene ontologies
HuGE Index, for HuGE data
MPSS, for MPSS data
GNF, for GNF data


/public/users/malawso4/research_hk.html Login | Web Editor | Full Editor
Last modified 5/15/06 2:03 PM by malawso4 (history)
Site contents