High correlations were obtained between probes in seemingly unrelated probe sets, following an examination of the data from thousands of Affymetrix GeneChips. Investigation revealed that these unexpected correlations were between probes that were adjacent to high-valued probes. Using carefully selected probes, together with simple linear models, the extent of blur has been measured for each CEL file. The cause is shown to be attributable to poorly performing scanners. Blur can result in the doubling of the values of thousands of probes. This in turn can lead to the doubling of the expression level for hundreds of probe sets.
We address the problem of detection and correction of spatial flaws in oligonucleotide microarrays. We present two similar procedures, of which one is intended solely for use with replicates and the other has wider applicability. By constructing a set of replicates, with one realistically flawed, we are able to examine the extent to which our procedures are capable of repairing the flaw. We find that, for this purpose, our procedures are superior to the existing `Harshlight' procedure.
We have used large surveys of Affymetrix GeneChip HG-U133_Plus_2 data in the public domain to conduct a study of antisense expression across diverse conditions. We derive correlations between groups of probes which map uniquely to the same exon in the antisense direction. When there are no probes assigned to an exon in the sense direction we find that many of the antisense groups fail to detect a coherent block of transcription. We find that only a minority of these groups contain coherent blocks of antisense expression suggesting transcription.
We also derive correlations between groups of probes which map uniquely to the same exon in both sense and antisense direction. In some of these cases the locations of sense probes overlap with the antisense probes, and the sense and antisense probe intensities are correlated with each other. This configuration suggests the existence of a Natural Antisense Transcript (NAT) pair. We find the majority of such NAT pairs detected by GeneChips are formed by a transcript of an established gene and either an EST or an mRNA.
In order to determine the exact antisense regulatory mechanism indicated by the correlation of sense probes with antisense probes, a further investigation is necessary for every particular case of interest. However, the analysis of microarray data has proved to be a good method to reconfirm known NATs, discover new ones, as well as to notice possible problems in the annotation of antisense transcripts.
We have developed a computational pipeline to analyse large surveys of A ymetrix GeneChips, for example NCBI’s Gene Expression Omnibus. GEO samples data for many organisms, tissues and phenotypes. Because of this experimental diversity, any observed correlations between probe intensities can be associated either with biology that is robust, such as common co-expression, or with systematic biases associated with the GeneChip technology.
Our bioinformatics pipeline integrates the mapping of probes to exons, quality control checks on each GeneChip which identifies flaws in hybridization quality, and the mining of correlations in intensities between groups of probes. The output from our pipeline has enabled us to identify systematic biases in GeneChip data. We are also able to use the pipeline as a discovery tool for biology.
We have discovered that in the majority of cases, A ymetrix probesets on Human GeneChips do not measure one unique block of transcription. Instead we see numerous examples of outlier probes. Our study has also identified that in a number of probesets the mismatch probes are an informative diagnostic of expression, rather than providing a measure of background contamination. We report evidence for systematic biases in GeneChip technology associated with probe-probe interactions. We also see signatures associated with post-transcriptional processing of RNA, such as alternative polyadenylation.