% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- %\VignetteIndexEntry{4. Import Methods} %\VignetteKeywords{Preprocessing, Affymetrix} %\VignetteDepends{affy} %\VignettePackage{affy} %documentclass[12pt, a4paper]{article} \documentclass[12pt]{article} \usepackage{amsmath} \usepackage{hyperref} \usepackage[authoryear,round]{natbib} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \author{Laurent} \begin{document} \title{affy: Import Methods (HowTo)} \maketitle \tableofcontents \section{Introduction} This document describes briefly how to write import methods for the \verb+affy+ package. As one might know, the Affymetrix data are originally stored in files of type \verb+.CEL+ and \verb+.CDF+. The package extracts and store the information contained in \verb+R+ data structures using file parsers. This document outlines how to get the data from other sources than the current\footnote{today's date is early 2003} file formats. As usual, loading the package in your \verb+R+ session is required. \begin{Sinput} R> library(affy) ##load the affy package \end{Sinput} <>= library(affy) @ {\bf note: this document only describes the process for .CEL files} Knowing the slots of \verb+Cel+ and \verb+AffyBatch+ objects will probably help you to achieve your goals. You may want to refer to the respective help pages. Try \verb+help(Cel)+, \verb+help(AffyBatch)+. \section{How-to} \subsection{Cel objects} The functions \verb+getNrowForCEL+ and \verb+getNcolForCEL+ are assumed to return the number of rows and the number of columns in the \verb+.CEL+ file respectively You will also need to have access to the $X$ and $Y$ position for the probes in the {\verb .CEL} file. The functions \verb+getPosXForCel+ and \verb+getPosYForCEL+ are assumed to return the $X$ and $Y$ positions respectively. The corresponding probe intensities are assumed to be returned by the function \verb+getIntensitiesForCEL+. If you stored {\bf all} the $X$ and $Y$ values that were in the \verb+.CEL+, the functions verb+getNrowForCEL+ and \verb+getNcolForCEL+ can written: <<<>>= getNrowForCEL <- function() max(getPosXForCEL()) getNcolForCEL <- function() max(getPosYForCEL()) @ You will also need the name for the corresponding \verb+.CDF+ (although you will probably no need the \verb+.CDF+ file itself, the cdf packages available for download will probably be enough). \begin{Sinput} import.celfile <- function(celfile, ...) { cel.nrow <- getNrowForCEL(celfile) cel.ncol <- getNcolForCEL(celfile) x <- matrix(NA, nr=cel.nrow, nc=cel.ncol) cel.intensities <- getIntensitiesForCEL(celfile) cel.posx <- getPosXForCEL(celfile) # +1 if indexing starts at 0 (like in .CEL) cel.posy <- getPosYForCEL(celfile) # idem x[cbind(cel.posx, cel.posy)] <- cel.intensities mycdfName <- whatcdf("aCELfile.CEL") myCel <- new("Cel", exprs=x, cdfName=mycdfName) return(myCel) } \end{Sinput} The function \verb+import.celfile+ can now replace the function \verb+read.celfile+ in the \verb+affy+ package \subsection{AffyBatch objects} (scratch) the use of \verb+...+ should make you able to override the function read.celfile by a hack like: \begin{Sinput} read.celfile <- import.celfile \end{Sinput} The function \verb+read.affybatch+ should now function using your \verb+import.celfile+ \end{document}