UP | HOME

Gene Expression Experimental Data Analysis

Table of Contents

Goals

Get familiar with exploratory data analysis techniques

  • hierarchical clustering (heatmaps and line plots)
  • principal component analysis

Data

  • differential expression of E. coli cells in biofilm and in suspension (i.e., clump of cells vs cells floating in solution)
  • GSE3905 dataset from HW7 (article)

Preprocessing

A preview

There are quite a few tab-delimited fields:

head.png

Fields

Time slotBiofilmSuspension
4 hGSM88912GSM88916
7 hGSM88913GSM88917
15 hGSM88914GSM88918
24 hGSM88915GSM88919

Format

FieldDescription
0IDREF
1IDENTIFIER
215h-suspension
315h-biofilm
424h-suspension
524h-biofilm
67h-suspension
77h-biofilm
84h-suspension
94h-biofilm

Tasks

Write a data parser (~30 lines of Perl):

  1. Read the data into Perl
  2. Calculate the differences between suspension and biofilm
  3. Output a CSV file like this:
IDREFdiff4diff7diff15diff24

Date: 2011-11-14

Author: Jon-Michael Deldin

Org version 7.7 with Emacs version 23