Introduction to Perl

Table of Contents


For this project, our goal is to

Save the reverse complement of a sequence to a file

Break it down!

Let's decompose this into input, output, and process:

The sequence we need to reverse complement. This comes in the form of sequence.txt for this project, but it could be a fasta text file, a database, an image, or something else.
The reverse complement of the input sequence saved as a file (rev_complement.txt).

For the process:

  1. Get/read sequence.txt into a variable
    • It's easier to work with files if we read them into variables. Better yet, since we're dealing with letters, we should read the file into a string variable.
    • In Perl, the open() function creates a file handle you can use to access the contents of the file. You can read the contents of the file after you've opened it.
    • You need to close() each file you open too.
  2. Take the reverse complement of the variable
    • Perl excels at manipulating strings, and it even provides us with a perfect operator, tr//. This is the transliterate operator, and we can use it like so: $reverse = tr/acgt/tgca/
  3. Save the file as rev_complement.txt
    • open() a file for writing
    • print() to the file
    • close() the file after writing

Useful Idioms

How do I read the contents of a file?

You open files like this:

my $fn = 'sequence.txt'; # filename to read
open my $in, '<', $fn;

# ...do stuff with $in...

close $in; # close the file
  • The < character is used for reading, whereas the > character is used for writing.
  • The my $in line makes the file handle available for use.

You can read the contents of the file as an array like this:

my $fn = 'sequence.txt'; # filename to read
open my $in, '<', $fn or die("Could not open $fn");

my @lines = <$in>; # special perl syntax

close $in;

How do I convert my array into a string?

With the join() function:

my @list = qw(d n a);
my $list_as_a_string = join("", @list); #=> 'dna'

How do I get rid of the "\n" character on the end?

The \n character is a newline character, and it can be removed with the chomp() function.

my $str = "dna\n";
chomp $str; #=> "dna"

How do I complement a string of DNA?

my $str  = 'cat';
my $comp = $str;
my $comp =~ tr/acgt/tgca/; #=> "gta"

How do I reverse a string?

my $str = 'cat';
my $rev = reverse $s; #=> "tac"

How do I write to a file?

my $out_fn = 'output.txt';
open my $out, '>', $out_fn or die("Could not open $out_fn");
print $out "Hello, World!";
close $out;

# open output.txt on your computer to verify

Final remarks

Submit the project on Moodle.

Date: 2011-11-13 15:30:11 MST

Author: Jon-Michael Deldin

Org version 7.7 with Emacs version 23