Table of Contents

How do I read the contents of a file?

You open files like this:

my $fn = 'sequence.txt'; # filename to read
open my $in, '<', $fn;

# ...do stuff with $in...

close $in; # close the file
  • The < character is used for reading, whereas the > character is used for writing.
  • The my $in line makes the file handle available for use.

You can read the contents of the file as an array like this:

my $fn = 'sequence.txt'; # filename to read
open my $in, '<', $fn or die("Could not open $fn");

my @lines = <$in>; # special perl syntax

close $in;

How do I write to a file?

my $out_fn = 'output.txt';
open my $out, '>', $out_fn or die("Could not open $out_fn");
print $out "Hello, World!";
close $out;

# open output.txt on your computer to verify
  • IMPORTANT Do not place a comma after the file handle ($out in the above example) – this will just print out the file reference (some GLOB(0xAB01) garbage)

How do I print a row to a CSV file?

Use printf to make your life simpler:

printf $out_fh "%d, %d\n", $x, $y; # %d is a placeholder for digits

Here's a more complete example:

my @x = (0, 1, 2, 3, 4);
my @y = (3, 1, 4, 1, 5);

open my $fh, '>', 'amazing_discovery.csv' or die "ERROR: $!\n";

# print a header
print $fh "Solar Radiation, Stock Prices\n";

for (my $i = 0; $i < $#x; $i++) {
    printf $fh "%d, %d\n", $x[$i], $y[$i];

close $fh;

How do I convert my array into a string?

With the join() function:

my @list = qw(d n a);
my $list_as_a_string = join("", @list); #=> 'dna'

How do I get rid of the "\n" character on the end?

The \n character is a newline character, and it can be removed with the chomp() function.

my $str = "dna\n";
chomp $str; #=> "dna"

How do I complement a string of DNA?

my $str  = 'cat';
my $comp = $str;
my $comp =~ tr/acgt/tgca/; #=> "gta"

How do I reverse a string?

my $str = 'cat';
my $rev = reverse $s; #=> "tac"

How do I remove the first element from an array?

Oh, so you have a pesky header line from a FASTA file? Let's get rid of it with shift:

# your array, typically from reading the fasta file
my @seq = ("> Obscure organism\n", "ACTGAAA\n", "AAAA");

shift @seq; # removes the first element

# @seq is now: ("ACTGAAA\n", "AAAA")

How do I get rid of newline characters (\n) in an array?

The laziest way is to convert the array to a string and run a substitution command on it. It is not the most efficient, but it is perfect when you already planned on converting the array to a string.

my @seq_ary = ("ACTGAAA\n", "AA\n\n\n\nAA\n");

# convert it to a string
my $seq = join('', @seq_ary);

# replace all newlines
$seq =~ s/\n//g;

How do I split a string into a letter-by-letter array?

So you want your sequence ACTG to become ('A', 'C', 'T', 'G')? There's a recipe for that:

my $seq = 'ACTG';

# split without a regular expression (thing between the slashes)
# puts each character into an array
my @nucs = split //, $seq; #=> ('A', 'C', 'T', 'G')

How do I increment or decrement something easily?

Use the ++ and -- operators.

my $x = 0;

# the annoying-to-type way:
$x = $x + 1; # x = 1

# the less annoying way:
$x += 1; # x = 2

# the lazier way:
$x++; # x = 3

$x--; # x = 2;

How do I make reading files into an array easier?

We read files into arrays quite a bit. Let's make our code less repetitive by encapsulating all those open, <$fh>, and close statements into a function.

# Returns a file's contents as an array of lines.
#   my @lines = file_to_array('my_fasta_file.fa');
# @param  string filename
# @return array
sub file_to_array {
    my $fn = shift;
    open my $fh, '<', $fn or die "ERROR: Could not read $fn\n";
    my @lines = <$fh>;
    close $fh;

    return @lines;

How do I make reading files into a string easier?

Let's write another subroutine to use our last subroutine:

# Returns a file's contents as a string.
#   my $contents = file_to_string('my_fasta_file.fa');
# @param  string filename
# @return string
sub file_to_string {
    my $fn = shift;
    my @lines = file_to_array($fn);

    # we can "glue" each line together with the join() function
    my $string = join('', @lines);

    return $string;

How do I reuse subroutines without copy-pasting?

  1. Put all of the subroutines into a separate PL file called lib.pl (or whatever you want). For example, place the file_to_array definition in lib.pl.
  2. In the script you want to use your subroutines in, add a call to do:
# my_script.pl
use warnings;
use strict;

do 'lib.pl';

my @lines = file_to_array('sequence.fa');

Date: 2011-11-13 15:29:58 MST

Author: Jon-Michael Deldin

Org version 7.7 with Emacs version 23