William Valdar's Perl coding conventions
This document, which is intended to help other people read my code,
reflects the way I write perl at the moment. I am not suggesting this is
the best way to write perl or any other language; any set of conventions
has it's good and bad points and this is no exception.These conventions
have evolved with time and will probably undergo further revisions as I
encounter new problems and explore new practices in my work.
Naming
Follow java
conventions where possible. Avoid abbreviations or contractions unless
they are standard: scrSeq() is bad, scoreSeq() is okay,
scoreSequence()
is
better provided it doesn't make other names unweildy, eg, scoreMultipleSequenceAlignment()
is probably worth contracting to scoreMsa(). If you have to include
an abbreviation, treat it as a single word: avoid PDBFile and
ReadPDBFile;
prefer pdbFile and ReadPdbFile. Except for constants,
use '_' sparingly.
Variables
-
single word: atom, chain
-
multi word: atomName, centralAtomName
-
constants: CALPHA_ATOM_NAME or ATOM_NAME_CALPHA,
ATOM_NAME_CBETA
Perl is not type-safe and this can cause confusion and errors. Use
a limited prefix notation for such common basic types as array, hash, FileHandle.
-
array refs ('a' prefix): aAtoms, aChains
-
hash refs ('h' prefix): hNames2Places, hChains
-
FileHandle objects ('fh' prefix): fhIn, fhOut, fhPdb
-
or ("ist"=input stream, "ost"=output stream): ostPdb,
istMsa
Functions
-
single word: Trim()
-
multi word: OpenFilesForReading()
Modules (packages that are not classes)
-
single word: Assert
-
multi word: FileIoHelper
Classes
As for modules but with 'C' prefix: CStopwatch, CWindowPanel,
Pdb::CResidue
Instance methods
-
public method: plot(), getColour(), classifyHetGroups()
-
private method: _plot(), _getColour(), _classifyHetGroups()
-
accessor methods same as JavaBeans: getProperty(), setProperty(), isProperty()
Layout conventions
Bracket using the Allman style. Indent at four spaces. Avoid hardware tabs.
Put a single space between while/if/for and their conditionals to distinguish
them from functions. Indent overflowing lines by twice the normal indent
(ie, eight spaces). Eg,
while (false != $condition)
{
if ($debug)
{
SomeFunction($argOne,
$argTwo);
SomeOtherFunction($fred,
$barney, $betty, $wilma, $bambam, $bedrock,
$dinosaur, $otherGuy);
}
}
Comment long argument lists like this:
FooFunction($name, # person's name
$address,
# where they live
$zScore);
# their z-score
Comment extended regular expressions similarly.
Documentation
| Notation |
Meaning |
| CRE |
caught run-time exception |
| URE |
uncaught run-time exception |
| <foo> |
argument or return value is of type foo |
| <const foo> |
foo is assumed to be constant (ie, not modified directly) |
| boolean |
zero is false, nonzero is true |
| char |
SCALAR variable of one character |
| CONSTANT |
a class constant |
| int |
SCALAR variable of integer value |
| real |
SCALAR variable of floating-point value |
| string |
SCALAR variable holding a string |
| Pdb::CResidue |
reference to an object of type Pdb::CResidue |
| @ |
ARRAY |
| @ real |
ARRAY of reals |
| \@ |
reference to an ARRAY |
| \@ real |
reference to an ARRAY of reals |
| \@@ real |
reference to a 2-D ARRAY of reals |
| % |
HASH |
| \% |
reference to a HASH |
| \sub |
reference to a subroutine |
| ist |
intput stream object |
| ost |
output stream object |
| stream |
stream object |
| this |
a reference to the current object |
| SCALAR |
anything as long as it's scalar (not a reference) |
| ? |
SCALAR or a reference |
eg:
ColsFromStream
Function:
Returns arrays representing the specified columns
from a file
ARGUMENTS:
1. <ist> stream
2. <\@ int> list of column numbers (starting 0..)
RETURN:
if scalar
1. <\@ string> first column
specified
elsif array
<@@ string> array of arrays
of specified columns
Programming conventions
-
Always use strict.
-
Always use English: prefer readable global variables to perlesqe jibberish,
ie, $PROGRAM_NAME instead of $0, $ARG instead
of $_, $PROCESS_ID instead of $$.
-
Avoid glob file handles, prefer FileHandle (or the retro-compatible IO::File)
objects. Globs are non-standard variable types that are program-global.
Their use makes large scale programs clumsy at best. For instance,
#main program
open(IN, "input.txt");
ReadDataFile("data.txt");
sub ReadDataFile($)
{
open(IN, "data.txt");
#ERROR: IN filehandle opened twice
#...
}
This could be fixed by dynamically scoping IN with local(*IN)
at the start of the function. However, dynamic scoping is unnecessarily
complicated for such a simple task. Compare FileHandle objects. Not only
are these presented as standard variables with normal scoping rules, but
their synatax accords with POSIX standards. Also, FileHandle objects are
more convenient: they automatically close when no longer pointed to.
Until perl completes its object-based IO library, globs will occasionally
be unavoidable (eg, controlling processes). In such cases,
avoid name conflicts by using references to anonymous globs from the Symbol
module, eg,
use Symbol;
my $progReader = gensym();
open($progReader, "prog.exe |");
-
In short one-shot scripts, (<>) is okay when at the top level.
-
Avoid creating and using variables with global scope. Use package-global
variables sparingly.
-
Use built in @ and % types sparingly. Prefer array and
hash references, since this is what a @ or % must be
converted to if efficiently passed to function, eg,
my @arrayType = (1..100);
ProcessArray(\@arrayType);
sub ProcessArray($)
{
my $arrayRef = shift;
# array spends most of its life accessed
# as a ref anyway
Process($arrayRef);
# etc...
}
-
Declare a package-scoped global constant (with use constant) only
if clients of the package need it to use the package properly. Otherwise,
make the constant private to the package (with my()).
-
Declare variables at the last possible moment, eg,
my $aScores = [];
for my $elem (@$array)
{
my $score = $elem * 2;
push @$aScores, $score;
}
-
Initialize variables when they are declared.
-
Avoid implicit use of $ARG (ie, $_). Use $ARG
explicitly only at the top level and even then wisely. Similarly, avoid
functions that use $ARG such as map and grep
unless at the top level.
-
If a function has a set number of parameters, make that number explicit
in the declaration, eg, JoinTwoFiles($fileOne, $fileTwo) should
be declared sub JoinTwoFiles($$).
-
Prefer true and false constants to 1 and 0 if that's what you mean.
-
In tests for numerical equality that involve a constant, prefer to put
the constant on the lhs to avoid accidental assignment, eg, 2 == $value.
-
Break any of the above rules to avoid clumsiness in a program.