Write an object-oriented module DNAsequence whose object has one attribute, a sequence of DNA, and two methods, get_dna and set_dna. Start with the code for Gene.pm, but see how far you can whittle it down to the minimum amount of code necessary to implement this new class.
The FileIO.pm module implements objects that read and write file data. However, they can, depending on the program, deviate substantially from what are actually present in files on your computer. For instance, you can read in all the files in a folder, and then change the filenames and data of all the objects, without writing them out. Is this a good thing or a bad thing?
In the text, you are asked why the new constructor for FileIO.pm has been whittled down to the bare bones. You can see that all it does is create an empty object. What functionality has been moved out of the new constructor and into the read and write methods? Does it make more sense to do without a new constructor entirely and instead have the read and write methods create objects? Try rewriting the code that way. Alternately, does it make sense to try rewriting the code so that both reading and writing are handled by the new constructor? Is creating an object sometimes logically distinct from initializing it?
Use FileIO.pm as a base class for a new class that manages the annotation of a pipeline in your laboratory. For example, perhaps your lab gets sequence from your ABI machine, screens it for vectors, assesses the quality of the sequencing run, searches your local database to determine if you've seen it or something like it before, then searches GenBank to see what other known sequences it matches or resembles, and finally adds it to an assembly project. Each step has a person or persons, a timestamp for the beginning and ending of each phase, and data. You want to be able to track the work done on each sequence that emerges from your ABI. (This is just an example. Pick a set of jobs that you actually do in your lab.)
For each sequence file format handled by the SeqFileIO.pm module, find the documentation that specifies the format. Compare the documentation with the is_, parse_, and put_ method to recognize, read, and write files in each format. How can you improve this code? Make it more complete? Faster?
My parse_ methods are somewhat ad hoc. They don't really parse the whole file according to the definition of the format. They just extract the sequence and a small amount of annotation. Take one of the formats and write a more complete parser for it. What are the advantages and disadvantages of a simple versus a more complete parser in this code? How about for other applications you may want to develop in the future?
Use the parser you developed in Exercise 4.6 to do a more complete job of identifying a file in the same format in the module's is_ method.
Add a new sequence file format to SeqFileIO.
In FileIO.pm, and in many other places in this book, the program calls croak and exits when a problem arises (such as when unsuccessfully attempting to open a file for reading). Such drastic measures are sometimes desirable; for example, you may want to kill the program if a security problem is discovered in which someone is attempting to read a forbidden file. Or, when developing software, you may like your program to print an informative message and die when a problem occurs, as that might help you develop the program faster.
However, very often what you really want is for the program to notice the error and take some appropriate steps, not simply die. If a file cannot be opened, it may be something as simple as the user of the program mistyping the filename, and what you'd like is to give the user another couple of chances to type the name in correctly. Rewrite FileIO.pm without calling croak. This may entail checking for the success or failure of certain operations and taking reasonable actions on failure. Should the class module take all such actions, or should the program that uses the class module be expected to behave appropriately when a failure is reported?
The AUTOLOAD method in FileIO.pm tests for attributes that are scalars and references to arrays. The need for this comes from the list of attributes given in the %_my_attribute_properties hash. Each attribute hash value is an anonymous array with two elements: default value and properties. From the default value you can see that a value is either a scalar (a string in this case) or an anonymous array (a reference to an array). The code that AUTOLOAD installs for accessor routines then checks if the attribute is either a scalar or a reference to an array.
This AUTOLOAD method is inherited by SeqFileIO.pm. One of the modifications that SeqFileIO.pm makes is defining its own %_my_attribute_properties to handle the new attributes that it defines, such as _sequence. In this case, all the attributes are either scalars or references to arrays, as before. What modifications are necessary if some other data type is needed for a new attribute by a class that inherited FileIO.pm? How can you rewrite FileIO.pm to make it easier to write classes that inherit it?
The test program testSeqFileIO has certain shortcomings. For one thing, it repeats blocks of code that can be replaced with a short loop (with a little rewriting). Another problem is that it doesn't test everything in the class.
Rewrite testSeqFileIO so that it's clearer and more comprehensive. By default, make it just give a short summary of the number of tests performed and the number of tests passed, but add a verbose flag so that it prints out all its tests in detail when desired. The module SeqFileIO.pm is lacking POD documentation.Add POD documentation to the module that is fairly easily cut and pasted into a test program for the module.
In SeqFileIO.pm, the hash %_all_attribute_properties changed from the base class and needed to be redefined. However, the code for the _all_attributes, _attribute_default, and _permissions helper methods didn't change. Why then did the new class SeqFileIO redefine these methods? (Hint: are these helper methods closures?) SeqFileIO.pm is also lacking POD documentation. Try adding POD documentation to the module soy that it can be easily cut and pasted into a test program for the module.
The h2xs program that ships with Perl simplifies module creation, and even helps you create the Makefile.PL that you'll need to add your own module to CPAN or to your local installation (which helps you bypass the somewhat awkward use lib directive that appears in the programs in this book). See also the perlxstut, the ExtUtils::MakeMaker, and the AUTOLOAD manpages. In particular, see the -X option to h2xs. Write a module starting from the use of h2xs.
The open calls in the read methods of the classes in this chapter specify a filehandle FileIOFH. Alternatives include using lexical scalars as filehandles or the IO::Handle package. Rewrite the read methods so files are opened with these alternative types of filehandles. What costs or benefits result from these rewritings? (See the perlopentut part of the Perl documentation.)
In the AUTOLOAD method, a copy of the file data is returned from the get_filedata accessor; this will protect the actual file data in the object, but it makes a copy of a potentially very large amount of data, which can overtax your system. Discuss alternatives for this behavior, and implement one of them.
Reading in a few hundred large files (as can easily happen with the modules in this chapter) can overtax your system, causing the system, or at least the program, to crash. Design two alternative methods that avoid this overuse of memory. For instance, you can avoid reading in a file until the sequence data is actually needed. You can also reread the data into the program each time needed but not save it in your object. Finally, you can reclaim memory from older files. Implement one of these methods or some other. What other parts of the code need to be altered?