You want to convert ASCII text to HTML. For example, you have mail you want to display intelligently on a web page.
Use the simple little encoding filter in Example 20-3.
#!/usr/bin/perl -w -p00 # text2html - trivial html encoding of normal text # -p means apply this script to each record. # -00 mean that a record is now a paragraph use HTML::Entities; $_ = encode_entities($_, "\200-\377"); if (/^\s/) { # Paragraphs beginning with whitespace are wrapped in <PRE> s{(.*)$} {<PRE>\n$1</PRE>\n}s; # indented verbatim } else { s{^(>.*)} {$1<BR>}gm; # quoted text s{<URL:(.*?)>} {<A HREF="$1">$1</A>}gs # embedded URL (good) || s{(http:\S+)} {<A HREF="$1">$1</A>}gs; # guessed URL (bad) s{*(\S+)*} {<STRONG>$1</STRONG>}g; # this is *bold* here s{\b_(\S+)\_\b} {<EM>$1</EM>}g; # this is _italics_ here s{^} {<P>\n}; # add paragraph tag }
Converting arbitrary plain text to HTML has no general solution because there are too many conflicting ways to represent formatting information. The more you know about the input, the better you can format it.
For example, if you knew that you would be fed a mail message, you could add this block to format the mail headers:
BEGIN { print "<TABLE>"; $_ = encode_entities(scalar <>); s/\n\s+/ /g; # continuation lines while ( /^(\S+?:)\s*(.*)$/gm ) { # parse heading print "<TR><TH ALIGN='LEFT'>$1</TH><TD>$2</TD></TR>\n"; } print "</TABLE><HR>"; }
The CPAN module HTML::TextToHTML has options for headers, footers, indentation, tables, and more.
The documentation for the CPAN modules HTML::Entities and HTML::TextToHTML