For purposes of this appendix, let us start with a specific hypothetical data representation. Here is an easy-to-understand example. In the town of Greenfield, MA, the telephone prefixes are 772-, 773-, and 774-. (For non-USA readers: In the USA, local telephone numbers are seven digits and are conventionally represented in the form ###-####; prefixes are assigned in geographic blocks.) Suppose also that the first prefix is the mostly widely assigned of the three. The suffix portions might be any other digits, in fairly equal distribution. The data set we are interested in is "the list of all the telephone numbers currently in active use." One can imagine various reasons why this might be interesting for programmatic purposes, but we need not specify that herein.
Initially, the data set we are interested in comes in a particular data representation: a multicolumn report (perhaps generated as output of some query or compilation process). The first few lines of this report might look like:
============================================================= 772-7628 772-8601 772-0113 773-3429 774-9833 773-4319 774-3920 772-0893 772-9934 773-8923 773-1134 772-4930 772-9390 774-9992 772-2314 [...]