J4/03-0009
14 August, 2002 
 
 
 
Page 1 of 8
SUBJECT: 
Strongly Typed File Records
AUTHOR: 
John Piggott
REFERENCES:
ISO/IEC 1989:2002, Information technology — Programming
languages — COBOL
SUMMARY:
The current Standard allows any number of different strongly-typed
records to be written to a COBOL file but does not permit them to be read
back unless the order of arrival can be precisely predicted.  This paper
attempts to remedy this obvious anomaly.
DISCUSSION:
This document arose from correspondence between the author and
members of J4 during December 2002 and was subsequently requested
by them.  This working paper has been produced before any further work
is done, because the solution adopted may not be obvious to all and
requires consensus.
The current Standard restricts the use of strongly-typed record formats in
the File Section.  It states in effect that, if an FD has a strongly-typed
level 1 record description following it, it can only have one record
description.  This is because level 1 record descriptions following an FD
implicitly redefine each other and any kind of redefinition is prohibited for
strongly-typed data descriptions.
Nevertheless, the Standard allows any number of different strongly-typed
record formats to be written easily to a file using WRITE (or REWRITE)
FILE … FROM …, for example:
J4/03-0009
14 August, 2002 
 
 
 
Page 2 of 8
But whilst allowing such a file to be written in COBOL, the Standard
provides no way to read the records back from the file (see note 1) unless
it is known in advance what type of record is next, in which case READ
INTO can be used.  In fact, it is extremely easy to read a record,
accidentally or deliberately, into the wrong type of data area (the "phoney
record") and thus to subvert the very purpose of strong typing.
Obviously it is untenable for a system not to be able to read its own
handwriting.  This paper was written just to correct this anomaly, not to
promote any philosophy of complex vs. simple file structures.
There are two main approaches:
Possible Approach #1
Forbid multiple WRITE ... FROM statements in such a case, i.e. bar
 
the above scenario, as if to close an unintentional loophole.
 
We need to leave aside any philosophical arguments in favor of
multiple record types for now and consider only practical questions.  
First, there are some points of view supporting this approach:
 
- Under the rules for WRITE FILE … FROM … the Standard says that
an implicit MOVE is made from the named record area to an implicit-
record having the same description.  But since such records would not
be allowed anyway as explained above, the WRITE statements would
be illegal.  But since we imagine the implicit-record exists only for the
instance and then conceptually vanishes after completion of the
WRITE, it is hard to support this argument.
 
- Since this is an anomaly or loophole that cannot be fully exploited, it
does no harm to tighten it up.  No one will be affected since the file
cannot be read back in COBOL.  The problem with this argument is
that programmers have always historically enthusiastically exploited
tricks and loopholes.  They will use this and will find a way around the
read-back problem, as discussed below.
Now for the points against this approach:
 
 
-  A potentially affecting change.  Any modifications to this recent new
Standard will not take effect until the current Standard has been well
used, and by then many programs could be affected.
1 Removing the strong typing on the file section records and using MOVE CORRESPONDING
fails because the latter does not handle data items with OCCURS, quite apart from questions of
efficiency and general dislike of MOVE CORRESPONDING.
J4/03-0009
14 August, 2002 
 
 
 
Page 3 of 8
 
-
Stating a rule would be difficult.  We would need to say something
like "if the source unit contains a WRITE FILE referencing a strongly-
typed item, any other WRITE FILE statement for the same file must
reference an item of the same or equivalent type".  But this does not
cover all cases: the FD could have a level 1 record of a different type,
or of no type at all.  This record could be innocuous (programmers are
used to coding some sort of record after an FD).
 
- There's a simple code-around.  Just by coding an untyped level 1
record after the FD and using WRITE record-name, apart from a
possible pitfall of record length, the programmer gets over the
problem completely.  (Strongly-typed items can always be used as
sending items.)
Possible Approach #2
 
Explicitly allow multiple-format files containing a strongly-typed record
to be written and read, i.e. complete the circle.
 
There are several points of view supporting this approach:
 
- Programmers historically have always expected files to be able to
contain different record formats.  Some users even have a convention
that every file must have a special header and trailer.
 
- We presumably want to encourage the use of strong typing, for
example to protect illicit redefinition of any of the several numeric
formats COBOL provides.  These formats are already used in
traditional files.  If we restrict usage to a single record format, strong
typing will be of little use in "legacy" file structures.
- Future COBOL formats may be devised that require strong typing
 
(maybe something similar to object references, but applying to files).  
We cannot predict how restriction to a single record format might
cramp such future developments.
 
- The current situation presents quite a nasty "gotcha".  A program-
mer might well write one program to write a multiple-format file (or
enhance an existing program to do that) and then find that there's no
way to read the file back in the next program.  It's not a simple matter
to come to this conclusion, as the author discovered.  Of course, what
will happen is that the programmer will simply suppress the STRONG
keyword (REPLACE ==TYPEDEF STRONG== BY ==TYPEDEF==)
and the facility will lose respect.
 
- By providing a formal way of reading a multiple-format file, we tempt
the programmer away from that nasty problem, discussed at the start,
of reading "blind" and getting a "phoney record".  (See Open Issues.)
J4/03-0009
14 August, 2002 
 
 
 
Page 4 of 8
So proceeding now with Possible Approach #2, we need to evaluate
different possible language extensions:
Suggestion #1
 
- Permit more than one level 1 record in the file section provided that
a SELECT WHEN is present on each record and only allow a record
(or any subordinate item) to be moved if its SELECT WHEN condition
is true.  For example:
 
This idea seems more promising and it satisfies traditionalists by
relaxing the rule against implicit redefinition in the file section in
exchange for a very "reasonable" requirement of selection by
condition.  However it too has problems: (1)  implementations will
have to generate an implicit test for the condition every time any of the
file section records is accessed, or any of its subordinates, including
elementary data items.   This is not a trivial matter: even when an
elementary item in the record is referenced, e.g. in a condition, the
implementation will need to generate a test equivalent to "IF
condition-i … ELSE (generate some standard exception)".   (2) by
allowing implicit redefinition in the file section, we would need to
defend not also allowing explicit redefinition in other sections, under
the same strictures.
Suggestion #2
 
- As the previous suggestion, i.e. permit more than one level 1 record
in the file section provided that a SELECT WHEN is present on each
record, but only allow a MOVE statement from the level 1 record-
name.  For example:
J4/03-0009
14 August, 2002 
 
 
 
Page 5 of 8
Here the MOVE statements would be the only permissible action. This
 
suggestion overcomes the foregoing possible objections from devel-
opers but it seems artificial to restrict reference to one type of state-
ment.  There is a more serious problem with this suggestion.  Note
that it requires the file section record to be moved into working-
storage in order to be processed.  Now, assuming that the same type
definition is used for both the file and working-storage records, all the
data-names in them below level 1 must be qualified whenever they
are used.  But, since the data-names in the file section cannot be ref-
erenced anyway, they are really superfluous, so the need for qualifi-
cation is just a nuisance with nothing gained in return.  See note 2.
Suggestion #3
 
- Require the receiving areas to have a SELECT WHEN and have a
kind of selective READ INTO which tests the SELECT WHEN
conditions.  For example, something like:
2 It is a matter of speculation whether the original authors of the TYPEDEF feature really
understood how intensely programmers hate having to qualify data-names.  It is a safe bet that
they will do anything to avoid using the same group-level TYPE more than once.  The SAME AS
clause has the same problem only worse.  The trouble is in the second reference.  Not only
must data-names under the second data description always be qualified, but any existing
references to data-names under the first data description must be changed by adding qualifiers.
J4/03-0009
14 August, 2002 
 
 
 
Page 6 of 8
where exactly one of w-rec1, w-rec2 receives the contents of the
 
record depending on which is the first area whose SELECT WHEN
condition will be true after the implied MOVE.
 
The problems with this approach are: (1) what happens if none of the
conditions are true?  (WHEN OTHER is not allowed outside the file
section.)  (2) how does the program "know" which area has been read
into?  (All the conditions will need to be tested again.)  (3) this is a big
change to the READ statement.
Suggestion #4
 
- Continue to forbid more than one level 1 record and have a kind of
multiple-choice READ INTO.  For example, something like:
where each condition is formed from data items in the record (f-rec).  
 
The conditions are tested in the order written and a true value causes
that area to receive the record.
 
This syntax gets round some of the conceptual problems of the
previous solution, but it is really no better than "trusting the
programmer" and simply allowing a direct MOVE into a strongly-typed
record.
Suggestion #5
 
- Allow a MOVE of data from a non-typed record in the file section to
a strongly-typed record elsewhere, provided that a SELECT WHEN is
present on the receiving item and the condition is true after the
MOVE.  For example:
J4/03-0009
14 August, 2002 
 
 
 
Page 7 of 8
 
Here there is just one skeleton file section record which is not typed
and which "represents" all the possible record layouts, whether typed
or non-typed.  One possible question is how the READ statement
behaves when there are several record layouts of different sizes but
only one level 1 record in the FD.  Presumably this does not cause a
problem.
Note that this suggestion would apply only to a MOVE from the file
 
section, not the implied MOVE of a READ … INTO.  It might be
wondered why we do not simply allow any operation from any section.  
The reasons are:
 
- Outside the file section, strongly-typed items will not necessarily
contain identifying values.
 
- READ (or RETURN)… INTO is already allowed without any
restriction and we don't wish to introduce changes.
OUTLINE OF PROPOSED REVISION:
It should be apparent by now that Suggestion #5 is preferred.  In
summary, the formal proposal will provide:
New syntax rule:
A MOVE to a strongly-typed item from a differently-typed or non-typed
item is permitted, provided that:
- both items are level 1
- the sending item is in the file section
- the receiving item has a SELECT WHEN clause
J4/03-0009
14 August, 2002 
 
 
 
Page 8 of 8
New general rule:
In the above scenario, the condition associated with the SELECT
WHEN clause must be set to true as a result of the execution of the
MOVE statement.  Otherwise a (new) exception is raised.
There are no substantive changes.
OPEN ISSUES:
It is a measure of how difficult it is to find a perfect solution that this
proposal still leaves some questions:
- Do we really want to allow absolutely any differently-typed or non-typed
record to be moved to the strongly-typed area?  For example, the
"skeleton record" in the example under Suggestion #5 above having just
a record type and large filler seems all right, but if f-rec has a non-
equivalent type declaration is that still all right?  It may be that the
programmer wants to handle one type of record (say strong-type-1) in the
file section and MOVE the other types (say strong-type-2 and strong-
type-3) into working-storage.  The only data item they all have in common
is the record type, and so the MOVE only appears to be non-compatible,
so it's probably acceptable.
- How can we check that the record, the sending item, is really an
unadulterated input record that has just been read?  What is to prevent
the programmer from altering the record type after the READ or indeed
moving a record that is about to be written to an output file?  Does
COBOL have a clear concept of an input file?  Should we require that the
file be open for INPUT or I-O?  It seems that we need to decide whether
the purpose of strong typing is to prevent accidental corruption or
deliberate3 corruption.  There is really no safeguard against the latter,
since the programmer can simply drop the strong typing by replacing the
STRONG keyword, or move the data to a non-typed area, or handle it at
the elementary level.  We want the programmer to say "great! - without
that type-check my program could have gone on to make a real mess of
the data" rather than "curses! - it's foiled me again - how can I get this
thing to work the way I want it?
- Should we also require a SELECT WHEN on the receiving area for a
READ … INTO to prevent the "phoney record" scenario, discussed at the
start?  The rule could only take effect if there were more than one area
being read into.  This seems important.
3 Deliberate corruption need not be malicious, but may simply be a "smart-Alec" operation or
trick, perhaps to "fix" or "deconstruct" a pointer or object-reference which appears to work but
later causes a foul-up.