Special format for last record in each group

Gandalfito · Post by **Gandalfito** » Thu Aug 30, 2012 1:15 pm

Hi, in a project to optimize batch jobs, we have found the following case:
1) DFSORT that unifies two flat files sorting by a key (there may be duplicates).
2) COBOL program that adds a sequence in a position of the file for each set of identical keys, and mark to indicate the last record of each key.
3) REPRO to copy the output from step 2 to VSAM.

The idea is to replace the above by a DFSORT step, but the drawback I see is that I can not find a way to identify the last record in each group in order to reformat it properly in a single pass (this is necessary because files are more than 50 million records and the objective is to optimize the execution).

We thought in a mixed solution which could be a Cobol program with DFSORT / FASTSRT, which would optimize the reading of the input file (using the "USING file"), add the sequence using DFSORT statements, it to mark the last record of each group in the "OUTPUT-PROCEDURE" and write directly on the VSAM (in this case unoptimized).

The ideal scenario is to avoid this and leave only the DFSORT, but that requires something that I can find, so I humbly ask for help.

The following examples to illustrate what I need to do:

Input file (FB 400) Key: 1,10,CH

Code: Select all

key-aaaaaa....
key-bbbbbb....
key-cccccc....
key-dddddd....
key-eeeeee....
key-aaaaaa....
key-bbbbbb....
key-cccccc....
key-eeeeee....
key-aaaaaa....
key-cccccc....
key-eeeeee....
key-cccccc....

Output file (FB 400)
Key: 1,10,CH
Seq: 11,4,ZD
Flag: 15,1,CH ("*")

Code: Select all

key-aaaaaa0001....
key-aaaaaa0002....	
key-aaaaaa0003*....
key-bbbbbb0001....
key-bbbbbb0002*....
key-cccccc0001....
key-cccccc0002....
key-cccccc0003....
key-cccccc0004*....
key-dddddd0001*....
key-eeeeee0001....
key-eeeeee0002....	
key-eeeeee0003*....

The DFSORT we use have PTF from July, 2008

Thanks

William Collins · Post by **William Collins** » Fri Aug 31, 2012 1:00 am

You are very out-of-date. Any chance of getting DFSORT up-to-date? That will help.

I don't even think you have support for your level, so perhaps see if you can use both these things to get something moving for some future point.

It can be done with JOINKEYS, but you don't have that.

There is another possibility, but find out about upgrade possibilities first.

NicC · Post by **NicC** » Fri Aug 31, 2012 3:10 am

In another forum today the exact same question was asked. The DFSort developer gave a solution using JOINKEYS. You need to get up to date. These upgrades are free.

William Collins · Post by **William Collins** » Fri Aug 31, 2012 3:59 am

Do you have WHEN=GROUP available?

Gandalfito · Post by **Gandalfito** » Fri Aug 31, 2012 9:30 pm

Unfortunately, in my company, tech people are quite reluctant to implement upgrades (even free). Believe me, I've asked many times to do so, without favorable results so far.
I hope this can help convince them.

Fortunately in a few months we will switch mainframe, and as I have been informed we will be more updated.

William C., yes, we have available the option WHEN = GROUP.

NicC, I would like to know the solution using JOINKEYS, so if possible, give me the link to the forum, please.

Greetings and thanks

William Collins · Post by **William Collins** » Sat Sep 01, 2012 3:41 am

OK, "a bit" more complex than the JOINKEYS.

First thing you need is to concatenate a "dummy" record after your data. The contents of the record are irrelevant, but I put "DUMMYDUMMY" to make it clear.

A COPY operation.

In INREC the length is going to be extended to 63 bytes.

First up on the WHEN=INIT is the establishment of a sequence number, which is then "modulus 2'd" to give a value of either 0 or 1. The sequence number is no longer needed, but I left it there whilst developing.

Then a GROUP is established when a 0 is encountered, and the GROUP contains two RECORDS. The "key" is pushed, as is the whole record. This can be rationalised.

In a similar manner a second GROUP is established, when 1 is encountered. The PUSH of the key and the record are in different locations from the previous GROUP.

The next IFTHEN is more to get the "blank" key out of the way. The "blank" is caused by the first record not having had two keys PUSHed on to it. The IFTHEN is present really just to keep that record out of the following test.

In the next IFTHEN the PUSHed keys are compared. If a mismatch, the record is marked as being the last. Note, at this stage the "wrong" record is marked.

Then in OUTREC, to avoid the need for HIT=NEXT, the PUSHed entire record, appropriate to the 0/1 marker, is placed in position 1.

The only thing now is that the first record is blank. OUTFIL OMIT takes care of that. The dummy record has disappeared by being overwritten by the final PUSHed whole record.

Note, I "enhanced" your sample date to get a mix of odd/even records in the key groups.

Test it well. There is room for improvement/rationalisation, as it was easier to develop whilst keeping redundant data.

E, voila!

Code: Select all

//MARKLAST EXEC PGM=SORT 
//CHKOUT DD DUMMY 
//SYSOUT   DD SYSOUT=* 
//SORTOUT  DD SYSOUT=* 
//SYSIN    DD * 
  OPTION COPY 
                                                                  
  INREC  IFOUTLEN=63, 
                                                                  
        IFTHEN=&#40;WHEN=INIT, 
                OVERLAY=&#40;31&#58;SEQNUM,3,ZD,31,3,ZD,MOD,+2,EDIT=&#40;T&#41;&#41;&#41;,
                                                                  
         IFTHEN=&#40;WHEN=GROUP,BEGIN=&#40;34,1,CH,EQ,C'0'&#41;,RECORDS=2, 
                 PUSH=&#40;37&#58;1,2,52&#58;1,11&#41;&#41;, 
                                                                  
         IFTHEN=&#40;WHEN=GROUP,BEGIN=&#40;34,1,CH,EQ,C'1'&#41;,RECORDS=2, 
                 PUSH=&#40;35&#58;1,2,40&#58;1,11&#41;&#41;, 
                                                                  
         IFTHEN=&#40;WHEN=&#40;37,2,CH,EQ,C'  '&#41;,OVERLAY=&#40;37&#58;35,2&#41;&#41;, 
                                                                  
         IFTHEN=&#40;WHEN=&#40;35,2,CH,NE,37,2,CH&#41;,
                OVERLAY=&#40;12&#58;C'EC'&#41;&#41; 

                                                                  
  OUTREC IFOUTLEN=13, 
                                                                  
         IFTHEN=&#40;WHEN=&#40;34,1,CH,EQ,C'1'&#41;, 
             OVERLAY=&#40;1&#58;52,11&#41;&#41;, 
                                                                  
         IFTHEN=&#40;WHEN=&#40;34,1,CH,EQ,C'0'&#41;, 
             OVERLAY=&#40;1&#58;40,11&#41;&#41; 
                                                                  
  OUTFIL OMIT=&#40;1,11,CH,EQ,C' '&#41;
//SORTIN   DD * 
XX000100.10 
XX000101.11 
XX000500.34 
XX000678.23 
XX000099.42 
YY000578.98 
YY000728.00 
ZZ000356.89 
ZZ178728.90 
ZZ999999.99 
DUMMYDUMMY

Output is:

Code: Select all

XX000100.10   
XX000101.11   
XX000500.34   
XX000678.23   
XX000099.42EC 
YY000578.98   
YY000728.00EC 
ZZ000356.89   
ZZ178728.90   
ZZ999999.99EC

William Collins · Post by **William Collins** » Sat Sep 01, 2012 12:34 pm

In an amusing follow-up, I've just noticed that the data and values I used are from the JOINKEYS example "elsewhere"

Your records are 400 bytes. Will work. Obviously you'll be interested in "how fast" but you'll have to let us know..

NicC · Post by **NicC** » Sat Sep 01, 2012 5:09 pm

It would be bad manners of me to post the other forums details here. Go onto google and look for mainframe forums or dfsort forums or...

William Collins · Post by **William Collins** » Sat Sep 01, 2012 9:49 pm

If you bear in mind that I used the test data from the JOINKEYS solution, it shouldn't be difficult to find with google

Gandalfito · Post by **Gandalfito** » Wed Sep 05, 2012 2:06 am

William C.
Thank you very much!
I will test the code and let you know the results and any improvements I can make. Thanks again for your time!

NicC, you're right, it was wrong to ask for that link. Anyway I was able to locate it. Thank you very much for your contribution!

William Collins · Post by **William Collins** » Wed Sep 05, 2012 3:36 pm

After a bit of a discussion, I have reworked it.

I have also used DFSORT symbols to "generalise" the solution.

Here is what to include for the symbols

Code: Select all

//SYMNAMES DD * 
INPUT-RECORD,1,20,CH 
  INPUT-KEY,=,3,CH 
  INPUT-DATA,=,20,CH 
POSITION,INPUT-RECORD 
OUTPUT-RECORD,=,23,CH 
  OUTPUT-KEY,=,3,CH 
  OUTPUT-ORIG-DATA,=,20,CH 
  OUTPUT-MARKER,*,3,CH 
* TEMPORARY FIELDS AS "EXTENSION" TO OUTPUT-RECORD
  TEMP-RECORD-SEQ,*,8,ZD 
  TEMP-GROUP-SEQ,*,3,ZD 
  TEMP-MARKER,*,3,CH 
  TEMP-GROUPOFTWO-SEQ,*,1,CH 
  TEMP-LEFT-RECORD,*,20,CH 
  TEMP-RIGHT-RECORD,*,20,CH 
* CONSTANTS 
  END-OF-GROUP-MARKER,C'BBB' 
//SYMNOUT DD SYSOUT=*

This allows you to leave the Sort Control Cards "untouched".

These you need to change to your values:

Code: Select all

INPUT-RECORD,1,20,CH /* 20 to your length
  INPUT-KEY,=,3,CH  /* 3 to your key length
  INPUT-DATA,=,20,CH /* 20 to your length
POSITION,INPUT-RECORD 
OUTPUT-RECORD,=,23,CH  /* 23 to your output length
  OUTPUT-KEY,=,3,CH /* 3 to your key length
  OUTPUT-ORIG-DATA,=,20,CH /* 20 to your original length
  OUTPUT-MARKER,*,3,CH /* to position and length of your new data, * for appending

  END-OF-GROUP-MARKER,C'BBB' /* the value to mark end of group

Note, if you can have more than 999 records with same key, you'll need to extend the size of that SEQ as well as the length of TEMP-GROUP-SEQ in SYMNAMES.

Here are the control cards:

Code: Select all

  OPTION COPY 
                                                                   
  INREC  IFTHEN=&#40;WHEN=INIT, 
                   OVERLAY=&#40;TEMP-RECORD-SEQ&#58;SEQNUM,8,ZD, 
                            TEMP-GROUP-SEQ&#58;SEQNUM,3,ZD, 
                                 RESTART=&#40;INPUT-KEY&#41;, 
                            TEMP-MARKER&#58;END-OF-GROUP-MARKER&#41;&#41;, 
                                                                   
         IFTHEN=&#40;WHEN=GROUP, 
                   RECORDS=2, 
                    PUSH=&#40;TEMP-GROUPOFTWO-SEQ&#58;SEQ=1&#41;&#41;, 
                                                                   
         IFTHEN=&#40;WHEN=GROUP, 
                   BEGIN=&#40;TEMP-GROUPOFTWO-SEQ,EQ,TO-PUSH-LEFT&#41;, 
                    RECORDS=2, 
                     PUSH=&#40;TEMP-LEFT-RECORD&#58;INPUT-RECORD&#41;&#41;, 
                                                                   
         IFTHEN=&#40;WHEN=GROUP, 
                   BEGIN=&#40;TEMP-GROUPOFTWO-SEQ,EQ,TO-PUSH-RIGHT&#41;, 
                    RECORDS=2, 
                     PUSH=&#40;TEMP-RIGHT-RECORD&#58;INPUT-RECORD&#41;&#41;, 
                                                                   
         IFTHEN=&#40;WHEN=GROUP, 
                   BEGIN=&#40;TEMP-GROUP-SEQ,EQ,FIRST-RECORD-OF-GROUP&#41;,
                    RECORDS=1, 
                     PUSH=&#40;OUTPUT-MARKER&#58;TEMP-MARKER&#41;&#41;, 
                                                                   
         IFTHEN=&#40;WHEN=&#40;TEMP-GROUPOFTWO-SEQ,EQ,TO-GET-FROM-RIGHT&#41;, 
                   OVERLAY=&#40;OUTPUT-RECORD&#58;TEMP-RIGHT-RECORD&#41;&#41;, 
                                                                   
         IFTHEN=&#40;WHEN=&#40;TEMP-GROUPOFTWO-SEQ,EQ,TO-GET-FROM-LEFT&#41;, 
                   OVERLAY=&#40;OUTPUT-RECORD&#58;TEMP-LEFT-RECORD&#41;&#41; 
                                                                   
   OUTFIL OMIT=&#40;TEMP-RECORD-SEQ,EQ,FIRST-RECORD-ON-FILE&#41;, 
           BUILD=&#40;OUTPUT-RECORD&#41;

The solution is based on this: It is easy to mark the first record in a group; if the records can be "moved down" whilst the marker stays where it is, then the last record of the previous group will have been marked.

The logic has changed from the previous version.

The INIT now has a sequence number for the file. This is because it will be more reliable to OMIT the first record of the file later rather than a "blank record", just in case a blank record genuinely exists in the file.

Also on the INIT, is the sequence number for the key group. If KEYBEGIN is available, this can be removed and the GROUP with sequence equal to 1 amended to use KEYBEGIN (thanks to sk for suggesting KEYBEGIN)..

Finally on the INIT is the marker value, copied from a Symbol. If it exists on the record it can be PUSHed later, a constant cannot be PUSHed.

Rather than using MOD to set up a 0/1 value, a GROUP with two RECORDS is now used to set up a 1/2 value. (thanks to sk).

The GROUP with test for FIRST-RECORD-OF-GROUP replaces the comparison of the keys.

I have tested with a different length record, key and marker value.

Here is the change:

Code: Select all

INPUT-RECORD,1,20,CH 
  INPUT-KEY,=,3,CH 
  INPUT-DATA,=,20,CH 
POSITION,INPUT-RECORD 
OUTPUT-RECORD,=,23,CH 
  OUTPUT-KEY,=,3,CH 
  OUTPUT-ORIG-DATA,=,20,CH 
  OUTPUT-MARKER,*,3,CH 
* TEMPORARY FIELDS AS "EXTENSION" TO OUTPUT-RECORD
  TEMP-RECORD-SEQ,*,8,ZD 
  TEMP-GROUP-SEQ,*,3,ZD 
  TEMP-MARKER,*,3,CH 
  TEMP-GROUPOFTWO-SEQ,*,1,CH 
  TEMP-LEFT-RECORD,*,20,CH 
  TEMP-RIGHT-RECORD,*,20,CH 
* CONSTANTS 
  END-OF-GROUP-MARKER,C'BBB' 
  FIRST-RECORD-ON-FILE,1 
  FIRST-RECORD-OF-GROUP,1 
  TO-PUSH-LEFT,C'1' 
  TO-PUSH-RIGHT,C'2' 
  TO-GET-FROM-RIGHT,C'1' 
  TO-GET-FROM-LEFT,C'2'

Here is the input:

Code: Select all

XXX00100.10999999991
XXX00101.11        2
XXX00500.34        3
XXX00678.23        4
XXX00099.42        5
YYY00578.98        6
YYY00728.00        7
ZZZ00356.89        8
ZZZ78728.90        9
ZZZ99999.99       10
DUMMYDUMMY

Here is the output:

Code: Select all

XXX00100.10999999991    
XXX00101.11        2    
XXX00500.34        3    
XXX00678.23        4    
XXX00099.42        5BBB 
YYY00578.98        6    
YYY00728.00        7BBB 
ZZZ00356.89        8    
ZZZ78728.90        9    
ZZZ99999.99       10BBB

Finally, to create the "dummy" record to concatenate to your data,

A small step with your main file on SORTIN.
SORTOUT to be the dummy trailer file, no DCB info.

Code: Select all

  OPTION COPY,STOPAFT=1
  INREC OVERLAY=&#40;C'CAN BE ANY TEXT FOR THIS DUMMY'&#41;

mainframegurukul.com

Special format for last record in each group

Special format for last record in each group

FREE TUTORIALS