Page 1 of 1

Eliminate duplicate records in file

Posted: Tue Oct 06, 2009 10:55 am
by banupriyab
Hi,
I have eliminated duplicate records in a file using SUM FIELDS=NONE in SORT.

//SYSIN DD *
SORT FIELDS=(1,23,CH,A)
SUM FIELDS=NONE

But, I DONT want the records to get aligned in ascending order and then get duplicates eliminated. Is there any other way to remove dups without getting sorted in asc or desc order?

Eg.,
Input records:

MOHANK 123456789012345 RAJES
ANTAKS 123456789012345 MIRAJ
MOHANK 123456789012345 NANAK

Output records:
ANTAKS 123456789012345 MIRAJ
MOHANK 123456789012345 RAJES

I want MOHANK in the first place followed by ANTAKS. Pls help.


Thanks,

BanuPriya B

Posted: Tue Oct 06, 2009 6:56 pm
by Anuj Dhawan
Probably you want to say, "I would like to keep the order of the input records when they are copied to the output file"... if so, eliminating duplicates, either by using SUM FIELDS=NONE or SELECT, requires sorting the records so that the records with the same key are in order.

If you need to keep the records in their original order, then you can use the trick of adding a sequence number before you eliminate the duplicates, and then sorting on that sequence number to get the remaining records back in their original order.

Posted: Tue Oct 06, 2009 7:03 pm
by Anuj Dhawan
Other suggestion which comes to mind is to use EQUALS. EQUALS tells the process to preserve the original order of the data within the "sort keys". Your site default is probably EQUALS. DFSORT is shipped with NOEQUALS as the default, but the site can change that to EQUALS.

If you're using DFSORT (and I'm not sure you are), you can see the value for EQUALS in message ICE128I ... it will have EQUALS=N or EQUALS=Y.

You can try using:

Code: Select all

   OPTION NOEQUALS 
to turn off EQUALS and see what you get.

Posted: Tue Oct 06, 2009 8:05 pm
by Frank Yaeger
BanuPriya,

Here's a DFSORT/ICETOOL job that will do what you asked for. I assumed your input file has RECFM=FB and LRECL=80, but the job can be changed appropriately for other attributes.

Code: Select all

//S1   EXEC  PGM=ICETOOL
//TOOLMSG   DD  SYSOUT=*
//DFSMSG    DD  SYSOUT=*
//IN DD DSN=...  input file (FB/80)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//OUT DD DSN=...  output file (FB/80)
//TOOLIN DD *
SORT FROM(IN) TO(T1) USING(CTL1)
SORT FROM(T1) TO(OUT) USING(CTL2)
/*
//CTL1CNTL DD *
  INREC OVERLAY=(81:SEQNUM,8,ZD)
  SORT FIELDS=(1,23,CH,A),EQUALS
  SUM FIELDS=NONE
/*
//CTL2CNTL DD *
  SORT FIELDS=(81,8,ZD,A)
  OUTREC BUILD=(1,80)
/*
Note that EQUALS will keep the duplicate records in their original order, but will not keep all of the records in their original order. For that, you need two passes over the data.

Posted: Tue Oct 06, 2009 9:40 pm
by Anuj Dhawan
Note that EQUALS will keep the duplicate records in their original order, but will not keep all of the records in their original order.
Thank you Frank -- I was little unsure about it.

Have a good one,

Regards,

Posted: Tue Oct 13, 2009 1:35 pm
by banupriyab
Thank u!! :D

The ICETOOL code worked!!
But dont we have any other means with normal SORT/SYNCSORT, because none of the jobs in our system uses ICETOOL.

Posted: Tue Oct 13, 2009 8:19 pm
by Frank Yaeger
I don't know what you're asking for. If you have DFSORT, then you have ICETOOL. ICETOOL has been part of DFSORT since 1991! You said ICETOOL worked, so what is the problem?

If you don't want to use ICETOOL for some reason, then you can just use two DFSORT steps instead.

Posted: Mon Oct 26, 2009 9:53 pm
by Alissa Margulies
banupriyab wrote:...dont we have any other means with normal SORT/SYNCSORT, because none of the jobs in our system uses ICETOOL.
SyncSort ships with ICETOOL as an alias to SYNCTOOL. If you prefer, as Frank suggested, you can code the following SyncSort job:

Code: Select all

//STEP1 EXEC PGM=SORT
//SORTIN  DD DSN=input.file
//SORTOUT DD DSN=&&TEMP
//SYSOUT  DD SYSOUT=*
//SYSIN   DD *
  INREC OVERLAY=(81:SEQNUM,8,ZD) 
  SORT FIELDS=(1,23,CH,A),EQUALS 
  SUM FIELDS=NONE 
/*
//STEP2 EXEC PGM=SORT
//SORTIN  DD DSN=&&TEMP
//SORTOUT DD DSN=output.file
//SYSOUT  DD SYSOUT=*
//SYSIN   DD *
  SORT FIELDS=(81,8,ZD,A) 
  OUTREC BUILD=(1,80) 
/*

Topic deleted by Admin

Posted: Mon Jan 25, 2016 10:19 pm
by academyindia4
<< Content deleted By Admin >>