comparing two flat files
Moderators: Frank Yaeger, Moderator Group
-
- Member
- Posts: 11
- Joined: Wed Nov 08, 2006 4:57 pm
comparing two flat files
Hi,
I want to compare two files for which the senerio is:
The comapre field in the first file is at col.no 13 and in the other file its on the col.no 23.(both have same length and same format)
Now, I want to compare the two file based on these field and remove the duplicate records.
Is this possible with sort?
-Regards
Abhishek
I want to compare two files for which the senerio is:
The comapre field in the first file is at col.no 13 and in the other file its on the col.no 23.(both have same length and same format)
Now, I want to compare the two file based on these field and remove the duplicate records.
Is this possible with sort?
-Regards
Abhishek
- Frank Yaeger
- Moderator
- Posts: 812
- Joined: Sat Feb 18, 2006 5:45 am
- Location: San Jose, CA
- Contact:
It's not clear what exactly you want to do. What is the RECFM and LRECL of each input file? What is the length and format of the field in each input file you want to compare? Does input file1 have duplicates within it? Does input file2 have duplicates within it? Please show an example of the records in each input file and what you expect for output.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
-
- Member
- Posts: 11
- Joined: Wed Nov 08, 2006 4:57 pm
Frank,
please find the answeres to your queries.
1. What is the RECFM and LRECL of each input file?
LRECL for FILEA is 478 and the for the other is 435.RECFM is FB for both the files.
2.What is the length and format of the field in each input file you want to compare?
Format is ZD and the length is 14.
3. Does input file1 have duplicates within it? MAY HAVE
4. Does input file2 have duplicates within it? NO
5.Please show an example of the records in each input file and what you expect for output.
FILEA
---------1--------2---------3---------4---------5---------6---------7---
cols:1-12 PD
cols:13-26 ZD
cols:27-478(includes different fields)
FILEB
---------1--------2---------3---------4---------5---------6---------7---
cols:1-12 PD
cols:13-22 ZD
cols:23-37 ZD
and rest as above..
I won't be able to provide data at this time because m not having mainframe connectivity at home. will provide you by late night.
Comparision has to done on cols:13-26(FILEA) & cols:23-37(FILEB)
please let me know if you want any information.
-Abhi
please find the answeres to your queries.
1. What is the RECFM and LRECL of each input file?
LRECL for FILEA is 478 and the for the other is 435.RECFM is FB for both the files.
2.What is the length and format of the field in each input file you want to compare?
Format is ZD and the length is 14.
3. Does input file1 have duplicates within it? MAY HAVE
4. Does input file2 have duplicates within it? NO
5.Please show an example of the records in each input file and what you expect for output.
FILEA
---------1--------2---------3---------4---------5---------6---------7---
cols:1-12 PD
cols:13-26 ZD
cols:27-478(includes different fields)
FILEB
---------1--------2---------3---------4---------5---------6---------7---
cols:1-12 PD
cols:13-22 ZD
cols:23-37 ZD
and rest as above..
I won't be able to provide data at this time because m not having mainframe connectivity at home. will provide you by late night.
Comparision has to done on cols:13-26(FILEA) & cols:23-37(FILEB)
please let me know if you want any information.
-Abhi
- Frank Yaeger
- Moderator
- Posts: 812
- Joined: Sat Feb 18, 2006 5:45 am
- Location: San Jose, CA
- Contact:
Now, I want to compare the two file based on these field and remove the duplicate records.
This is a rather ambiguous statement. For the output, do you want records from file1 only, or records from file2 only, or records from both files? Do you want to keep one record for each key (from which file?), or do you want to eliminate all records that have a match (from both files), or what?
I wanted to see an example of your input records and output records to determine what you actually want to do. A conceptual example like this will do:
Code: Select all
File 1
F1R1 01
F1R2 01
F1R3 05
F1R4 10
F1R5 10
F1R6 10
F1R7 15
F1R8 20
F1R9 20
File2
F2R1 01
F2R2 10
F2R3 15
F2R4 18
Output
???
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
-
- Member
- Posts: 11
- Joined: Wed Nov 08, 2006 4:57 pm
Code:
File 1
File2
Output
FILEA
=COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
000001 ?? 01200607212014190418112064 Kincumber 124
000002 ?? 15200607212137080418112064 Chatswood 122
000003 ?? 01200607212138230418112064 NL Chatswood 947
000004 ?? 01200607212139060418112064 NL Chatswood 941
000005 ?? 01200607212139400418112064 NL Chatswood 941
000006 ?? 15200607212139590418112064 Chatswood 122
FILEB
=COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
000001 ?? 2006082H O0120060721201419 041811206412488 Kincu
000002 ?? 2006082H O0120060721213823 04181120640294778400 Chats
000003 ?? 2006082H O0120060721213906 04181120640294118400 Chats
000004 ?? 2006082H O0120060721213940 04181120640294118400 Chats
000005 ?? 2006082H O0120060721214055 04181120640294118000 Chats
000006 ?? 2006082H O1520060721213708 04181120641223 Chats
000007 ?? 2006082H O1520060721213959 04181120641223 Chats
The data marked in red is the key field.
Now, I want that if the key from FILEB matches key in FILEA the it should be eliminated and if it does not matches, the record should be written to the output file(the output record should be in the format of FILEA)
NOTE:
FILEA LRECL=478, FILEB LRECL=435
RECFM is same for both files.
File 1
Code: Select all
a/c loc name age add
------------------------
F1R1 01
F1R1 01
F1R1 05
F1R1 10
F1R1 10
F1R1 10
F1R1 15
F1R1 20
F1R1 20
Code: Select all
a/c status loc
-------------
F1R1 S1 01
F1R1 S2 10
F1R1 S3 15
F1R1 S4 18
Code: Select all
a/c loc name age add
-----------------------
F1R1 05
F1R1 20
=COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
000001 ?? 01200607212014190418112064 Kincumber 124
000002 ?? 15200607212137080418112064 Chatswood 122
000003 ?? 01200607212138230418112064 NL Chatswood 947
000004 ?? 01200607212139060418112064 NL Chatswood 941
000005 ?? 01200607212139400418112064 NL Chatswood 941
000006 ?? 15200607212139590418112064 Chatswood 122
FILEB
=COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
000001 ?? 2006082H O0120060721201419 041811206412488 Kincu
000002 ?? 2006082H O0120060721213823 04181120640294778400 Chats
000003 ?? 2006082H O0120060721213906 04181120640294118400 Chats
000004 ?? 2006082H O0120060721213940 04181120640294118400 Chats
000005 ?? 2006082H O0120060721214055 04181120640294118000 Chats
000006 ?? 2006082H O1520060721213708 04181120641223 Chats
000007 ?? 2006082H O1520060721213959 04181120641223 Chats
The data marked in red is the key field.
Now, I want that if the key from FILEB matches key in FILEA the it should be eliminated and if it does not matches, the record should be written to the output file(the output record should be in the format of FILEA)
NOTE:
FILEA LRECL=478, FILEB LRECL=435
RECFM is same for both files.
- Frank Yaeger
- Moderator
- Posts: 812
- Joined: Sat Feb 18, 2006 5:45 am
- Location: San Jose, CA
- Contact:
There are two 20 records in file1, but you only show one 20 record in the output file, so I assumed you only want one record with each key in the output file. I also assumed that the key in file1 starts in position 13 and the key in file2 starts in position 23 as you said in your first post. Given that, here's a DFSORT/ICETOOL job that will do what you asked for:
Code: Select all
//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN1 DD DSN=... input file1 (FB/478)
//IN2 DD DSN=... input file2 (FB/435)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(MOD,PASS)
//OUT DD DSN=... output file (FB/478)
//TOOLIN DD *
COPY FROM(IN2) TO(T1) USING(CTL1)
COPY FROM(IN1) TO(T1) USING(CTL2)
SPLICE FROM(T1) TO(OUT) ON(13,14,CH) -
KEEPNODUPS WITH(1,479) USING(CTL3)
/*
//CTL1CNTL DD *
INREC BUILD=(13:23,14,479:C'BB')
/*
//CTL2CNTL DD *
INREC OVERLAY=(479:C'VV')
/*
//CTL3CNTL DD *
OUTFIL FNAMES=OUT,INCLUDE=(479,2,CH,EQ,C'VV'),
BUILD=(1,478)
/*
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
-
- Member
- Posts: 11
- Joined: Wed Nov 08, 2006 4:57 pm
- Frank Yaeger
- Moderator
- Posts: 812
- Joined: Sat Feb 18, 2006 5:45 am
- Location: San Jose, CA
- Contact:
Glad I could help.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
-
- Member
- Posts: 11
- Joined: Wed Nov 08, 2006 4:57 pm
- Frank Yaeger
- Moderator
- Posts: 812
- Joined: Sat Feb 18, 2006 5:45 am
- Location: San Jose, CA
- Contact:
We can leave it open. Somebody may have something relevant to add someday. I generally only close a thread when it's a duplicate.I think we can close this session.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
FREE TUTORIALS
Tutorials
Free tutorials from mainframegurukul
- JCL Tutorial
Covers all important JCL concepts. - Cobol Tutorial
This tutorials covers all Cobol Topics from STRING to COMP-3. - DB2 Tutorial
DB2 Tutorial focuses on DB2 COBOL Programming. - SORT Tutorial
This Tutorial covers all important aspects of DFSORT with examples - CICS Tutorial
This CICS tutorial covers CICS concepts and CICS Basics, CICS COBOL Programming.
Interview
Mainframe Interview questions
- Cobol Interview Questions
50+ Interview Questions - JCL Interview Questions
50+ Interview Questions - DB2 Interview Questions
100+ Interview Questions - CICS Interview Questions
70+ Interview Questions - VSAM Interview Questions
27 Interview Questions
Other References
Mainframe Tools and others
- XPEDITER Reference
Explains how we can debug a program - FILEAID Reference
Explains how to browse , edit and delete datasets - Change Man Reference
Quick Start tutorial on Changeman - Abend Reference
Important Abend codes explained - FaceBook Page
MainframeGurukul FaceBook Page - LinkedIn Page
MainframeGurkul Linkedin Page