For /F compare two text files

Microsoft Windows
Post Reply
User avatar
MigrationUser
Posts: 336
Joined: 2021-Jul-12, 1:37 pm
Contact:

For /F compare two text files

Post by MigrationUser »

02 Dec 2015 14:34
Newbie

I'm new to CMD Shell and I try to compare two text file using For/F. I have read other post and everyone is using FINDSTR but the FINDSTR is limit to small range of word. I try something like this from another post and it doesn't work.

Code: Select all

for /f %%a in (file1.txt) do (
    set "flag="
    for /f %%b in (file2.txt) do (
        if %%a==%%b (
            echo lines are same
            set flag=1
        )
    )

    if not defined flag (
        echo %%a is not found in file2
    )
)
Thank You in advance for the help.

----------------------------

#2 02 Dec 2015 19:27
Simon Sheppard

Why not just use FC?
http://ss64.com/nt/fc.html

----------------------------

#3 08 Dec 2015 19:16
Newbie

Because if the two file not in right order the result come out blank, I try to use the FOR/F to compare two file with different order.

----------------------------

#4 08 Dec 2015 20:15
Simon Sheppard

So are you trying to compare each line in the first file and find a matching line that could be anywhere in the second file?

What about blank lines, or lines with only one or two characters?

----------------------------

#5 08 Dec 2015 21:27
Shadow Thief

If you're looking to see if every line in file1.txt exists somewhere in file2.txt but not necessarily in the same place, you might want to try sorting both files into temporary files first and then running an fc on the two sorted files. It will go much faster.

Code: Select all

sort file1.txt >sorted1.txt
sort file2.txt >sorted2.txt
fc sorted1.txt sorted2.txt
----------------------------

#6 10 Dec 2015 09:18
bluesxman

While some alternatives have been suggested, you could adapt your original code like so:

Code: Select all

for /f %%a in (file1.txt) do (
    type file2.txt | findstr /x /l "%%a" >nul
    if errorlevel 1 (
         echo "%%a" not found
    ) else (
         echo "%%a" found
    ) 
)
For larger files you may find it takes a while.

cmd | *sh | ruby | chef

----------------------------

#7 11 Dec 2015 19:57
Newbie

Thank You bluesxman
The code work perfectly for small file but when I run large file it take forever. Do you know is there a way to delete the match line to make the data small for a faster run?

----------------------------

#8 11 Dec 2015 21:54
bluesxman

This might be a touch quicker, but I wouldn't expect any miracles:

Code: Select all

for /f %%a in (file1.txt) do (
    findstr /x /l "%%a" file2.txt >nul
    if errorlevel 1 (
         echo "%%a" not found
    ) else (
         echo "%%a" found
    ) 
)
To be frank I would fully expect it to take a long time for a large file, because of how it's working. This is probably why the other solutions where offered. CMD is simply not efficient at doing this sort of thing.

If one of your files is significantly larger than the other, I would make the larger one "file2"; that should give you the best speed possible from this method.

I'm not exactly sure what you mean by "Do you know is there a way to delete the match line".

If you mean deleting matched lines from "file2", it could be done as below. I wouldn't expect to see any particular improvement unless "file2" has a LOT of duplicates -- more likely it will just slow things down even more.

Code: Select all

for /f %%a in (file1.txt) do (
    findstr /x /l "%%a" file2.txt >nul
    if errorlevel 1 (
         echo "%%a" not found
    ) else (
         echo "%%a" found
         findstr /x /l /v "%%a" file2.txt >file2_new.txt
         move /y file2_new.txt file2.txt >nul
    ) 
)
If you mean only considering duplicate lines in "file1" once, then the next version should do that. If "file1" has a lot of duplicates and "file2" is big, you may see some performance improvement from this. There is a known limitation here -- lines that begin with white space will not behave as desired.

Code: Select all

type nul>found.txt
for /f %%a in (file1.txt) do (
    findstr /x /l "%%a" found.txt >nul
    if errorlevel 1 (
        findstr /x /l "%%a" file2.txt >nul
        if not errorlevel 1 (
            set /p "out=%%a" <nul >> found.txt
            echo:>>found.txt
        )
    ) 
)
echo: Found these lines in both files:
type found.txt
del found.txt
Last edited by bluesxman (11 Dec 2015 21:56)

cmd | *sh | ruby | chef

----------------------------

#9 12 Dec 2015 07:41
Aacini

The command below show the lines in file1.txt that does not exist in file2.txt:

findstr /V /G:file2.txt file1.txt

If some lines in file1.txt that exist in file2.txt are displayed, try to insert /X switch.

----------------------------

#10 12 Dec 2015 11:32
bluesxman

Ah good point, I'd forgotten about that switch.

cmd | *sh | ruby | chef

----------------------------

#11 12 Dec 2015 12:26
foxidrive
Newbie wrote:

FINDSTR is limit to small range of word. I try something like this from another post and it doesn't work.
The code you posted doesn't handle spaces, which could be why it failed with your file.
It requires the "delims=" section, as well as the double quotes when testing the for loop variables.

I made some extra changes - but it may not be all that quick with your large files anyway.

Code: Select all

@echo off
for /f "delims=" %%a in (file1.txt) do (
    set "flag="
    for /f "delims=" %%b in (file2.txt) do (
        if not defined flag if "%%a"=="%%b" echo(lines are same & set flag=1
    )
        if not defined flag echo("%%a" is not found in file2
)
----------------------------

#12 12 Dec 2015 17:00
Aacini

This method should run much faster than previous methods, because each file is processed just one time and it does not use external commands. This method will run slower if file1.txt is very big and have not repeated lines.

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Load all unique lines of file1 into "L" array
rem the subscript is enclosed in quotes, so it can have special characters
for /F "delims=" %%a in (file1.txt) do set L"%%a"=1

rem Delete all lines from file2
for /F "delims=" %%a in (file2.txt) do set L"%%a"=

rem Show remaining lines
for /F "delims==" %%a in ('set L^" 2^>NUL') do (
   set "line=%%a"
   echo(!line:~2,-1!
)
Last edited by Aacini (13 Dec 2015 18:45)

----------------------------

#13 22 Dec 2015 16:02
Newbie

Thank You to all that reply to post. I learn a lot from each and everyone of you. smile

original thread: https://ss64.org/oldforum/viewtopic.php?id=2071
Post Reply