Use SED to extract part of a line?

Microsoft Windows
Rekrul
Posts: 52
Joined: 2021-Aug-15, 11:29 pm

Use SED to extract part of a line?

Post by Rekrul »

I know this isn't really a Windows CMD command question, but I'm not sure where else to post this. Every time I try to post a question to a site like Stack Exchange, I'm usually told I posted in the wrong section, or I just get directed to a post asking a similar, but not quite the same, question, with an answer that doesn't work for me. And since I have an old system with old browsers, I can't even reply to any answers that get posted.

If I've posted to the wrong section, I apologize.

I've just spent over an hour Googling this and trying to follow examples on the net. Invariably every single last one of them contains some parameter or text that causes it to not work for what I want to do. Half the answers tell the person posting the question to use something else, like grep (as far as I know, grep can't do specific line numbers), and when I Google how to use that program to do what I want, the replies say to use SED instead! Or they asy to use AWK, which I don't have.

I've tried reading the SED documentation, but I find it extremely confusing, with examples for complex situations, but being pretty vague on simple usage.

Here is what I want to do;

This is line 35 of a file;

Code: Select all

<segment bytes="126634" number="11">part11of11.oyMncnhsPk99bxa&amp;QDD2@powerpost2000AA.local</segment>
I want to use SED to extract JUST the number between number=" and "> from only this line. Which in this case would be 11.

Could someone take pity on me and tell me what SED command would do this?

I'm using a version of SED ported to Windows if that matters.
User avatar
Simon Sheppard
Posts: 190
Joined: 2021-Jul-10, 7:46 pm
Contact:

Re: Use SED to extract part of a line?

Post by Simon Sheppard »

SED is a stream editor intended to modify a files contents rather than extracting values from it.

if you want to use bash utilities, I would start by using head and tail to extract just line 35.
Then when you have that single line, use grep to extract the part you need.

If you want to do this in native Windows cmd, then extract the 35th line with

Code: Select all

SET "_line35="
FOR /F "skip=34 delims=" %%i in (demo.txt) DO if not defined _line35 set "_line35=%%i"
Echo %_line35%
Then when you have that single line, use find or another FOR command to extract the part you need.
Rekrul
Posts: 52
Joined: 2021-Aug-15, 11:29 pm

Re: Use SED to extract part of a line?

Post by Rekrul »

Simon Sheppard wrote: 2023-Mar-12, 1:48 pm SED is a stream editor intended to modify a files contents rather than extracting values from it.
OK. I thought that since it could be restricted to individual lines, it would be a good choice for extracting part of a specific line.
Simon Sheppard wrote: 2023-Mar-12, 1:48 pm if you want to use bash utilities, I would start by using head and tail to extract just line 35.
Then when you have that single line, use grep to extract the part you need.
Isolating that line isn't really the problem, more that I don't know how to Grep the line to get just the number inside the quotations. When I Google how to extract a substring using Grep, most of the examples either show how to cut off the start of the line, or they pipe it to SED or AWK.

Can just Grep be used to extract the number between number=" and ">?
Simon Sheppard wrote: 2023-Mar-12, 1:48 pm If you want to do this in native Windows cmd, then extract the 35th line with

Code: Select all

SET "_line35="
FOR /F "skip=34 delims=" %%i in (demo.txt) DO if not defined _line35 set "_line35=%%i"
Echo %_line35%
Then when you have that single line, use find or another FOR command to extract the part you need.
I don't mind using external commands, I was just trying to do it as efficiently as possible.
OJBakker
Posts: 13
Joined: 2021-Jul-29, 7:06 am

Re: Use SED to extract part of a line?

Post by OJBakker »

No need to use external tools.

Code: Select all

@echo off
set "SegNr="
for /f tokens^=4^ delims^=^" %%A in ('more +34 text.txt') do if not defined SegNr set "SegNr=%%A"
echo SegmentNumber=%SegNr%
pause
or

Code: Select all

@echo off
(more +34 "text.txt")>"text_tmp.txt"
set /p Line=<"text_tmp.txt"
set "SegNr="
for /f tokens^=4^ delims^=^" %%A in ("%Line%") do set "SegNr=%%A"
echo SegmentNumber=%SegNr%
pause
Rekrul
Posts: 52
Joined: 2021-Aug-15, 11:29 pm

Re: Use SED to extract part of a line?

Post by Rekrul »

OJBakker wrote: 2023-Mar-13, 12:07 pm No need to use external tools.
Thank you for the reply. I actually found this solution (using quotes as a deliminator) through a Google search the other day, and finished my script. Well, "finished" in the sense that it works. Now I'm going to polish it and add some options.

For those who are interested;

Short version: It's a script to automatically fix broken NZB files generated by the NZBKing website.

Long version: NZB files are XML files used to download files from Usenet Newsgroups, which are decentralized message forums. The NZB file is kind of like a torrent in that it tells the client program what files to download. NZB files can, and usually do include multiple related files grouped together. In addition, each file is further broken up into parts to avoid hitting post size limits. The NZB lists all these parts so that the original file(s) can be recreated.

NZB search sites like NZBKing let you search for what you want and generate an NZB for it. Unfortunately, NZBKing creates broken files. It fails to append the part count to the end of the header for each file, such as (1/25). Because of this, most programs think any files with more than one part are broken. The solution is to check the number of parts for each file and put that information in the header for the file, which is a pain in the butt to do manually.

All header lines start with <file poster= and the line below the last part in each set contains </segments>. So my script uses Grep to count the number of files contained in the NZB, then uses a For loop to grep each matching header line and put the line number into a numbered array. Then it does the same for the /segments tags, subtracting one to get the line above it. SED is used to feed that line to For, which gets the the part number, then SED is used to append that number to the corresponding header line, writing it to a temporary file, the original is deleted, the temp file is renamed to the original name, it checks to see if it's hit the last file (in the NZB) yet, and if not, it loops, incrementing the counter.

Limitations: Technically, the parts can be listed in any order, so the last one listed isn't always guaranteed to be the last part of each file. However NZBKing always seems to list the parts in ascending order, and since the script is only needed for NZBs from that website, it shouldn't be a problem. Files in the newsgroups sometimes have missing parts and people post "Par" files to fix this. If the last part of a file is missing, it won't be listed in the NZB, but since the header doesn't tell you how many parts are in the file (the problem this script was created to fix), there's no way to know that the last part listed isn't the true last part. That should be a rare occurrence and shouldn't really make a difference as to how possible it is to download the files. If the part is missing, making the part count one number higher wouldn't change much.

It's not especially fast, and surprisingly, when I removed the array usage and instead dumped the Grepped line numbers to a temporary file and then used For with SED to pull just the appropriate lines out of that file and put them into variables, it was twice as slow! I figured that since it wouldn't have to jump around with a bunch of Calls, it would be faster, but that's not the case.

If you want to try it, search for anything you want on the NZBKing website, and save one or more NZB files, then run this script in that directory and it will fix them. The original files will be copied to a directory called Originals, and it will skip any files that have already been fixed (or that don't need fixing).

Code: Select all

@echo off

for %%F in (*.nzb) do (set Filename=%%~nF
call :ProcessNZB)
goto:eof

:ProcessNZB
for /f %%E in ('grep -c "<file poster=" "%Filename%.nzb"') do set FilesCount=%%E
for /f %%E in ('grep -c "yEnc (1/" "%Filename%.nzb"') do set FixedCount=%%E
if %FixedCount% equ %FilesCount% exit /b

md Originals 2>nul

echo %Filename%
copy "%Filename%.nzb" Originals\ >nul

set Counter=1
for /f "tokens=1 delims=:" %%E in ('grep -no "<file poster=" "%Filename%.nzb"') do (set Value=%%E
set Placeholder=Subject
call :ArrayIn
call :IncrementCounter)

set Counter=1
for /f "tokens=1 delims=:" %%E in ('grep -no "</segments>" "%Filename%.nzb"') do (set /a Value=%%E-1
set Placeholder=LastPart
call :ArrayIn
call :IncrementCounter)

set Counter=1
:FixNZB
set Placeholder=Subject
call :ArrayOut
set SubjectLine=%Value%

set Placeholder=LastPart
call :ArrayOut
set LastPartLine=%Value%
call :IncrementCounter

for /F tokens^=4^ delims^=^" %%P in ('sed -n "%LastPartLine%,%LastPartLine%p" "%Filename%.nzb"') do set LastPartNumber=%%P

sed "%SubjectLine%s/yEnc /yEnc (1\/%LastPartNumber%)/" "%Filename%.nzb" >Temp.tmp
del "%Filename%.nzb"
ren Temp.tmp "%Filename%.nzb"

if %Counter% leq %FilesCount% goto FixNZb
exit /b

:ArrayIn
call :Array2 %Placeholder%[%Counter%] %Value%
exit /b
:ArrayOut
call :Array2 Value %%%Placeholder%[%Counter%]%%
exit /b
:Array2
set %1=%2
exit /b


:IncrementCounter
set /a Counter=Counter+1
exit /b
And here's the slower version that doesn't use arrays;

Code: Select all

@echo off

for %%F in (*.nzb) do (set Filename=%%~nF
call :ProcessNZB)
del LineNumbers.tmp
goto:eof

:ProcessNZB
for /f %%E in ('grep -c "<file poster=" "%Filename%.nzb"') do set FilesCount=%%E
for /f %%E in ('grep -c "yEnc (1/" "%Filename%.nzb"') do set FixedCount=%%E
if %FixedCount% equ %FilesCount% exit /b

md Originals 2>nul

echo %Filename%
copy "%Filename%.nzb" Originals\ >nul

grep -no "<file poster=" "%Filename%.nzb" >LineNumbers.tmp
grep -no "</segments>" "%Filename%.nzb" >>LineNumbers.tmp

set Counter=1
set /a FilesCount=FilesCount+1
:FixNZB
set /a Counter2=Counter+FilesCount-1

for /f "tokens=1 delims=:" %%E in ('sed -n "%Counter%,%Counter%p" LineNumbers.tmp') do set SubjectLine=%%E
for /f "tokens=1 delims=:" %%E in ('sed -n "%Counter2%,%Counter2%p" LineNumbers.tmp') do set /a LastPartLine=%%E-1

for /F tokens^=4^ delims^=^" %%P in ('sed -n "%LastPartLine%,%LastPartLine%p" "%Filename%.nzb"') do set LastPartNumber=%%P

sed "%SubjectLine%s/yEnc /yEnc (1\/%LastPartNumber%)/" "%Filename%.nzb" >Temp.tmp
del "%Filename%.nzb"
ren Temp.tmp "%Filename%.nzb"
call :IncrementCounter

if %Counter% lss %FilesCount% goto FixNZb
exit /b

:IncrementCounter
set /a Counter=Counter+1
exit /b
Rekrul
Posts: 52
Joined: 2021-Aug-15, 11:29 pm

Re: Use SED to extract part of a line?

Post by Rekrul »

Unless I find any bugs, I'm declaring my NZB fixer script done. I added progress updates to the title bar, displaying the number of the file being processed as well as how many headers within that file have been fixed. I added protections against poison characters in the filenames being Echoed to the window, such as the ampersand. They didn't cause the script to fail, but the error looked unsightly. I used Delayed Expansion, since just having the names enclosed in quotes looked ugly in my opinion.

It also has optional provisions for being passed a list of files to process instead of going through every NZB file in the directory, so that it can be called from a file manager to process just selected files. In Total Commander, you do this by including the parameters

%F %Y

Here it is;

Code: Select all

@echo off
title Fix NZBKing Files

set CurrentFile=1

if "%1"=="" goto ProcessAll
set List=%1
set NZBCount=0
for /f %%F in (%List%) do set /a NZBCount=NZBCount+1

if %NZBCount% equ 0 goto ProcessAll

for /f "delims=" %%F in (%List%) do (set Filename=%%F
call :ProcessNZB)
goto:eof

:ProcessAll
for %%F in (*.nzb) do set /a NZBCount=NZBCount+1

for %%F in (*.nzb) do (set Filename=%%F
call :ProcessNZB)
goto:eof

:ProcessNZB
if not "%Filename:~-4%"==".nzb" exit /b

for /f %%E in ('grep -c "<file poster=" "%Filename%"') do set FilesCount=%%E
for /f %%E in ('grep -c "yEnc (1/" "%Filename%"') do set FixedCount=%%E
if %FixedCount% equ %FilesCount% (setlocal enabledelayedexpansion
echo Skipping !Filename!
endlocal
set /a CurrentFile=CurrentFile+1
exit /b)

md Originals 2>nul

setlocal enabledelayedexpansion
echo !Filename!
endlocal
copy "%Filename%" Originals\ >nul

set Counter=1
for /f "tokens=1 delims=:" %%E in ('grep -no "<file poster=" "%Filename%"') do (set Value=%%E
call :ArrayIn Subject
call :IncrementCounter)

set Counter=1
for /f "tokens=1 delims=:" %%E in ('grep -no "</segments>" "%Filename%"') do (set /a Value=%%E-1
call :ArrayIn LastPart
call :IncrementCounter)

set Counter=1
:FixNZB
call :ArrayOut Subject
set SubjectLine=%Value%

call :ArrayOut LastPart
set LastPartLine=%Value%

for /F tokens^=4^ delims^=^" %%P in ('sed -n "%LastPartLine%,%LastPartLine%p" "%Filename%"') do set LastPartNumber=%%P

sed "%SubjectLine%s/yEnc /yEnc (1\/%LastPartNumber%)/" "%Filename%" >Temp.tmp
del "%Filename%"
ren Temp.tmp "%Filename%"
title Fix NZBKing Files - Processing file %CurrentFile% of %NZBCount% - %Counter% of %FilesCount% headers fixed
call :IncrementCounter

if %Counter% leq %FilesCount% goto FixNZb
ren "%Filename%" "%Filename:~0,-4%-Fixed.nzb"
set /a CurrentFile=CurrentFile+1
exit /b

:ArrayIn
set ArrayName=%1
call :Array2 %ArrayName%[%Counter%] %Value%
exit /b
:ArrayOut
set ArrayName=%1
call :Array2 Value %%%ArrayName%[%Counter%]%%
exit /b
:Array2
set %1=%2
exit /b

:IncrementCounter
set /a Counter=Counter+1
exit /b
User avatar
Simon Sheppard
Posts: 190
Joined: 2021-Jul-10, 7:46 pm
Contact:

Re: Use SED to extract part of a line?

Post by Simon Sheppard »

Nice solution :)
Simon_Weel
Posts: 34
Joined: 2021-Dec-13, 3:53 pm

Re: Use SED to extract part of a line?

Post by Simon_Weel »

As for extracting part of a line, see https://superuser.com/a/1503852.
Rekrul
Posts: 52
Joined: 2021-Aug-15, 11:29 pm

Re: Use SED to extract part of a line?

Post by Rekrul »

Simon Sheppard wrote: 2023-Mar-20, 12:10 pm Nice solution :)
Thanks. :)

I know I said it was done, but I couldn't resist making a couple tweaks to the way it handles lists passed to it from a file manager.

Previously, it simply counted the lines in the list file that was passed to it and defaulted to processing all the NZB files in the directory if the number of lines was zero. However, there's also the possibility that the user may have accidentally selected non-NZB files. The script already skipped any name that didn't end in .NZB, however those names were included in the count of total files to be processed, since it just counted the number of lines in the file. So if you accidentally selected one NZB file and 20 text files, it would say it was processing file 1 of 21 and then close after that file was done. Now it only counts NZB files so the progress messages are accurate.

Additionally, if it's passed a list that contains filenames, but doesn't contain any NZB files, it will abort rather than processing all NZB files.

And of course, it can still be used by just executing it in the directory with the NZB files. It doesn't have to be called from a file manager.

Code: Select all

@echo off
title Fix NZBKing Files

set CurrentFile=1

if "%1"=="" goto ProcessAll
set List=%1
set ListLines=0
set NZBCount=0
for /f %%F in (%List%) do (set Filename=%%F
call :NameCheck)

if %ListLines% equ 0 goto :ProcessAll
if %NZBCount% equ 0 goto:eof

for /f "delims=" %%F in (%List%) do (set Filename=%%F
call :ProcessNZB)
goto:eof

:ProcessAll
for %%F in (*.nzb) do set /a NZBCount=NZBCount+1

for %%F in (*.nzb) do (set Filename=%%F
call :ProcessNZB)
goto:eof

:ProcessNZB
if not "%Filename:~-4%"==".nzb" exit /b

for /f %%E in ('grep -c "<file poster=" "%Filename%"') do set FilesCount=%%E
for /f %%E in ('grep -c "yEnc (1/" "%Filename%"') do set FixedCount=%%E
if %FixedCount% equ %FilesCount% (setlocal enabledelayedexpansion
echo Skipping !Filename!
endlocal
set /a CurrentFile=CurrentFile+1
exit /b)

md Originals 2>nul

setlocal enabledelayedexpansion
echo !Filename!
endlocal
copy "%Filename%" Originals\ >nul

set Counter=1
for /f "tokens=1 delims=:" %%E in ('grep -no "<file poster=" "%Filename%"') do (set Value=%%E
call :ArrayIn Subject
call :IncrementCounter)

set Counter=1
for /f "tokens=1 delims=:" %%E in ('grep -no "</segments>" "%Filename%"') do (set /a Value=%%E-1
call :ArrayIn LastPart
call :IncrementCounter)

set Counter=1
:FixNZB
call :ArrayOut Subject
set SubjectLine=%Value%

call :ArrayOut LastPart
set LastPartLine=%Value%

for /F tokens^=4^ delims^=^" %%P in ('sed -n "%LastPartLine%,%LastPartLine%p" "%Filename%"') do set LastPartNumber=%%P

sed "%SubjectLine%s/yEnc /yEnc (1\/%LastPartNumber%)/" "%Filename%" >Temp.tmp
del "%Filename%"
ren Temp.tmp "%Filename%"
title Fix NZBKing Files - Processing file %CurrentFile% of %NZBCount% - %Counter% of %FilesCount% headers fixed
call :IncrementCounter

if %Counter% leq %FilesCount% goto FixNZb
ren "%Filename%" "%Filename:~0,-4%-Fixed.nzb"
set /a CurrentFile=CurrentFile+1
exit /b

:ArrayIn
set ArrayName=%1
call :Array2 %ArrayName%[%Counter%] %Value%
exit /b
:ArrayOut
set ArrayName=%1
call :Array2 Value %%%ArrayName%[%Counter%]%%
exit /b
:Array2
set %1=%2
exit /b

:IncrementCounter
set /a Counter=Counter+1
exit /b

:NameCheck
set /a ListLines=ListLines+1
if "%Filename:~-4%"==".nzb" set /a NZBCount=NZBCount+1
exit /b
Rekrul
Posts: 52
Joined: 2021-Aug-15, 11:29 pm

Re: Use SED to extract part of a line?

Post by Rekrul »

I thought this script was finished, but then I downloaded an NZB that it didn't work on. I had assumed (yes, I know what they say about assuming) that all posts today would be using the yEnc format. I happened to download an NZB for a post that wasn't. Or the posting software didn't put "yEnc" in the subject line for each file. In any case, my script failed to fix the file and I had to adjust it to simply look for the end of the lines rather than the word yEnc. This involved much experimentation, which I posted about earlier, but now I believe I have a true finished script. At least until some other problem crops up.

Code: Select all

@echo off
title Fix NZBKing Files

set CurrentFile=1
set Tag=)\"^>

if "%1"=="" goto ProcessAll

set List=%1
set ListLines=0
set NZBCount=0

for /f %%F in (%List%) do (set Filename=%%F
call :NameCheck)

if %ListLines% equ 0 goto :ProcessAll
if %NZBCount% equ 0 goto:eof

for /f "delims=" %%F in (%List%) do (set Filename=%%F
call :ProcessNZB)
goto:eof

:ProcessAll
for %%F in (*.nzb) do set /a NZBCount=NZBCount+1

for %%F in (*.nzb) do (set Filename=%%F
call :ProcessNZB)
goto:eof

:ProcessNZB
if not "%Filename:~-4%"==".nzb" exit /b

for /f %%E in ('grep -c "<file poster=" "%Filename%"') do set FilesCount=%%E
setlocal enabledelayedexpansion
for /f %%E in ('grep -c "!Tag!" "%Filename%"') do endlocal & set FixedCount=%%E
if %FixedCount% equ %FilesCount% (setlocal enabledelayedexpansion
echo Skipping !Filename!
endlocal
set /a CurrentFile=CurrentFile+1
exit /b)

md Originals 2>nul

setlocal enabledelayedexpansion
echo !Filename!
endlocal
copy "%Filename%" Originals\ >nul

set Counter=1
for /f "tokens=1 delims=:" %%E in ('grep -no "<file poster=" "%Filename%"') do (set Value=%%E
call :ArrayIn Subject
call :IncrementCounter)

set Counter=1
for /f "tokens=1 delims=:" %%E in ('grep -no "</segments>" "%Filename%"') do (set /a Value=%%E-1
call :ArrayIn LastPart
call :IncrementCounter)

set Counter=1
:FixNZB
call :ArrayOut Subject
set SubjectLine=%Value%

call :ArrayOut LastPart
set LastPartLine=%Value%

for /F tokens^=4^ delims^=^" %%P in ('sed -n "%LastPartLine%,%LastPartLine%p" "%Filename%"') do set LastPartNumber=%%P

sed "%SubjectLine%s/ \"^>/ (1\/%LastPartNumber%)\">/" "%Filename%" >Temp.tmp
del "%Filename%"
ren Temp.tmp "%Filename%"
title Fix NZBKing Files - Processing file %CurrentFile% of %NZBCount% - %Counter% of %FilesCount% headers fixed
call :IncrementCounter

if %Counter% leq %FilesCount% goto FixNZb
ren "%Filename%" "%Filename:~0,-4%-Fixed.nzb"
set /a CurrentFile=CurrentFile+1
exit /b

:ArrayIn
set ArrayName=%1
call :Array2 %ArrayName%[%Counter%] %Value%
exit /b
:ArrayOut
set ArrayName=%1
call :Array2 Value %%%ArrayName%[%Counter%]%%
exit /b
:Array2
set %1=%2
exit /b

:IncrementCounter
set /a Counter=Counter+1
exit /b

:NameCheck
set /a ListLines=ListLines+1
if "%Filename:~-4%"==".nzb" set /a NZBCount=NZBCount+1
exit /b
Post Reply