You are not logged in.

#1 11 Jul 2007 16:57

avery_larry
Member
Registered: 11 Jul 2007
Posts: 266

Processing special characters in variables

I use a .cmd script to process raw email messages (and eventually send them to SpamAssassin).  My problem is with spam emails that have From: addresses which include special characters.  The one that causes the most problems is the pipe character -- |.  I have also had problems with quotes and <>.

The specific line of a file that I'm working with looks like this:

MAIL FROM:<whateveryouwanthere>

Unfortunately, SPAM doesn't always follow nice rules.  I've had the following iterations:

MAIL FROM:<some adderess<realaddress@somewhereelse>>
(multiple @'s, multiple <>'s, and spaces)

and:

MAIL FROM:<some|addrees@somwehre.com>

and:

MAIL FROM:<"something"weoriu@somehwer.com>

Note that this particular line contains ONLY the email address -- NOT the common name plus email address like this:

From: "regular name here" <realaddress@domain.com>

Basically I want the email address to be in a variable and I want to be able to write that email address to a new file.

So to set the variable, I have the following code:

::This strips all trailing >>'s.
for /f "tokens=1 delims=>" %%a in (micro.aa) do set sto1=%%a
::This gets whatever is inside a nested <<>>.  Z1Z is appended to avoid nulls and to process tokens later.
for /f "tokens=1,2-3 delims=<>" %%a in ("%sto1%") do set sto=Z1Z%%c
::The if statement will be true if there were NOT nested <<>>'s.  The for will pull the non-nested email address.
if "%sto%"=="Z1Z" for /f "tokens=1,2 delims=<>" %%a in ("%sto1%") do set sto=Z1Z%%b
::This tests to see if the email address was null  <>.  It also strips extra double quotes for the if test.
if "%sto:"=%"=="Z1Z" set sto=Z1Znull
::This pulls just the email address into the variable -- works with all 3 options above.
for /f "tokens=1* delims=Z" %%a in ("%sto%") do set sto=%%b

Perhaps I could clean that up using %sto1~1% instead of the last for statement (and just append a single character instead of Z1Z).

Later, when writing the variable to a file, I use the following to deal with the pipe character  |

echo X-Envelope-GW: %sto:|=^|%>email0.sa

Now really, this seems to be quite robust in dealing with invalid email addresses.  I've only had 1 crash in the last several months.  I wish I had saved that email because I can't remember what was in the address that caused the problem.  It may have been the following:

MAIL FROM:<6|^|d@test.com>


So basically I'm trying to see if there's a better way to work with special characters.

Ted Kumsher

Offline

#2 11 Jul 2007 18:57

bluesxman
Member
From: UK
Registered: 29 Dec 2006
Posts: 1,129

Re: Processing special characters in variables

You're pretty much doing the sorts of things I would to handle that stuff.  The best single piece of advice I could offer is to use double quotes when setting variables, like this:

    set "var=nasty string"

Note that they surround the value and the variable name.  Don't worry, they won't appear in the finished product, and will successfully handle just about anything.  Though I have found ! % and ^ to be quite difficult (occasionally impossible) to deal with at times.

Other than that you could investigate using a win32 port of "sed" to strip out the undesireables.


cmd | *sh | ruby | chef

Offline

#3 11 Jul 2007 19:25

Simon Sheppard
Admin
Registered: 27 Aug 2005
Posts: 1,130
Website

Re: Processing special characters in variables

The way I approach these problems is to replace all the delimiters with something else, then run the script, then change everything back

so < might become ~
and > could be §

it's a cop out but it does minimise any chance of nasty unexpected side effects and saves time adding all the right combination of escape characters - unless you just enjoy the challenge!

Offline

#4 12 Jul 2007 17:20

avery_larry
Member
Registered: 11 Jul 2007
Posts: 266

Re: Processing special characters in variables

Simon Sheppard wrote:

The way I approach these problems is to replace all the delimiters with something else, then run the script, then change everything back

so < might become ~
and > could be §

it's a cop out but it does minimise any chance of nasty unexpected side effects and saves time adding all the right combination of escape characters - unless you just enjoy the challenge!

Um -- you mean manually?  I can't manually change all the delimiters in 15,000 files each day.  This is processing my realtime inbound email.

Offline

#5 12 Jul 2007 17:27

avery_larry
Member
Registered: 11 Jul 2007
Posts: 266

Re: Processing special characters in variables

bluesxman wrote:

You're pretty much doing the sorts of things I would to handle that stuff.

But you seem so much smarter than me!

bluesxman wrote:

The best single piece of advice I could offer is to use double quotes when setting variables, like this:

    set "var=nasty string"

Note that they surround the value and the variable name.  Don't worry, they won't appear in the finished product, and will successfully handle just about anything.  Though I have found ! % and ^ to be quite difficult (occasionally impossible) to deal with at times.

Other than that you could investigate using a win32 port of "sed" to strip out the undesireables.

That set command looks interesting.  Any quick advice on how that would change things?

Also -- anyway to do combine variable hacks like this:

combine
%var:|=^|%
and
%var:^=^^%
to something like:
%var:^=^^:|=^|%

And why didn't they allow the %* variables to have the same edits as %var%?  I have to set a varible to %1 in order to do a %var:~-1%  But whatever I guess.

Last edited by avery_larry (12 Jul 2007 17:28)

Offline

#6 12 Jul 2007 19:15

Simon Sheppard
Admin
Registered: 27 Aug 2005
Posts: 1,130
Website

Re: Processing special characters in variables

avery_larry wrote:

Um -- you mean manually?  I can't manually change all the delimiters in 15,000 files each day.  This is processing my realtime inbound email.

no no, I would call a text replace utility in a script - you would still need to call it using delimiters, but you can replace one at a time.
e.g.
replace all the <'s in all the files with something else
then replace all the >'s in all the files with something else
...etc
Then parse the files line by line looking for the exact data you need to extract.

This likely wont be the fastest method if thats a concern.

Offline

#7 12 Jul 2007 19:57

avery_larry
Member
Registered: 11 Jul 2007
Posts: 266

Re: Processing special characters in variables

Simon Sheppard wrote:

no no, I would call a text replace utility in a script - you would still need to call it using delimiters, but you can replace one at a time.

Is that standard or do you have particular utility in mind?  When I do this processing, I only have a single line file so it probably wouldn't be too much processing overhead.

Offline

#8 13 Jul 2007 09:22

bluesxman
Member
From: UK
Registered: 29 Dec 2006
Posts: 1,129

Re: Processing special characters in variables

avery_larry wrote:

That set command looks interesting.  Any quick advice on how that would change things?

It'll help prevent crashes due to wanky character combinations -- just about anything between double quotes will be treated at a literal string, meaning you're less likely to be tripped up by the likes of "<" ">" "&" "|".  You could achieve a similar effect with:

set var="nasty string"

but then the quotes would form part of the variable ... which, under most circumstances, you wouldn't really want to happen.

avery_larry wrote:

Also -- anyway to do combine variable hacks like this:

combine
%var:|=^|%
and
%var:^=^^%
to something like:
%var:^=^^:|=^|%

Not as far as I know.  If I'm doing repeated replacements in the same variable, I'll often do something like this:

@echo off

set var=we went up the black mountain on a good day

REM Seinfeld, anyone?
for %%a in ("up=down" "black=white" "good=bad" "day=night") do set var&call set var=%%var:%%~a%%

set var

pause

Though the above method doesn't work as intended with certain special characters.

avery_larry wrote:

And why didn't they allow the %* variables to have the same edits as %var%?  I have to set a varible to %1 in order to do a %var:~-1%  But whatever I guess.

Who is knowing? I guess it was just to challenge the likes of us. :)

avery_larry wrote:

Is that standard or do you have particular utility in mind?  When I do this processing, I only have a single line file so it probably wouldn't be too much processing overhead.

I find myself using a Win32 port of the Unix command "sed" to achieve things in a single line that would either be extremely difficult (read: lots of coding required), unreliable (read: even more coding required to achieve an acceptable level of robustness) or simply impossible to do with native "cmd" syntax.

Try this ... there's a very helpful set of example commands here too.

Last edited by bluesxman (13 Jul 2007 09:24)


cmd | *sh | ruby | chef

Offline

#9 13 Jul 2007 17:03

avery_larry
Member
Registered: 11 Jul 2007
Posts: 266

Re: Processing special characters in variables

Thank you very much.

How about something like:

perl -p -e "s/\|/¢/g" < macro.aaa > newtext.txt
perl -p -e "s/\%%/Ñ/g" < newtext.txt > newtext2.txt

etc.

?

Guess I should have thought of this myself.  I assume it's pretty much the same thing that sed would do.

Last edited by avery_larry (13 Jul 2007 18:19)

Offline

#10 16 Oct 2007 15:52

elmouj
Member
Registered: 16 Oct 2007
Posts: 1

Re: Processing special characters in variables

Hello,

    I have been using windows batch scripting these last days and I am really blocked in an issue :pc: . I have a java program that launches a batch sript. inside the batch script there is a 'curl' call. the problem I have is when the parameters I pass from java contain special characters. I have tried to escape the charcters like doubling the " ...etc but I wonder if I could have an exhaustive list of speciall characters and how to escape each one.

Tnx in advance

Offline

Board footer

Powered by