You are not logged in.
Pages: 1
I am in the process of creating a database and I need a script to help automate this task. It is a comparison database which compares different phone card rates.
There are 4 steps
1) Download the site to a local text file
2) Process the text file retrieving countries and the rate
3) Output the country and rate together in usable format
4) Output them to a csv
So far I have acheived step 1 using wget
wget --O site.txt site.html
now the tricky part is processing the txt
I have a country feed it into the script for example: Australia
then i need to retrieve the assosiated rate
it would then output it as a csv, or anything i can open in excel that seperates the country in first column and rate in second column
A csv example would look like this
example.csv
Australia,2.5
bangladesh,8
othercountry,33
Below is some http code in the txt file. I would be running a search for Australia exact key word and then the next money value and then saving them to a csv in the %country%,%rate% format
I just need some help in this area, any tips are welcome, Once I get it down I should be able to work the rest out myself
card1.txt
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Aruba Mobile</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$0.578</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Ascension Is</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$1.320</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Australia</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$0.037</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Australia - 13</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$0.177</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Australia Mobile</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$0.340</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Australia Mobilesat</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$6.133</FONT></TD></TR>
<TR height=17>
card2.txt
</tr>
<tr class="meihuabodytext">
<td>Aruba </td>
<td class="xl24">$ 0.29 </td>
</tr>
<tr class="meihuabodytext">
<td>Ascension </td>
<td class="xl24">$ 1.75 </td>
</tr>
<tr class="meihuabodytext">
<td>Australia </td>
<td class="xl25">2.5c </td>
</tr>
<tr class="meihuabodytext">
<td>Austria </td>
<td class="xl25">3.9c </td>
card3.txt
<tr height=17>
<td height=17 nowrap><span class="style12">Aruba Mobile</span></td>
<td align=right nowrap><div align="left"><span class="style12">0.450</span></div></td>
</tr>
<tr height=17>
<td height=17 nowrap><span class="style12">Ascension Is</span></td>
<td align=right nowrap><div align="left"><span class="style12">1.485</span></div></td>
</tr>
<tr height=17>
<td height=17 nowrap><span class="style12">Australia</span></td>
<td align=right nowrap><div align="left"><span class="style12">0.049</span></div></td>
</tr>
<tr height=17>
<td height=17 nowrap><span class="style12">Australia Mobile</span></td>
<td align=right nowrap><div align="left"><span class="style12">0.383</span></div></td>
</tr>
<tr height=17>
<td height=17 nowrap><span class="style12">Australia Mobilesat</span></td>
<td align=right nowrap><div align="left"><span class="style12">0.593</span></div></td>
</tr>
<tr height=17>
<td height=17 nowrap><span class="style12">Austria</span></td>
<td align=right nowrap><div align="left"><span class="style12">0.049</span></div></td>
</tr>
<tr height=17>
<td height=17 nowrap><span class="style12">Austria -
Saltzburg</span></td>
<td align=right nowrap><div align="left"><span class="style12">0.049</span></div></td>
</tr>
<tr height=17>
<td height=17 nowrap><span class="style12">Austria -
Vienna</span></td>
<td align=right nowrap><div align="left"><span class="style12">0.049</span></div></td>
I will keep you updated about other cards if need be
Thanks
cmd, vbs, ps, bash
autoit, python, swift
Offline
Since you've only included a section of the web page data, I can't account for anything further up that might cause problems. However, based on what you've provided, I'd tackle the first one like this (though I'd probably put a fair bit of error checking in there too):
* UNTESTED *
setlocal enabledelayedexpansion
set output=output.csv
REM make sure there are no providers already defined, which might muck up our data
for /f "usebackq tokens=1 delims==" %%a in (`set PROV_ 2^>nul`) do (set %%~a=)
for /f "usebackq delims=<> tokens=1,3,6" %%a in ("card1.txt") do (
if /i "%%a" EQU "TD" (
REM lines with "/TR" in token 6 are the price, the provider lines contain no token 6
if /i "%%c" NEQ "/TR" (
set provider=%%b
) ELSE (
set work=%%b
set PROV_!provider!=!work:$=!
)
)
)
REM create the CSV header row
echo:"Provider","Price">"%output"
REM bring back all the providers/prices in alphabetical order and put them in the CSV
for /f "usebackq tokens=2* delims=_=" %%a in (`set PROV_`) do echo:"%%a","%%b">>"%output%"
I'd use similar tactics for the others too.
Last edited by bluesxman (04 May 2007 10:22)
cmd | *sh | ruby | chef
Offline
Pages: 1