You are not logged in.

#1 04 May 2007 06:02

NDog
Member
From: New Zealand
Registered: 31 May 2006
Posts: 121
Website

creating database from http

I am in the process of creating a database and I need a script to help automate this task. It is a comparison database which compares different phone card rates.

There are 4 steps
1) Download the site to a local text file
2) Process the text file retrieving countries and the rate
3) Output the country and rate together in usable format
4) Output them to a csv

So far I have acheived step 1 using wget
wget --O site.txt site.html

now the tricky part is processing the txt

I have a country feed it into the script for example: Australia
then i need to retrieve the assosiated rate

it would then output it as a csv, or anything i can open in excel that seperates the country in first column and rate in second column


A csv example would look like this

example.csv
Australia,2.5
bangladesh,8
othercountry,33

Below is some http code in the txt file. I would be running a search for Australia exact key word and then the next money value and then saving them to a csv in the %country%,%rate% format

I just need some help in this area, any tips are welcome, Once I get it down I should be able to work the rest out myself

card1.txt
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Aruba Mobile</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$0.578</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Ascension Is</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$1.320</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Australia</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$0.037</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Australia - 13</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$0.177</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Australia Mobile</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$0.340</FONT></TD></TR>
<TR height=17>
<TD><FONT face=Arial color=#000000 size=2>Australia Mobilesat</FONT></TD>
<TD><FONT face=Arial color=#000000 size=2>$6.133</FONT></TD></TR>
<TR height=17>

card2.txt
                    </tr>
                    <tr class="meihuabodytext">
                      <td>Aruba </td>
                      <td class="xl24">$ 0.29 </td>
                    </tr>
                    <tr class="meihuabodytext">
                      <td>Ascension </td>
                      <td class="xl24">$ 1.75 </td>
                    </tr>
                    <tr class="meihuabodytext">
                      <td>Australia </td>
                      <td class="xl25">2.5c </td>
                    </tr>
                    <tr class="meihuabodytext">
                      <td>Austria </td>
                      <td class="xl25">3.9c </td>

card3.txt
  <tr height=17>
    <td height=17 nowrap><span class="style12">Aruba Mobile</span></td>
    <td align=right nowrap><div align="left"><span class="style12">0.450</span></div></td>
  </tr>
  <tr height=17>
    <td height=17 nowrap><span class="style12">Ascension Is</span></td>
    <td align=right nowrap><div align="left"><span class="style12">1.485</span></div></td>
  </tr>
  <tr height=17>
    <td height=17 nowrap><span class="style12">Australia</span></td>
    <td align=right nowrap><div align="left"><span class="style12">0.049</span></div></td>
  </tr>
  <tr height=17>
    <td height=17 nowrap><span class="style12">Australia Mobile</span></td>
    <td align=right nowrap><div align="left"><span class="style12">0.383</span></div></td>
  </tr>
  <tr height=17>
    <td height=17 nowrap><span class="style12">Australia Mobilesat</span></td>
    <td align=right nowrap><div align="left"><span class="style12">0.593</span></div></td>
  </tr>
  <tr height=17>
    <td height=17 nowrap><span class="style12">Austria</span></td>
    <td align=right nowrap><div align="left"><span class="style12">0.049</span></div></td>
  </tr>
  <tr height=17>
    <td height=17 nowrap><span class="style12">Austria -
      Saltzburg</span></td>
    <td align=right nowrap><div align="left"><span class="style12">0.049</span></div></td>
  </tr>
  <tr height=17>
    <td height=17 nowrap><span class="style12">Austria -
      Vienna</span></td>
    <td align=right nowrap><div align="left"><span class="style12">0.049</span></div></td>

I will keep you updated about other cards if need be

Thanks


cmd, vbs, ps, bash
autoit, python, swift

Offline

#2 04 May 2007 10:00

bluesxman
Member
From: UK
Registered: 29 Dec 2006
Posts: 1,129

Re: creating database from http

Since you've only included a section of the web page data, I can't account for anything further up that might cause problems.  However, based on what you've provided, I'd tackle the first one like this (though I'd probably put a fair bit of error checking in there too):
* UNTESTED *

setlocal enabledelayedexpansion

set output=output.csv

REM make sure there are no providers already defined, which might muck up our data
for /f "usebackq tokens=1 delims==" %%a in (`set PROV_ 2^>nul`) do (set %%~a=)

for /f "usebackq delims=<> tokens=1,3,6" %%a in ("card1.txt") do (
    if /i "%%a" EQU "TD" (
        REM lines with "/TR" in token 6 are the price, the provider lines contain no token 6
        if /i "%%c" NEQ "/TR" (
            set provider=%%b
        ) ELSE (
            set work=%%b
            set PROV_!provider!=!work:$=!
        )
    )
)

REM create the CSV header row
echo:"Provider","Price">"%output"

REM bring back all the providers/prices in alphabetical order and put them in the CSV
for /f "usebackq tokens=2* delims=_=" %%a in (`set PROV_`) do echo:"%%a","%%b">>"%output%"

I'd use similar tactics for the others too.

Last edited by bluesxman (04 May 2007 10:22)


cmd | *sh | ruby | chef

Offline

Board footer

Powered by