You are not logged in.

#1 26 May 2015 11:51

jaffamuffin
Member
Registered: 21 Mar 2009
Posts: 19

Help optimising a simple script

Hi all

Below is a simple script I have used for the past year or so and it scans a file tree with find and outputs to text file.
Then it reads in the text file, slices it up a little and then uploads it into a mysql DB for some analysis.

Is there are better way to do this (faster?) I thought about using IFS to split the find output, but as you can see it's not a simple split - otherwise I could probably import straight in to the DB anyway.

Thoughts comments welcome thanks.

#!/bin/bash

templogimg=/tmp/tmp_count.log
joblog=/tmp/count_identify.log
rm -rf $joblog
## Get list of all files in scans

SCANROOT=/mnt/proserv-bookscanning/HLPP/SCANS/_LIVE
cd $SCANROOT
find ./ -name "*.tif" -type f -printf "%TY-%Tm-%Td,%TX,%p\n">>$templogimg
# OUTPUTS Lines like this e.g.:
# 2015-05-22,12:40:42.8325545000,./7/hol_01028/000303.tif
# 2015-05-22,12:40:44.3457739000,./7/hol_01028/000304.tif
# 2015-05-22,12:40:54.7355071000,./7/hol_01028/000305.tif
# 2015-05-22,12:40:56.2487265000,./7/hol_01028/000306.tif

STATUS=1

#touch -t `date +%m%d0000` /tmp/$$
#find ./ -newer /tmp/$$ -name "*.jpg" -type f -printf "%TY-%Tm-%Td,%TX,%p\n">$templogimg
#find ./ -name "*.jpg" -type f -printf "%TY-%Tm-%Td,%TX,%p\n">$templogimg

while read LINE
do
echo $LINE
        filedate=`echo $LINE|cut -c1-10`
        filetime=`echo $LINE|cut -c12-19`
        filepath=`echo $LINE|cut -d, -f3`
        scannerid=`echo $filepath|cut -d/ -f2`
        #boxid=`echo $filepath|cut -d/ -f3`
        adsvolid=`echo $filepath|awk -F'/' '{print $(NF-1)}' -`
        filename=`echo $filepath|awk -F'/' '{print $(NF)}' -`

        echo $filedate,$filetime,$filepath,$scannerid,$filename,$adsvolid>>$joblog

done < $templogimg


mysql --local-infile -uuser -ppass dbname <<MYSQLEOF
LOAD DATA LOCAL INFILE '$joblog'
INTO TABLE counts
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
(filedate, filetime, filepath, scannerid, filename, adsvolid)
SET timestamp = CURRENT_TIMESTAMP;
MYSQLEOF

Last edited by jaffamuffin (26 May 2015 11:52)

Offline

#2 23 May 2019 15:20

kteague
Member
Registered: 07 May 2019
Posts: 14

Re: Help optimising a simple script

A few things I'm noticing.

1. You're locating data and putting it into $templogimg.

find ./ -name "*.tif" -type f -printf "%TY-%Tm-%Td,%TX,%p\n">>$templogimg

Then you're taking the same data, running it through a while loop, massaging it a lot, and outputting its results into $joblog, but what goes into $joblog appears to be almost like what you put in $templogimg to begin with.

echo $filedate,$filetime,$filepath,$scannerid,$filename,$adsvolid>>$joblog

I think you may be able to use awk & sed to massage $templogimg to give you the results that you're putting in $joblog, and it may even negate the necessity of the while loop.  Granted, awk & sed are also both external commands, but from what I can see, you can possibly run it against the whole $templogimg rather than a while loop.  I suspect your while loop is where you're taking the performance hit (see #2 below).


2. Your script calls on external commands (find & cut).  Each time you call on an external command, it has to spawn a new process and that comes at an expense.  If it's an option for you, PERL would be better suited for this task as it can handle everything you're trying to do internally, including interacting with the MySQL database.  If it's not an option for you, I'd suggest minimizing the amount of calls you make to external commands (see my awk & sed comment above).

3. After $templogimg is massaged using awk & sed, you can import the CSV file to your database.

Offline

Board footer

Powered by