How can Iterate through all the files and directories
Posted: 2021-Jul-25, 1:28 pm
10 Sep 2008 15:50
kalinga
How can Iterate through all the files and directories under the current directory
Any help :wall:
------------
Kalinga.
----------------------------
#2 20 Jul 2011 15:29
flabdablet
Re: How can Iterate through all the files and directories
I generally do this by using the find command to output a list of the pathnames I want, then pipe that output into a read loop to do the processing I want.
For example, here's a snippet that renames every .JPG, JpG, .jPG, .JPEG, .jPeG etc. file in the current directory and all its subdirectories to give them all consistent lowercase .jpg extensions:
Check man find - it's a very flexible file finding tool with a heap of options, and every Unix-like system has it.
In the snippet above I'm using -type f to specify that I only want find to return the names of files, not directories; you can use -type d to get directories only, or leave the -type option out altogether to get both.
The -iname '*.jpg' part tells find to list only files whose names match case-insensitively the pattern '*.jpg' i.e. end with '.jpg' or '.JPG' or '.Jpg'. The -o -iname '*.jpeg' part extends the matching to cover all variants of '.jpeg' as well (read -o as "or").
Note that those wildcard patterns are quoted to stop the shell from expanding them; unusually for a Unix tool, we actually want find to see patterns, not filenames.
read has the -r option applied to stop it misinterpreting any \ in a filename as an escape character. Even so, read uses a newline to mark the end of input, and this code will fail if any of the filenames contains a newline character. Fortunately such filenames are very rare in practice (they break lots of shell scripts).
There are a couple of ways to use find to process a bunch of files without piping its output through a read loop. One is to use its inbuilt -exec option. Here's an example using that to copy every file from the current directory and all its subdirectories into a single directory under /tmp, effectively giving us a "flattened" directory:
Everything between -exec and \; (non-inclusive) is treated as a command and arguments, invoked once for each file found, with {} replaced by the file's pathname. Note that \; is escaped to stop the shell from treating the semicolon as its own end-of-command marker.
Invoking one command per file can be a bit slow when processing lots of files. Rather than use -exec, it can be a lot quicker to pipe the output of find into xargs:
xargs repeatedly builds and executes a command consisting of its own arguments followed by as many strings read from its standard input as will fit in a command line until it runs out of input. Note the use of the -print0 action in find, which tells it to output pathnames terminated by NUL (\0) characters rather than newlines; since NUL cannot occur inside any Unix filename, that makes this pipeline completely bulletproof against weird names. xargs has the -0 option to make it expect its input stream to be formatted in that way, and cp uses the -t option so that the destination directory can come before all the source pathnames.
kalinga
How can Iterate through all the files and directories under the current directory
Any help :wall:
------------
Kalinga.
----------------------------
#2 20 Jul 2011 15:29
flabdablet
Re: How can Iterate through all the files and directories
I generally do this by using the find command to output a list of the pathnames I want, then pipe that output into a read loop to do the processing I want.
For example, here's a snippet that renames every .JPG, JpG, .jPG, .JPEG, .jPeG etc. file in the current directory and all its subdirectories to give them all consistent lowercase .jpg extensions:
Code: Select all
find . -type f -iname '*.jpg' -o -iname '*.jpeg' -print |
while read -r name
do
mv "$name" "${name%.*}.jpg"
done
In the snippet above I'm using -type f to specify that I only want find to return the names of files, not directories; you can use -type d to get directories only, or leave the -type option out altogether to get both.
The -iname '*.jpg' part tells find to list only files whose names match case-insensitively the pattern '*.jpg' i.e. end with '.jpg' or '.JPG' or '.Jpg'. The -o -iname '*.jpeg' part extends the matching to cover all variants of '.jpeg' as well (read -o as "or").
Note that those wildcard patterns are quoted to stop the shell from expanding them; unusually for a Unix tool, we actually want find to see patterns, not filenames.
read has the -r option applied to stop it misinterpreting any \ in a filename as an escape character. Even so, read uses a newline to mark the end of input, and this code will fail if any of the filenames contains a newline character. Fortunately such filenames are very rare in practice (they break lots of shell scripts).
There are a couple of ways to use find to process a bunch of files without piping its output through a read loop. One is to use its inbuilt -exec option. Here's an example using that to copy every file from the current directory and all its subdirectories into a single directory under /tmp, effectively giving us a "flattened" directory:
Code: Select all
mkdir /tmp/foo
find . -type f -exec cp {} /tmp/foo \;
Invoking one command per file can be a bit slow when processing lots of files. Rather than use -exec, it can be a lot quicker to pipe the output of find into xargs:
Code: Select all
mkdir /tmp/foo
find . -type f -print0 | xargs -0 cp -t /tmp/foo