How to Bulk Rename Files to Numeric File Names in Linux

Shutterstock / estherpoon

Do you want to rename a whole set of files in a digital sequence (1.pdf, 2.pdf, 3.pdf,…) under Linux? This can be done with lightweight scripts and this article will show you how to do just that.

Digital file names

Usually when we scan PDF file using hardware (cell phone, dedicated PDF scanner), the file name will read as 2020_11_28_13_43_00.pdf. Many other semi-automated systems produce similar file names based on date and time.

Sometimes the file may also contain the name of the application used, or other information such as for example the applicable DPI (dots per inch) or the scanned paper size.

When collecting PDF files from different sources, the file naming conventions can differ significantly and it may be good to standardize on a digital (or partial digital) file name.

This also applies to other domains and filesets. For example your recipes or photo collection, sample data generated automated monitoring systems, log files ready for archiving, a set of SQL files for the database engineer and generally all data collected from different sources with different naming schemes.

Bulk rename files to digital file names

In Linux, it is easy to quickly rename a whole set of files with completely different file names, in a numeric sequence. “Easy” means “easy to execute” here: The problem of mass renaming files to numeric numbers is complex to code in itself: the oneliner script below took 3-4 hours to find, create and test. Many of the other commands I tried all had limitations that I wanted to avoid.

Please note that no warranty is given or provided and this code is provided “as is”. Please do your own research before running it. Having said that, I have tested it successfully against files with various special characters, and also against over 50k files without any files being lost. I have also checked a file named ‘a’ $ ‘ n”a.pdf’ which contains a new line.

if [ ! -r _e -a ! -r _c ]; then echo ‘pdf’> _e; echo 1> _c; find. -name “*. $ (cat _e)” -print0 | xargs -0 -I {} bash -c ‘mv -n “{}” $ (cat _c). $ (cat _e); echo $[ $(cat _c) + 1 ] > _c ‘; rm -f _e _c; Fi

Let’s see how it works first, then analyze the command. We created a directory with eight files, all named differently, except that their extension matches and is .pdf. We then run the above command:

Bulk rename files to digital filenames in Linux

The result was that the 8 files were renamed to 1.pdf, 2.pdf, 3.pdf, etc., even though their names were quite out of sync before.

The command assumes that you do not yet have files named 1.pdf to x.pdf. If you do, you can move those files to a separate directory, set Echo 1 to a higher number to start renaming the remaining files at a given offset, and then merge the two directories again.

Always be careful not to overwrite files, and it’s always a good idea to do a quick backup before updating anything.

Let’s examine the order in detail. This can help to see what’s going on by adding the -t option to xargs which allows us to see what’s going on behind the scenes:

xargs with the -t option allows us to see what happens during the renaming process

To begin with, the command uses two small temporary files (named _e and _c) as temporary storage. At the start of the oneliner, it performs a security check using an if statement to ensure that the _e and _c files are not present. If there is a file with that name, the script will not continue.

Regarding using small temporary files versus variables, I can say that while using variables would have been ideal (saves disk I / O), I was having two issues.

The first is that if you EXPORT a variable at the start of the oneliner and use that same variable later, if another script uses the same variable (including that script runs more than once simultaneously on the same machine) , then this script, or this one, may be affected. It is better to avoid such interference when it comes to renaming many files!

The second was that xargs in combination with bash -c seems to have a limitation in handling variables inside the bash -c command line. Even extensive research online has not provided a viable solution for this. So, I ended up using a small _c file which keeps progressing.

_e is the extension we’ll be looking for and using, and _c is a counter that will be automatically increased with each name change. The echo $[ $(cat _c) + 1 ] > The _c code does this, displaying the file with cat, adding a number and rewriting it.

The command also uses the best possible method of handling special file name characters by using a null terminator instead of the standard newline terminator, that is, the 0 character. This is provided by the option -print0 to find, and option -0 for xargs.

The find command will search for all files with the extension specified in the _e file (created by the echo ‘pdf’> _e command. You can vary this extension to any other extension you want, but do not prefix it with a dot The dot is already included in the last *. $ (Cat _e) -name specifier to search for.

After find locates all the files and sends them – 0 done to xargs, xargs renames the files one by one using the counter file (_c) and the same extension file (_e). To get the contents of the two files, a simple cat command is used, executed from a subshell.

The mv move command uses -n to avoid overwriting files already present. Finally, we clean up the two temporary files by deleting them.

Although the cost of using two state files and the subshell fork can be limited, it adds overhead to the script, especially when dealing with a large number of files.

There are all kinds of other solutions for this same problem online, and many have tried and failed to create a fully working solution. Many solutions forgot all kinds of side cases, like using ls without specifying –color = never, which can lead to parsing for hex codes when the directory listing color code is used.

However, other solutions failed to properly handle files with spaces, newlines, and special characters such as “”. To do this, the find … -print0 … | xargs -0 … is generally indicated and ideal (and the find and xargs manuals allude to this fact quite strongly).

While I don’t see my implementation as the perfect or final solution, it seems to make a significant leap over many other solutions, using find and 0 terminated strings, ensuring maximum filename and parse compatibility, as well as a few other niceties like being able to specify a starting offset and being fully native to Bash.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.