Tutorial - How to split a Tab-Indented Keyword List
How to split a tab-indented list? The answer would seem simple enough: open it in a text editor and cut/paste the parts you want into new text files. In actual fact it is a little more complicated than that, and mistakes can easily be made. This tutorial was written to show you how to split tab-indented list files correctly.
First of all, you need a suitable program that can open a tab-indented keyword list. A Text Editor is the best choice. You might already have a favorite one that you are used to. The essential requirements for a suitable Text Editor program are:
- Saves its files in the UTF-8 format
- Visualizes hidden characters, such as TAB and SPACE
Other useful functions are:
- Global and selective sorting
- Box selection
- Spell checking
- Case changing and Capitalizing
- Search and Replace
Won't a Word Processor program do just as well? You might be quite familiar with your favorite Word Processor program, and quite like the way it works. Unfortunately, Word Processors, whilst trying to be helpful, ordered, and pretty, save more than just the basic text information. Font height and color, paragraphs and page footers: none of these are useful in a Photo Keyword List if we then wish to import it into LightRoom.
My suggestion for a suitable Text Editor program is 'PSPad'. Its free to download and use, (though they do ask for a donation), and is just the thing to get you started.
NB: When you start PSPad for the first time, click 'Settings > Program Settings > Editor (part 2)'. You should then check the box marked 'Real Tabs', and set the 'Tab Width' to 6 or more. I'd also recommend that you click 'Format > Font' and choose a fixed-width font, like Courier New.
Here we see a screen capture of the Controlled Vocabulary Keyword List that we will be working on displayed in 'PSPad'. I've clicked 'View > Special Characters' to get the program to display the characters that are normally hidden. TAB is indicated by the little '>>' symbol. SPACE is indicated by a dot or period. CARRIAGE RETURN/LINE FEED is indicated by backwards-P at the end of each line. (Its called a 'Pilcrow'.) The top-level category in the Keyword List '[3 - WHERE]' is positioned to the left of its line, with no preceding characters. The next level down in the keyword hierarchy '[3-01 GEOGRAPHIC]' has one TAB character in front of it. Each subsequent level down in the keyword hierarchy has one more TAB character in front of it than the line above. It is thus quite easy to visualize the hierarchical structure when you look at the text file.
So, returning to the original question of how to split the Keyword List: Lets assume that we wish to split this list into two files that can then be imported into Lightroom, or other image processing and viewing programs. One section will include all the cities in Alabama, and the other section will include all the cities in Alaska.
Creating the first file is easy enough. We select the line with 'Alaska' in it and every line below, and click 'Cut'. The result is shown in the image above. The file that we are left with can be saved separately as 'alabama.txt', and then imported into Lightroom by opening that program and clicking 'Metadata > Import Keywords'.
If we then open a New textfile and 'Paste' the clipboard contents into it, we will get a result identical to the image below. Lets save that as 'alaska.txt'. We then import this file into Lightroom as well, by clicking the same 'Metadata > Import Keywords'.
The result is shown in the image below. The 'alabama.txt' file has imported correctly, and its full hierarchy is displayed in the screen shot of the Lightroom 'Keyword Tags' panel. But what has happened to the 'alaska.txt' entries? Rather than positioning within the same '[3 - WHERE] > [3-01 GEOGRAPHIC] > North America' hierarchy as we wanted, Alaska has re-aligned to the left-hand edge as a new top-level category.
The reason for this is that Lightroom has no way to know that the Alaska entries are part of a bigger hierarchy. It ignored the leading-tabs and started calculating hierarchy from the first word it encountered. To fix this we need to add the parent hierarchy for Alaska to the 'alaska.txt' file. You can see what this looks like in the image below.
Back in Lightroom, I restarted with a clean slate, removing all keywords by clicking 'Metadata > Purge Unused Keywords'. I then imported the 'alabama.txt' file by clicking 'Metadata > Import Keywords', then did the same with the ammended 'alaska.txt' file. The result can be seen in the image below: everything is just the way we expected it to be.
By following the above example we have seen that when splitting a tab-indented file, it is necessary that each section has its full hierarchy included. It is quite alright for some words to be duplicated, such as the lines:
[3 - WHERE] [3-01 GEOGRAPHIC] North America
in the example above, as Lightroom and other programs will delete the unnecessary levels during the import process. We can safely load multiple small tab-indented keyword lists into Lightroom. The order that they are imported is not important, as Lightroom will always display the list in alphabetical order.
Advanced Keyword List Splitting
A text-editor is fine to start with, but what happens if you want to split your list automatically, and not have to worry about the indent-values? Wouldn't it be nice to be able to rebuild your list from smaller split-files too, and have all errors spotted automatically? Surprisingly, there are remarkably few programs designed to work with tab-indented lists. I used to create my own tools to build my own lists, as there was nothing else that did what I needed. Eventually I put all of the tools together, added a raft of new features, and now sell the program from this website - see my Tab-List Tools page.