Contents - Index


How to Check for Duplicates in a List

 

As mentioned in the 'Errors' section, there is a difference between a 'Matching Line' and a 'Duplicate'. 'Matching Lines' are defined as identical lines: same text, same number of indents, whereas 'Duplicates' refer to repeated words anywhere in the text, even if they have a different indent level or parent hierarchical structure. Matching Lines are covered in the 'Tools > Errors' page. Here we will talk about finding and stepping through Duplicates.

 

Let us not think of Duplicates as errors. There may be a number of reasons why we need to have the same word at different places in a controlled vocabulary. For instance, 'bed' is something that we sleep in, but 'bed' is also something that we grow vegetables in. Both are distinct items, and both are needed, in different branches of the hierarchy. Obviously though, there are other occasions where the duplication of a word is a mistake, or unnecessary, so the Duplicates tool works hard to find all duplicates and presents the results in a list that can be easily checked through.

 

To demonstrate the 'Duplicates' tool, load the 'Tools > Testing > Wedding Guests' test-list, then click 'Tools > Duplicates > Status Check'. The program will report that a number of Duplicate Groups have been found. The first one, shown at the top of the left-hand 'Duplicate Groups' box, is 'ADOPTION'. To the right of that is the 'Duplicates' box, which contains all the duplicates of the word 'ADOPTION'.

 

You can step through all the duplicate groups using the First/Next/Prev/Last buttons, and the Workspace text will scroll accordingly. If you switch from 'Groups' to 'Duplicates' in the bottom left corner, you can step through the individual duplicates within that group. Notice how the first member of a Duplicate Group is highlighted in the Workspace in a darker green than other members of that group.

 

If, whilst working through the duplicates in your list you decide that a particular Duplicate Group is not important to you at the moment, you can 'Ignore' it by right-clicking and selecting 'Toggle Group Ignore-State'. The group will then be hidden, though not forgotten! Click 'Show Ignored Duplicates' to see them again, though notice that an asterisk '*' has been added between the line-number and the text on any item that has been ignored. You can also 'Ignore' individual Duplicates within a Group by right-clicking on them and selecting 'Toggle Duplicate Ignore-State' in the same way. Two counters reminds you of the number of groups and duplicates that you are currently hiding. Note that if you move away from the 'Tools > Duplicates' page, your 'Ignore' selections will be lost.