personal pages

namesift

This is a writer's/editor's tool.

Give it the names of TEXT files. It will send its results to stdout (the screen). You can redirect that to a file, or through a pager like more or less.

There's no wildcarding here, so the program is more powerful on a Unix/Linux system, where the shell expands wildcards into a list of files. On a WinDOS system, you'll need to feed it a list from a wildcarder.

There's also no directory-recursion, so running it from a recursive shell is helpful for multidirectory mirrors.

This program is designed to sift for proper names in a story.

Its initial job was to sift a local mirror of the entire fanfic.tass-anime.com archive for secondary-character and new-character names, looking towards building a reference list or a database of such.

This was triggered by Fred Herriot's use in "Lonely Souls" of the hormone-driven high-schooler Enzo created in Kenko's "Girl Days". As fan-canon has pretty much voted that Mrs. Tendo's name was Kimiko, so Enzo will probably be adopted as an enduring character in the Ranmaverse, and there are undoubtedly other such characters, enough to fill several Furinkan classrooms and as many blocks of Nerima dwellings.

It can also be used to find proper-name typos.

The program looks for two Proper-cased words in a row and reports them. If you have three Proper-cased words in a row, it will report two pairs. The program ignores all punctuation, so little things like ends of sentences won't stop it; only a paragraph break (a blank line) will restrain its combinatorial zeal.

"Mrs" is a proper name to this program; that's deliberate, so as to catch all the names for the ladle lady, Nodoka's chatty neighbors in Nerima and/or Juuban, and Kasumi's greengrocer. To make that really robust, I suppose I could have had it treat "-san" as a proper name; maybe in the next revision if there is one.

The program counts occurrences of the names it lists, and sorts entries first by number and then by first name. Because of this, it will go through every listed file before producing any output. That can take a while even on a fast machine, so go get some coffee rather than panicking.

There is an exclusion-list in the tail end of the program, after the __END__ statement. Add any names you don't want listed to this list, one word per line. I've already put in some words that sensitive people might not want to have appear on their screens; you have been warned.

--siaru


personal pages