Saturday, May 22, 2010

Fixing your feed part 2

Update: This report is obsolete. Online stuff changes from time to time, and for this topic a revision would be impractical.

Read more about archived posts, if you like.

English-language syntax is tricky, and the Short pipe does a number of little tricky things to compensate. Even so, the generic replacement rules sometimes run into an exception that screws things up.

For instance, my pipe was evaluating the text string "U.S." as a place that needed a line break after the first period.

Why? The Short Feed pipe evaluates any period followed by letter with no intervening space as the end of a paragraph, in need of some line returns. (The Full pipe does not do this.)

I could have edited the post where I originally wrote "U.S." to be simply "US," but that would have been incorrect usage and I am fussy about that sort of thing. So I wrote an exception into the pipe.

I will show you what I did so that you can write simple exceptions too. (For example, you might write about time and say "11:30 A.M.") As a first step, you must go to your pipe and click Edit.

Your pipe is made of modules that operate on the elements of your feed as it flows through the pipe.


Near the beginning of the pipe, after a small module called "Reverse" (which is the one that puts your feed into chronological order) is a Regex module. It only has one rule in it, which is to replace all instances of "U.S." with "@US@".

Notice that the string "U.S." is specified, in the Replace field, as "U\.S\." That is, there is a left-leaning slash before the periods. In the syntax of regular expressions, the period has a special meaning--unless it is preceded by the slash, in which case it (and the slash) equals a period.

If you were to make an exception for "A.M." you would search for "A\.M\."

Note also that there is a checked-off box by the letter "g" towards the end of the line. That means the search will be applied globally to the whole feed at once, not just to the first instance of "U.S."

Making this change at the start of the pipe means that the later search-replace operations won't put line returns between U. and S. But once we are past that part of the pipe we can change things back.

Four modules from the end of the pipe is another Regex module, and one of the rules searches for "@US@" and turns it back into "U.S." This rule is also global. Note that the syntax rule about putting a slash in front of every period does not apply to the "replace with" field, in fact putting the slashes there will put slash characters into the feed.

You can use this method to fix other anomalies that crop up in your feed.

For instance, in one post on my blog I begin with a quote from Henry David Thoreau followed by his name and no punctuation. Then I begin the next paragraph by saying, "Thoreau asked for...."
Bring me an apple from the Tree of Life!
--Henry David Thoreau

Thoreau asked for....
All fine and proper, but the feed stripped away the returns and gave me
Bring me an apple from the Tree of Life!--Henry David ThoreauThoreau asked for....
My pipe fixed most of that, but was unable to recognize the need for line breaks after the unpunctuated "Thoreau." So the pipe did not fix "ThoreauThoreau."

I chose to fix this by replacing "ThoreauThoreau" with "Thoreau<p></p>Thoreau" (the "<p></p>" providing the line returns). I put this in the first Regex box along with "U\.S\." and a number of other rules, but unlike the "U.S." search could just as easily have put it in the last one.

I also did not mark "ThoreauThoreau" as a global search. It's unusual enough to be a one-off, and searching globally slows things down.

I think of these and similar anomalies as unusual exceptions to general text-handling rules.

With these two examples, I hope I have given you the tools to fix your "unusual exceptions." Just add search and replace terms to the existing modules by clicking the + Rules button and filling in the blanks.

My own private version of this pipe that I use for my blog has been tweaked to deal with the above exceptions and also "J.R.R. Tolkien," "ZestarHere's," "NASAThe," "best;'T is," and several other oddities that came about when the Short option stripped the HTML from my feed.

If you use the Short feed and blog regularly, you are bound to generate some exceptions of your own.

Since I continue to blog, I proofread the newest parts of my own Journey feed to catch new "exceptions" that occur periodically.

After editing trailing spaces from your original blog post, this is the final step in polishing your first-post-first feed of your blog.*

*(Another step, editing the modules to allow for additional line returns, is only needed if you are running a Short feed through the pipe and if you have many short paragraphs in the first 250 words of any of your posts. It is detailed further here.)

Congratulations, and let me know how it turned out!

No comments:

Post a Comment