URL Substitution

Parent Previous Next

When you tap the headline of an article on the front page of mulitFEED the URL of the article body from the feed XML is used view the article with QuickView or the full BlackBerry browser. Sometimes there is a more suitable version of the article at a slightly different permutation of the regular URL. multiFEED let's you to transparently modify the URL on the fly to retrieve the alternate version of the article.

Simple Substitution Example

The Register is a technology oriented web site that provides a substantial number of Atom feeds targeting different technology topics. The Atom XML article links point to standard desktop versions of the article pages, which are too graphically dense for easy reading on a handheld device. The good news is that every article has a "mobile" formatted version whose URL is only slightly different that the one in the feed XML. The first part of a typical article URL from one of The Register's Atom feeds looks like this:

http://go.theregister.com/feed/www.theregister.co.uk/<path to article>

In this situation if you change the "www" in the middle to "m" you get the exact same article, except formatted for mobile devices...

http://go.theregister.com/feed/m.theregister.co.uk/<path to article>

Note that in the example above the actual article URL is reached via a redirector (the http://go.theregister.com/feed/ portion). This allows the web server to redirect the browser to a different URL if there is a more suitable one for the browser being used. This slightly slows initial page loading, and we already know the URL we want to use so we can shorten this to:

http://m.theregister.co.uk/<path to article>

We are not quite done yet. Although almost all Register articles referenced in the Atom feeds are located at the http://theregister.co.uk domain, a few are at http://m.channel.register.co.uk instead so we need to take that into account too. Now you can use this knowledge to configure multiFEED to always make this change in the background before you ever see the article page. Simply tap the URL Substitution drop down and choose whether to modify the URL when viewing the articles in QuickView, or in the full BlackBerry browser, or both. A new drop down will appear below that one to select the substitution type. You have three choices, Simple, RegExp, and RegExp (greedy). For The Register example above, use Simple substitution, set the Match Text to "http://go.theregister.com/feed/www." and the Replacement Text to "http://m.". Now every time you view an article from The Register you will get the mobile version, regardless of which domain it is hosted on.

RegExp Substitution Example

Simple URL substitutions will suffice in almost all situations, but occasionally the substitutions are too complicated for a simple match-and-replace so multiFEED supports Regular Expression substitutions that use sophisticated pattern matching to determine the text to replace. The Canadian news service CBC provides several national and regional RSS feeds. A typical article URL in one of these feeds looks like this:

http://www.cbc.ca/<path to article>-1.2345678?cmp=rss

Just like the Register, the article URLs actually get redirected by the web server. From a mobile browser, after redirection, the URL above resolves to:

http://www.cbc.ca/m/news/#!/content/1.2345678

As you can see the 1.2345678 portion of the original URL is also used after the redirection as a unique identifier for the article. If we want to bypass the redirector and jump straight to the final URL we need something a bit more sophisticated than a simple match-and-replace. In this case Regular Expressions come to our rescue. Set the URL substitution type to RegExp and enter the following match expression:

\-([0-9]+\.[0-9]+)\?

This is not a tutorial on Regular Expressions, but here is a simple breakdown of what the expression does:


\-

Find a hyphen, followed immediately by...

(

Starts grabbing characters for later substitution.

[0-9]+

One or more decimal characters (0-9) followed immediately by....

\.

a period, followed immediately by...

[0-9]+

one or more decimal characters, followed immediately by..

)

Finishes grabbing characters for later substitution.

\?

A question mark.


Everything between the two parentheses is grabbed by the Regular Expression and can be substituted into the replacement expression, which looks like this:

http://www.cbc.ca/m/news/#!/content/\1

Note the \1 in the replacement expression... this tells the Regular Expression engine to plug in the first block of grabbed text at that location. If we had used more parentheses to grab more than one block from the original URL the additional grabbed blocks could be plugged in with \2, \3, etc.


Regular Expressions are very powerful, and can be used for very sophisticated substitutions, but are quite difficult to master. For more information look here.