DWITE Online Computer Programming Contest, January 2008, Problem 3
There's a lot of spam on the internet – blog comments, forum posts, etc., all done for the purpose of planting enough links and influencing search engines such as Google to think that a certain page is more important than it should be. One of the solutions is to mark untrusted links with rel="nofollow"
tag, telling spiders to ignore the link. A sample link might look like:
<a href="http://compsci.ca/" title="Computer Science Canada" rel="nofollow">sample link</a>
The goal is to write a program that will find all the links in a text file, and insert nofollow
tags properly. rel=""
should be inserted as the last property of the link, unless it already exists. nofollow
tag should be inserted last in the rel=
string, unless it already exists. Rel could have multiple tags, space separated. Refer to the sample input for examples.
The input will contain five lines of text, each will contain one link, in the form <a*>*</a>
. Links might be surrounded by filler text. Each line will be no more than 255 characters long.
The output will contain five lines – just the parsed links.
Sample Input
This is a <a>sample link</a>.
<a rel="" href="http://dwite.org/">link with rel</a>
<a href="http://compsci.ca/" rel="nofollow">link with no follow</a>
<a href="http://compsci.ca/blog" rel="external">more rels</a>
text <a href="http://compsci.ca/v3/viewforum.php?f=131" title="">link</a> more text
Sample Output
<a rel="nofollow">sample link</a>
<a rel="nofollow" href="http://dwite.org/">link with rel</a>
<a href="http://compsci.ca/" rel="nofollow">link with no follow</a>
<a href="http://compsci.ca/blog" rel="external nofollow">more rels</a>
<a href="http://compsci.ca/v3/viewforum.php?f=131" title="" rel="nofollow">link</a>
Problem Resource: DWITE
Comments