The word canonical is a religion-related term, and means "according to canon law, scripture or doctrine." But in general use, it just means "usual, standard, conventional, customary, or according to the rules." So as a Webmaster, you choose what single domain you want to use for your site, and what single URL should be used to request each of your pages.
This article also provides very good advice that it's best to avoid "stacked redirects" --multiple redirects invoked by a single client request-- while doing things like index page and domain canonicalization.
Here is domain/URL canonicalizaton and type-in fixup routine that would do the following:
Do all of the above using a single 301-Moved Permanently redirect The result is a routine that can "correct" a request from a badly-coded link like:
<a href="http://example.com/index,hmtl>for more info, click here</a>
where the closing quote has been omitted on the link, "html" was mis-typed, and a comma was typed where the filetype-separator period should be.
The result of a click on that link is a request for "http://example.com/index,hmtl%3Efor%20more%20info,%20click%20here%3C/a%3E
The code will redirect that to the canonical domain and index page URL "www.example.com/" using a single redirect, correcting the comma and "hmtl", and stripping off the spurious path info along the way. Or it can fix up multiple slashes or periods, or remove trailing punctuation from links improperly embedded in text, or automatically-linked in forum posts, e.g. "For help with this code, see http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html."
This is done with a 301-Moved Permanently redirect, so that search engines are notified not to list or use the incorrect URL, but to replace it with the corrected/canonicalized one.
This code is intended for the most common Apache hosting set-ups: shared virtual hosting on Apache 1.3.x, with configuration options limited to .htaccess files only.
Update: It was discovered that Apache 2.0.52 has the same bug as Apache 1.3.x. Although the original bug report was closed with a statement that this bug was fixed in Apache 2.0.30, it was apparently not fixed completely. Therefore, the solution presented here applies to Apache 2.0 as well.
This routine is the right solution for my sites, which follow *my* strict URL conventions, but likely not for yours. Modification will almost certainly be required. Like almost all mod_rewrite code, this is not a simple cut-and-paste or find-and-replace proposition.
It does fix-ups on only the most common URL errors I have seen in my logs, but of course, there are many others; The code is not meant to exhaustively cover all possible errors, just the most common ones on my sites.
This code is to be examined and perhaps modified by Webmasters who are conversant and comfortable with mod_rewrite and regular expressions. Again, this code is not an entry-level exercise, and the most likely result of trying to modify it without thoroughly understanding and testing it is a disaster -- the best of which would be an immediate server crash, and the worst of which might be to thoroughly trash your search engine rankings.
Code:
# .htaccess
#
# Specify IP address(es) used by Webmaster, admins, & testers. These may access
# the server by its unique IP address without being redirected to the domain.
# Also, URLs are *not* corrected for access by this group, in order to prevent
# this code from "hiding" problems during development.
# Note that these addresses are those of your workstations, not your server.)
SetEnvIf Remote_Addr ^192\.168\.1\. TestIP=true
SetEnvif Remote_Addr ^10\.10\.45\.3$ TestIP=true
SetEnvIf Remote_Addr ^127\.0\.0\.[1-7]$ TestIP=true
#
#
# Setup: Enable mod_rewrite, disable MultiViews
Options +FollowSymLinks -MultiViews
RewriteEngine on
#
# Redirect non-problematic URLs
# Note: The fix-up code below is complex, and is intended for use to fix only
# generally-specified problematic URL requests. For administrative redirection
# of specific non-problematic URLs, 'normal' redirects should be placed here.
#
RewriteRule ^old_page\.html$ http://www.example.com/new_page.html [R=301,L]
RewriteRule ^old_page2\.htm$ http://www.example.com/new_page2.htm [R=301,L]
#
#
# URL FIXUP REDIRECT ROUTINE
#
# This code corrects various problems with URLs, presumably due to typos in
# links from other sites. It is complicated by measures taken to avoid a
# mod_rewrite bug in Apache 1.3. ( See http://archive.apache.org/gnats/7879 )
# This code uses a single external redirect to correct all detected problems.
#
# Skip next two rules if lowercasing in progress
# (Remove this rule if case-conversion plug-in below is removed)
RewriteCond %{ENV:qLow} ^yes$ [NC]
RewriteRule . - [S=2]
#
# Prevent recursion and over-writing of myURI and myQS
RewriteCond %{ENV:qRed} ^yes$ [NC]
RewriteRule .? - [L]
#
# Get the client-requested full URI and full query string
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ (/[^?]*)(\?[^\ ]*)?\ HTTP/
RewriteRule .? - [E=myURI:%1,E=myQS:%2]
#
#
###############################################
# Uppercase to lowercase conversion plug-in
# (This section, along with the first noted rule
# above, may be removed if not needed or wanted)
#
# Skip next 28 rules if no uppercase letters in URL
RewriteCond %{ENV:myURI} ![A-Z]
RewriteRule .? - [S=28]
#
# Else swap them out, one at a time
RewriteCond %{ENV:myURI} ^([^A]*)A(.*)$
RewriteRule . - [E=myURI:%1a%2]
RewriteCond %{ENV:myURI} ^([^B]*)B(.*)$
RewriteRule . - [E=myURI:%1b%2]
RewriteCond %{ENV:myURI} ^([^C]*)C(.*)$
RewriteRule . - [E=myURI:%1c%2]
RewriteCond %{ENV:myURI} ^([^D]*)D(.*)$
RewriteRule . - [E=myURI:%1d%2]
RewriteCond %{ENV:myURI} ^([^E]*)E(.*)$
RewriteRule . - [E=myURI:%1e%2]
RewriteCond %{ENV:myURI} ^([^F]*)F(.*)$
RewriteRule . - [E=myURI:%1f%2]
RewriteCond %{ENV:myURI} ^([^G]*)G(.*)$
RewriteRule . - [E=myURI:%1g%2]
RewriteCond %{ENV:myURI} ^([^H]*)H(.*)$
RewriteRule . - [E=myURI:%1h%2]
RewriteCond %{ENV:myURI} ^([^I]*)I(.*)$
RewriteRule . - [E=myURI:%1i%2]
RewriteCond %{ENV:myURI} ^([^J]*)J(.*)$
RewriteRule . - [E=myURI:%1j%2]
RewriteCond %{ENV:myURI} ^([^K]*)K(.*)$
RewriteRule . - [E=myURI:%1k%2]
RewriteCond %{ENV:myURI} ^([^L]*)L(.*)$
RewriteRule . - [E=myURI:%1l%2]
RewriteCond %{ENV:myURI} ^([^M]*)M(.*)$
RewriteRule . - [E=myURI:%1m%2]
RewriteCond %{ENV:myURI} ^([^N]*)N(.*)$
RewriteRule . - [E=myURI:%1n%2]
RewriteCond %{ENV:myURI} ^([^O]*)O(.*)$
RewriteRule . - [E=myURI:%1o%2]
RewriteCond %{ENV:myURI} ^([^P]*)P(.*)$
RewriteRule . - [E=myURI:%1p%2]
RewriteCond %{ENV:myURI} ^([^Q]*)Q(.*)$
RewriteRule . - [E=myURI:%1q%2]
RewriteCond %{ENV:myURI} ^([^R]*)R(.*)$
RewriteRule . - [E=myURI:%1r%2]
RewriteCond %{ENV:myURI} ^([^S]*)S(.*)$
RewriteRule . - [E=myURI:%1s%2]
RewriteCond %{ENV:myURI} ^([^T]*)T(.*)$
RewriteRule . - [E=myURI:%1t%2]
RewriteCond %{ENV:myURI} ^([^U]*)U(.*)$
RewriteRule . - [E=myURI:%1u%2]
RewriteCond %{ENV:myURI} ^([^V]*)V(.*)$
RewriteRule . - [E=myURI:%1v%2]
RewriteCond %{ENV:myURI} ^([^W]*)W(.*)$
RewriteRule . - [E=myURI:%1w%2]
RewriteCond %{ENV:myURI} ^([^X]*)X(.*)$
RewriteRule . - [E=myURI:%1x%2]
RewriteCond %{ENV:myURI} ^([^Y]*)Y(.*)$
RewriteRule . - [E=myURI:%1y%2]
RewriteCond %{ENV:myURI} ^([^Z]*)Z(.*)$
RewriteRule . - [E=myURI:%1z%2]
#
# Set lowercasing-in-progress flag
RewriteRule . - [E=qLow:yes]
#
# If any uppercase characters remain, re-start
# mod_rewrite processing from the beginning
RewriteCond %{ENV:myURI} [A-Z]
RewriteRule . - [N]
#
# If any characters were lowercased, set redirect required
# flag and reset lowercasing-in-progress flag
# (S=28 from above lands here)
RewriteCond %{ENV:qLow} ^yes$ [NC]
RewriteRule . - [E=qRed:yes,E=qLow:done]
#
# End Uppercase to lowercase conversion plug-in
###############################################
#
# Fix non-canonical domain requests (except for valid
# subdomains & stats accessed by unique server IP/address)
RewriteCond %{HTTP_HOST} !^(www|dev|test)\.example\.com(:80)?$
RewriteCond %{HTTP_HOST}<>%{ENV:TestIP} !^192\.168\.0\.101(:80)?<>true$ [NC]
RewriteCond %{HTTP_HOST}<>%{REQUEST_URI} !^192\.168\.0\.101(:80)?<>/stats/
RewriteRule .? - [E=qRed:yes]
#
# Replace "hmtl" with "html"
RewriteCond %{ENV:myURI} ^([^.,]+)[.,]+hmtl [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1.html]
#
# Replace comma(s) or multiple filetype delimiter periods in page filepaths
# with a single period (e.g. "/page,html" or "/page..html")
RewriteCond %{ENV:myURI} ^([^,.]+)([,.]{2,}|,)((s?html?|php[1-9]?|pdf|xls).*)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1.%3]
#
# Remove invalid trailing characters
RewriteCond %{ENV:myURI} ^([/0-9a-z._\-]*)[^/0-9a-z._\-] [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Fix additional directory paths appended to filenames (/logo.jpg/<directory_path>)
RewriteCond %{ENV:myURI} ^([^.]+\.[^/]+)/
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Remove trailing punctutation
RewriteCond %{ENV:myURI} ^(.*)[._\-]+$
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Remove multiple contiguous slashes in URL (up to three instances)
RewriteCond %{ENV:myURI} ^(.*)//+(.*)$
RewriteRule . - [E=qRed:yes,E=myURI:%1/%2,C]
RewriteCond %{ENV:myURI} ^(.*)//+(.*)$
RewriteRule . - [E=myURI:%1/%2,C]
RewriteCond %{ENV:myURI} ^(.*)//+(.*)$
RewriteRule . - [E=myURI:%1/%2]
#
# Redirect direct client requests for "<anything>/index.html" to "<anything>/"
RewriteCond %{ENV:myURI} ^(/([^/]+/)*)index\.html [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Redirect specific replaced/relocated pages to specific new pages
# (Note: This is 'doing it the hard way,' and only URLs that have
# been requested with typos/type-ins or other problems should be
# included here. A straight 301 redirect rule located above all of
# the code shown here can be used to redirect non-problematic URLs)
RewriteCond %{ENV:myURI}<>/locales.html ^/location\.html<>(.+)$ [NC,OR]
RewriteCond %{ENV:myURI}<>/about/widgets-intl.html ^/about/local-widgets\.html<>(.+)$ [NC,OR]
RewriteCond %{ENV:myURI}<>/selector/widget-selector.html ^/selector/widgets[^.]+\.xls<>(.+)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Redirect all pages in old directories to same-named pages in new directories
RewriteCond /new_dir1<>%{ENV:myURI} ^([^<]+)<>/old_dir1(.+)$ [NC,OR]
RewriteCond /new_dir2<>%{ENV:myURI} ^([^<]+)<>/old_dir2(.+)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1%2]
#
# Redirect old filetype to new filetype
RewriteCond %{ENV:myURI}<>.jpg ^(/[^.]+)\.jpeg<>(.+)$ [NC,OR]
RewriteCond %{ENV:myURI}<>.php5 ^(/[^.]+)\.php4<>(.+)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1%2]
#
# Correct bad query string on products page link
RewriteCond %{ENV:myURI} ^/products\.php$
RewriteCond %{ENV:myQS} ^(([^&]+&)*)product=w1234(&.+)?$
RewriteRule . - [E=qRed:yes,E=myQS:%1product=w01234%3,S=2]
#
# Remove blank query strings from all URLs
RewriteCond %{ENV:myQS} ^\?$
RewriteRule .? - [E=qRed:yes,S=1]
#
# Remove spurious query strings from non-dynamic pages
RewriteCond %{ENV:myQS} ^\?
RewriteCond %{ENV:myURI} !^/(locales|test)\.html$
RewriteCond %{ENV:myURI} !^/(cats|products)\.php$
RewriteCond %{ENV:myURI} !^/cgi-bin/
RewriteRule .? - [E=qRed:yes,E=myQS:?]
#
# Do the external 301 redirect only if the referrer is
# not our own site, the resource exists at the corrected URL,
# and the requesting IP is not that of our site tester.
# (Note: Some of these conditions have been commented-out for code testing.
# Once the code has been tested thoroughly, be sure to un-comment these lines.)
RewriteCond %{ENV:qRed} ^yes$ [NC]
#RewriteCond %{ENV:TestIP} !^true$ [NC]
RewriteCond %{HTTP_REFERER} !^http://((www|dev|test)\.)?example\.(org|com)
RewriteCond %{HTTP_REFERER} !^http://192\.168\.0\.101(:80)?/?
#RewriteCond %{DOCUMENT_ROOT}%{ENV:myURI} -f [OR]
#RewriteCond %{DOCUMENT_ROOT}%{ENV:myURI} -d
RewriteRule .? http://www.example.com%{ENV:myURI}%{ENV:myQS} [R=301,L]
#
# ##### End URL fixup redirect routine ###### .htaccess
#
# Specify IP address(es) used by Webmaster, admins, & testers. These may access
# the server by its unique IP address without being redirected to the domain.
# Also, URLs are *not* corrected for access by this group, in order to prevent
# this code from "hiding" problems during development.
# Note that these addresses are those of your workstations, not your server.)
SetEnvIf Remote_Addr ^192\.168\.1\. TestIP=true
SetEnvif Remote_Addr ^10\.10\.45\.3$ TestIP=true
SetEnvIf Remote_Addr ^127\.0\.0\.[1-7]$ TestIP=true
#
#
# Setup: Enable mod_rewrite, disable MultiViews
Options +FollowSymLinks -MultiViews
RewriteEngine on
#
# Redirect non-problematic URLs
# Note: The fix-up code below is complex, and is intended for use to fix only
# generally-specified problematic URL requests. For administrative redirection
# of specific non-problematic URLs, 'normal' redirects should be placed here.
#
RewriteRule ^old_page\.html$ http://www.example.com/new_page.html [R=301,L]
RewriteRule ^old_page2\.htm$ http://www.example.com/new_page2.htm [R=301,L]
#
#
# URL FIXUP REDIRECT ROUTINE
#
# This code corrects various problems with URLs, presumably due to typos in
# links from other sites. It is complicated by measures taken to avoid a
# mod_rewrite bug in Apache 1.3. ( See http://archive.apache.org/gnats/7879 )
# This code uses a single external redirect to correct all detected problems.
#
# Skip next two rules if lowercasing in progress
# (Remove this rule if case-conversion plug-in below is removed)
RewriteCond %{ENV:qLow} ^yes$ [NC]
RewriteRule . - [S=2]
#
# Prevent recursion and over-writing of myURI and myQS
RewriteCond %{ENV:qRed} ^yes$ [NC]
RewriteRule .? - [L]
#
# Get the client-requested full URI and full query string
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ (/[^?]*)(\?[^\ ]*)?\ HTTP/
RewriteRule .? - [E=myURI:%1,E=myQS:%2]
#
#
###############################################
# Uppercase to lowercase conversion plug-in
# (This section, along with the first noted rule
# above, may be removed if not needed or wanted)
#
# Skip next 28 rules if no uppercase letters in URL
RewriteCond %{ENV:myURI} ![A-Z]
RewriteRule .? - [S=28]
#
# Else swap them out, one at a time
RewriteCond %{ENV:myURI} ^([^A]*)A(.*)$
RewriteRule . - [E=myURI:%1a%2]
RewriteCond %{ENV:myURI} ^([^B]*)B(.*)$
RewriteRule . - [E=myURI:%1b%2]
RewriteCond %{ENV:myURI} ^([^C]*)C(.*)$
RewriteRule . - [E=myURI:%1c%2]
RewriteCond %{ENV:myURI} ^([^D]*)D(.*)$
RewriteRule . - [E=myURI:%1d%2]
RewriteCond %{ENV:myURI} ^([^E]*)E(.*)$
RewriteRule . - [E=myURI:%1e%2]
RewriteCond %{ENV:myURI} ^([^F]*)F(.*)$
RewriteRule . - [E=myURI:%1f%2]
RewriteCond %{ENV:myURI} ^([^G]*)G(.*)$
RewriteRule . - [E=myURI:%1g%2]
RewriteCond %{ENV:myURI} ^([^H]*)H(.*)$
RewriteRule . - [E=myURI:%1h%2]
RewriteCond %{ENV:myURI} ^([^I]*)I(.*)$
RewriteRule . - [E=myURI:%1i%2]
RewriteCond %{ENV:myURI} ^([^J]*)J(.*)$
RewriteRule . - [E=myURI:%1j%2]
RewriteCond %{ENV:myURI} ^([^K]*)K(.*)$
RewriteRule . - [E=myURI:%1k%2]
RewriteCond %{ENV:myURI} ^([^L]*)L(.*)$
RewriteRule . - [E=myURI:%1l%2]
RewriteCond %{ENV:myURI} ^([^M]*)M(.*)$
RewriteRule . - [E=myURI:%1m%2]
RewriteCond %{ENV:myURI} ^([^N]*)N(.*)$
RewriteRule . - [E=myURI:%1n%2]
RewriteCond %{ENV:myURI} ^([^O]*)O(.*)$
RewriteRule . - [E=myURI:%1o%2]
RewriteCond %{ENV:myURI} ^([^P]*)P(.*)$
RewriteRule . - [E=myURI:%1p%2]
RewriteCond %{ENV:myURI} ^([^Q]*)Q(.*)$
RewriteRule . - [E=myURI:%1q%2]
RewriteCond %{ENV:myURI} ^([^R]*)R(.*)$
RewriteRule . - [E=myURI:%1r%2]
RewriteCond %{ENV:myURI} ^([^S]*)S(.*)$
RewriteRule . - [E=myURI:%1s%2]
RewriteCond %{ENV:myURI} ^([^T]*)T(.*)$
RewriteRule . - [E=myURI:%1t%2]
RewriteCond %{ENV:myURI} ^([^U]*)U(.*)$
RewriteRule . - [E=myURI:%1u%2]
RewriteCond %{ENV:myURI} ^([^V]*)V(.*)$
RewriteRule . - [E=myURI:%1v%2]
RewriteCond %{ENV:myURI} ^([^W]*)W(.*)$
RewriteRule . - [E=myURI:%1w%2]
RewriteCond %{ENV:myURI} ^([^X]*)X(.*)$
RewriteRule . - [E=myURI:%1x%2]
RewriteCond %{ENV:myURI} ^([^Y]*)Y(.*)$
RewriteRule . - [E=myURI:%1y%2]
RewriteCond %{ENV:myURI} ^([^Z]*)Z(.*)$
RewriteRule . - [E=myURI:%1z%2]
#
# Set lowercasing-in-progress flag
RewriteRule . - [E=qLow:yes]
#
# If any uppercase characters remain, re-start
# mod_rewrite processing from the beginning
RewriteCond %{ENV:myURI} [A-Z]
RewriteRule . - [N]
#
# If any characters were lowercased, set redirect required
# flag and reset lowercasing-in-progress flag
# (S=28 from above lands here)
RewriteCond %{ENV:qLow} ^yes$ [NC]
RewriteRule . - [E=qRed:yes,E=qLow:done]
#
# End Uppercase to lowercase conversion plug-in
###############################################
#
# Fix non-canonical domain requests (except for valid
# subdomains & stats accessed by unique server IP/address)
RewriteCond %{HTTP_HOST} !^(www|dev|test)\.example\.com(:80)?$
RewriteCond %{HTTP_HOST}<>%{ENV:TestIP} !^192\.168\.0\.101(:80)?<>true$ [NC]
RewriteCond %{HTTP_HOST}<>%{REQUEST_URI} !^192\.168\.0\.101(:80)?<>/stats/
RewriteRule .? - [E=qRed:yes]
#
# Replace "hmtl" with "html"
RewriteCond %{ENV:myURI} ^([^.,]+)[.,]+hmtl [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1.html]
#
# Replace comma(s) or multiple filetype delimiter periods in page filepaths
# with a single period (e.g. "/page,html" or "/page..html")
RewriteCond %{ENV:myURI} ^([^,.]+)([,.]{2,}|,)((s?html?|php[1-9]?|pdf|xls).*)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1.%3]
#
# Remove invalid trailing characters
RewriteCond %{ENV:myURI} ^([/0-9a-z._\-]*)[^/0-9a-z._\-] [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Fix additional directory paths appended to filenames (/logo.jpg/<directory_path>)
RewriteCond %{ENV:myURI} ^([^.]+\.[^/]+)/
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Remove trailing punctutation
RewriteCond %{ENV:myURI} ^(.*)[._\-]+$
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Remove multiple contiguous slashes in URL (up to three instances)
RewriteCond %{ENV:myURI} ^(.*)//+(.*)$
RewriteRule . - [E=qRed:yes,E=myURI:%1/%2,C]
RewriteCond %{ENV:myURI} ^(.*)//+(.*)$
RewriteRule . - [E=myURI:%1/%2,C]
RewriteCond %{ENV:myURI} ^(.*)//+(.*)$
RewriteRule . - [E=myURI:%1/%2]
#
# Redirect direct client requests for "<anything>/index.html" to "<anything>/"
RewriteCond %{ENV:myURI} ^(/([^/]+/)*)index\.html [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Redirect specific replaced/relocated pages to specific new pages
# (Note: This is 'doing it the hard way,' and only URLs that have
# been requested with typos/type-ins or other problems should be
# included here. A straight 301 redirect rule located above all of
# the code shown here can be used to redirect non-problematic URLs)
RewriteCond %{ENV:myURI}<>/locales.html ^/location\.html<>(.+)$ [NC,OR]
RewriteCond %{ENV:myURI}<>/about/widgets-intl.html ^/about/local-widgets\.html<>(.+)$ [NC,OR]
RewriteCond %{ENV:myURI}<>/selector/widget-selector.html ^/selector/widgets[^.]+\.xls<>(.+)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1]
#
# Redirect all pages in old directories to same-named pages in new directories
RewriteCond /new_dir1<>%{ENV:myURI} ^([^<]+)<>/old_dir1(.+)$ [NC,OR]
RewriteCond /new_dir2<>%{ENV:myURI} ^([^<]+)<>/old_dir2(.+)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1%2]
#
# Redirect old filetype to new filetype
RewriteCond %{ENV:myURI}<>.jpg ^(/[^.]+)\.jpeg<>(.+)$ [NC,OR]
RewriteCond %{ENV:myURI}<>.php5 ^(/[^.]+)\.php4<>(.+)$ [NC]
RewriteRule . - [E=qRed:yes,E=myURI:%1%2]
#
# Correct bad query string on products page link
RewriteCond %{ENV:myURI} ^/products\.php$
RewriteCond %{ENV:myQS} ^(([^&]+&)*)product=w1234(&.+)?$
RewriteRule . - [E=qRed:yes,E=myQS:%1product=w01234%3,S=2]
#
# Remove blank query strings from all URLs
RewriteCond %{ENV:myQS} ^\?$
RewriteRule .? - [E=qRed:yes,S=1]
#
# Remove spurious query strings from non-dynamic pages
RewriteCond %{ENV:myQS} ^\?
RewriteCond %{ENV:myURI} !^/(locales|test)\.html$
RewriteCond %{ENV:myURI} !^/(cats|products)\.php$
RewriteCond %{ENV:myURI} !^/cgi-bin/
RewriteRule .? - [E=qRed:yes,E=myQS:?]
#
# Do the external 301 redirect only if the referrer is
# not our own site, the resource exists at the corrected URL,
# and the requesting IP is not that of our site tester.
# (Note: Some of these conditions have been commented-out for code testing.
# Once the code has been tested thoroughly, be sure to un-comment these lines.)
RewriteCond %{ENV:qRed} ^yes$ [NC]
#RewriteCond %{ENV:TestIP} !^true$ [NC]
RewriteCond %{HTTP_REFERER} !^http://((www|dev|test)\.)?example\.(org|com)
RewriteCond %{HTTP_REFERER} !^http://192\.168\.0\.101(:80)?/?
#RewriteCond %{DOCUMENT_ROOT}%{ENV:myURI} -f [OR]
#RewriteCond %{DOCUMENT_ROOT}%{ENV:myURI} -d
RewriteRule .? http://www.example.com%{ENV:myURI}%{ENV:myQS} [R=301,L]
#
# ##### End URL fixup redirect routine #####
Download the same code from here
Notes:
The "<>" characters used in several RewriteConds above have no special meaning to mod_rewrite and are not regular-expressions operators. They are merely a unique character string that I use to enable unambiguous matching of combined server variable values on a single line by clearly delineating one value from the other.
The rules are in a specific order; Some of the later rules depend upon the actions of previous rules.
Some of the rules have exclusions implemented using RewriteConds. You may not need them at all, or you will very likely need to modify them to suit your site.
The order of the RewriteConds is intentional. In some cases, the given order is required so that back-references will function correctly, and in other cases they are ordered based on performance considerations. For example, it is good to avoid directory-exists and file-exists checks if possible, since they take a lot of time and CPU resources. So these are deferred until all other conditions are met.
A very simple but effective way to test this code is to create a page of non-canonical, mis-typed, and malformed links to your site, and then click those links using a Mozilla or Firefox browser with the "Live HTTP Headers" extension enabled. The server response can then be examined in detail to be sure it's working as expected.
Remember that the code was intentionally designed to *not* correct requests referred from your own site or to correct links when clicked-on by you or testers within your organization, as listed in the exclusion section at the top. It will make your life easier if you leave the RewriteConds in the 301-redirec rule commented-out until you have adapted this code to your site and have thoroughly tested it. Then un-comment those RewriteConds and re-test from a machine that is not part of your development and test network.
A brief explanation of the techniques used here and the point of this exercise: Apache has a nasty mod_rewrite bug that prevents multiple internal rewrites from working properly, except for a few cases where subdirectories are not present in the URL-path. If an attempt is made to rewrite /dir/a.html to /dir/b.html, and then to rewrite /dir/b.html to /dir/c.html in a second RewriteRule, the resulting URL will be /dir/c.html/c.html. The more sequential rewrites are done, the more times the filepath will be added to the end of the URL. And of course, if you add yet another RewriteRule to try to remove it, you still end up with two repeats of the filepath!
As was stated at the outset, it is best to avoid 'stacked' redirects, both to avoid confusing search engine robots, and to facilitate the efficient passing of PageRank/link-popularity through to the redirect target URL.
Both of the above problems are addressed in this code through the use of environment variables: "qRed" to flag a queued external redirect, "myURI" to hold the URL as it is tested and modified, and "myQS" to hold the query string as it is tested and modified. By using these second two variables to completely by-pass Apache's normal URI-handling variables, the Apache mod_rewrite bug is avoided.
Unfortunately, this makes the code at least twice as long as it would be without the bug, but I haven't found a better way to work around it.
I have provided a "plug-in" for doing uppercase-to-lowercase conversion in .htaccess. I call it a 'plugin' because I structured it so that it can be easily added or removed with minimum impact on the other code. I *do not* suggest including or using this case-conversion code unless it is absolutely necessary to correct a pre-existing or emerging problem; It is potentially a very-slow, high-CPU-load routine because it will invoke a restart of all mod_rewrite processing if more than one instance of any given capital letter appears in the requested URL. As such, you should take all steps possible to avoid depending on it for any purpose other than to correct inbound links from other sites which are non-responsive to requests for link correction and are completely out of your control. It should certainly not be used to "allow" you to use mixed-case URLs on your own pages; The result is almost certain to be an overloaded server if your site is even moderately popular.
I've tried to be specific in the description of the individual routines. Some of them may not be useable on your site. For example, the "Fix additional directory paths appended to filenames" routine cannot be used as-is on sites which have periods in directory paths. It would have to be re-coded or removed for use on such a site.
This code came off a live server, and has therefore been fairly-thoroughly tested.
If you use this code, JdMorgan will appreciate it if you'd attribute it to him.
Source