2009年10月2日 星期五

Optimize your URLs , Best practices for crawling & indexing

Agenda

1.Context: Why do you care?
2.Reduce inefficient crawling of your site
 1.Avoid maverick coding practices
 2.Remove user-specific details from URLs
 3.Optimize dynamic URLs
 4.Rein in infinite spaces
 5.Disallow actions Googlebot can’t perform
2.Get your preferred URLs indexed
3.Resources


Why do you care?
The Internet from > 10^6feet

How do search engines deal with this?

Focus on efficiency in below steps:
1.Discover unique content
2.Prioritize crawling
 1.Crawl new content
 2.Refresh old content
 3.Crawl fewer duplicates
3.Keep all the good stuff in the index
4.Return relevant search results

Funnel your crawling “budget” toward your most important content


Avoid maverick coding practices
Discourage alternative encodings
shop.example.com/items/Periods-Styles__end-table_W0QQ_catrefZ1QQ_dmptZAntiquesQ5fFurnitureQQ_flnZ1QQ_npmvZ3QQ_sacatZ100927QQ_trksidZp3286Q2ec0Q2em282
Where [W0 = ?] and [QQ= &]

Eliminate expand/collapse "parameters"
www.example.com/ABN/GPC.nsf/MCList?OpenAgent&expand=1,3,15

Remove user-specific details from URLs
Remove from the URL path
www.example.com/cancun+hotel+zone-hotels-1-23-a7a14a13a4a23.html
www.example.com/ikhgqzf20amswbqg1srbrh55/index.aspx?tpr=4&act=ela
Creates infinite URLs to crawl

Difficult to understand algorithmically
Keywords in name/value pairs are just as good as in the path
www.example.com/skates/riedell/carrera/
www.example.com/skates.php?brand=riedell&model=carrera

Optimize dynamic URLs
Dynamic URLs contain name/value pairs
skates.php?size=6&brand=riedell

Create patterns for crawlers to understand
www.example.com/article.php?category=1&article=3&sid=123
www.example.com/article.php?category=1&article=3&sid=456
www.example.com/article.php?category=2&article=3&sid=789

Use cookies to hide user-specific details
www.example.com/skates.php?query=riedell+she+devil&id=9823576
www.example.com/skates.php?ref=www.fastgirlskates.com&color=red

Rein in infinite spaces
Uncover issues in CMS
www.example.com/wiki/index.php?title=Special:Ipblocklist&limit=250&offset=423780&ip=

Disallow actions Googlebot can’t perform
Googlebot is too cheap to ‘Add to cart’
Disallow shopping carts
http://www.example.com/index.php?page=EComm.AddToCart&Pid=3301674647606&returnTo=L2luZGV4LnBocD9wYWdlPUVDb21tLlByb2QmUGlkPTMzMDE2NzQ2NDc2OTI=

Googlebot is too shy to ‘Contact us’
Disallow contact forms, especially if they have unique URLs
http://www.example.com/bb/posting.zsp?mode=newtopic&f=2&subject=Seeking%20information%20about%20roller%20derby%20training

Googlebot forgets his password a lot
Disallow login pages
https://www.example.com/login.asp?er=43d9257de47d8b08a91069cccb584ab83ff21140bd46e81656dab3507f45d1ab079cd77244231e557d724dc1df1a641

Get your preferred URLs indexed
Set your preferred domain in Webmaster Tools
www.example.com vs. example.com

Put canonical URLs in your Sitemap

Use the new rel=“canonical” on any duplicate URLs


Get feedback in Webmaster Tools

摘自:http://docs.google.com/present/view?id=dgk2ft62_18cvjx4nk4

沒有留言:

wibiya widget