1.Context: Why do you care?
2.Reduce inefficient crawling of your site
1.Avoid maverick coding practices
2.Remove user-specific details from URLs
3.Optimize dynamic URLs
4.Rein in infinite spaces
5.Disallow actions Googlebot can’t perform
2.Get your preferred URLs indexed
3.Resources
Why do you care?
The Internet from > 10^6feet
How do search engines deal with this?
Focus on efficiency in below steps:
1.Discover unique content
2.Prioritize crawling
1.Crawl new content
2.Refresh old content
3.Crawl fewer duplicates
3.Keep all the good stuff in the index
4.Return relevant search results
Funnel your crawling “budget” toward your most important content
Avoid maverick coding practices
Discourage alternative encodings
shop.example.com/items/Periods-Styles__end-table_W0QQ_catrefZ1QQ_dmptZAntiquesQ5fFurnitureQQ_flnZ1QQ_npmvZ3QQ_sacatZ100927QQ_trksidZp3286Q2ec0Q2em282
Where [W0 = ?] and [QQ= &]
Eliminate expand/collapse "parameters"
www.example.com/ABN/GPC.nsf/MCList?OpenAgent&expand=1,3,15
Remove user-specific details from URLs
Remove from the URL path
www.example.com/cancun+hotel+zone-hotels-1-23-a7a14a13a4a23.html
www.example.com/ikhgqzf20amswbqg1srbrh55/index.aspx?tpr=4&act=ela
Creates infinite URLs to crawl
Difficult to understand algorithmically
Keywords in name/value pairs are just as good as in the path
www.example.com/skates/riedell/carrera/
www.example.com/skates.php?brand=riedell&model=carrera
Optimize dynamic URLs
Dynamic URLs contain name/value pairs
skates.php?size=6&brand=riedell
Create patterns for crawlers to understand
www.example.com/article.php?category=1&article=3&sid=123
www.example.com/article.php?category=1&article=3&sid=456
www.example.com/article.php?category=2&article=3&sid=789
Use cookies to hide user-specific details
www.example.com/skates.php?query=riedell+she+devil&id=9823576
www.example.com/skates.php?ref=www.fastgirlskates.com&color=red
Rein in infinite spaces
Uncover issues in CMS
www.example.com/wiki/index.php?title=Special:Ipblocklist&limit=250&offset=423780&ip=
Disallow actions Googlebot can’t perform
Googlebot is too cheap to ‘Add to cart’
Disallow shopping carts
http://www.example.com/index.php?page=EComm.AddToCart&Pid=3301674647606&returnTo=L2luZGV4LnBocD9wYWdlPUVDb21tLlByb2QmUGlkPTMzMDE2NzQ2NDc2OTI=
Googlebot is too shy to ‘Contact us’
Disallow contact forms, especially if they have unique URLs
http://www.example.com/bb/posting.zsp?mode=newtopic&f=2&subject=Seeking%20information%20about%20roller%20derby%20training
Googlebot forgets his password a lot
Disallow login pages
https://www.example.com/login.asp?er=43d9257de47d8b08a91069cccb584ab83ff21140bd46e81656dab3507f45d1ab079cd77244231e557d724dc1df1a641
Get your preferred URLs indexed
Set your preferred domain in Webmaster Tools
www.example.com vs. example.com
Put canonical URLs in your Sitemap
Use the new rel=“canonical” on any duplicate URLs
Get feedback in Webmaster Tools
摘自:http://docs.google.com/present/view?id=dgk2ft62_18cvjx4nk4
沒有留言:
張貼留言