If you own a website you need to understand how data collecting
companies find and explore your website. Once you understand the basics
you can easily master tools to see who is visiting your site and how to
stop malicious robots from hacking into your site.
It is also important to understand what robots are so that you can create meta data and optimize your website to keep robots coming back and re-indexing your site.
It is also important to understand what robots are so that you can create meta data and optimize your website to keep robots coming back and re-indexing your site.
Spiders, ants, and worms, oh my! What are they and what do they do?
Robots is a catch-all, or generic term for programs and automated scripts that “crawl” through the web (the Internet) and collect data from websites and anything else on the Internet that they can find.- Spiders: Spiders are the same thing as robots and the terms can be used interchangeably.
- Worms: Worms are robots but are distinguished because worms are replicating programs (unlike other robots).
- Web crawlers: Web crawlers (or, webcrawlers) are also the same as robots. Note: WebCrawler is a specific robot.
- WebAnts (web ants): Webants are cooperating robots that share and distribute information. Ants that find information share their data so other ants do not have to explore the same file.
- Hackers: To find ways in attack or take over your website. A great way to have your website hacked is to use out-of-date software and web applications that may have security breaches. Hackers often use worm robots.
- Spammers: Mass marketers and spammers look for email addresses on websites, forums, blogs, social networks, etc. using robots.
- Search Engines: To crawl your website to collect data and index it. Search engines look at meta data and content as well as other information about webpages using robot technology.
- Webmasters: To study competing sites and their own site. Webmasters also study logs of who is crawling their site and how often to fine tune SEO and robot.tx files (a file that allows, restricts, or forbids robots from crawling your site). Every website should have a robot.txt file!!
No comments:
Post a Comment