Getting into search engines' listings
Search engine crawlers may look at a number of different factors when crawling a site, and many pages from a site may not be indexed by the search engines until they gain more pagerank or links or traffic. Distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled, as well as other importance metrics. Cho et al. (Cho et al., 1998) described some standards for those decisons as to which pages are visited and sent by a crawler to be included in a search engine's index.
Webmasters can instruct spiders to not index certain files or directories through the standard robots.txt file in the root directory of the domain. Standard practice requires a search engine to check this file upon visiting the domain, though a search engine crawler will keep a cached copy of this file as it visits the pages of a site, and may not update that copy as quickly as a webmaster does. The web developer can use this feature to prevent pages such as shopping carts or other dynamic, user-specific content from appearing in search engine results, as well as keeping spiders from endless loops and other spider traps.
For those search engines who have their own paid submission (like Yahoo!), it may save some time to pay a nominal fee for submission. Yahoo!'s paid submission program guarantees inclusion in their search results, but does not guarantee specific ranking within the search results.