In order to make sense of and appropriately rank the web pages of the web, you first have to find out what's out there. To accomplish this goal, search engines use robots. Robots are computer programs that spend their time fetching pages on the web and either storing them or processing them on the fly. Whether a robot fetches and stores or just processes on the fly depends on two factors; the scale of the required crawl (crawling is the act of a robot fetching pages on the web) and the complexity of the processing.
The scale of the crawl has an effect because a large crawl will often need to be run in parallel in order to complete it in a reasonable amount of time. If this is the case, you'll want your crawler to be as fast as possible, offloading the processing of pages to other computers.
Processing the pages your robot gathers is where the magic starts to happen. The web pages are broken down into elementary components. Each element that a search engine deems important can be isolates; title, links, link text, URL, etc... From this data, a search engine can begin to determine how a page will rank for a particular search term or phrase.