by Escalera Technologies
Slidescope is the best training / coaching institute for Asp .Net Training in Lucknow. Slidescope is associated with Escalera, it was founded in year 2009 with an aim to provide professional training to Engineering and Computer Application Graduates and Post Graduates. Escalera Technologies has it’s center in Lucknow and provides online coaching on Asp.net with C#(Sharp) for students of Lucknow and nearby areas.
Why You Should Chose Us ?
We provide assistance in placement. We are associated with many software development organizations for internship and placement. Many Students of Slidescope are placed in IT consulting Organizations.
We teach UI designing for desktop and mobile versions of website as well.
Students will learn the concept of Server Side and Client Side Scripting.
Contact Slidescope Call Now +91 – 8604000569
A High Level Architecture of Google Search will be discussed in this post.
Google Searches work with the help of Web Crawlers.
Web Crawling is the process of downloading the web pages.
Downloading of web pages (web crawling) is not done by single crawler. It is done by various distributed crawlers.
It starts with URL Server that is there to send the list of all URLs that are needed to be fetched.
Fetched web pages are forwarded to Storeserver.
There is a repository to store the compressed web pages.
Note that a every webpage fetched has a url and a unique ID which is assigned to it.
It is known as docID.
Indexer box shown in the image does the process of indexing and sorting.
Indexer has some set of roles – Reading Repository, Uncompressing documents and parsing them.
All the documents are later converted into hits – Set of Word Occurrences.
Hits are there to fetch details like word and its position, font size capitalization etc.
Indexer distributes the hits into barrels. Here a partially sorted forward index is formed.
Indexer performs an important function of parsing every link in the web page and identifying the information hidden in anchor tags.
Indexer file has information to understand about the referring pages of the links and the text of links.
We can see that information from Anchors box is being forwarded to URL Resolver.
Url Resolver is a reader of anchor files and is a converter of Relative URL into Absolute URLs.
This gives the unique docID. Anchor text is not put into the forward index that is associated with the docID to which anchor is pointing.
A database of links (pair of docIDs) is generated, this Database of link is used to determine the PageRanks of documents.
Barrels are taken by Sorter. Barrels are sorted by docID. In order to generate the word index the Sorter resorts the barrels by wordID.
There is a program called DumpLexicon. List of wordIDs and the lexicon (a kind of vocabulary) that are produced by indexer are taken together by the DumpLexicon and a new lexicon to be used by the Searcher is produced.
A web server is used to run Searcher. The lexicon built by DumpLexicon, the inverted index, and the PageRank of documents are used together to answer the search queries by users.
This is how Architecture of Google Search works.
Reference of Article –