Rewriting the Technological Scenario

Chapter 40 Search Engine Algorithm

Near the afternoon, Meng Qian has arrived in Shanghai, Pudong, this is the first time he is reborn, and he often comes from him.

As a financial center in Huaxia, Shanghai is a business card to the world.

However, Meng Qian came to Shanghai for the first time in Shanghai, in 2000, he has never seen it in Shanghai.

At this time, Pudong has begun to stand up high-rise buildings, and there is a large factory building and shantytowns. The car will open, and you can see that many places are demolition and transformation.

"Zhang Zhong is preparing to put the branch in Pudong?" After the destination, Meng Qian borrowed the memory contrast, if you didn't read a wrong, you should be a Zhangjiang High-tech Park.

In the four major key development regions of Pudong, the world's relatively familiarity should be that the financial center Lujiazui and the Science and Technology Center Zhangjiang.

In 2000, Zhang Jiang, leading industry is circuit, software, and biomedicine.

Zhang Shu new nodded, "Now the most developed potential in the south, there is no doubt that Shenzhen and Shanghai Pudong, and Zhangjiang High-tech Park is the hatching of technology."

At this time, everyone said that the development potential of the southern cities, especially the development of science and technology, and no one will think of Hangzhou.

When I came to Zhang Tao's new rent, there were five men who were waiting there, two of them were foreign men.

Zhang Shuxin made a introduction. The two foreign men came from IBM, a from Google, I would like to have dug, or it is intended to dig, and the two are the engine project group.

In the other three Chinese people, one is the technical director of Haiwei, and the other two are back from Silicon Valley. A graduated from Stanford University, once served in Intel, one person graduated from Harvard, once served in Oracle, is talent.

Simplely greeted, everyone is sitting in the conference room, next is Meng Qian's performance time, today, to show his engine core technology.

This thing is needed to use network reptile technology, retrieving sorting technology, web processing technology, big data processing technology, natural language processing technology, etc., of course, in 2000, it is not used in natural language processing technology, big data processing The concept of later generation is not the same.

But it is simple, in fact, the core is a thing, algorithm.

Because each technology is inseparable from the algorithm.

"I don't know how to build trees and understanding in the engine. I can only follow my rhythm. If you have any questions, I can interrupt me at any time." Meng Qian took directly to the topic before the blackboard.

"Before I show my core technology, let's take a look at the top three mainstream algorithms, whiteness hyperlink analysis, Google's PageRank algorithm and IBM's HITS algorithm.

Almost everyone thinks that the ultra-chain analysis of the whiteness is the most backward of the three algorithms, but some things we have to take a look at the perspective of the whiteness, and the ultra-chain analysis of whiteness can be laid to some extent. The development foundation of the engine.

Some sounds say that Google is actually plagiarizing the chain algorithm of whiteness. After all, Li Yanhong is really in Google, we don't guess the true and false now, but this statement reflects a very important signal, in fact, no matter which home Algorithm, the algorithm base is actually the same.

Grab the web page information, then use some mechanism to sort these pages, when the user enters the keyword, the web page that is arranged according to the mechanism is matched according to the keyword.

So where is white? The key is that the more simplicity is too simple to be based on all the results pointed to by other pages in a certain result, the higher the value of this basis.

In contrast, Google's PageRank has more important things. The first thing is that the link to the A page to the B page is explained as a voting behavior to b, Google is here at the same time. Level to form a new level.

That is, every page has a PR value, and your PR value will become a reference for other pages PR values.

Then constantly repeat the PR of each page. It is assumed that a random PR value is given, then the PR value of these pages tends to be stable, which is the state of convergence.

As for HITS, its theoretical basis still has not changed, and its largest feature or change is in its average distribution weight of the PageRank algorithm does not meet the actual situation of the link.

So the HITS algorithm introduces another web page, called the HUB page, the HUB page is a web page that provides a collection of authoritative web links.

So the result of using HITS will be more authoritative than the other two, but this algorithm will greatly increase the calculation burden, right? "

Meng Qian looked at the buddies from IBM, and the other person stunned nodded in unclear.

So now it is now simple to summarize, the engine's algorithm is the ultra-chain analysis, and the advantages of the algorithm are how to make the results more reference value and allow the user to get more effective information.

Of course, if you can directly understand the user's needs and help him most of the most desired content, this is the ideal engine state, but everyone knows this is impossible.

Therefore, the quality of the engine is under the same keyword, you can get relatively more people to get what they want.

10 users use Google, 5 people find what they want, if they use our engine, 6 people have found their own things, and in the current technical environment in the field here, we are more excellent.

So on the basis of this understanding, I will introduce you to everyone, it is my engine algorithm, dynamic rules hyperlink analysis algorithm.

The dynamic rules hyperlink analysis algorithm has several changes.

First, we mentioned that the good engine is to look at who's feedback under the same keyword. So when the user is in a certain thing, he wants to see from the big probability. The result should be more vertical related to this thing.

For example, the customer is in the car, no matter whether he wants to buy a car, you still want to know the car knowledge, the help of the automotive professional web page to him should be larger.

So in my algorithm, for links to a website, I will first do the vertical rate score, such as there are 10 websites to be a link to A, these 10 websites are all car websites and these 10 websites are not The results of the automotive website must be certainly different.

There is still a small psychological problem, that is, there is a lot of hyperlinks between peers, so there are more vertical website links, its professionalism is definitely more probability that the website is linked to the seven-eight-metric website. Spectrum.

Second, establish a keyword library heat sorting mechanism, and now several engine companies have sorted the webpage, and I have sorted the keywords, and the keyword is very simple, that is, the amount of users.

For example, today's car users are the most, then the car's score may be 10 points. At this time, the algorithm will allocate more resources to the car-related information, to grab more quality webpages.

There are four advantages here, enhance information feedback speed, increase the timeliness of hotspot feedback, saving computer resources, and allows users to use our engine to get useful information.

Third, the user feedback mechanism is to track the clicks and browsing of the user.

Or use the car example, after 100 users car, 80 clicks, the A web page rating will rise, if there are more users of the A web page stay time, A web pages will rise, If there are more users directly link on the A web page, the rating of the A web page will rise.

That is to say, in the entire web rating system, the user feedback is added.

Fourth, regular algorithm, look for probability behavior in all users of users, and feedback these big probability to artificial, such as 60% of the next term of the user's users.

Some of these rules We cannot predict, but we can use the algorithm to make big data mining, and these results of the feedback can be scored for a certain web page for the artificial analysis department. This is labor division.

Combined with the above four points, under my algorithm, any web page will also have a score, I call it a precision.

Factors affecting the precision include their own score, the vertical website score of the link, the user feedback, manual maker, and the external chain effect. "

Thereafter, Meng Qiao tasted the exhibiting the algorithm logic and algorithm deduction formula of each branch.

However, in the process of Meng Qian said in the process of the last regular algorithm, Jereff from IBM suddenly took a shout, "ohmygad! ArtificialIntelligence ?!"

Meng Qian turned his head and saw the other party and frowned.

Jerevon suddenly, thought that Meng Qi did not understand, with a strange pronunciation, "lying trough !!!"

...

With the interruption of Jelf, the other four technicians who are immersed in Meng Qian shared, and the eyes have also changed significantly ...