Microsoft justAnnounceOpen source is a key algorithm behind Bing search ——SPTAG, which enables Bing to quickly return search results to users.
Only a few years ago, web search was simple, the user entered a few keywords and browsed the results page. Today, the same user may take a photo on the phone and put it in the search box, or use the smart assistant to ask questions without having to touch the device yourself. They may also enter a question and expect an actual response instead of a list of possible answers.
SPTAG (Space Partition Tree And Graph) is a distributed approximation nearest neighbor search (ANN) library that provides high quality vector index construction, search and distributed online service toolkits for large-scale vector search scenarios. Using the SPTAG algorithm as the core of the open source Python library, Bing can search for billions of messages in milliseconds.
Of course, vector search itself is not a new idea, what Microsoft is doing is applying this concept to deep learning models.
First, the team uses a pre-trained model and encodes the data into a vector, where each vector represents a word or pixel. Then use the new SPTAG library to generate a vector index. As the query progresses, the deep learning model converts the text or image into a vector, which in turn can find the most relevant vector in the index.
Microsoft said the SPTAG library has so far cataloged more than 150 billion pieces of data, including individual words, characters, web snippet and full queries.
“Bing processes billions of documents every day. The idea now is to represent these items as vectors and search for a huge index of more than 100 billion vectors to find the most relevant results in 5 milliseconds.”
The Bing team expects open source SPTAG to be used to build applications that recognize language based on audio clips, or for users to take photos of plants and identify genus and species.
The libraryIt is now open for use and provides all the tools to build and search these distributed vector indexes.