While at TechEd, I blogged a bit about the cool features I was seeing in the Sharepoint Search functionality. I’ve finally managed to find a bit of extra time to write more about that to help bring the exciting details to a wider audience.
Like most search providers, Sharepoint Search crawls content, creates an index based on the results of crawls, and then users search against the index.
Any given Sharepoint site is configured with a single Shared Service Provider (SSP). This SSP determines a few architecturally-oriented configuration settings, including what Search index end users access when searching from that Sharepoint site. So any given Sharepoint server might have many SSPs, and many different search indexes. And in contrast, many Sharepoint sites across many Sharepoint servers could share the same SSP and Search index. So from a Sharepoint architectural design there’s a lot of flexibility for which underlying index is used.
OK, so admittedly that’s not very exciting. I’ll get to the exciting stuff now …
So the effectiveness of any search offering depends on how relevant the results it returns are. Sharepoint has a rich ability to calculate relevance. The
factors it supports are:
- Title and filename
- Density of search term (e.g. 10 mentions in a 2 page doc vs. 10 mentions in a 100 page doc)
- Keywords (i.e. terms with special meaning to an organization that have special behavior associated with them. This is closely tied to the best bets feature, and additionally you can provide synonyms for keywords that broaden the results returned and the likelihood of the keyword being triggered)
- Best bets (i.e. results that have been manually tagged as a “best bet”)
- Security (i.e. users only see results they inpidually have permissions to see)
- Hyperlink click distance (number of “clicks” from an authoritative site)
- HTML anchor text (that’s the text of hyperlinks)
- URL depth (how nested within a website directory structure is it?)
- URL text matching
- Document Title (office docs only)
- De-duplication of results (no duplicate results returned)
- Language of choice (as determined by browser language)
- Search scopes (definable subsets of all the index)
Another key element in what is returned in a search is what kinds of sources can be crawled. Sharepoint Search supports a persity of sources:
- Sharepoint sites
- SMB (i.e. Windows) file shares
- Exchange public folders
- Non-Sharepoint websites
- Active Directory or any LDAP directory
- Sharepoint profile databases
- Web applications
Obviously, there is a ton of value here by being able to search more than just web-based sources.
There are some details under the hood here (which I freely admit I don’t fully understand yet) with respect to secure sources that require authentication/authorization. You can specify crawler credentials for each source, but I’m not sure I understand how that security is respected.
So Sharepoint Search gives you the ability to move beyond just web searching, and it gives you a bunch of knobs and buttons to help make results more relevant.
Let’s look a bit more about one of those knobs. Search scopes are a way to define a more limited set of the index to search against. Assuming you define relevant scopes, this improves the relevance of search results. In an UW enterprise Sharepoint Search offering you might imagine scopes that are targeted to specific kinds of content (via metadata or filename), to specific disciplines (via metadata, sources, or URL location), to specific departmental sources (via source). Nice feature.
So as a user, you have the ability to save searches. And optionally you can configure alerting on those saved searches. Which means the user would be emailed when the search results change (and I’ll admit the implications of this feature scare me). You can also optionally choose to save the search as an RSS feed. Both these optional features have the effect of turning search from a pull to a push mechanism, which is very nice.
From within Office, for example Word, you can also issue a search against Sharepoint Search. You right click on a word, choose “Look Up” and assuming you’ve configured the search providers within Word to point at the Sharepoint search provider, it’ll work.
From a Search service provider perspective, there are a number of nice features. For example, Sharepoint provides usage reporting which can help you tune the various factors noted above to make the search service more relevant. Typical canned usage reports that might be helpful are:
- search result destination pages
- queries with zero results
- most-clicked best bets
- queries with zero best bets
- queries with low click-through
- top query origin site collections over the previous X days