Methods to Get External Data from Sharepoint

This content was original posted on the sharepoint_tech list, and has been slightly edited for repost here. There is some new content in the asp.net bullet (last bullet).

Developing applications within sharepoint is largely dependent on getting access to data which isn’t natively within the sharepoint DB to begin with. But accessing data from within Sharepoint isn’t just for web developers anymore. End users can easily make use of external data from within Sharepoint. We’ll explore that later.

Getting access to external data from within Sharepoint is supported via a variety of options, including:

  • Manual import into a sharepoint list. This method assumes that sharepoint will become the authoritative source of this data (unless you manually synchronize). Excel import, Access import, and several other methods can accomplish this. This is a very end user accessible method, but may not be ideal if the data already exists in a sql database. Basically this is good for smaller amounts of data and for adhoc cases.
  • Business Data Catalog (BDC). The BDC is a collection of external data source connections, with definitions of the type of data within those external sources. These connections are defined centrally (this is one of the shared services that many farms can reuse), and then available to every Sharepoint site. By “pre-defining” the data connections once, the external data is widely useful to many people. The BDC allows end users to easily incorporate data into their sharepoint site without requiring those end users to know anything about the external data connection, or even the query language for that data source. Farm administrators setup the BDC connections, but end users get the benefit. This is good for data which has wide usefulness. External sources defined in the BDC can optionally be crawled by the Sharepoint search engine.
  • Define database connections within Sharepoint Designer. These connections can be leveraged with views, tables, and other data controls on that single webpage. This approach requires some technical knowledge of the database, the connection specifics, and the relevant query statement (usually tsql). This might be a common approach for web applications by web developers. This approach doesn’t assume any special privileges with respect to the Sharepoint server (more on this below).
  • Define asp.net database connections with Visual Studio. The asp.net framework has been commonly available for quite awhile to create connections to external data sources. So it’s no surprise that it can also be used within Sharepoint along with asp.net data controls (e.g. gridview) in highly customized ways. This approach requires developer-level knowledge, and would only be required for web applications with very unique requirements. This approach does assume some special privileges with respect to the Sharepoint server. Specifically, this approach requires code-behind, and all pages with code-behind fall into a special category in terms of how they are provisioned. This is because these pages don’t actually live within the Sharepoint DB, but instead live on the file system itself. On a shared Sharepoint server, there may be change management procedures behind deploying these pages.

Obviously there are many points of accessibility here with  options for users of varying sophistication. The bottom line is that Sharepoint makes it very easy to get at external data and use within a web context.

Basic Sharepoint List Features

Some of this content has been previously posted on the sharepoint_tech list, but much of it hasn’t.

Everything within Sharepoint is either a list or an item in a list. In some cases, you’ll see a different name, e.g. document library or posts or alerts, but you’ll notice that all of these are simply lists with a special name.

Because everything in Sharepoint is a list, knowing the common set of list functionality is important. And that’s what this post is all about. Knowing this information will provide some basic understanding which you can build upon. Take note that all of this functionality is available in both WSS (the version of sharepoint which comes free with the server OS) and MOSS.

Every list has:

  • “Permitted” content types
  • Columns (or site columns)
  • Views
  • Permissions
  • Workflow (optional)
  • Versioning (optional)

Content Types

Content types in particular are interesting, providing the basis for a managed experience. So for example, with a custom content type, you might:

  • Associate a word template with a document libary so anyone else can re-use your template to create new documents of that type
  • Associate workflows with a custom content type
  • Associate information management policy with a custom content type, e.g. define a retention policy for items of that type
  • Have managed metadata from within sharepoint. With managed metadata you might enable better search relevance, have a better ability to manage items within the list, plug metadata values into content regions within a template-like content type, and so on
  • Re-use defined content types across sharepoint sites

Think of content types as the schema of your lists. They define what types of data items are possible in any given list. There are many default content types which are generally useful (about 20 by default). For example, all Office documents are the document Sharepoint content type. You can build new content types from existing content types. When you’ve enabled a content type on a list, the result shows up for users in the New menu as a new type of item they can create.

Columns

Each item in a list can have many pieces of data associated with it. These pieces of data are called columns, and you can add additional columns to any list.  Those additional columns can represent whatever data you’d like to capture.

For any column in a sharepoint list, you can “normalize” the values for a column–meaning require the values to follow a set of known/controlled/approved values–where normalize means the values must be from a list of values. This requires that the approved values are either in another Sharepoint list within that sharepoint site or are in a connected external data source. To do this, when you add a column you use the lookup type to link the values to the other sharepoint list. This results in a dropdown menu for the new column for each item in the list. This kind of functionality makes categorizing things in lists easy w/o any need for coding.

Views

Each list has some default views defined, e.g. the ‘All items’ view which displays all items in the list. Alternatively, you can define custom views which target any of the properties of items, including content types and columns.

Permissions

Each list has permissions. If someone has no ability to read a list, they won’t see the list at all from parent sites. Sharepoint 2007 also supports per-item permissions, so for example, you can permit something at the list level, but not permit it for a specific item.

Workflow

With workflow you can create actions which happen when items are created, deleted, changed, etc. Out of the box workflows include: Approval, Collect Feedback, Collect Signatures, Three State. The out-of-the-box workflows are pretty useful, but you can also create custom workflows (with Sharepoint Designer) without a lot of hassle.

Versioning

Versioning provides content management. Changes to items are incremented as minor versions. List content owners control major versions, and only major versions are “published” for general viewing. This feature is much more extensive, so we’ll leave the description limited for now, and perhaps come back to it in another post.

Other List Features

Every list within sharepoint can be “exposed” via a webpart from another page. This respects the original list’s security, but allows you to manage the user experience for that list. By exposing a list via a webpart you can remove portions of functionality (say the ability to edit items), use a view to change what is exposed, target audiences (meaning that the webpart only is displayed for certain viewers), etc. Dashboards, personal sites, the ‘all site content’ page are examples of this ‘exposure via a webpart’ functionality.

This webpart exposure makes it very easy to design a portal like experience with Sharepoint. End users often comment that they didn’t realize that working with web parts would be so easy. It’s not much more complicated than click and drag.

The Action Menu

Every sharepoint list has an action menu which exposes list functionality. The following are standard list actions:

  • Export to Spreadsheet. Put the list into an Excel Spreadsheet, opening Excel on the client computer.
  • Open in Access. Put the list in an Access database.
  • View RSS feed. See the raw RSS feed. Every list is RSS-enabled by default, allowing you to change the flow of web-oriented data from a pull-based model to a push-based model. This is a very significant change, especially for collaborative lists with critical information.
  • Alert Me. This allows you request an email when things change. There are some configuration options here. Again, this enables you to turn the information flow into a push-based model.

Special List-Specific Actions

Some lists have special actions available to them. For example, a Document Library has the ‘Connect to Outlook’ option, which allows you to take documents offline for viewing and editing.

Another interesting specific action for a Document Library is the ‘send to’ action. On any item you can right-click and choose to ‘send to other location’, specifying another Sharepoint site (doesn’t have to be on same server). This creates a “live link” in that other sharepoint site. This option can/should be used to give read-only access on another site. “Live link” means that the source can optionally notify the other sites of updates to the source doc which enables intelligent document re-use.

There are other basic list features and functionality not covered here. Send in your favorite candidate and I’ll add it here.

Sharepoint: A Torrent of Info & The Big Picture

Over the past week, I’ve been posting like mad to the sharepoint_tech mailman list. If you aren’t subscribed, you should consider getting on that list. But don’t worry about missed content, because I’m going to re-post all that stuff here (sorry about duplication, but I think it makes sense for the content to be re-accessible). So expect a torrent of sharepoint related info here soon.

With a bit of my time opening up with the advent of the Nebula domain migration put on hold, I’ve been really digging into understanding Sharepoint 2007. Like everyone else, I’m interested in learning what the buzz is about, but I also bring a focus which is very architecture-centric–the “big picture” if you will. The funny thing about such a “big picture” focus is that it requires both a breadth and depth of technical detail. This is really hard to get with something like Sharepoint, where the feature set is so large. But for the first time, I’m starting to get a sense (dare I call it a vision?) for what the Sharepoint architecture might look like here at the UW.

Fundamentals

So there are several fundamentals you have to understand about Sharepoint :

  • Almost everything in Sharepoint is stored within a SQL database. There are some exceptions and it’s the exceptions which often drive architecture.
  • Almost everything in Sharepoint is a list or an object in a list. The exceptions here aren’t significant.
  • There are some fundamental, basic building block, kind of objects which define what most users see and experience. Those objects are:
    • Content types. Most everything revolves around content types. If you’re like me last week, you probably don’t know what they are. Content types are the schema component within the data architecture of Sharepoint. So every “item” within every list in sharepoint has a content type, e.g. document, calender item, contact, etc. A large set of the “global” functionality within Sharepoint is targeted at content types. For example, data retention policies are best targeted at content types. Workflows are best targeted at content types. Reuse of a custom list item on other sites requires that you create a content type. So this is one of the basic building blocks within Sharepoint.
    • Site templates. This defines what master page, webparts, content types are available by default initially for a new site. You can create new site templates, and you can modify a site based on a site template to be completely different from the initial default state.
    • Master pages. These define a template for the pages which are associated with them. They include “content regions” and webparts where you plug in your content. Master pages support styles, aka cascading style sheets (CSS). This is how you’d “brand” the look and feel of web pages.
    • Web parts. Web parts are what define most of the core cool functionality users see within Sharepoint. Webparts actively do something, resulting in content which is displayed on the webpage.
  • Shared Services. There is a subset of Sharepoint functionality which can be shared between Sharepoint farms/servers. The components of Shared Services are:
    • User profiles and Personal sites (My Sites)
    • Search (i.e. search indexes)
    • Audience definition (to facilitate targeting content dependent on membership in an audience)
    • Excel Services (browser based, server processed Excel features. i.e. clients don’t need Excel; content authors do)
    • Business Data Catalog (this is hard to explain, and I’ll likely blog about it separately)

Big Picture proposal

So getting back to the big picture, I foresee a central instance of MOSS, which primarily provides two Sharepoint features for everyone:

  • Personal sites & profiles
  • Search indexing

Nothing else. I’d call this the UW Sharepoint infrastructure.

For all the other functionality, I foresee hosted instances of Sharepoint which consume the shared services of the UW Sharepoint infrastructure. So that’d mean that departments could bring up a sharepoint server and offer whatever combination of features were appropriate for their departments, while getting the benefit of broader UW search base and centrally-provided personal sites/profiles. So a hybrid approach which maximizes where it most makes sense to do the work.

Additionally, I think a centrally *hosted* instance of MOSS is needed to provide the full complement of features for those departments which would rather not run and administrate their own sharepoint server. In some way, this could be scoped to fit into a cost-recovery sort of arrangement.

Finally, given the Sharepoint licensing implications for sites with a public audience (i.e. anonymous access), I think a centrally hosted instance of WSS is also needed. For example, this public blog might go onto that server. Other examples might include public departmental websites. The cost for us to get a internet connector license on MOSS is slightly more than $7K, so it’s definitely cheaper to just buy another server ($3-4K) and put WSS on it.

There’s been some talk about taxonomy and governance with respect to Sharepoint here at the UW. There are some good resources out on the web on this topic (I’ll post them in subsequent posts). One of the beautiful things about what I’m proposing here is that a large part of the painful process of defining the specifics of taxonomy and governance university-wide are side-stepped. It doesn’t all go away, e.g. we still want/need a consistent set of vocabulary that everyone uses and there is still value in defining common roles/responsibilities.

I’d be interested in hearing what folks think of this. Questions welcome. 🙂

Sharepoint Search Functionality

While at TechEd, I blogged a bit about the cool features I was seeing in the Sharepoint Search functionality. I’ve finally managed to find a bit of extra time to write more about that to help bring the exciting details to a wider audience.

Like most search providers, Sharepoint Search crawls content, creates an index based on the results of crawls, and then users search against the index.

Any given Sharepoint site is configured with a single Shared Service Provider (SSP). This SSP determines a few architecturally-oriented configuration settings, including what Search index end users access when searching from that Sharepoint site. So any given Sharepoint server might have many SSPs, and many different search indexes. And in contrast, many Sharepoint sites across many Sharepoint servers could share the same SSP and Search index. So from a Sharepoint architectural design there’s a lot of flexibility for which underlying index is used.

OK, so admittedly that’s not very exciting. I’ll get to the exciting stuff now …

The HOW

So the effectiveness of any search offering depends on how relevant the results it returns are. Sharepoint has a rich ability to calculate relevance. The

factors it supports are:

  • Title and filename
  • Metadata
  • Density of search term (e.g. 10 mentions in a 2 page doc vs. 10 mentions in a 100 page doc)
  • Keywords (i.e. terms with special meaning to an organization that have special behavior associated with them. This is closely tied to the best bets feature, and additionally you can provide synonyms for keywords that broaden the results returned and the likelihood of the keyword being triggered)
  • Best bets (i.e. results that have been manually tagged as a “best bet”)
  • Security (i.e. users only see results they inpidually have permissions to see)
  • Hyperlink click distance (number of “clicks” from an authoritative site)
  • HTML anchor text (that’s the text of hyperlinks)
  • URL depth (how nested within a website directory structure is it?)
  • URL text matching
  • Document Title (office docs only)
  • De-duplication of results (no duplicate results returned)
  • Language of choice (as determined by browser language)
  • Search scopes (definable subsets of all the index)

The WHAT

Another key element in what is returned in a search is what kinds of sources can be crawled. Sharepoint Search supports a persity of sources:

  • Sharepoint sites
  • SMB (i.e. Windows) file shares
  • Exchange public folders
  • Non-Sharepoint websites
  • Active Directory or any LDAP directory
  • Sharepoint profile databases
  • Databases
  • Web applications

Obviously, there is a ton of value here by being able to search more than just web-based sources.

There are some details under the hood here (which I freely admit I don’t fully understand yet) with respect to secure sources that require authentication/authorization. You can specify crawler credentials for each source, but I’m not sure I understand how that security is respected.

The Experience

So Sharepoint Search gives you the ability to move beyond just web searching, and it gives you a bunch of knobs and buttons to help make results more relevant.

Let’s look a bit more about one of those knobs. Search scopes are a way to define a more limited set of the index to search against. Assuming you define relevant scopes, this improves the relevance of search results. In an UW enterprise Sharepoint Search offering you might imagine scopes that are targeted to specific kinds of content (via metadata or filename), to specific disciplines (via metadata, sources, or URL location), to specific departmental sources (via source). Nice feature.

So as a user, you have the ability to save searches. And optionally you can configure alerting on those saved searches. Which means the user would be emailed when the search results change (and I’ll admit the implications of this feature scare me). You can also optionally choose to save the search as an RSS feed. Both these optional features have the effect of turning search from a pull to a push mechanism, which is very nice.

From within Office, for example Word, you can also issue a search against Sharepoint Search. You right click on a word, choose “Look Up” and assuming you’ve configured the search providers within Word to point at the Sharepoint search provider, it’ll work.

From a Search service provider perspective, there are a number of nice features. For example, Sharepoint provides usage reporting which can help you tune the various factors noted above to make the search service more relevant. Typical canned usage reports that might be helpful are:

  • search result destination pages
  • queries with zero results
  • most-clicked best bets
  • queries with zero best bets
  • queries with low click-through
  • top query origin site collections over the previous X days