Overview
I have worked with many different SharePoint architectures for all the different versions of SharePoint and presented quite a few of my own. I have also run into many poorly architected SharePoints, most of the time these SharePoints are the result of:
- Not studying the Microsoft Best Practices before implementation
- Piecemeal tactic of cherry picking individual best practices without considering overall impact
- Employing a bean counter approach to building a SharePoint
- Considering that Best Practices do not apply because of a special customization or consideration
All best practices need to be interpreted with the recognition that they are Best Practices and not Rules, however in a well architected SharePoint, departing from best practices should only be considered when improving performance and user experience. In this blog entry I am summing up some of the best practices and experiences that I have had with SharePoint delivery.
Okay, so where do I go from here
On Microsoft Best Practices site (http://technet.microsoft.com/en-us/office/sharepointserver/bb736746.aspx, which is practically my home page) Microsoft has divided up different best practices based on the type of SharePoint farm that you are deploying. For small deployments the specific guidelines can be followed, but large global enterprise deployments will deploy all of these solutions to a large user base and that requires both a more complex multi-farm, tiered configuration and recognition of how services work together. Microsoft has best practices for:
- Operational Excellence
- Team Collaboration Sites
- My Sites
- Publishing Portals
- Search
- Capacity Management
- Developing Custom Applications
Your typical global enterprise solution will have all these solutions (Publishing Portal, Search, My Sites, Team Collaboration Sites and typically Customized Applications)
Understanding SharePoint Solutions
Single Farm Vs Multiple Farms
There are a number of factors to consider when designing an architecture for a global enterprise that will use SharePoint with all or most SharePoint solutions. Consider the key business drivers behind each solution.
Publishing and Collaboration Portals
Key Business Drivers behind Portals is an always up gateway for internal services. Most organizations would choose availability over performance and typically these are centralized. Microsoft recognizes this and the Microsoft Best Practices is that Portal sites should be optimized for highest availability, with specific limitations on lists and the application pools will be optimized for caching. In addition to this, common sense and experience has taught us that restricting access to SharePoint Designer and 3rd party applications increase up time for Portals.
My Sites
Most companies recognize the power of Social Networking, My Sites is part of this. Another business drive might be to store and share files. Microsoft has a metric for My Sites I have copied below. For global organizations it is optimal for My Sites to be regionally located to improve performance. Natively MySites and Search are all part of the same SSP. With a large number of My Sites this should be separated.(http://technet.microsoft.com/en-us/library/cc262706.aspx) “4. Plan for performance
The increased volume in sites, folder structures, drive storage, and processing that can accompany My Sites (even if they are as small as possible and used only to store a profile picture) can impact the backup and recovery processes and the availability of the rest of the SSP farm. Large organizations with 100,000 employees or more should consider putting My Sites into a separate farm from their Search SSP. We recommend a maximum of 150,000 My Sites per SSP. “
Collaboration Team Sites
Key Business drivers for Collaboration sites are to store documents but also drive business processes. Often SharePoint applications start from dedicated team sites. Unlike Publishing Portals, Collaboration Site Collections will often grow quite large, they are usually team based so they are often function, tower or region specific. Ideally the databases should be separated for best query execution. Typically SharePoint Designer is employed on Collaboration Sites and sometimes third party applications to drive particular business processes. Because of the large volume of content maintenance windows for indexing and backup are even more important so whenever possible these should be deployed regionally.
Search
Enterprise search is often one of the top reasons that customers purchase SharePoint. As such it typically requires the highest availability and solid performance.
Custom Applications
In terms of delivering a highly available deployment, it is possible to combine Portal, My Site and Search solutions into a single farm if the anticipated performance requirement is low enough.. Collaboration Team Sites and Custom Applications should be kept separate and local if possible.
Farm Performance
I often think of a SharePoint Farm as a balanced wheel. When we consider guidelines we have to make sure that performance and capacity is in balance across each of the services in the farm.
The number 1 reason for poor SharePoint performance is latency. Latency can be added in to any component of a SharePoint farm or during communication between components. While designing the Architecture for SharePoint in your organization at all costs avoid having Farms of Unfortunate Proportions.
Web Front End Servers
“Capacity is directly affected by scalability. This section lists the objects that can compose a solution and provides guidelines for acceptable performance for each type of object. Limits data is provided, along with notes that describe the conditions under which the limit obtain as well as links to additional information where available. Use the guidelines in this article to review your overall solution plans.
If your solution plans exceed the recommended guidelines for one or more objects, take one or more of the following actions:
- Evaluate the solution to ensure that compensations are made in other areas.
- Flag these areas for testing and monitoring as you build and deploy your solution.
- Re-design the solution to ensure that you do not exceed capacity guidelines.
The following tables list the objects by category and include recommended guidelines for acceptable performance. Acceptable performance means that the system as tested can support that number of objects, but that the number cannot be exceeded without some performance degradation. An asterisk (*) indicates a hard limit; no asterisk indicates a tested or supported limit.
The following table lists the recommended guidelines for site objects.
| Web server/database server ratio | 8 Web servers per database server | The scale out factor is dependent upon the mix of operations. | |
| Web server/domain controller ratio | 3 Web servers per domain controller | Depending on how much authentication traffic is generated, your environment may support a greater number of Web servers per domain controller. | |
Throughput vs. number of Web servers
In our test environment, farm throughput reached a plateau at 5 Web servers per database server, and did not change substantially when additional Web servers were added. Although you can deploy up to 8 Web servers per database server, you may not realize substantial throughput gains after 5 Web servers. This is because as the number of Web servers making calls against a single database server increases, the database server eventually reaches 100% capacity. Results in your environment may vary according to the performance characteristics of your database server. You will need to conduct your own testing to determine the optimum number of Web servers in your farm environment.”
Scaling OUT Vs Scaling UP Web Servers
Here is a quick look at the obvious benefits of each.
Scale OUT Benefits
Scale UP Benefits
- Optimize Network Paths
- Licensing Costs
Remember that at least one of your web servers is going to be utilized by your index server and depending on your content that could be a big indexing job.
Index Servers
There are some best practices for index server particular to Collaboration Team sites. Typically Collaboration Team sites have a lot of frequently changing content. It’s essential that the proper maintenance windows are allocated for indexing these sites.
http://technet.microsoft.com/en-us/library/cc262574.aspx
“Estimate crawl window
In a Office SharePoint Server 2007 search environment, crawling content typically is the longest-running operation that is not initiated by users. You will need to perform testing in your own environment to determine the amount of time it takes to crawl content using a particular content source, and whether the throughput consumed by crawling this content interferes with your target user response times. Typically, you should verify that crawling a particular content source can be contained within an overnight time span of 12 hours.
Remote server latency
Server latency is a major factor that affects crawl performance. Performance between farm servers must be balanced for overall crawl performance to reach its potential. For example, a powerful index server can be operating at 25% of its capacity if the database server being crawled is not able to respond quickly enough. In such a case, you can scale up the database server, which will in turn increase crawl speeds across the entire farm.
You should conduct your own testing to evaluate the responsiveness of servers in your environment. The database server serving the target farm is often the bottleneck in cases where crawl performance is poor. To improve crawl performance, you can:
- Scale up database server hardware by adding or upgrading processors, adding memory, and upgrading to hard disks with faster seek and write times.
- Increase the memory on query servers in the farm
- Crawl during non-peak hours so that the database server being crawled can service user traffic during the day, and respond to crawls during off-peak hours.
SQL Database servers
When considering scaling up SQL server remember that there is only 1 TEMPDB per SQL Server instance and SharePoint relies heavily upon tempdb operations. Below are Microsoft recommendations, I have found that with current hardware 2TB of data per SQL instance is supportable, I have seen as high as 5 TB per server on Microsoft site for Collaboration Team sites…. But remember the whole farm must be balanced out to handle large data which makes for bigger windows for indexing and backup.
“Consider scaling out in addition to adding resources
It is important to track the following three resource components of a server running SQL Server 2005: CPU, memory, and I/O subsystem. When one or more of the components seem stretched, analyze the appropriate course of action based on the current and projected work load. Then, determine whether to add more resources or to scale out to a new server running SQL Server 2005. In general, we recommend that you consider scaling out in addition to adding more resources.
- If your deployment parameters are generally greater than the upper limits of most of the listed values, your deployment can be considered large.
| Metric | Value |
| Content database size | 50 GB |
| Number of content databases | 20 |
| Number of concurrent requests to SQL Server 2005 | 200 |
| Users | 1000 |
| Number of items in regularly accessed list | 2000 |
| Number of columns in regularly accessed list | 20 |
“
(The calculation is 20 content databases multiplied by 50GB makes 1TB considered Large)
“It is important to track the following three resource components of a server running SQL Server 2005: CPU, memory, and I/O subsystem. When one or more of the components seem stretched, analyze the appropriate course of action based on the current and projected work load. Then, determine whether to add more resources or to scale out to a new server running SQL Server 2005. In general, we recommend that you consider scaling out in addition to adding more resources.
We recommend that you deploy an additional server running SQL Server 2005 when you have more than four Web servers running at full capacity.
Minimal latency on the I/O subsystem that serves the server that runs SQL Server is very important. Slow response from the I/O subsystem cannot be compensated for by adding other types of resources, like CPU or memory, but it can influence and cause issues throughout the farm. Plan for minimal latency before deployment, and monitor your existing systems as described in the section on monitoring.
SQL Server performance depends heavily on the I/O subsystem. Unless your database fits into physical memory, SQL Server constantly brings database pages in and out of the buffer pool. This generates substantial I/O traffic. Similarly, the log records need to be flushed to the disk before a transaction can be declared committed. And finally, SQL Server uses tempdb for various purposes such as to store intermediate results, to sort, to keep row versions and so on. So a good I/O subsystem is critical to the performance of SQL Server.”
“Greater bus bandwidth helps improve reliability and performance. Consider that the disk is not the only user of bus bandwidth — for example, you must also account for network access.”
When considering scaling out SQL Servers, its common to forget that the system bus can contribute to overall latency in the system and that Scaling Up is not always an acceptable solution.
Content Databases Performance
Microsoft has outlined two best practices for SharePoint around SQL Server performance that I have placed below. You will notice that both guidelines essentially say the same thing, keep your databases consistent:
Here is the Best Practice for Publishing Portals:
“This is because Office SharePoint Server 2007 performs best when the types of access and usage patterns for the content in a database are similar. Separating primarily read-only content (publishing) from read-write content (authoring), into different site collections can help.”
EXECUTION PATHS are critical when it comes to response and latency in response is critical for indexing.
“Consider performance
When you host team sites on a dedicated Web application, you have several content databases that contain only team site collections. If content databases host sites with similar data characteristics, Microsoft SQL Server database software operates more efficiently because SQL Server chooses a query plan based on the characteristics of a database. By contrast, if a database hosts sites with vastly different data characteristics, the query plan that SQL Server uses might not be the most efficient method for all content in the database. For example, if a database hosts team sites (that is, a large number of medium-sized sites) and portal sites (that is, a small number of very large sites with many requests), the chosen query plan will be inefficient for one of the types of sites. Therefore, by placing content for team sites in dedicated databases, you can optimize performance for SQL Server, which results in better performance for the overall server farm. “
SharePoint Processes that Consume Resources
SharePoint Timer Jobs
Making sure that your collaboration farms are region/function/tower specific will help with this operation. Joel O. Lists this as one of his SharePoint Top Performance Killers:
http://www.sharepointjoel.com/default.aspx
“Misc Timer Jobs – User Sync for large #s of Users – the more users the longer the timer jobs will run. Profile Synchronization - This job runs once every Hour and there is one per Web Application Quick Profile Synchronization - This job runs once every minute as performance permits and there is one per Web Application (MSDN.) More information on miscellaneous timer jobs… “
Operating System issues… reasons why to use more redundant servers and less large ones:
Backup and Restore Operations
“By design, most backup jobs consume as many I/O resources as they can to finish the job in the available time for maintenance. Therefore, you might see disk queuing, and you might see that all I/O requests come back more slowly than usual. This is typical and should not be considered a problem.”
Index Propagation
Index Propagation will affect the performance of the Index and Query servers depending on the size of the indexes.