Sharepoint Database Exporter Update
I just updated the Sharepoint Database Exporter. It now supports Sharepoint 2007.
For more details, see the my original post here.
No commentsSharepoint 2003 - Database Internal
When I wrote the Sharepoint Database Exporter, I learned a lot about how Sharepoint stores the data internally. I also noticed that my blog is often hit with google searches from people looking for how to query a Sharepoint database.
So here are some information…
This information is about the storage of WSS websites in either a Sharepoint Portal Server or a Windows Sharepoint Services. But its not about the storage of Sharepoint Portal Areas (my guess is it should look similar).
So if your Query Analyser connection are opened and ready to query
… the place to start looking is the database that usually terminates by “_SITE” suffix.
Sites
This table stores all the top-level websites. For each entry found here, there is at least one entry on the Webs table.
Webs
This table stores all the websites information for a Sharepoint installation.
| Id | Identifier of the website |
| SiteId | Identifier of the top-level website id (Refers to the Site table) |
| ParentWebId | Identifier of the parent website. For a top-level website, this column contains NULL. |
| Title | The name of the website. |
Lists
This table stores the lists of a website. You can join this table to the Webs table using the tp_WebId.
| tp_WebId | Identifier of the list’s owner website. |
| tp_ID | Unique Identifier of the list. |
| tp_Title | Name of the list. |
| tp_BaseType | This field indicate the basics of the list. Based on the value of this column, the storage and handling may differ.For example, a base type of ‘1′ is used for document and picture libraries. |
| tp_ServerTemplate | This field indicate which type of Sharepoint listitem object will be usedhere are some of the types I am aware of:
|
| tp_RootFolder | This field stores the root item for list attachments. It may be used to link the UserData table with the Docs table when the base type is not ‘1′ (Document/Picture Library). (see the Docs description below) |
| tp_Fields | This fields is a really important one because it is the glue between the physical column name (in table UserData) and the logical field name (defined in Sharepoint).The content of this field is an xml fragment of ‘Field’ and ‘FieldRef’ elements. These elements contains a lot of details like the formula for calculated fields, the datatype, etc. But the most important parts are the ‘Name’ and ‘ColName’ attributes. The ‘Name’ attribute stores the Sharepoint defined name and ‘ColName’ attribute stores the SQL Server column name of the UserData table.Now, the catch 22… This field is filled __ONLY__ when the list’s server template is customized. So if you created a Task list and you didn’t customized by adding an additional field or modifying an existing one, the ‘Fields’ column will be empty. I guess the definition of standard list are ‘Hardcoded’ into Sharepoint so the content of the column is interpreted only when needed. |
UserData
UserData is THE interesting table… It contains every items (records) stored in any list. Yes every list stores its data in the same table.
Additional metadata (fields that you added if you customized a list) is stored in one of the multiple generic column name (int1, int2, int3, in4, …, nvarchar1, nvarchar2, nvarchar3, …). Now you see the importance of the Fields column of the Lists table…
| tp_ID | Identifier of the list item. This is an auto-increment number managed by Sharepoint (not an IDENTITY column) |
| tp_ListId | The identifier of the list owner. |
| bit1, bit2, bitN, int1, int2, intN, float1, float2, floatN, … | The data columns |
Docs
Another interesting table. This table stores the files of a document/picture library and the attachments of a list item.
The way to find an attachment is different than the way to find a file of a document library.
For a document library:
- Use the value from Lists.tp_ID with the Docs.ListId column.
- and use the value from UserData.tp_ID with the Docs.DocLibRowId column
For a list’s attachment, its more complicated:
- The list itself map to a root folder. It is simple to retrieve the root folder from the Docs table by using Lists.tp_RootFolder with Docs.Id.The result is something like:
DirName /MyRootWebsite/MyWebsite/Lists/
LeafName MyList - To get all the attachment of a list, then you need to concatenate the DirName and LeafName of the previous result with ‘Attachments’ constant and search for it.Ex: Search for DirName = /MyRootWebsite/MyWebsite/Lists/MyList/AttachmentsThe result is something like:
DirName: /MyRootWebsite/MyWebsite/Lists/MyList/Attachments
LeafName: list identifier (ex: 1) - Now to finally get to the attachment, we need an additional concatenation: DirName + LeafNameEx: Search for DirName = /MyRootWebsite/MyWebsite/Lists/MyList/Attachments/1
The result is something like:
DirName: /MyRootWebsite/MyWebsite/Lists/MyList/Attachments/1
LeafName: MyFile.zip
Content: the content of the file.
| Id | Unique identifier of a file |
| DirName | The full name of the folder where the file is stored. |
| LeafName | The name of the file or folder depending what kind of data the row contains. |
| Type | Indicate if the row contains a file or a folder. 1 = folder, 0 = file. |
| ListId | The list identifier that owns the file |
Typical/Helper Queries
It’s not really easy to explain the structure, so it will surely not be easy to understand it correctly (misinterpretation, etc.). When we’re talking about database, sometimes nothing explain better than sample queries…
Note: in the queries, information in bold should be replace by actual values.
Retrieve a specific website based on its name:
SELECT * FROM Webs WHERE Title = ‘**MyWebsite**’;
Retrieve all lists owned by a website:
SELECT * FROM Lists WHERE tp_WebId = ‘**MyWebsiteIdentifier**’;
Retrieve all records owned by a list:
SELECT * FROM UserData WHERE tp_ListId = ‘**MyListIdentifier**’;
Retrieve all attachments for all items of a list:
SELECT content.*
FROM Lists l
INNER JOIN Docs root ON (root.Id = l.tp_RootFolder)
INNER JOIN Docs content ON (content.ListId = root.ListId)
WHERE l.tp_ID = ‘**MyListIdentifier**’
AND content.DirName LIKE root.DirName + ‘/’ + root.LeafName + ‘/Attachments’ + ‘/%’;
Retrieve all attachments for a list item:
SELECT content.*
FROM Lists l
INNER JOIN Docs root ON (root.Id = l.tp_RootFolder)
INNER JOIN Docs content ON (content.ListId = root.ListId)
WHERE l.tp_ID = ‘**MyListIdentifier**’
AND content.DirName = root.DirName + ‘/’ + root.LeafName + ‘/Attachments’ + ‘/’ + ‘**MyListItemIdentifier**’;
Retrieve all files for a document library:
SELECT * FROM Docs WHERE ListId = ‘**MyListIdentifier**’;
Retrieve the file for a document library item:
SELECT * FROM Docs WHERE ListId = ‘**MyListIdentifier**’ AND DoclibRowId = **MyListItemIdentifier**;
Sharepoint 2003 and 2007 Database Exporter
Hi,
Some months ago I was experimenting with Windows Sharepoint Services and found out that my machine was not powerfull enough to run Windows 2003, SQL Server and Sharepoint at a decent speed. I had data in Sharepoint I wanted to keep so I did an SQL Server backup of the database. I tought that I would re-setup everything easily after a machine upgrade… I was wrong! I never succeeded to restore the database backup in a running Sharepoint environment (I learned the hard way that I should have used WSS backup utilities instead of relying only on a database backup).
From that moment I decided to explore how Microsoft stores its data within Sharepoint. I found out about Docs and UserData tables and I thought I could develop a little tool to extract my data. In the mean time I discovered the Sharepoint Database Explorer from James Edelen. I was happy that I would’nt have to code it myself :)… The tool is nice, but it didn’t allow me to export custom lists and metadata…
As a learning experience, I decided to build an exporter program. It happens that I find it usefull in a couple of situations… So I decided to share in case anybody would have the same needs than I had!
***** Updated (2008-02-21) *****
Features (v1.1.0.1):
- It is possible to preview the pending version (checked-out and saved, but not yet checked-in) of a file.
(on both: WSS 2.0 and WSS 3.0) (see this screenshot) - Bug fix regarding the extraction of document librarie items when files are checked-out.
Features (v1.1.0.0):
- Connects to either Sharepoint 2003 or Sharepoint 2007 database structure (automatic detection).
- It is possible to preview the version history of a file and to export individually a previous version of a file.
Features (v1.0.0.1):
- Connects to Sharepoint 2003 databases directly
- Does not rely on a working Sharepoint environment… it only connects to the database
- You can preview the list contents (see screenshots)
- It is possible to export individual files from the preview window.
- You can export custom list attachments and document library files
- You can export metadata
Metadata is exported in an xml file. For attachments, you have the choice to export them in a folder or to embed them in the xml.
Known issues:
- When a list is not customized (kept as-is — no additional fields added), Microsoft does not store the column mapping (Sharepoint field name mapping to SQL Server column name). They probably store them in a resource file so its faster to retrieve than querying the database. For us, that means we have to hardcode the mapping for each list type (Custom List, Contact List, Discussion List, Calendar, etc.). For this reason, some list types may not export well.
Screenshots:
- Preview tab - Attachment history
- Preview tab - Custom List
- Preview tab - Listitem with attachment
- Export tab
- XML sample
Download (Latest Version):
- Sharepoint Database Exporter (v1.1.0.1) - Binaries Only
- Sharepoint Database Exporter (v1.1.0.1) - Sources and Binaries
- License (New BSD License)
Add-On:
- XSL Transformation to convert output into an HTML Table
(use this to import to an older Excel spreadsheet that does not have native XML support)
Download (Previous Versions):
- Sharepoint Database Exporter (v1.1.0.0) - Binaries Only
- Sharepoint Database Exporter (v1.1.0.0) - Sources and Binaries
- Sharepoint 2007 Database Exporter - Alpha 1 - Binaries
- Sharepoint 2007 Database Exporter - Alpha 1 - Sources and Binaries
- Sharepoint 2003 - Database Exporter (v1.0.0.1) - Binaries Only
- Sharepoint 2003 - Database Exporter (v1.0.0.1) - Source and Binaries
Final Notes:
This work is done on my free time, but I encourage to post bugs and feature requests so I can take a look when I’ll have time.
Also if you made your own modifications, feel free to post them back so I can integrate them, of course giving you the credits for what you’ve done…
56 commentsSharepoint 2003 performance issue with document libraries
Situation:
I’ve been put in charge of an application running on the Sharepoint 2003 plateform. The application composed of many webparts and stores its data directly within lists and document libraries. The website is split by language, one subsite by language. For example:
- /root/English
- /root/French
- /root/Spanish
- …
The purpose of this application is to store and search files with custom metadata attached to it. So we find a customized document library in each language subsite.
The problem:
Overall, there is about 1 000 000 records in those libraries (the biggest library is english with about 400 000 records) and searching through this using custom CAML queries is a pain!
At some point in the application we need to know in which language a document is available. For this we query each document libraries using a filter on a metadata that contains our unique identifier.
<where>
<fieldref name="MyUniqueID">
<value type="Number">00000</value>
</where>
The tracing revealed an increasing time of processing. We currently have 7 languages and the processing time goes like that:
- 1st language: less than 1 second
- 2nd language: about 2 seconds
- 3rd language: about 5 seconds
- 4th language: about 10 seconds
- 5th language: about 20 seconds
- 6th language: about 20 seconds
- 7th language: about 20 seconds
Work Around:
First of all, the design of the application is not the best but that’s not the purpose of this post.
Some would say, get that data out of Sharepoint! Ya, that’s what we’re doing (for many reasons…) but in the meantime, the system must continue to be up and running.
Metadata is dynamic (it varies from list to list) and Sharepoint is not optimized to search using this. But Sharepoint recognize when you search using its internal identifier and it goes very fast. So if we could map our identifier to the sharepoint internal identier, it should be faster. I found out that doing this 2 steps operation is actually faster (in my scenario it reduced the total from 77 seconds to 2 seconds average):
- Run an ugly SQL statement directly on the Sharepoint database to find the Sharepoint internal identifier based on our own unique identifier.
- Query Sharepoint API (using CAML) with Sharepoint internal identifier.
the SQL statement looks like this:
SELECT tp_ID FROM UserData WHERE tp_ListID = '...' AND Float1 = ...
Float1 stores our unique identifier. I determine it at run-time by looking at the Fields property (not really sure by heart of the property name…). It contains the Sharepoint column Name / SQL column name mappings.
To get the list identifier (a GUID), just take the ID property of the SPList object.
That’s about it… I could talk about it for hours but I think I said the important points.
Additional notes:
Maybe the following hotfix would have correct the problem:
http://support.microsoft.com/?id=900929
See this explanation from Keith Richie’s blog:
http://blogs.msdn.com/krichie/archive/2006/07/20/673197.aspx