Apr 2
Sharepoint 2003 – Database Internal
When I wrote the Sharepoint Database Exporter, I learned a lot about how Sharepoint stores the data internally. I also noticed that my blog is often hit with google searches from people looking for how to query a Sharepoint database.
So here are some information…
This information is about the storage of WSS websites in either a Sharepoint Portal Server or a Windows Sharepoint Services. But its not about the storage of Sharepoint Portal Areas (my guess is it should look similar).
So if your Query Analyser connection are opened and ready to query
… the place to start looking is the database that usually terminates by “_SITE” suffix.
Sites
This table stores all the top-level websites. For each entry found here, there is at least one entry on the Webs table.
Webs
This table stores all the websites information for a Sharepoint installation.
| Id | Identifier of the website |
| SiteId | Identifier of the top-level website id (Refers to the Site table) |
| ParentWebId | Identifier of the parent website. For a top-level website, this column contains NULL. |
| Title | The name of the website. |
Lists
This table stores the lists of a website. You can join this table to the Webs table using the tp_WebId.
| tp_WebId | Identifier of the list’s owner website. |
| tp_ID | Unique Identifier of the list. |
| tp_Title | Name of the list. |
| tp_BaseType | This field indicate the basics of the list. Based on the value of this column, the storage and handling may differ.For example, a base type of ‘1′ is used for document and picture libraries. |
| tp_ServerTemplate | This field indicate which type of Sharepoint listitem object will be usedhere are some of the types I am aware of:
|
| tp_RootFolder | This field stores the root item for list attachments. It may be used to link the UserData table with the Docs table when the base type is not ‘1′ (Document/Picture Library). (see the Docs description below) |
| tp_Fields | This fields is a really important one because it is the glue between the physical column name (in table UserData) and the logical field name (defined in Sharepoint).The content of this field is an xml fragment of ‘Field’ and ‘FieldRef’ elements. These elements contains a lot of details like the formula for calculated fields, the datatype, etc. But the most important parts are the ‘Name’ and ‘ColName’ attributes. The ‘Name’ attribute stores the Sharepoint defined name and ‘ColName’ attribute stores the SQL Server column name of the UserData table.Now, the catch 22… This field is filled __ONLY__ when the list’s server template is customized. So if you created a Task list and you didn’t customized by adding an additional field or modifying an existing one, the ‘Fields’ column will be empty. I guess the definition of standard list are ‘Hardcoded’ into Sharepoint so the content of the column is interpreted only when needed. |
UserData
UserData is THE interesting table… It contains every items (records) stored in any list. Yes every list stores its data in the same table.
Additional metadata (fields that you added if you customized a list) is stored in one of the multiple generic column name (int1, int2, int3, in4, …, nvarchar1, nvarchar2, nvarchar3, …). Now you see the importance of the Fields column of the Lists table…
| tp_ID | Identifier of the list item. This is an auto-increment number managed by Sharepoint (not an IDENTITY column) |
| tp_ListId | The identifier of the list owner. |
| bit1, bit2, bitN, int1, int2, intN, float1, float2, floatN, … | The data columns |
Docs
Another interesting table. This table stores the files of a document/picture library and the attachments of a list item.
The way to find an attachment is different than the way to find a file of a document library.
For a document library:
- Use the value from Lists.tp_ID with the Docs.ListId column.
- and use the value from UserData.tp_ID with the Docs.DocLibRowId column
For a list’s attachment, its more complicated:
- The list itself map to a root folder. It is simple to retrieve the root folder from the Docs table by using Lists.tp_RootFolder with Docs.Id.The result is something like:
DirName /MyRootWebsite/MyWebsite/Lists/
LeafName MyList - To get all the attachment of a list, then you need to concatenate the DirName and LeafName of the previous result with ‘Attachments’ constant and search for it.Ex: Search for DirName = /MyRootWebsite/MyWebsite/Lists/MyList/AttachmentsThe result is something like:
DirName: /MyRootWebsite/MyWebsite/Lists/MyList/Attachments
LeafName: list identifier (ex: 1) - Now to finally get to the attachment, we need an additional concatenation: DirName + LeafNameEx: Search for DirName = /MyRootWebsite/MyWebsite/Lists/MyList/Attachments/1
The result is something like:
DirName: /MyRootWebsite/MyWebsite/Lists/MyList/Attachments/1
LeafName: MyFile.zip
Content: the content of the file.
| Id | Unique identifier of a file |
| DirName | The full name of the folder where the file is stored. |
| LeafName | The name of the file or folder depending what kind of data the row contains. |
| Type | Indicate if the row contains a file or a folder. 1 = folder, 0 = file. |
| ListId | The list identifier that owns the file |
Typical/Helper Queries
It’s not really easy to explain the structure, so it will surely not be easy to understand it correctly (misinterpretation, etc.). When we’re talking about database, sometimes nothing explain better than sample queries…
Note: in the queries, information in bold should be replace by actual values.
Retrieve a specific website based on its name:
SELECT * FROM Webs WHERE Title = ‘**MyWebsite**’;
Retrieve all lists owned by a website:
SELECT * FROM Lists WHERE tp_WebId = ‘**MyWebsiteIdentifier**’;
Retrieve all records owned by a list:
SELECT * FROM UserData WHERE tp_ListId = ‘**MyListIdentifier**’;
Retrieve all attachments for all items of a list:
SELECT content.*
FROM Lists l
INNER JOIN Docs root ON (root.Id = l.tp_RootFolder)
INNER JOIN Docs content ON (content.ListId = root.ListId)
WHERE l.tp_ID = ‘**MyListIdentifier**’
AND content.DirName LIKE root.DirName + ‘/’ + root.LeafName + ‘/Attachments’ + ‘/%’;
Retrieve all attachments for a list item:
SELECT content.*
FROM Lists l
INNER JOIN Docs root ON (root.Id = l.tp_RootFolder)
INNER JOIN Docs content ON (content.ListId = root.ListId)
WHERE l.tp_ID = ‘**MyListIdentifier**’
AND content.DirName = root.DirName + ‘/’ + root.LeafName + ‘/Attachments’ + ‘/’ + ‘**MyListItemIdentifier**’;
Retrieve all files for a document library:
SELECT * FROM Docs WHERE ListId = ‘**MyListIdentifier**’;
Retrieve the file for a document library item:
SELECT * FROM Docs WHERE ListId = ‘**MyListIdentifier**’ AND DoclibRowId = **MyListItemIdentifier**;
Does anyone try to put some files into document library by sql command?
It is never safe to bypass an API to write directly in the native format (in this case, SQL Server) of any software unless you are authorized and have documentation provided by the authors of the software.
What happens if you just miss a single bit update somewhere in the database that the API would have set? The system may become unstable. One may try to use the Sharepoint stored procedures, but again, what would be the call sequence for, for example, adding a new document within a folder of a document library?
Why would you need to write directly to the Sharepoint database? perhaps there is a workaround (ex: A DTS script calling a program that uses the Sharepoint API)
Pascal.
That is an interesting article which has given me food for thought.
I was just wondering if you would have any pointers for deleting a folder via the database as the Sharepoint app will not allow the deletion of this particular folder. (Fallen foul of the don’t call a folder “Forms” issue that MS are trying to get a hotfix for)
Hi,
I have nothing handy, but let me come back to you as I may find something of have an alternative.
Does your document library has a lot of content? Or is it connected to many other objects?
Is it Sharepoint 2003 or Sharepoint 2007?
Pascal.
Hi,
This article was tremendously helpful to me to obtain data about the Sharepoint 2003 Intranet I inherited, thank you.
Unfortunately the environment is approx 90% Areas…
You very clearly state the above is: “not about the storage of Sharepoint Portal Areas (my guess is it should look similar)”.
Could you please direct me to a link that might provide the same information as you provided above for site about areas…
Need to know the dB where the Area data is stored, also *_sites?
Need the table names of what would be the equivalent to those listed above.
Any assistance greatly appreciated.
Best,
Paula
Hi Paula,
I am not aware of a similar site that talks about the database tables of Areas in an SPS site. Also, I do not have access to a Portal site so I can’t give more information. What you could do is look for stored procedures (perhaps there is one with a meaningful name like GetAreaDetails).
Most of the sharepoint API uses Stored Proc, these could reveal how areas are stored.
Pascal.
Hi Pascal,
Great article. Helps in understanding the DB schema.
I have text files stored in Doc Lib. Is there any way to get handle to those via SQL and save them back after manipulation? I’ve thousands of them in and writing something using API to get binary, manipulate and save back would be time consuming. I’m fearful about, what will happen to version history, what will happen if someone has kept a version checked out etc issues.
Thanks for the great insight.
Mangesh Kamath
Pascal,
Do you have any information on how I can get a list of all users, and which sub sites they are a member of (and ideally, what there membership group is?)
It’s got something to do with the [UserInfo] tables, the [Webs] table and the [WebMembers] table.
Ideally the data should be like:
‘guy wicks’, ‘ad\guywicks’, ‘Site1′, ‘Guest’
‘guy wicks’, ‘ad\guywicks’, ‘Site2′, ‘Contributor’
etc.
Wow, Your article answered one burning question I have been chasing for a long time.
Now that I can find the url to an attachement (e.g An AVI) on a list item, How do I dynamically display its preview everytime a corresponding list item is displayed?
Eme
Hi, I can’t tell.
I hope you are not directly querying the database from within your webparts? I wouldn’t recommend this as if Microsoft changes its structure within an update, your code may start to fail…
Pascal.
Excellent writing…
I have a number of duplicate calendar list records that I have been able to isolate. Would running a deletion SQL command to remove from the UserData table be bad?
There is a lot of linked information into a Sharepoint Database. I would never feel safe about altering the database directly. I’d rather write a little program that uses the sharepoint API to automate it in case it cannot be done manually.
Pascal.
Hi;
Thanks for this very helpfull article. I used it alot. But I have a problem that I couldn`t solve. By using information from your article I wrote couple of scripts that eliminate spaces from DirName (Directory Names) and LeafName in Docs Table. Because we had a problem with long file and folder names. When I look at them from SQL they seem to be changed but now I can`t see the files that I changed from sql in SP. And Folders names that I changed didn`t make difference in SP. Can you think any solution for this issue.
By the way I`m working on SP 2003. Right now I reset the content index from central edministration and start full update. I`ll let you now if it makes difference.
Even if the storage engine of sharepoint is a format that we can query and update, this is still a proprietary format. Updating the content directly is at your own risks.
The information available in this post was gathered by reverse-engineering when I built the sharepoint database exporter. That makes me able to read content directly from database with a minimum level of confidence, but it ain’t the case for writing. They way Sharepoint handles its folder hierarchy is not totally clear.
Since you have a working environment, I strongly suggest to have a .NET developer create a little job that can do it using the Sharepoint API.
Now if you still want to continue your way, I remember that it seemed that all database operations in sharepoint were wrapped into stored procedures. At least, if you can find one that you could use to perform your operations that would be safer. Otherwise if you can isolate the code that you need from these stored proc, that could give you the “how-to”.
Pascal.