We thought we were being brilliant when we decided to store our build output binaries in TFS. Clearly this would make it easy for developers to get the latest build - by just using "Get Latest" on the folder. And TFS only stores the latest version, and a reverse-delta of the changes (even for binaries), so each build should only represent a nominal increase in database side, right?
We quickly discovered that our TfsVersionControl database was growing by 1.7 GB every night. But how could this happen?
Well, what happened is described in this blog post. By default, TFS will not calculate the delta for any file over 16 MB. While it sounds reasonable, we did have about 4 binaries that exceeded this limit. Unfortunately, instead of - say - reverting to the VSS behaviour and only storing the latest version of the file, TFS simply stores a full copy of the file for every version.
While the 16 MB limit is configurable (though not officially documented, thus not officially supported...) by adding the following key to the Web.config for VersionControl, allowing delta calculation for files up to 100 GB:
<add key="deltaMaxFileSize" value="104857600" />
We found out, however, that the delta calculation and compression becomes far less efficient as the size of the binary file increases, meaning we were only saving 60% of the full binaries instead.
We decided to stop pushing binaries into TFS at this point, and instead to use a batch process to pull the latest binaries from the build drop location. But we still had he problem of the existing binaries bloating the database.
After some studying of the tables, and some experimentation, there was only one table we had to delete from in order to clean up the binaries without breaking TFS functionality. The SQL script looked something like this:
DELETE FROM tbl_Content
WHERE tbl_Content.FileId IN
(SELECT FileId FROM tbl_File
INNER JOIN tbl_Namespace ON
tbl_File.FileId = tbl_Namespace.FileId
WHERE tbl_Namespace.FullPath LIKE '<TFS path>')
It took 2 hours to run this script on our 20 GB database, and after it had finished and DBCC SHINKDATABASE was run, our database was back down to a manageable under 5 GB.

2 comments:
Did you mean 16Gb or 16Mb?
The post below says 16Mb
http://teamfoundation.blogspot.com/2007/07/maximum-size-of-file-under-source.html
You are correct, I did in fact mean 16 MB and I'm not sure how I missed that - twice. Thank you for pointing it out, I have fixed it now.
Post a Comment