-->

Friday, November 16, 2007

Binaries in TFS

We thought we were being brilliant when we decided to store our build output binaries in TFS. Clearly this would make it easy for developers to get the latest build - by just using "Get Latest" on the folder. And TFS only stores the latest version, and a reverse-delta of the changes (even for binaries), so each build should only represent a nominal increase in database side, right?

We quickly discovered that our TfsVersionControl database was growing by 1.7 GB every night. But how could this happen?

Well, what happened is described in this blog post. By default, TFS will not calculate the delta for any file over 16 MB. While it sounds reasonable, we did have about 4 binaries that exceeded this limit. Unfortunately, instead of - say - reverting to the VSS behaviour and only storing the latest version of the file, TFS simply stores a full copy of the file for every version.

While the 16 MB limit is configurable (though not officially documented, thus not officially supported...) by adding the following key to the Web.config for VersionControl, allowing delta calculation for files up to 100 GB:

<add key="deltaMaxFileSize" value="104857600" />

We found out, however, that the delta calculation and compression becomes far less efficient as the size of the binary file increases, meaning we were only saving 60% of the full binaries instead.

We decided to stop pushing binaries into TFS at this point, and instead to use a batch process to pull the latest binaries from the build drop location. But we still had he problem of the existing binaries bloating the database.

After some studying of the tables, and some experimentation, there was only one table we had to delete from in order to clean up the binaries without breaking TFS functionality. The SQL script looked something like this:

DELETE FROM tbl_Content

WHERE tbl_Content.FileId IN

(SELECT FileId FROM tbl_File

INNER JOIN tbl_Namespace ON

tbl_File.FileId = tbl_Namespace.FileId

WHERE tbl_Namespace.FullPath LIKE '<TFS path>')

It took 2 hours to run this script on our 20 GB database, and after it had finished and DBCC SHINKDATABASE was run, our database was back down to a manageable under 5 GB.

2 comments:

Frank Kish said...

Did you mean 16Gb or 16Mb?

The post below says 16Mb

http://teamfoundation.blogspot.com/2007/07/maximum-size-of-file-under-source.html

RTA said...

You are correct, I did in fact mean 16 MB and I'm not sure how I missed that - twice. Thank you for pointing it out, I have fixed it now.