Remember That Guy, whom I wrote about nearly a year ago? He's back. Actually, he didn't go anywhere, he's been here the whole time. It's just that I've only recently noticed a new character trait in him: He's an Architecture Astronaut.
The problem that brought this out is how to deploy an optional utility to clients, which we don't want to deploy as part of our main web-based application, and how to manage matching up their versions appropriately, since the utility is not guaranteed to be backwards-compatible with the application it is used with.
ClickOnce was considered, but ruled out for two reasons: 1. There's no reliable way to ensure clients can use ClickOnce, since their servers which run our application may be completely disconnected from the Internet, and 2. ClickOnce only allows you to get the absolute most recent version of the utility, you cannot easily selectively get a specific version.
That Guy's solution? Package the utility's ClickOnce source, and deploy it along with our web application to the clients' servers. When they want to install this utility, they would hit up the location of the ClickOnce application and install it. That way, if the client installs an updated version of the web application, the next time they run the utility it would automatically update.
This is a clever solution, I will grant him that. But it is not the correct one. In fact, I'm not even sure what this solution actually solves, since part of the original reason for looking at different deployment options was because we don't want to pre-package it with our main application.
Luckily, I'm the one in charge of figuring out the deployment side of things. My choice solution? Deploy the utility using a basic MSI installer which we provide to clients on request, and add a very basic version checking mechanism to the utility that prompts the client if there is a version mismatch.
Occam's Razor: The simplest soluton is usually the correct one.
Tuesday, August 26, 2008
Wednesday, August 13, 2008
Making Things Mesh (even with Windows Home Server)
I've been playing around with the beta of Windows Live Mesh and I must say I'm very impressed with the possibilities of this product. To break it down, the beta currently offers two main features: Remote machine access, and file and folder synchronization. Remote access is handy, but fairly straightforward and nothing new. What I really want to talk about is the folder sync functionality.
Folder synchronization has been around for a long time, too, so what makes Live Mesh so special? A few things, primarily:
1. You can select any existing folder in your file system for synchronization. For example, you can take your Windows-default Favorites folder from your PC and add it to your Mesh for synchronizing with your laptop.
2. The folder that is synced from one machine does not have to be the same folder on another. Using the above example, when you add the Favorites Mesh folder to your laptop, you are not required to use the same Windows-default Favorites folder for synching. You can create a completely new folder anywhere on your computer that will be synchronized with the Favorites folder from your PC.
3. Folders that you add to your Mesh are also synchronized with the "cloud", up to 5 GB worth (currently). The contents of those folders are accessible from anywhere via web browser, in what is called your Live Mesh Desktop. If you exceed the 5 GB limit, contents will still be synced between devices via P2P, but will not sync with the Live Mesh Desktop.
4. You can invite anyone to share in any of your Mesh folders. That person, once signed up to Live Mesh, will have access to that folder on their Live Mesh Desktop, and if they install the Live Mesh client, can synchronize it automatically to their computer. A person only has access to the folders that you specifically invite them to share in.
Being a huge fan of Windows Home Server, my thoughts immediately turned to how I could use this to synchronize my laptop with my document share on WHS (the built-in Windows synchronization with shares is not available to Windows Vista Home Premium, only Business and Ultimate). I did read that it is possible to install the latest beta verion of Mesh on Windows Server 2003 (which is what WHS runs), but that sharing the folders from the D: drive is not recommended, since all warnings about WHS say that you should access everything from the Windows Shares. I decided to investigate further, despite the warnings.
For backround, WHS uses a technology called Drive Extender (DE), which allows users to add disks to the server, which get added to the storage pool without concern for drive letters or RAID configuration. The Drive Extender Whitepaper explains that DE uses the D: drive (a partition of the system disk) to store symbolic links to the actual files which reside on other volumes which are not mapped to drive letters. Various background services balance the files between drives and duplicate them if that feature is enabled.
Nothing I read indicates that there are any filters operating at the Windows Share (CIFS/SMB) level. Assuming that those Windows Shares are simply run-of-the-mill shares pointing to shared folders on D:, then that means NTFS and the DE filter handle all the low-level file functions, redirecting the symbolic links to the actual file locations. Following that logic, I can't see any reason that another file sharing technology, in this case Windows Live Mesh, cannot operate on those folders the same way that the Windows shares do.
I took the dive, installed Live Mesh, and added a subfolder under my share to the Mesh. The initial synchronization to my Live Mesh Desktop was successful, and then I added it to my laptop as my Windows-default Documents folder. The synchronization there was also successful.
Finally, I ran a test by reorganizing and adding some files to my local Documents folder. Within a few minutes, those changes were reflected in the shared folder perfectly. I have been running this synchronization for about a week now, and WHS has had no problems working with those files, balancing them, or duplicating them. Apparently, my assumption was correct.
Having said and done all that, I do not recommend going against the warnings of Microsoft and the WHS community at large, however this experiment was extremely helpful in understanding at what levels the Drive Extender technologies work. I especially would not recommend doing anything similar for software that works at any level lower than the top-most layers of the file system, e.g., keep virus scanners and disk defragmenters away from your D: drive.
It should also be noted that the Windows Live Mesh client only works when it is run interactively by a user. I tried to run it as a Service (with Administrator's credentials) using AnyServiceInstaller, but it would not sync any folders. Additionally, opening the Console from a client computer does not execute anything in the Startup folder nor anything in the Run registry keys. To get the automatic syncing working, I added a shortcut to Live Mesh to Advanced Admin Console, so I can start it from there after any server reboots, since once the WHS Console has been opened from a client machine, it remains open in an interactive user session on the server, so Live Mesh also keeps running in the background of an interactive login. It's not a perfect solution, but it works.
I'm very interested in the possibilities presented by Live Mesh and I will be very interested to see how Microsoft plans to extend the functionality to smart phones, which they seem to be working on.
Folder synchronization has been around for a long time, too, so what makes Live Mesh so special? A few things, primarily:
1. You can select any existing folder in your file system for synchronization. For example, you can take your Windows-default Favorites folder from your PC and add it to your Mesh for synchronizing with your laptop.
2. The folder that is synced from one machine does not have to be the same folder on another. Using the above example, when you add the Favorites Mesh folder to your laptop, you are not required to use the same Windows-default Favorites folder for synching. You can create a completely new folder anywhere on your computer that will be synchronized with the Favorites folder from your PC.
3. Folders that you add to your Mesh are also synchronized with the "cloud", up to 5 GB worth (currently). The contents of those folders are accessible from anywhere via web browser, in what is called your Live Mesh Desktop. If you exceed the 5 GB limit, contents will still be synced between devices via P2P, but will not sync with the Live Mesh Desktop.
4. You can invite anyone to share in any of your Mesh folders. That person, once signed up to Live Mesh, will have access to that folder on their Live Mesh Desktop, and if they install the Live Mesh client, can synchronize it automatically to their computer. A person only has access to the folders that you specifically invite them to share in.
Being a huge fan of Windows Home Server, my thoughts immediately turned to how I could use this to synchronize my laptop with my document share on WHS (the built-in Windows synchronization with shares is not available to Windows Vista Home Premium, only Business and Ultimate). I did read that it is possible to install the latest beta verion of Mesh on Windows Server 2003 (which is what WHS runs), but that sharing the folders from the D: drive is not recommended, since all warnings about WHS say that you should access everything from the Windows Shares. I decided to investigate further, despite the warnings.
For backround, WHS uses a technology called Drive Extender (DE), which allows users to add disks to the server, which get added to the storage pool without concern for drive letters or RAID configuration. The Drive Extender Whitepaper explains that DE uses the D: drive (a partition of the system disk) to store symbolic links to the actual files which reside on other volumes which are not mapped to drive letters. Various background services balance the files between drives and duplicate them if that feature is enabled.
Nothing I read indicates that there are any filters operating at the Windows Share (CIFS/SMB) level. Assuming that those Windows Shares are simply run-of-the-mill shares pointing to shared folders on D:, then that means NTFS and the DE filter handle all the low-level file functions, redirecting the symbolic links to the actual file locations. Following that logic, I can't see any reason that another file sharing technology, in this case Windows Live Mesh, cannot operate on those folders the same way that the Windows shares do.
I took the dive, installed Live Mesh, and added a subfolder under my share to the Mesh. The initial synchronization to my Live Mesh Desktop was successful, and then I added it to my laptop as my Windows-default Documents folder. The synchronization there was also successful.
Finally, I ran a test by reorganizing and adding some files to my local Documents folder. Within a few minutes, those changes were reflected in the shared folder perfectly. I have been running this synchronization for about a week now, and WHS has had no problems working with those files, balancing them, or duplicating them. Apparently, my assumption was correct.
Having said and done all that, I do not recommend going against the warnings of Microsoft and the WHS community at large, however this experiment was extremely helpful in understanding at what levels the Drive Extender technologies work. I especially would not recommend doing anything similar for software that works at any level lower than the top-most layers of the file system, e.g., keep virus scanners and disk defragmenters away from your D: drive.
It should also be noted that the Windows Live Mesh client only works when it is run interactively by a user. I tried to run it as a Service (with Administrator's credentials) using AnyServiceInstaller, but it would not sync any folders. Additionally, opening the Console from a client computer does not execute anything in the Startup folder nor anything in the Run registry keys. To get the automatic syncing working, I added a shortcut to Live Mesh to Advanced Admin Console, so I can start it from there after any server reboots, since once the WHS Console has been opened from a client machine, it remains open in an interactive user session on the server, so Live Mesh also keeps running in the background of an interactive login. It's not a perfect solution, but it works.
I'm very interested in the possibilities presented by Live Mesh and I will be very interested to see how Microsoft plans to extend the functionality to smart phones, which they seem to be working on.
Tackling Lucene.Net - Part II
NOTE: I'm sorry that the code samples wrap. Until I figure out how to do them differently, that's the best I got.
So in Part I, I promised some detail about how to index and display line numbers in the search results. The code examples I provide may be a bit messy and are not very modular at all, but that's okay because they're short, to the point, and therefore should make fairly decent examples. I hope. I don't expect anyone to copy my code verbatim, but rather I hope that anyone who needs this functionality out of Lucene.Net will use them a the guide I didn't have when figuring this out for myself.
First, you will need to ensure that offset data is stored as part of the indexed content, when creating the Lucene.Net Field, include the Field.TermVector.WITH_POSITIONS_OFFSETS flag as a parameter:
Originally for generating previews, I used the Highlighter.Net contrib (which is distributed as part of Lucene.Net) to format the search results into HTML fragments that I could format into a document for display. This works fine for basic display of the search results, but doesn’t provide any means of getting or displaying the offset information as line/column data for displaying line numbers as part of the results. Therefore, I had to build a mean of formatting the results from scratch.
First, I created a class that “explodes” the original text into an array of lines. I won’t go into too much detail about how this class is built (it should be easy enough to figure out), but here’s the method that does the splitting, which should be fairly straightforward.
And here’s the logic that gets the line and column positions of the specified offset (which is based on the original text). There is probably a more graceful way of doing this, but this was the quick and dirty method I wrote to get it working:
The method loops through each line and checks if the offset falls within that line. If it does, set the line and column out parameters and break out of the loop; otherwise, keep looking.
The Exploder gets called once we have our Hits object from the Lucene searcher, within a loop that gets the document for each hit. The original document is read in and exploded.
Next, we get an array of the search hit tokens, which we will use to get the location of each hit in the document, for formatting the fragment and addline line data:
parser is of course the original QueryParser.
Here’s what GetTokenPositions looks like:
PositionedToken is a lightweight class that simply stores the line and column position of the start of the token, the token length, and a reference to the original Token object.
Based on this, it should be fairly clear that the next step will be to build some kind of preview using all the PositionedTokens to get the lines on which tokens appear and format those lines for display. My solution was to build HTMLPreviewBuilder:
FragmentLines is a class that builds an array of lines which includes the line on which the token resides from the exploded text, and a buffer of preview lines before and after; in the example below I have simply hard-coded it to grab 2 lines before and 2 lines after:
So now our preview has an array of these, each containing a preview fragment for each token. What if other tokens are within the 2-line preview, or even on the same line, you ask? We can merge those fragments together, and I will address that in Part III. Note that Tokens in the above class is in fact an array; this is set up for this reason. For now, we'll only have one PositionedToken element in there.
Here is where the line numbers are added and the tokens formatted for HTML display:
EncodeForHTML escapes any angle-brackets and ampersands for the HTML. Now that our lines are formatted with line numbers and the search terms highlighted using more hard-coded stuff (please feel free to do one better than that), we can wrap it in an HTML document for returning to the user:
Et voila! HTML preview complete with line numbers. As mentioned earlier, this will display a preview fragment for each token, regardless of overlap. In the next segment, Part III, I will show you how to merge the fragments that overlap and how to format the merged segments.
So in Part I, I promised some detail about how to index and display line numbers in the search results. The code examples I provide may be a bit messy and are not very modular at all, but that's okay because they're short, to the point, and therefore should make fairly decent examples. I hope. I don't expect anyone to copy my code verbatim, but rather I hope that anyone who needs this functionality out of Lucene.Net will use them a the guide I didn't have when figuring this out for myself.
First, you will need to ensure that offset data is stored as part of the indexed content, when creating the Lucene.Net Field, include the Field.TermVector.WITH_POSITIONS_OFFSETS flag as a parameter:
return new Field("content", new StreamReader(filePath), Field.TermVector.WITH_POSITIONS_OFFSETS);
Originally for generating previews, I used the Highlighter.Net contrib (which is distributed as part of Lucene.Net) to format the search results into HTML fragments that I could format into a document for display. This works fine for basic display of the search results, but doesn’t provide any means of getting or displaying the offset information as line/column data for displaying line numbers as part of the results. Therefore, I had to build a mean of formatting the results from scratch.
First, I created a class that “explodes” the original text into an array of lines. I won’t go into too much detail about how this class is built (it should be easy enough to figure out), but here’s the method that does the splitting, which should be fairly straightforward.
public static string[] Explode(string text)
{
string[] explodedText = text.Replace("\r\n", "\n").Split('\n');
// Remove the trailing empty line that occurs when splitting.
Array.Resize<string>(explodedText, explodedText.Length - 1);
return explodedText;
}
And here’s the logic that gets the line and column positions of the specified offset (which is based on the original text). There is probably a more graceful way of doing this, but this was the quick and dirty method I wrote to get it working:
public void GetPosition(int offset, out int line, out int column)
{
int charpos = offset;
line = column = -1;
for (int i = 0; i <>
{
if (charpos <>
{
line = i;
column = charpos;
break;
}
else
charpos -= (_lineLength[i] + 2); // +2 for the missing \r\n
}
}
The method loops through each line and checks if the offset falls within that line. If it does, set the line and column out parameters and break out of the loop; otherwise, keep looking.
The Exploder gets called once we have our Hits object from the Lucene searcher, within a loop that gets the document for each hit. The original document is read in and exploded.
Next, we get an array of the search hit tokens, which we will use to get the location of each hit in the document, for formatting the fragment and addline line data:
List<PositionedToken> tokenPositions = GetTokenPositions(parser.GetAnalyzer().TokenStream("content", new StreamReader(filePath)), explodedText);
parser is of course the original QueryParser.
Here’s what GetTokenPositions looks like:
private List<PositionedToken> GetTokenPositions(TokenStream tokenstream, ExplodedText explodedtext)
{
List<PositionedToken> tokenPositions = new List<PositionedToken>();
Token token;
while ((token = tokenstream.Next()) != null)
{
int line, column;
explodedtext.GetPosition(token.StartOffset(), out line, out column);
tokenPositions.Add(new PositionedToken(line, column, token));
}
return tokenPositions;
}
PositionedToken is a lightweight class that simply stores the line and column position of the start of the token, the token length, and a reference to the original Token object.
Based on this, it should be fairly clear that the next step will be to build some kind of preview using all the PositionedTokens to get the lines on which tokens appear and format those lines for display. My solution was to build HTMLPreviewBuilder:
public class HTMLPreviewBuilder
{
private List<FragmentLines> _fragments;
private ExplodedText _explodedText;
public HTMLPreviewBuilder(List<PositionedToken> tokens, ExplodedText explodedtext)
{
_fragments = new List<FragmentLines>();
_explodedText = explodedtext;
foreach (PositionedToken token in tokens)
_fragments.Add(new FragmentLines(token, explodedtext));
// If for whatever reason we have no fragments, return the original text.
if (_fragments.Count == 0)
_fragments.Add(new FragmentLines(explodedtext));
FormatLinesAndTokens();
}
}
FragmentLines is a class that builds an array of lines which includes the line on which the token resides from the exploded text, and a buffer of preview lines before and after; in the example below I have simply hard-coded it to grab 2 lines before and 2 lines after:
public class FragmentLines
{
public string[] Lines;
public int StartLineNumber;
public int EndLineNumber;
public PositionedToken[] Tokens;
public FragmentLines(ExplodedText explodedtext)
{
Lines = explodedtext.Lines;
StartLineNumber = EndLineNumber = 1;
}
public FragmentLines(PositionedToken token, ExplodedText explodedtext)
{
Tokens = new PositionedToken[1] { token };
StartLineNumber = Math.Max(0, token.Line - 2); // 2 lines prior
EndLineNumber = Math.Min(explodedtext.LineCount - 1, token.Line + 2); // 2 lines after
int numLines = (EndLineNumber - StartLineNumber) + 1;
Lines = new string[numLines];
for (int i = 0; i <>
Lines[i] = explodedtext[StartLineNumber + i];
}
}
So now our preview has an array of these, each containing a preview fragment for each token. What if other tokens are within the 2-line preview, or even on the same line, you ask? We can merge those fragments together, and I will address that in Part III. Note that Tokens in the above class is in fact an array; this is set up for this reason. For now, we'll only have one PositionedToken element in there.
Here is where the line numbers are added and the tokens formatted for HTML display:
private void FormatLinesAndTokens()
{
// Gets the width of the string representation of the largest line number, so we can pad the line numbers appropriately.
int maxLineNumWidth = explodedtext.LineCount.ToString().Length;
foreach (FragmentLines frag in _fragments)
{
// Inserts the line number on each line, and formats any tokens
for (int lineNum = frag.StartLineNumber; lineNum <= frag.EndLineNumber; lineNum++)
{
string line = frag.Lines[lineNum - frag.StartLineNumber];
int lineNumDisplay = lineNum + 1; // File line numbers start at 1.
string lineNumPrefix = lineNumDisplay.ToString().PadLeft(maxLineNumWidth) + ": ";
// Get the original Token
PositionedToken token = frag.Tokens[0];
if (token.Line == lineNum)
{
StringBuilder lineBuilder = new StringBuilder();
int startPos = 0;
int endPos = line.Length;
// Get key positions in the line so we can insert HTML
int startPosToTokenLen = token.Column - startPos;
int tokenEndPos = token.Column + token.Length;
int tokenEndPosToEndPos = endPos - tokenEndPos;
lineBuilder.Append(lineNumPrefix);
lineBuilder.Append(EncodeForHTML(line.Substring(startPos, startPosToTokenLen)));
lineBuilder.Append("<span style=\"background-color:#FFFF00;font-weight:bold\">");
lineBuilder.Append(EncodeForHTML(line.Substring(token.Column, token.Length)));
lineBuilder.Append("</span>");
lineBuilder.Append(EncodeForHTML(line.Substring(tokenEndPos, tokenEndPosToEndPos)));
frag.Lines[lineNum - frag.StartLineNumber] = lineBuilder.ToString();
}
else
frag.Lines[lineNum - frag.StartLineNumber] = EncodeForHTML(line)
}
}
}
EncodeForHTML escapes any angle-brackets and ampersands for the HTML. Now that our lines are formatted with line numbers and the search terms highlighted using more hard-coded stuff (please feel free to do one better than that), we can wrap it in an HTML document for returning to the user:
public override string ToString()
{
StringBuilder preview = new StringBuilder("<html>");
preview.Append("<body style=\"font-family: Courier New; font-size: 8pt; background-color: #FFFFE1\">");
foreach (FragmentLines fragment in _fragments)
preview.Append("<pre>" + fragment.ToString() + "</pre><hr />");
preview.Append("</body></html>");
return preview.ToString();
}
Et voila! HTML preview complete with line numbers. As mentioned earlier, this will display a preview fragment for each token, regardless of overlap. In the next segment, Part III, I will show you how to merge the fragments that overlap and how to format the merged segments.
Subscribe to:
Posts (Atom)
