Tagged: Sitecore Azure

Sitecore Media Library -> Cloud

By default all of Sitecore’s images are stored in the database, and retrieved on the fly when an image is requested via the media library. There are definitely reasons why this is a bad idea, and can be improved upon, and much has been written on the web about how to move Sitecore’s images from the database to the cloud. There are specific reasons to do this – reduce the size of the content database, reduce page load times and to reduce database hits. I’ve taken a look through a few options for porting images to Windows Azure Blob Storage (WABS)*, and wanted to outline what I saw as the approaches that people have taken, and how we ended up solving this problem.

*Other providers like AS3 are also available

The options for achieving this I have read about so far are broadly:

  1. Swap out the SqlserverDataProvider for your own subclassed provider that pumps the blob data out to azure with a GUID for its name, reference.
  2. Reroute the media library to an Azure storage blob using ARR or something similar. Example using Sitecore’s own configuration to reroute.

When I came to investigate these approaches, I found slight problems with both of them, and hence developed a slightly different approach, which is outlined below.

Option 1 – Swapping out the data provider

The first approach is broadly elegant, and offers some significant benefits, but unfortunately also has some drawbacks. The benefits are that, being low-level, it should preserve Sitecore’s image resizing capabilities via the pipeline steps, and it allows us to remove the images entirely from the database, thereby minimizing the size of the database, making for easier portability. It also would allow logic to be added that only pulls images from Azure if they exist, and fallback to the database in the event that no image has been uploaded, making it more robust.

The main drawback with this approach is that because we are hooking into quite a low-level part of Sitecore, in the SqlserverDataProvider, the functions to read / write the blob only get limited information about it – a GUID which would identify that blob in the SQL table, and the data itself. This leaves a problem whereby once all your images are published to the cloud, they don’t have the same item hierarchy (folder path) that would have been present in Sitecore, and worse, they don’t necessarily have the correct extension. So whilst this solution works acceptably, it’s not easy to see what has happened if an image is missing, and eventually you will have one container with potentially thousands of
unstructured images in it, all named by
GUIDs.

I spent some time trying to amend this solution to save / retrieve the images via a path rather than a GUID, and there are some options here. Remember the GUID you have is not the ID of the Media item, it’s actually the ID of the blob in the database, so it’s not so easy to get to the item from it to lookup the path.

The first option is to lookup the path from the GUID via a database hit. There’s some SQL below which should do it, but I don’t like this solution. You’re getting rid of one database hit, and introducing another. Also it feels “dirty”. There are further options – you could create and maintain some sort of lookup and cache these hits, but the whole thing starts to feel pretty messy at this point.

SELECT top 100 
 *         
FROM 
 Items I
 Join SharedFields S on S.ItemId = i.ID and s.fieldid  = '{40E50ED9-BA07-4702-992E-A912738D32DC}' 
 left Join Blobs B on S.Value = B.BlobId 
Where 
 s.value = '{B018B71D-681E-4771-88E6-EFF99994F979}'        
order by 
 i.created desc 

The second option I looked into was to try and hook into the process at a higher level where the Media item GUID is still available to use. Looking in the call stack for the ‘SetBlobStream’ function, we see the below.

Sitecore Callstack

To get the full path for an item when calling SetStream, we would need to get into this call stack a bit higher – ideally at the MediaData / Media class. There is some config that looks like it might wire this up in Sitecore:

    <mediaLibrary>
      <!-- MEDIA PROVIDER
         The media provider used to generate URLs, create media items, control media caching, parse media requests, and other
         media related functionality.      
      -->
      <mediaProvider type="Sitecore.Resources.Media.MediaProvider, Sitecore.Kernel" />
      <!-- MEDIA REQUEST PREFIXES 
           Allows you to configure additional media prefixes (in addition to the prefix defined by the Media.MediaLinkPrefix setting)
           The prefixes are used by Sitecore to recognize media URLs. 
           Notice: For each custom media prefix, you must also add a corresponding entry to the <customHandlers> section 
      -->
      <mediaPrefixes>
        <!-- Example
        <prefix value="-/media"/>
        -->
      </mediaPrefixes>
      <requestParser type="Sitecore.Resources.Media.MediaRequest, Sitecore.Kernel" />
      <mediaTypes>
        <mediaType name="Any" extensions="*">
          <mimeType>application/octet-stream</mimeType>
          <forceDownload>true</forceDownload>
          <sharedTemplate>system/media/unversioned/file</sharedTemplate>
          <versionedTemplate>system/media/versioned/file</versionedTemplate>
          <metaDataFormatter type="Sitecore.Resources.Media.MediaMetaDataFormatter" />
          <mediaValidator type="Sitecore.Resources.Media.MediaValidator" />
          <thumbnails>
            <generator type="Sitecore.Resources.Media.MediaThumbnailGenerator, Sitecore.Kernel">
              <extension>png</extension>
              <filePath>/sitecore/shell/themes/Standard/Applications/32x32/Document.png</filePath>
            </generator>
            <width>150</width>
            <height>150</height>
            <backgroundColor>#FFFFFF</backgroundColor>
          </thumbnails>
          <prototypes>
            <media type="Sitecore.Resources.Media.Media, Sitecore.Kernel" />
            <mediaData type="Sitecore.Resources.Media.MediaData, Sitecore.Kernel" />
          </prototypes>
        </mediaType>

However, I found that when I changed the type that mediaData should link to, the changes had no impact. I could see my class being instantiated at points during the rendering of an image, but unfortunately it wasn’t instantiated from the Media class, which is what I needed. Looking at the class, it has an injected reference to the MediaData class, but I can’t see where I can influence this in config, and I suspect it can’t be easily done. At this point I decided that this was probably a dead end for me, and there were easier ways to get images working in cloud storage. So I moved on to looking at other options.

Option 2 – Rerouting using Active Rewrite Rules

An alternative to the above is to use some mechanism to push images to the cloud, and then reroute from the browser requests for media library URLs to the cloud, therefore bypassing Sitecore’s own media handler.

In order to achieve the first part of this solution and push images to the cloud, it made the most sense to follow the method outlined here – a publishitem pipeline step. This pipeline step is quite simple, all it does is check whether a published item is a media item, and if so push it up to the cloud if the item has been updated / added. A code sample from our solution is below – the IImageStore interface / implementation are not provided, but hopefully it’s still clear what this is trying to do.

public class PublishItemProcessor: Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor
{
	private readonly IImageStore _imageStore;

    public PublishItemProcessor(): this (IoC.Unity.Resolve&lt;IImageStore&gt;())
	{

    }

    public PublishItemProcessor(IImageStore imageStore)
	{	
		if (imageStore == null) throw new ArgumentNullException(“imageStore”);
		_imageStore = imageStore;
	}

    public override void Process(PublishItemContext context)
	{
		var target = context.PublishOptions.TargetDatabase.GetItem(context.ItemId,context.PublishOptions.Language);
		if (target == null || !target.Paths.IsMediaItem) return;

        var mediaItem = new MediaItem(target);
		switch (context.Action)
		{
			case PublishAction.PublishVersion:
			case PublishAction.PublishSharedFields:
				_imageStore.Add(mediaItem);
				break;
		
            case PublishAction.DeleteTargetItem:
				_imageStore.Remove(mediaItem);
				break;

		}
	}
}

The imagestore implementation here only knows that it takes a media item and publishes it to the cloud. Therefore – update or add – the media in the cloud will be overridden. Unfortunately we found a slight idiosyncrasy here, in that it looks like the DeleteTargetItem PublishAction never fires. This didn’t turn out to be a significant problem, it may be necessary to add a clean-up step at a later point that goes through the Azure Storage container and removes any orphaned items, but for now the orphaned items don’t do any harm. This publish pipeline step is configured as per the article referenced above, so I won’t repeat that configuration here.

The second part of the solution was to rewrite requests for http://<servername>/~/media/ to https://<azure_storage_name>/media/, thereby ensuring images are now served from the cloud rather than pulled from the database. We found we had one additional requirement – to still allow Sitecore to serve the images where those images are being re-sized by the server. This is largely a backwards compatibility concern, but again could be achieved using ARR. The rule that was applied is broadly as below:

    <rule name=CloudImages stopProcessing=true>
      <match url=~/media(?:/(.+.(?:jpg|jpeg|png|gif|bmp))) />
      
      <!– we still want sitecore’s image resizing functionality – so don’t root for requests where this is being invoked.   –>
      <conditions logicalGrouping=MatchAll trackAllCaptures=true>        
        <add input={QUERY_STRING} negate=true pattern=(?:.(h=|w=|bc=|width=|height=)) />
      </conditions
   
      <action type=Redirect redirectType=Permanent url=https://<cloud.server>/media/{R:1} />
    </rule>    
    

This rule matches all images served from the media library, with the listed extensions, and serves them from a cloud server rather than the Sitecore instance. Having configured this, voila! Images are now stored in the cloud as well as the database. This allows us to take some load off the Sitecore database, with minimal interruption and fuss, and should improve page load time when Sitecore is heavily contended as well.

Further work:

In an ideal world, the following requirements would additionally be satisfied by this solution:

  1. Backwards compatibility, Sitecore can fall back to the database where an image has failed to upload to the cloud.
  2. Image resizing / other pipeline steps can still be integrated where necessary, without fetching these images from the database.
  3. Image remove / publish deletes redundant images from the cloud.

These requirements may be looked at as part of a refinement to this solution at some point in the  future, but for now they are not considered so important, so we will press on with this solution. Feel free to sound out other articles / approaches you consider effective here in the comments section!

Advertisements

Sitecore Azure Walkthrough and Gotchas

Walkthrough

With little documentation online on this I thought I’d share something with all the gotchas I spotted in getting it up and running.  Hope it speeds up someone else’s attempts.  The version I have running is Sitecore Azure 3.1.

Environment File

You need to request one from Sitecore as detailed in their documentation.  As this can take a while it’s best to do this up-front.  It took under an hour to get back to me but their docs say to allow for up to 24hrs.  The doc links you to their generic global contact us form which isn’t too helpful. There is also an email address which might get you a response faster  – with details of what to send them in the following post.  But the best way I’ve found is to request an environment file is via the following URL as it captures all the fields you need.  (Note, you cant have dashes in your project name).

Azure Pre-requisites

Unless you’ve installed SQL 2012 you WILL need to install the following:

  • Microsoft SQL Server 2012 Shared Management Objects and
  • Microsoft System CLR Types for Microsoft SQL Server 2012

Microsoft has made this part quite difficult.  Firstly they’ve changed the URL to the download so the Sitecore doc is out of date.  The actual download URL for these resources is here.

Next they’ve chosen to not indicate what the version of the MSI is on its filename.  Therefore you may inadvertently install the X86 one when you need the X64 one.  You can find this out by downloading the MSI and checking the details tab.

2013-10-18_121916

When you install these note that one is dependent on another but you can figure the order out quite easily.

You also need to install MS Azure SDK 2.0Note – this is another gotcha.  I didn’t read the doc carefully enough and went ahead and installed SDK 2.1 but the version of Sitecore Azure I was planning to use Sitecore Azure 3.1.0 rev. 130731, was not compatible with it.  I only ended up finding this when my deploys were failing with the following:

Exception: System.ApplicationException

Message: Can’t find sdk path

Source: Microsoft.ServiceHosting.Tools.MSBuildTasks

at Microsoft.ServiceHosting.Tools.Internal.SDKPaths.GetSDKPath()

Sitecore

Installing Sitecore 7.0 is fairly straightforward. Older versions of Sitecore are compatible with Azure but require some config so I thought I’d go through the path of least resistance.  Note with v7 the dependency on .NET 4.5 so ensure that is installed (Visual Studio 2012 users will have it already; earlier versions will need to install it separately).  You then install the Sitecore Azure module by installing the package found on the SDN using the Package Installer.  (I’m using Sitecore Azure 3.1.0 rev. 130731.zip).

When the install completes you get a shiny new button:

2013-10-18_122019

This opens a tool which allows you to run your deployments.  In the process of which it will ask you to upload your environment file and also install a management certificate.  The latter process is very straightforward and doesn’t merit any observations.

Finally you should be ready to kick off a deploy.

Network Problems

When I tried this on my workstation behind a corporate filewall and a web proxy I ran into innumerable issues.  It was quite clear something was getting blocked because the XAML interface was slow to respond, it hung when trying to do anything

2013-10-18_122058

I was getting errors logged which pointed at network issues

2013-10-18_122140

plus

2013-10-18_122221

SQL Timeouts

The next problem I encountered was deploys failing to complete with SQL stack traces that looked like this:

ManagedPoolThread #16 16:45:00 ERROR Sitecore.Azure: Deploy database error. Retry 6

Exception: System.Data.SqlClient.SqlException

Message: A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: SSL Provider, error: 0 – The wait operation timed out.)

Source: .Net SqlClient Data Provider

at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)

at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptionsuserOptions, DbConnectionInternal& connection)

at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)

at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)

at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)

at System.Data.SqlClient.SqlConnection.Open()

at Sitecore.Azure.Managers.Pipelines.DeployDatabase.TransferData.TransferDataWorker(Table table, Database targetDatabase)

at Sitecore.Azure.Managers.Pipelines.DeployDatabase.DeployDatabasePipelineProcessor.DoDeploy[T](Func`3 func, Int32 repeat, IEnumerable`1 objects, Database targetDatabase, Action`1 exceptionCallBack)

The advice I got from the helpful team at Sitecore Support was that the defaultSQLtimeout in configuration was probably set too low and so I ended up amending the default value found here:

<setting name=”DefaultSQLTimeout” value=”00:05:00″ />

To 30 mins.  On redeploy I was able to successfully complete a deployment.

Missing DLLs

Subsequent to raising a ticket about it and resolving it, I noticed the missing dlls issue has been blogged about elsewhere:  http://toadcode.blogspot.co.uk/2013/04/sitecore-azure-getting-up-and-running.html

For the sake of completeness and given I am working with a different version of Sitecore than Toad’s Code, I thought I’d add what I had to do.  The missing files are as follows:

  1. System.Web.Mvc.dll 3.0.0.0
  2. System.Web.Helpers.dll 1.0.0.0
  3. System.Web.WebPages.dll 1.0.0.0
  4. System.Web.WebPages.Deployment.dll; 1.0.0.0
  5. System.Web.WebPages.Razor.dll; 1.0.0.0
  6. Microsoft.Web.Infrastructure.dll. 1.0.0.0

At this stage I did not have an instance of visual studio running with my own code and a build pointing at my Sitecore website, nevertheless in order to obtain the correct versions of these (there were multiple version of these on my machine and you need to choose the right ones) I opened visual studio, created a new asp.net project and added these as references because VS does a nice job at clearly specifying which versions you are adding on the right-hand side.  I then manually copied the binaries from this compiled project into my Sitecore instance and redeployed.  Note – you can also RDP onto your already-deployed instances and “hot fix” them.

2013-10-18_122303

When completed I now have a blank instance of a Sitecore delivery instance in the cloud:

2013-10-18_122424