You're Deploying it Wrong! - AS Edition (Part 2)
This is part 2 of the Analysis Services DevOps blog series. Go to part 1
The first thing we need to align before getting started, is what branching strategy to use. While this sounds boring, it’s actually quite important, because it will dictate what the daily development workflow will be like, and in many cases, branches will tie directly into the project methods used by your team. For example, using the agile process within Azure DevOps, your backlog would consist of Epics, Features, User Stories, Tasks and Bugs (well, hopefully not too many Bugs, since we’re going to automate testing as part of our DevOps journey).
In the agile terminology, a User Story is a deliverable, testable piece of work. The User Story may consist of several Tasks, that are smaller pieces of work that need to be performed, typically by a developer, before the User Story may be delivered. In the ideal world, all User Stories have been broken down into manageable tasks, each taking only a couple of hours to complete, adding up to no more than a handful of days for the entire User Story. This would make a User Story an ideal candidate for a so-called Topic Branch, where the developer could make one or more commits for each of the tasks within the User Story. Once all tasks are done, you want to deliver the User Story to the client, at which time the topic branch is merged into a delivery branch (for example, a “Test” branch), and the code deployed to a testing environment.
Determining a suitable branching strategy depends on many different factors. In general, Microsoft recommends the Trunk-based Development (video) strategy, for agile and continuous delivery of small increments. The main idea is to create branches off the Master branch for every new feature or bugfix (see image below). Code review processes are enforced through pull requests from feature branches into Master, and using the Branch Policy feature of Azure DevOps, we can set up rules that require code to build cleanly before a pull request can be completed.
However, such a strategy might not be feasible in a Business Intelligence development team, for a number of reasons:
- New features often require prolonged testing and validation by business users, which may take several weeks to complete. As such, you will likely need a user-faced test environment.
- BI solutions are multi-tiered, typically consisting of a Data Warehouse tier with ETL, a Master Data Management tier, a semantic layer and reports. Dependencies exist between these layers, that further complicate testing and deployment.
- The BI team may be responsible for developing and maintaining several different semantic models, serving different areas of business (Sales, Inventory, Logistics, Finance, HR, etc.), at different maturity stages and at varying development pace.
- The most important aspect of a BI solution is the data! As a BI developer, you don’t have the luxury of simply checking out the code from source control, hitting F5 and having a full solution up and running in the few minutes it takes to compile the code. Your solution needs data, and that data has to be loaded, ETL’ed or processed across several layers to make it to the end user. Including data in your DevOps workflows could blow up build and deployment times from minutes to hours or even days. In some scenarios, it might not even be possible, due to ressource or economy constraints.
There’s no doubt that a BI team would benefit from a branching strategy that supports parallel development on any of the layers in the full BI solution, in a way that lets us mix and match features that are ready for testing. But especially due to the last bullet point above, we need to think carefully about how we’re going to handle the data. If we add a new attribute to a dimension, for example, do we want to automatically load the dimension as part of our build and deployment pipelines? If it only takes a few minutes to load such a dimension, that would probably be fine, but what if we’re adding a new column to a multi-billion row fact table? And if developers are working on new features in parallel, should each developer have their own development database, or how do we otherwise prevent them from stepping on each others toes in a shared database?
There’s no easy solution to the questions above - especially when considering all the tiers of a BI solution, and the different constellations and prefered workflows of BI teams across the planet. Also, when we dive into actual build, deployment and test automation, we are going to focus mostly on Analysis Services. The ETL- and database tiers have their own challenges from a DevOps perspective, which are outside the scope of this blog. But before we move on, let’s take a look at another branching strategy, and how it could potentially be adopted to BI workflows.
For this blog series, we’re going to use the GitFlow branching strategy by Vincent Driessen. If you’re not already familiar with this pattern, I strongly recommend reading the article.
Implementing a branching strategy similar to this, can help solve some of the DevOps problems typically encountered by BI teams, provided you put some thought into how the branches correlate to your deployment environments. In an ideal world, you would need at least 4 different environments to fully support GitFlow:
- The production environment, which should always contain the code at the HEAD of the master branch.
- A canary environment, which should always contain the code at the HEAD of the develop branch. This is where you typically schedule nightly deployments and run your integration testing, to make sure that the features going into the next release to production play nicely together.
- One or more UAT environments where you and your business users test and validate new features. Deployment happens directly from the feature branch containing the code that needs to be tested. You will need multiple test environments if you want to test multiple new features in parallel. With some coordination effort, a single test environment is usually enough, as long as you carefully consider the dependencies between your BI tiers.
- One or more sandbox environments where you and your team can develop new features, without impacting any of the environments above. As with the test environment, it is usually enough to have a single, shared, sandbox environment.
Again, I’d like to emphasize that there’s really no “one-size-fits-all” solution to these considerations. Maybe you’re not building your solution in the Cloud, and therefore don’t have the scalability or flexibility to spin up new resources in seconds or minutes. Or maybe your data volumes are very large, making it impractical to replicate environments due to resource/economy/time constraints. Before moving on, also make sure to ask yourself the question of whether you truly need to support parallel development and testing. This is rarely the case for small teams with only a few stakeholders, in which case you can still benefit from CI/CD, but where GitFlow branching might be overkill.
Even if you do need to support parallel development, you may find that multiple developers can easily share the same development or sandbox environment, without encountering too much trouble.
Start off with a new Project in Azure DevOps. Provide a name and a description for you project and use Git for version control. I prefer the Agile work item process, but you can use whichever methodology you prefer.
Once the project is created, the first thing we need to do, is initialize the Git repository, set up branches and branch policies. If you click on “Repos” in the navigation pane on the left side of the screen, you’ll see your options for initializing the repository:
If you have an existing Git or TFVC code repository somewhere else, you can Import it to the new repository. If not, just initialize an empty repository with a README and .gitignore file, or clone the empty repository to your computer and copy your existing code into the cloned repository. To clone the repository, you can use Visual Studio Team Explorer, or any other Git-capable tool of your choice. For example, to clone a repository using the Git command line, you simply write:
git clone https://tabulareditor.visualstudio.com/DevOpsBI/_git/DevOpsBI
…where the URL should be the URL of your new Azure DevOps Git repository (aka. the remote repository). This will create a new folder on your machine, with the same name as the Git repository. After you’ve copied the files and folders you need into this folder, you need to Stage, Commit and Push. Again, you can use Visual Studio for these operations, or you can use the Git command line:
git add . git commit -m "Initial commit of code base" git push
Once you have the initial code in your remote repository, navigate to “Branches” under “Repos” in the navigation pane, and create the develop branch from the existing master branch. Right-click on the develop branch and choose “Set as compare branch”. Then, go to “Project settings” > “Repositories”, expand Branches, right-click on the develop branch and choose “Set as default branch”.
Set up branch policies, by going back to the “Branches” area under “Repos” in the navigation pane. Let’s set up the policies on master first, by right-clicking on the master branch and choosing “Branch policies”. For master, we will limit the merge type to only allow Squash merges. This ensures that the commit history on master will be as simple as possible (a release branch is squashed into a single commit on master, upon deployment to production).
For the develop branch, I also generally recommend limiting merge types to only Squash merges. In addition, you should require that all pull requests are checked for linked work items, for improved traceability. Remaining policies can be set according to your own preferences.
Later on, we will also add build validation as a policy on both branches, to ensure that we only ever commit code that actually works - but before we can do that, we need to set up our build pipelines. This is the topic for the next chapter of this blog.
Once you have the develop and master branches set up according to this post, you can start developing. In general, you want to create a new branch from your develop branch, whenever you start working on a new feature. Remember that branching is a very inexpensive operation in git, so we always create a new branch whenever we start to build something new. You can even branch off an existing feature branch, if you need to experiment a little or simultaneously move the feature in two different directions.
The best way to create a new branch, is to start from the Feature or User Story, to which the branch relates:
This way, the branch will be linked to this work item, which was one of the requirements we set for pull requests into the develop branch. Also, since we marked the develop branch as “default”, it only takes a split second to create a new branch this way. Make sure to supply a proper name for the new branch (preferably using slashes for branch “folders”). Consider setting up a policy that requires branch folders, to ensure consistency.
The next step is to pull the new branch to your local computer or wherever you’re actually going to develop things. Again, use your tool of choice for the Git operations. For example, with the Git command line you would simply do:
git pull git checkout Feature/DailyInventory
Here, “Feature/DailyInventory” should be the name of the branch you want to work on. Regularly stage, commit and push your code using the same commands as we used above, when initializing the repository.
Once you’re ready to integrate your changes from your feature branch and into the develop branch, use the Azure DevOps UI to create a pull request:
Complete the pull request. By default, this will delete the feature branch and complete the work items linked to it, upon completion. In the next chapter, we will see how code in the repository can be automatically deployed using Tabular Editor and a build pipeline.