What is Version Control?

What is version control?
Version control is a crucial part of almost everyone’s development. Whether you know it or not, most people are doing some level of version control (whether they’re working with documents, source code, or almost any digital file on their file system).
Some methods of version control are obviously stronger than others. For example, saving files with different names to manage their respective versions is a form of version control, albeit a highly error prone and messy process.
A better way is to use tooling to version our files and keep things organized, without interrupting our workflow. But when it comes to tooling, there are a few considerations to keep in mind.
So, first let’s take a step back and think about what we actually want out of version control.
1. Archive
The first thing that we want is an archive that tracks all the versions and edits made to our files.
- This gives us the ability to refer to or go back to a previous version of a file if something has gone wrong.
- Or if we want to know the history of how something has changed over time.
2. Scalability
If we have to save a copy of a file every time we make an edit, then we’re duplicating the size of the storage we need. This may not seem like a big deal when we’re working with documents or source code, but if we’re working with large binary files or machine learning datasets, then storage costs can grow rapidly.
3. Workflow
We also want our version control system to facilitate a reasonable workflow.
- A developer’s time is one of the most valuable resources to manage, and the more time you’re having to spend on worrying about versioning issues, the less time you’re able to spend on development.
- In general, we want a tool where we can make incremental changes to our project without having to constantly worry about our version control system doing what we intended.
4. Collaboration
Finally, we want to be able to collaborate with others.
- Developers are almost never working on a project by themselves. Therefore, we want something that can facilitate multiple users and provide a single source of truth to keep all the team members in sync.
Version Control in Practice
Now, Let’s take a look at an example of a version control tool to see how these features work together in action.
By far the most popular version control tool for software development is Git. And this is because it enables almost all of the desired benefits of Version Control for software projects.
- Git provides us with an archive for our project, by storing all the changes made to our files in something it calls a code repository.
- The changes made to files inside this code repository are efficiently stored as snapshots and diffs, so you can always restore a specific version when you need to.
- Git also introduces the concept of branches, which facilitate a developer-friendly workflow. These branches allow us to ‘branch off’ the main development path of a project to work on an idea or feature without disturbing the main state of development. When our work is ready, we can then merge it back into the main branch, moving our project along safely.
- From the collaboration perspective, Git is best used with additional services like GitHub or GitLab. These services manage the complexities of bringing work together from different developers and maintaining that single source of truth that we need.
Concluding Thoughts
Despite all of the amazing things that Git does for software development, it does have its limitations. In particular, what’s good for managing code and documents, isn’t always good for managing other types of data.
In our next blog, we’ll talk more about the limitations of Git and what version control for Data Science looks like.
—
Check out our corresponding video to learn more about Data Versioning.