Canonicalizing a URL path using std::filesystem::canonical

More specially, a file path where we are trying to remove .. and .

So, if you have “projects/vectorization/../gitstuff/./../../something/index.html” canonicalizing the string would reduce it to “something/index.html”

We have a couple of options to do this.

Option 1 – use std::filesystem
Since C++17, you can take a path, and canonicalize it using std::filesystem::canonical.

For paths that do not exist, or you just want to mess around, you can use std::filesystem::weakly_canonical to remove .. and ..

It’s really simple to work with, and here’s an example of how to use it:

void option_one(std::string& path)
{
  auto fs_path   = fs::path(path);
  auto canonical = fs::weakly_canonical(fs_path);
  std::cout << canonical << '\n';
}

Option 2 – use a stack
We can use a stack to hold the path components, that we find while iterating up through a string.

This will be a lot more involved than Option 1, but if C++17 is not available in your project, then:

/**********************************************************/
std::string option_two(std::string& path)
{
  // Don't work on an empty path
  if (path.empty())
    return "";

  // Use a stack to hold each path component
  auto path_components = std::stack<std::string>();

  // Use 2 variables to hold the beginning and the end of a path component string
  // initialized to the beginning of the string, and the first slash found
  auto beginning = 0;
  auto end = path.find("/", beginning);

  // Now, walk up the string, gathering each path component
  while (end != std::string::npos)
  {
    // Check if the path component is a `..` or `.`
    const auto item = path.substr(beginning, end - beginning);

    // If it's a `..`, pop the stack, otherwise ignore `.` and only add a path component
    if (item == ".." && !path_components.empty())
      path_components.pop();
    else if (item != ".")
      path_components.push(item);

    // Set our variables to the current slash position, and the next found slash
    beginning = end + 1;
    end = path.find("/", beginning);
  }

  // Add the last path component, if we have a trailing one
  if ((path.length() - beginning) > 0)
  {
    const auto last = path.substr(beginning, path.length() - beginning);
    if (last != ".." && last != ".")
      path_components.push(last);
  }

  // Reverse the stack to make our mechanism work
  std::stack<std::string> rpath_components;
  while (!path_components.empty())
  {
    rpath_components.push(path_components.top());
    path_components.pop();
  }

  // Append the path components to our string, delimited with `/`
  std::string canonical;
  while (!rpath_components.empty())
  {
    canonical += rpath_components.top();
    rpath_components.pop();
    canonical += "/";
  }

  // Remove the last trailing forward slash
  canonical = canonical.substr(0, canonical.length() - 1);

  std::cout << canonical << "\n";
  return canonical;
}

So, you can see there are 2 ways, and lots more if you put your mind to it, of canonicalizing a path.

Happy coding!

Refs
https://en.cppreference.com/w/cpp/filesystem/canonical

Leave a Reply

Your email address will not be published. Required fields are marked *