Solving Vietnamese URL Slug Issues in Optimizely CMS 12

Optimizely CMS 12 (formerly EPiServer) is a robust enterprise platform, but its default URL segmentation logic often struggles with non-Latin characters. For Vietnamese content, the CMS typically strips out diacritics incorrectly or completely removes special characters like "đ". This results in broken, unreadable URL slugs that negatively impact SEO and user experience.

In this technical guide, we will implement a production-ready solution to automatically generate SEO-friendly Vietnamese URL slugs whenever content is saved.

Step 1: Create the Slug Normalizer

The core logic resides in a static utility class. We use Canonical Decomposition (FormD) to separate base characters from diacritics. Since the Vietnamese character "đ" is unique and does not decompose via standard Unicode normalization, we handle it manually.

File: CmsIv.Web/Business/UrlSlugNormalizer.cs

using System.Globalization;
using System.Text;
using System.Text.RegularExpressions;

namespace CmsIv.Web.Business
{
    public static class UrlSlugNormalizer
    {
        public static string NormalizeToSlug(string text)
        {
            if (string.IsNullOrWhiteSpace(text))
                return string.Empty;

            // Handle 'đ' and 'Đ' specifically as they are not decomposed by FormD
            string processedText = text.Replace('đ', 'd').Replace('Đ', 'D');

            // Step 1: Use NFD (Canonical Decomposition)
            string decomposed = processedText.Normalize(NormalizationForm.FormD);

            // Step 2: Remove combining characters (diacritical marks)
            var result = new StringBuilder();
            foreach (char c in decomposed)
            {
                if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
                {
                    result.Append(c);
                }
            }

            string normalized = result.ToString();

            // Step 3: Lowercase, replace spaces with hyphens, and remove non-alphanumeric characters
            normalized = normalized.ToLowerInvariant();
            normalized = Regex.Replace(normalized, @"\s+", "-");
            normalized = Regex.Replace(normalized, @"[^a-z0-9\-]", string.Empty);
            normalized = Regex.Replace(normalized, @"-+", "-");
            
            return normalized.Trim('-');
        }

        public static bool IsValidSlug(string slug)
        {
            if (string.IsNullOrEmpty(slug)) return false;
            return Regex.IsMatch(slug, @"^[a-z0-9]+(-[a-z0-9]+)*$");
        }
    }
}

Step 2: Implement the Initialization Module

We use an IInitializableModule to hook into the IContentEvents.SavingContent event. This allows us to intercept the saving process and fix the URLSegment before it is persisted to the database.

File: CmsIv.Web/Initialization/VietnameseUrlSlugModule.cs

using System;
using System.Text.RegularExpressions;
using EPiServer.Core;
using EPiServer.Framework;
using EPiServer.Framework.Initialization;
using CmsIv.Web.Business;

namespace CmsIv.Web.Initialization
{
    [ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
    public class VietnameseUrlSlugModule : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            var contentEvents = context.Locate.Advanced.GetInstance<IContentEvents>();
            contentEvents.SavingContent += OnSavingContent;
        }

        public void Uninitialize(InitializationEngine context)
        {
            var contentEvents = context.Locate.Advanced.GetInstance<IContentEvents>();
            contentEvents.SavingContent -= OnSavingContent;
        }

        private void OnSavingContent(object sender, ContentEventArgs e)
        {
            try
            {
                if (!(e.Content is PageData pageData)) return;

                string name = pageData.Name ?? string.Empty;
                if (string.IsNullOrWhiteSpace(name) || !ContainsVietnameseCharacters(name))
                    return;

                string expectedSlug = UrlSlugNormalizer.NormalizeToSlug(name);
                string corruptedEpiSlug = GenerateCorruptedEpiSlug(name);

                // Overwrite only if URLSegment is empty or matches EPiServer's corrupted generation
                if (string.IsNullOrWhiteSpace(pageData.URLSegment) || 
                    pageData.URLSegment.Equals(corruptedEpiSlug, StringComparison.OrdinalIgnoreCase))
                {
                    pageData.URLSegment = expectedSlug;
                }
            }
            catch (Exception)
            {
                // Log exception for monitoring
            }
        }

        private string GenerateCorruptedEpiSlug(string name)
        {
            var normalized = name.Normalize(System.Text.NormalizationForm.FormD);
            var sb = new System.Text.StringBuilder();

            foreach (var c in normalized)
            {
                if (char.IsLetterOrDigit(c) && c <= 127)
                    sb.Append(char.ToLowerInvariant(c));
                else if (char.IsWhiteSpace(c) || c == '-')
                    sb.Append('-');
            }

            return Regex.Replace(sb.ToString(), @"-+", "-").Trim('-');
        }

        private bool ContainsVietnameseCharacters(string text) =>
            Regex.IsMatch(text, @"[áàảãạăắằẳẵặâấầẩẫậéèẻẽẹêếềểễệíìỉĩịóòỏõọôốồổỗộơớờởỡợúùủũụưứừửữựýỳỷỹỵđĐ]", RegexOptions.IgnoreCase);
    }
}

Step 3: Verification with Unit Tests

Automated tests ensure that edge cases (like multiple spaces or punctuation) are handled correctly across all Vietnamese tone marks.

File: CmsIv.Web.Tests/Business/UrlSlugNormalizerTests.cs

using Xunit;
using CmsIv.Web.Business;

namespace CmsIv.Web.Tests.Business
{
    public class UrlSlugNormalizerTests
    {
        [Theory]
        [InlineData("Năng lực con người vượt xa thực thi trong kỷ nguyên AI", "nang-luc-con-nguoi-vuot-xa-thuc-thi-trong-ky-nguyen-ai")]
        [InlineData("Đường đi đến Đà Nẵng", "duong-di-den-da-nang")]
        [InlineData("  Khoảng   trắng   nhiều  ", "khoang-trang-nhieu")]
        public void NormalizeToSlug_ValidInputs_ReturnsExpectedSlug(string input, string expected)
        {
            string result = UrlSlugNormalizer.NormalizeToSlug(input);
            Assert.Equal(expected, result);
        }
    }
}

Strategic Insights & Troubleshooting

  • Character 'đ' Handling: Remember that FormD decomposition does not convert 'đ' to 'd'. It remains as a special character, which is why manual replacement is mandatory.
  • SEO Persistence: The implementation only overwrites the slug if the current slug is empty or matches the "corrupted" default. This prevents overwriting custom slugs manually entered by SEO experts.
  • Performance: Using Regex.IsMatch to filter content before processing ensures minimal overhead on the SavingContent event.
← Quay lại Blog