Remove Accents from Strings in MuleSoft

Remove Accents from Strings in MuleSoft

A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or basic glyph.

The main use of diacritical marks in the Latin script is to change the sound values of the letters to which they are added. Here you can get more details.

Now let us talk about the use case we are trying to solve.

We want to strip these accents from words to their original form. Example: José Alberto should give Jose Alberto

The Approaches

  1. Using Custom Java Code

We can create a simple Java Class called StringUtils under src/main/java.

The code should look like below:

package utility;

import java.text.Normalizer;

public class StringUtils {
	public static String stripAccents(String src) {
		return Normalizer.normalize(src, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");
	}
}

We can then import this in a Java connector or better to use it in Dataweave as below:

%dw 2.0
import java!utility::StringUtils
output application/json
---
{
	name1: StringUtils::stripAccents("Santiago Muñez"),
	name2: StringUtils::stripAccents("Mesut Özil")	
}

Here we import the java class using import java!utility::StringUtils

The output:

2. Using Apache Commons Lang

First, we will add the maven dependency to pom.xml

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.12.0</version>
</dependency>

Next, we will directly import it in Dataweave and call the stripAccents function

%dw 2.0
import java!org::apache::commons::lang3::StringUtils
output application/json
---
{
	name1: StringUtils::stripAccents("Santiago Muñez"),
	name2: StringUtils::stripAccents("Mesut Özil")	
}

Here we import the Java code from the Apache Commons lang package.

The output:

Hope you liked it and will be useful in some of your integrations.

Cheers.