Data Sources

Extract Regex

Sometimes, thinking up keywords can be a pretty tough nut to crack, all the more if your data feed isn’t exactly tidy; for example, if the source’s product_name contains a lot of details which won’t be used in keywords because they are either too general, or too specific.

This can be resolved by modifying the variable, i.e. by creating a couple of your own, new variables and using regular expressions to remove groups of words or characters from them, until only the required word is left.

But maybe you have thought at one point or another that it would be nice if this lengthy process could be simplified. What if you could just extract a group of words from the text instead of removing it? Simply put, make the functionality be completely opposite to the traditional regular expression? And if you haven’t wondered that, then you no longer need to because we’ve created a new function for you which works precisely this way — Extract regex!

We’ll show you how it works in practice, using three different examples — electronics, fashion, and anything that falls outside these two categories.

In case of electronics, _brand_ + _model_number_ is a good keyword. Let’s say that Lenovo Legion Y520–15IKB, black, is our initial text, meaning that we want to post an ad for Lenovo Y520–15IKB. Using our own variable, we formulate a regular expression “A mix of letters and digits (in most cases): \b((?=[A-Za-z-\/]{0,19}\d)[A-Za-z0–9\/-]{4,20})\b” and choose the Extract regex function.

In case of clothes and shoes, it definitely pays off to have more variations. So, our initial text goes like Vans UltraRange 3D — Black/Asphalt 47, and we’d like to create these four variations:

1.) name of the shoe, no color or size

2.) name of the shoe + color

3.) name of the shoe + size

4.) name of the shoe + size + color

The number 1.) option can be created the same way you’re already used to — with a template of regular expressions, by removing color and quality. This gives you Vans UltraRange 3D.

In case of option number 2.), you have to create your own text variable for colors to go along with the shoe’s name. From the Find Regular Expressionfunction, select “Colors in English: \b(black|white|grey|gray|red|green|blue|silver|gold|brown|yellow|orange|pink|purple)\b. You end up with Vans UltraRange 3D Black.

Option number 3.) can be created in the same way — With the Extract Regexfunction: \b(35|36|37|38|39|40|41|42|43|44|45|46|47|48|L|XXS|XS|S|XL|M|XXL|XXXL)\b. We arrive at Vans UltraRange 3D 47.

The last option, number 4.), is created by simply adding your own text variable _color_ + _size_ to the shoe’s name. The final keyword is Vans UltraRange 3D 47 Black.

Let’s demonstrate the last example with a building set. The initial text goes LEGO Batman Movie 70905 Batmobil. You’d like to post an ad for it, using the keyword _brand_ + _set_number_, in this case LEGO 70905. Such a keyword can be extracted with your own text variable. Use the Extract Regexfunction to extract all — \d+ figures.

Blog post
Was this article helpful?
Didn’t find what you were looking for?
Ask a question