DCU Library | Digital Humanities Workshops | Intro to Data Cleaning with OpenRefine

Transformations

Through facets, filters and clusters OpenRefine offers relatively straightforward ways of getting an overview of your data, and making changes where you want to standardise terms used to a common set of values.

However, sometimes there will be changes you want to make to the data that cannot be achieved in this way. Such types of changes include:

To support this type of activity OpenRefine supports Transformations which are ways of manipulating data in columns. Transformations are normally written in a special language called GREL (General Refine Expression Language). GREL expressions are similar to Excel Formula, although they tend to focus on text manipulations rather than numeric functions.

Full documentation for the GREL is available at https://docs.openrefine.org/manual/grelfunctions. This tutorial covers only a small subset of the commands available.

Common transformations

Some transformations are used regularly and are accessible directly through menu options, without having to type them directly.

Examples of some of these common transformations are given in the table below, with their ‘GREL’ equivalents.

Common Transformation Action GREL expression
To Uppercase Converts the current value to uppercase value.toUppercase()
To Lowercase Converts the current value to lowercase value.toLowercase()
To Titlecase Converts the current value to titlecase value.toTitlecase()
Trim leading and trailing whitespace Removes any ‘whitespace’ characters (e.g. spaces, tabs) from the start or end of the current value value.trim()

Sometimes there will be changes that are more structural.

Remove whitespace in Publisher Column

Using GREL

To start writing GREL transformations:

You write transformations in the ‘Expression’ box and then can Preview the effect the transformation would have on the first 10 rows of your data. The transformation you type into the ‘Expression’ box has to be a valid GREL expression. The simplest expression is simply the word value by itself - which simply means ‘the value that is currently in the column’ - that is, “make no change”.

GREL functions are written by giving a value of some kind (a text string, a date, a number etc.) to a GREL function. Some GREL functions take additional parameters or options which control how the function works. GREL supports two syntaxes:

Either is valid, and which is used is completely down to personal preference. Next to the Preview option are options to view:

Use GREL to standardise Publisher formatting


Undo / Redo and Apply / Extract

OpenRefine lets you undo, and redo, any steps you have taken in cleaning the data. This means you can always try out transformations and ‘undo’ if you need to. The way OpenRefine records the steps you have taken even allows you to take the steps you’ve carried out on one data set, and apply it to another data set by a simple copy and paste operation.

Back
Next