Through facets, filters and clusters OpenRefine offers relatively straightforward ways of getting an overview of your data, and making changes where you want to standardise terms used to a common set of values.
However, sometimes there will be changes you want to make to the data that cannot be achieved in this way. Such types of changes include:
To support this type of activity OpenRefine supports Transformations which are ways of manipulating data in columns. Transformations are normally written in a special language called GREL (General Refine Expression Language). GREL expressions are similar to Excel Formula, although they tend to focus on text manipulations rather than numeric functions.
Full documentation for the GREL is available at https://docs.openrefine.org/manual/grelfunctions. This tutorial covers only a small subset of the commands available.
Some transformations are used regularly and are accessible directly through menu options, without having to type them directly.
Examples of some of these common transformations are given in the table below, with their ‘GREL’ equivalents.
Common Transformation | Action | GREL expression |
---|---|---|
To Uppercase | Converts the current value to uppercase | value.toUppercase() |
To Lowercase | Converts the current value to lowercase | value.toLowercase() |
To Titlecase | Converts the current value to titlecase | value.toTitlecase() |
Trim leading and trailing whitespace | Removes any ‘whitespace’ characters (e.g. spaces, tabs) from the start or end of the current value | value.trim() |
Sometimes there will be changes that are more structural.
Publisher > Text facet
Edit cells > common transform > remove consecutive whitespace
To start writing GREL transformations:
Edit cells > Transform…
to see the GREL screen.You write transformations in the ‘Expression’ box and then can Preview the effect the transformation would have on the first 10 rows of your data. The transformation you type into the ‘Expression’ box has to be a valid GREL expression. The simplest expression is simply the word value
by itself - which simply means ‘the value that is
currently in the column’ - that is, “make no change”.
GREL functions are written by giving a value of some kind (a text string, a date, a number etc.) to a GREL function. Some GREL functions take additional parameters or options which control how the function works. GREL supports two syntaxes:
value.function(options)
function(value, options)
Either is valid, and which is used is completely down to personal preference.
Next to the Preview
option are options to view:
History
- a list of transformations you’ve previously used with the option to reuse them
immediately or to ‘star’ them for easy accessStarred
- a list of transformations you’ve ‘starred’ via the ‘History’ viewHelp
- a list of all the GREL functions and brief information on how to use thempublisher
Akshantala Enterprises
and Society of Pharmaceutical Technocrats
include
optionTitle
columnEdit cells->Transform...
`value.toTitlecase()
OpenRefine lets you undo, and redo, any steps you have taken in cleaning the data. This means you can always try out transformations and ‘undo’ if you need to. The way OpenRefine records the steps you have taken even allows you to take the steps you’ve carried out on one data set, and apply it to another data set by a simple copy and paste operation.
Undo
and Redo
options are accessed via the lefthand panel
When you Undo, remaining steps will still show but are greyed out. However if you then start doing new transformations the greyed out steps will disappear and you’ll no longer have the option to redo those steps.
To be reapply a set of steps or transformations you click on the Extract
button. These are saved in JSON format. Select which one you want to save and copy the transformations into a .txt or .json file. You can then apply this from the same menu in a new dataset or project.
The steps in the Undo / Redo menu of a project is saved with your project - it is not just during that session. So later on if you realise a transformation from an old project would be useful you can go back in and pull it out.