DCU Library | Digital Humanities Workshops | Intro to Data Cleaning with OpenRefine

Data Types

There are five data types in OpenRefine:

Working with Dates

OpenRefine tends to default to seeing content as text. But you can turn them into these other data types. This project has a date field which is actually being treated as a string. With a transform we can tell it that this is a date

Note how the values are now displayed in green and follow a standard convention for their display format (ISO 8601) - this indicates they are now stored as date data types in OpenRefine. We can now carry out functions that are specific to Dates

Arrays

An array is a list of values or strings that can be reordered or manipulated. They only come into play when performing transformations. Arrays cannot be stored in a cell in OpenRefine. In GREL it is the split function which creates an array (similar to the way we split cells earlier).

Like earlier, the | is whatever is the separator already there. The preview shows the column values square brackets [ at the end (denoting this as an array) and quote marks around each individual value in the array.

There are various array functions that can be used to do different things. We can try a couple out here just to see the results in the preview screen (without clicking Ok)

length()

uniques()

Let’s instead try sort(), another array function which organises the values in an array alphabetically:

So that didn’t quite work as planned, The sort function is case sensitive. So we’ll fix that first:

Back
Next