R has excellent support for dates and times via the built-in Date and POSIXt
classes. Their usage, however, is not always as straightforward as one
would want. Certain conversions are more cumbersome than we would like: while
as.Date("2015-03-22"), would it not be nice if as.Date("20150322") (a
format often used in logfiles) also worked, or for that matter
as.Date(20150322L) using an integer variable, or even
as.Date("2015-Mar-22") and as.Date("2015Mar22")?
Similarly, many date and time formats suitable for POSIXct (the short form)
and POSIXlt (the long form with accessible components) often require rather too
much formatting, and/or defaults. Why for example does
as.POSIXct(as.numeric(Sys.time()), origin="1970-01-01") require the
origin argument on the conversion back (from fractional seconds since the
epoch) into datetime—when it is not required when creating the
double-precision floating point representation of time since the epoch?
But thanks to Boost and its excellent
Boost Date_Time
library—which we already mentioned in
this post about the BH package— we can
address parsing of dates and times. It permitted us to write a new function
toPOSIXct() which now part of the
RcppBDT package (albeit right
now just the GitHub version but we
expect this to migrate to CRAN “soon” as well).
Implementation
We will now discuss the outline of this implementation. For full details,
see
the source file.
Headers and Constants
Note that we show only two datetime formats along with two date
formats. The actual implementation has many more.
Core Converter
The actual conversion from string to a double (the underlying format in
POSIXct) is done by the following function. It loops over all given
formats, and returns the computed value after the first match. In case of
failure, a floating point NA is returned.
Convenience Wrappers
We want to be able to convert from numeric as well as string formats. For
this, we write a templated (and vectorised) function which invokes the actual
conversion function for each argument. It also deals (somewhat
heuristically) with two corner cases: we want 20150322 be converted from
either integer or numeric, but need in the latter case distinguish this value
and its rangue from the (much larger) value for seconds since the epoch.
That creates a minir ambiguity: we will not be able to convert back for inputs
from seconds since the epoch for the first few years since January 1, 1970.
But as these are rare in the timestamp form we can accept the trade-off.
User-facing Function
Finally, we can look at the user-facing function. It accepts input in either
integer, numeric or character vector form, and then dispatches accordingly to
the templated internal function we just discussed. Other inputs are
unsuitable and trigger an error.
Illustration
A simply illustration follows. A fuller demonstration is
part of the RcppBDT package.
This already shows support for subsecond granularity and a variety of date formats.