R has excellent support for dates and times via the built-in Date and POSIXt
classes. Their usage, however, is not always as straightforward as one
would want. Certain conversions are more cumbersome than we would like: while
as.Date("2015-03-22"), would it not be nice if as.Date("20150322") (a
format often used in logfiles) also worked, or for that matter
as.Date(20150322L) using an integer variable, or even
as.Date("2015-Mar-22") and as.Date("2015Mar22")?
Similarly, many date and time formats suitable for POSIXct (the short form)
and POSIXlt (the long form with accessible components) often require rather too
much formatting, and/or defaults. Why for example does
as.POSIXct(as.numeric(Sys.time()), origin="1970-01-01") require the
origin argument on the conversion back (from fractional seconds since the
epoch) into datetime—when it is not required when creating the
double-precision floating point representation of time since the epoch?
We will now discuss the outline of this implementation. For full details,
the source file.
Headers and Constants
Note that we show only two datetime formats along with two date
formats. The actual implementation has many more.
The actual conversion from string to a double (the underlying format in
POSIXct) is done by the following function. It loops over all given
formats, and returns the computed value after the first match. In case of
failure, a floating point NA is returned.
We want to be able to convert from numeric as well as string formats. For
this, we write a templated (and vectorised) function which invokes the actual
conversion function for each argument. It also deals (somewhat
heuristically) with two corner cases: we want 20150322 be converted from
either integer or numeric, but need in the latter case distinguish this value
and its rangue from the (much larger) value for seconds since the epoch.
That creates a minir ambiguity: we will not be able to convert back for inputs
from seconds since the epoch for the first few years since January 1, 1970.
But as these are rare in the timestamp form we can accept the trade-off.
Finally, we can look at the user-facing function. It accepts input in either
integer, numeric or character vector form, and then dispatches accordingly to
the templated internal function we just discussed. Other inputs are
unsuitable and trigger an error.
A simply illustration follows. A fuller demonstration is
part of the RcppBDT package.
This already shows support for subsecond granularity and a variety of date formats.