Using DateDataParser¶

dateparser.parse() uses a default parser which tries to detect language every time it is called and is not the most efficient way while parsing dates from the same source.

DateDataParser provides an alternate and efficient way to control language detection behavior.

The instance of DateDataParser reduces the number of applicable languages, until only one or no language is left. It assumes the previously detected language for all the subsequent dates supplied and does not try to execute the language detection again after a language is discarded.

This class wraps around the core dateparser functionality, and by default assumes that all of the dates fed to it are in the same language.

class dateparser.date.DateDataParser(*args, **kwargs)[source]¶

Class which handles language detection, translation and subsequent generic parsing of string representing date and/or time.

Parameters:	languages (list) – A list of two letters language codes, e.g. [‘en’, ‘es’]. If languages are given, it will not attempt to detect the language. allow_redetect_language (bool) – Enables/disables language re-detection. settings (dict) – Configure customized behavior using settings defined in `dateparser.conf.Settings`.
Returns:	A parser instance
Raises:	ValueError - Unknown Language, TypeError - Languages argument must be a list

get_date_data(date_string, date_formats=None)[source]¶

Parse string representing date and/or time in recognizable localized formats. Supports parsing multiple languages and timezones.

Parameters:	date_string (str\|unicode) – A string representing date and/or time in a recognizably valid format. date_formats (list) – A list of format strings using directives as given here. The parser applies formats one by one, taking into account the detected languages.
Returns:	a dict mapping keys to `datetime.datetime` object and period. For example: {‘date_obj’: datetime.datetime(2015, 6, 1, 0, 0), ‘period’: u’day’}
Raises:	ValueError - Unknown Language

Note

Period values can be a ‘day’ (default), ‘week’, ‘month’, ‘year’.

Period represents the granularity of date parsed from the given string.

In the example below, since no day information is present, the day is assumed to be current day 16 from current date (which is June 16, 2015, at the moment of writing this). Hence, the level of precision is month:

>>> DateDataParser().get_date_data(u'March 2015')
{'date_obj': datetime.datetime(2015, 3, 16, 0, 0), 'period': u'month'}

Similarly, for date strings with no day and month information present, level of precision is year and day 16 and month 6 are from current_date.

>>> DateDataParser().get_date_data(u'2014')
{'date_obj': datetime.datetime(2014, 6, 16, 0, 0), 'period': u'year'}

Dates with time zone indications or UTC offsets are returned in UTC time.

>>> DateDataParser().get_date_data(u'23 March 2000, 1:21 PM CET')
{'date_obj': datetime.datetime(2000, 3, 23, 14, 21), 'period': 'day'}

Warning

It fails to parse English dates in the example below, because Spanish was detected and stored with the ddp instance:

>>> ddp.get_date_data('11 August 2012')
{'date_obj': None, 'period': 'day'}

dateparser.date.DateDataParser can also be initialized with known languages:

>>> ddp = DateDataParser(languages=['de', 'nl'])
>>> ddp.get_date_data(u'vr jan 24, 2014 12:49')
{'date_obj': datetime.datetime(2014, 1, 24, 12, 49), 'period': u'day'}
>>> ddp.get_date_data(u'18.10.14 um 22:56 Uhr')
{'date_obj': datetime.datetime(2014, 10, 18, 22, 56), 'period': u'day'}

Settings¶

dateparser‘s parsing behavior can be configured like below:

TIMEZONE defaults to UTC. All dates, complete or relative, are assumed to be in UTC. When specified, resultant datetime converts according to the supplied timezone:

>>> parse('January 12, 2012 10:00 PM')
datetime.datetime(2012, 1, 12, 22, 0)

>>> parse('January 12, 2012 10:00 PM', settings={'TIMEZONE': 'US/Eastern'})
datetime.datetime(2012, 1, 12, 17, 0)

PREFER_DAY_OF_MONTH defaults to current and can have first and last as values:

>>> from dateparser import parse
>>> parse(u'December 2015')  # default behavior
datetime.datetime(2015, 12, 16, 0, 0)
>>> parse(u'December 2015', settings={'PREFER_DAY_OF_MONTH': 'last'})
datetime.datetime(2015, 12, 31, 0, 0)
>>> parse(u'December 2015', settings={'PREFER_DAY_OF_MONTH': 'first'})
datetime.datetime(2015, 12, 1, 0, 0)

PREFER_DATES_FROM defaults to current_period and can have past and future as values. Assuming current date is June 16, 2015:

>>> from dateparser import parse
>>> parse(u'March')
datetime.datetime(2015, 3, 16, 0, 0)
>>> parse(u'March', settings={'PREFER_DATES_FROM': 'future'})
datetime.datetime(2016, 3, 16, 0, 0)

SKIP_TOKENS is a list of tokens to discard while detecting language. Defaults to ['t'] which skips T in iso format datetime string .e.g. 2015-05-02T10:20:19+0000.:

>>> from dateparser.date import DateDataParser
>>> DateDataParser(settings={'SKIP_TOKENS': ['de']}).get_date_data(u'27 Haziran 1981 de')  # Turkish (at 27 June 1981)
{'date_obj': datetime.datetime(1981, 6, 27, 0, 0), 'period': 'day'}