Skip to content

Modules

Returns a list of tuples comprising the located date and the word index at which it was found.

Parameters:

Name Type Description Default
text str

The corpus of text in which to find dates/times.

required

Returns:

Type Description
list[tuple[str, int]]

list[tuple[str, int]]: A list of tuples containing a string representing the date and time and an integer word index at which it was found.

Examples:

Get dates from a text sample.

>>> find_dates("A thing happened on Jan 1st 2012 and the next morning at 09:15 and also jan 15th at 12am in 2018.")
[
    ('2012-01-01', 4),
    ('2012-01-02 09:15', 9),
    ('2018-01-15 12:00', 15)
]
Source code in date_fuzz/extraction.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def find_dates(text: str, context: Optional[str] = None) -> list[tuple[str, int]]:
    """Returns a list of tuples comprising the located date and the
    word index at which it was found.

    Args:
        text (str): The corpus of text in which to find dates/times.

    Returns:
        list[tuple[str, int]]: A list of tuples containing a string
            representing the date and time and an integer word index at
            which it was found.

    Examples:
        Get dates from a text sample.

        >>> find_dates("A thing happened on Jan 1st 2012 and the next morning at 09:15 and also jan 15th at 12am in 2018.")
        [
            ('2012-01-01', 4),
            ('2012-01-02 09:15', 9),
            ('2018-01-15 12:00', 15)
        ]
    """
    tokens = find_tokens(text)
    if len(tokens) == 0:
        return []
    groups = group_tokens(text, tokens)

    if context:
        context_tokens = find_tokens(context)

    if context and context_tokens:
        context_groups = group_tokens(context, context_tokens)
        formatted_groups = format_token_groups(groups, context_groups)
    else:
        formatted_groups = format_token_groups(groups)

    return formatted_groups

Function to remove all date/time indicators from a text sample.

Parameters:

Name Type Description Default
text str

Text block to strip.

required

Returns:

Name Type Description
str str

Text block with dates/times removed.

Examples:

This can be used to get raw text once dates have been extracted.

>>> strip_dates("Jan 1st 2012: A thing happened.")
A thing happened.
Source code in date_fuzz/extraction.py
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def strip_dates(text: str) -> str:
    """Function to remove all date/time indicators from a text sample.

    Args:
        text (str): Text block to strip.

    Returns:
        str: Text block with dates/times removed.

    Examples:
        This can be used to get raw text once dates have been extracted.

        >>> strip_dates("Jan 1st 2012: A thing happened.")
        A thing happened.
    """
    stripped_text = deepcopy(text)
    for pattern in date_time_patterns_dict:
        stripped_text = pattern.sub("", stripped_text)

    stripped_text = stripped_text.replace("  ", " ")
    stripped_text = stripped_text.replace(" ,", "")
    stripped_text = stripped_text.replace(" :", "")
    stripped_text = stripped_text.replace(" .", ".")
    return stripped_text.replace("  ", " ")