Skip to content

Defaulting to Microseconds in Timestamp/Timedelta Creation from Numeric Input

Core Problem

When creating timestamps or timedeltas from numeric input using pandas functions such as date_range, Timestamp, or to_datetime with the unit parameter, the resulting dtype often defaults to nanoseconds. However, when the input is in seconds, minutes, hours, etc., it would be more consistent and efficient to return seconds, minutes, hours, etc. instead of nanoseconds.

Solution & Analysis

One possible solution to this issue is to default to microseconds when creating timestamps or timedeltas from numeric input. This can be achieved by modifying the to_datetime function to use the unit parameter's value as a multiplier for the microsecond resolution.

import pandas as pd

def to_datetime(values, unit=None):
    # ... ( existing code )

    if unit is None:
        # Default to microseconds when possible
        if isinstance(values[0], int) and values[0] > 1e6:
            unit = 'us'
        elif isinstance(values[0], int) and values[0] > 1e3:
            unit = 'ms'

    # ... ( existing code )

Another possible solution is to return the same dtype as the input when converting from seconds, minutes, hours, etc. This can be achieved by modifying the to_datetime function to use a more granular unit of time based on the input value.

import pandas as pd

def to_datetime(values, unit=None):
    # ... ( existing code )

    if isinstance(values[0], int) and values[0] > 1e6:
        unit = 'us'
    elif isinstance(values[0], int) and values[0] > 1e3:
        unit = 'ms'
    elif isinstance(values[0], int) and values[0] > 3600:  # hours
        unit = 'h'
    elif isinstance(values[0], int) and values[0] > 60:   # minutes
        unit = 'm'

    # ... ( existing code )

Conclusion

Defaulting to microseconds when creating timestamps or timedeltas from numeric input can improve consistency and efficiency. By modifying the to_datetime function to use a more granular unit of time based on the input value, we can achieve this while maintaining flexibility and compatibility with different types of input values.

Reference