Vai al contenuto

Modulo Data

Questo modulo gestisce i dati non geografici.

preprocess_data(data_folder, data_column_remapping=None, add_administrative_informations=None, regions_data_path=None, regions_target_columns=None, provinces_data_path=None, provinces_target_columns=None, municipalities_data_path=None, municipalities_target_columns=None, output_folder=None)

Preprocess census CSV files and return aggregated data and trace record.

This function performs the following operations:

  1. Searches for all CSV files in the specified folder.
  2. Uses the last CSV file (in alphabetical order) as the trace record file.
  3. Loads and concatenates all other CSV files into a single DataFrame.
  4. Applies column name remapping if data_column_remapping is provided.
  5. Adds administrative information (regions, provinces, municipalities) if requested.
  6. Replaces any NaN values with 0.
  7. Loads the trace record file into a dedicated DataFrame.
  8. Returns either:
  9. A dictionary containing census_data and trace DataFrames, or
  10. Saves the resulting CSV files to output_folder and returns the path.
PARAMETER DESCRIPTION
data_folder

Folder containing the CSV files to process.

TYPE: Path

data_column_remapping

Optional dictionary for renaming census dataset columns (e.g., {"pro_com": "PRO_COM"}).

TYPE: dict | None DEFAULT: None

add_administrative_informations

If True, enriches data with administrative information (regions, provinces, municipalities) via add_administrative_info().

TYPE: bool | None DEFAULT: None

regions_data_path

Optional path to the file containing region data.

TYPE: Path | None DEFAULT: None

regions_target_columns

Optional list of columns to extract/keep for region data.

TYPE: list | None DEFAULT: None

provinces_data_path

Optional path to the file containing province data.

TYPE: Path | None DEFAULT: None

provinces_target_columns

Optional list of columns to extract/keep for province data.

TYPE: list | None DEFAULT: None

municipalities_data_path

Optional path to the file containing municipality data.

TYPE: Path | None DEFAULT: None

municipalities_target_columns

Optional list of columns to extract/keep for municipality data.

TYPE: list | None DEFAULT: None

output_folder

Optional destination folder where the following files will be saved: - census_data.csv for concatenated data - census_trace.csv for the trace record If None, data is returned as a dictionary of DataFrames.

TYPE: Path | None DEFAULT: None

RETURNS DESCRIPTION
dict | Path

Either a dictionary with keys: - "census_data": DataFrame containing concatenated census data - "trace": DataFrame containing field trace record

dict | Path

Or the path to output_folder if specified, where census_data.csv and

dict | Path

census_trace.csv have been saved.

RAISES DESCRIPTION
ValueError

If no CSV files are found in the specified folder.

Note

The trace record file is considered to be the last CSV in alphabetical order within data_folder. The check_encoding() function is used to determine the correct encoding for CSV files.