Modulo Data
Questo modulo gestisce i dati non geografici.
preprocess_data(data_folder, data_column_remapping=None, add_administrative_informations=None, regions_data_path=None, regions_target_columns=None, provinces_data_path=None, provinces_target_columns=None, municipalities_data_path=None, municipalities_target_columns=None, output_folder=None)
Preprocess census CSV files and return aggregated data and trace record.
This function performs the following operations:
- Searches for all CSV files in the specified folder.
- Uses the last CSV file (in alphabetical order) as the trace record file.
- Loads and concatenates all other CSV files into a single DataFrame.
- Applies column name remapping if
data_column_remappingis provided. - Adds administrative information (regions, provinces, municipalities) if requested.
- Replaces any NaN values with 0.
- Loads the trace record file into a dedicated DataFrame.
- Returns either:
- A dictionary containing
census_dataandtraceDataFrames, or - Saves the resulting CSV files to
output_folderand returns the path.
| PARAMETER | DESCRIPTION |
|---|---|
data_folder
|
Folder containing the CSV files to process.
TYPE:
|
data_column_remapping
|
Optional dictionary for renaming census dataset columns
(e.g.,
TYPE:
|
add_administrative_informations
|
If True, enriches data with administrative
information (regions, provinces, municipalities) via
TYPE:
|
regions_data_path
|
Optional path to the file containing region data.
TYPE:
|
regions_target_columns
|
Optional list of columns to extract/keep for region data.
TYPE:
|
provinces_data_path
|
Optional path to the file containing province data.
TYPE:
|
provinces_target_columns
|
Optional list of columns to extract/keep for province data.
TYPE:
|
municipalities_data_path
|
Optional path to the file containing municipality data.
TYPE:
|
municipalities_target_columns
|
Optional list of columns to extract/keep for municipality data.
TYPE:
|
output_folder
|
Optional destination folder where the following files will be saved:
-
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict | Path
|
Either a dictionary with keys:
- |
dict | Path
|
Or the path to |
dict | Path
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If no CSV files are found in the specified folder. |
Note
The trace record file is considered to be the last CSV in alphabetical
order within data_folder. The check_encoding() function is used to
determine the correct encoding for CSV files.