Modulo Process e Pre-process
Il modulo Process e Pre-process si concentra sulla gestione avanzata e sull'elaborazione dei dati del censimento, integrando informazioni amministrative e combinando dati geografici e tabellari per ottenere dataset arricchiti.
finalize_census_data(census_data_path, years, output_data_folder=None, delete_preprocessed_data=False)
Finalize census data by merging geodata and tabular data into a single GeoPackage.
For each specified year, this function:
1. Reads geographic data (census sections) from layer census<year>.
2. Reads tabular data from layer data<year>.
3. Removes unnecessary columns from tabular data.
4. Performs a join between geographic and tabular data on key SEZ<year>.
5. Sorts the data and builds a final GeoDataFrame.
6. Saves the result to a GeoPackage named census_data.gpkg,
with one layer per year (census<year>).
| PARAMETER | DESCRIPTION |
|---|---|
census_data_path
|
Folder containing the pre-processed GeoPackage file
(
TYPE:
|
years
|
List of census years to finalize (e.g., [1991, 2001, 2011, 2021]).
TYPE:
|
output_data_folder
|
Optional folder where the final GeoPackage
TYPE:
|
delete_preprocessed_data
|
If True, deletes the
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the main GeoPackage file |
KeyError
|
If the join column |
Note
The join key between geographic and tabular data is dynamic and follows
the SEZ<year> convention. Columns to remove from tabular data are
defined in census_data[year]['data_columns_to_remove']. The final
GeoPackage may contain multiple layers, one per processed year.
preprocess_census(processed_data_folder, years, output_data_folder=None, delete_download_folder=False, municipalities_code=[])
Preprocess census data for one or more years, integrating geodata, boundaries, and tables.
This function coordinates the entire census preprocessing workflow, executing for each requested year:
- Preprocessing of geographic data (shapefiles or similar).
- Preprocessing of tabular data (CSV).
- Addition of administrative information (optional).
- Saving of resulting data into a GeoPackage.
- Optional deletion of the pre-processed data folder.
Required inputs are read from the census_data configuration dictionary, which
defines paths, columns, mappings, and specific settings for each census year.
| PARAMETER | DESCRIPTION |
|---|---|
processed_data_folder
|
Folder containing pre-downloaded or pre-processed data (geodata, tabular data, administrative boundaries).
TYPE:
|
years
|
List of census years to process (e.g., [1991, 2001, 2011, 2021]).
TYPE:
|
output_data_folder
|
Optional folder where processed data will be saved
(GeoPackage and associated layers). If None, the parent folder of
TYPE:
|
delete_download_folder
|
If True, deletes the
TYPE:
|
municipalities_code
|
Optional list of ISTAT municipality codes to filter
in the data (
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the final folder containing the processed data. |
| RAISES | DESCRIPTION |
|---|---|
KeyError
|
If the |
Exception
|
For any errors during reading, preprocessing, or saving. |
Note
The GeoPackage file is opened and written via sqlite3.connect().
Layers inserted into the GeoPackage follow conventions such as:
- sezioni<year> for the geographic layer
- data<year> for the tabular dataset
- tracciato<year> for the field trace record
The municipalities_code parameter is applied during the geodata phase.
add_administrative_info(census_data, regions_data_path, regions_target_columns, provinces_data_path, provinces_target_columns, municipalities_data_path, municipalities_target_columns)
Enrich census data with administrative information (municipalities, provinces, regions).
This function integrates corresponding administrative codes and names into the census data, sourced from three external datasets: regional, provincial, and municipal boundaries.
The logical workflow includes:
1. Standardization of census dataset column names.
2. Reading of administrative datasets (regions, provinces, municipalities).
3. Merge of municipalities with provinces.
4. Merge of result with regions.
5. Final join with census dataset on the municipal key (PRO_COM).
6. Cleanup and renaming of final administrative columns.
| PARAMETER | DESCRIPTION |
|---|---|
census_data
|
Census dataset to which administrative information will be added.
TYPE:
|
regions_data_path
|
Path to the file containing regional data.
TYPE:
|
regions_target_columns
|
List of columns to extract from the regional dataset (the first column is used as the index).
TYPE:
|
provinces_data_path
|
Path to the file containing provincial data.
TYPE:
|
provinces_target_columns
|
List of columns to extract from the provincial dataset (the first column is used as the index).
TYPE:
|
municipalities_data_path
|
Path to the file containing municipal data.
TYPE:
|
municipalities_target_columns
|
List of columns to extract from the municipal dataset (the first column is used as the index).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Census DataFrame enriched with administrative information on municipalities, |
DataFrame
|
provinces, and regions. |
Note
Administrative codes used for merges are assumed to be:
PRO_COM (municipality), COD_PROV/COD_PRO (province), COD_REG (region).
The function uses read_administrative_boundaries() to load and filter
administrative datasets.