Get data¶
Get macroeconomic data¶
- pynsee.macrodata.get_series(*idbanks, update=False, silent=False, metadata=True, startPeriod=None, endPeriod=None, firstNObservations=None, lastNObservations=None, includeHistory=None, updatedAfter=None)¶
Get data from INSEE series idbank
- Args:
idbanks (str or list or pd.series) : some idbanks provided by get_idbank_list()
update (bool, optional): Set to True, to update manually the data stored locally on the computer. Defaults to False.
silent (bool, optional): Set to True, to disable messages printed in log info
metadata (bool, optional): If True, some metadata is added to the data
startPeriod (str, optional): start date of the data.
endPeriod (str, optional): end date of the data.
firstNObservations (int, optional): get the first N observations for each key series (idbank).
lastNObservations (int, optional): get the last N observations for each key series (idbank).
includeHistory (boolean, optional): boolean to access the previous releases (not available on all series).
updatedAfter (str, optional): starting point for querying the previous releases (format yyyy-mm-ddThh:mm:ss)
- Returns:
DataFrame: contains the data, indexed by DATE and sorted by IDBANK
- Examples:
>>> from pynsee.macrodata import get_series_list, get_series >>> # inflation figures in France >>> df_idbank = get_series_list("IPC-2015") >>> df_idbank = df_idbank.loc[ >>> (df_idbank.FREQ == "M") & # monthly >>> (df_idbank.NATURE == "INDICE") & # index >>> (df_idbank.MENAGES_IPC == "ENSEMBLE") & # all kinds of household >>> (df_idbank.REF_AREA == "FE") & # all France including overseas departements >>> (df_idbank.COICOP2016.str.match("^[0-9]{2}$"))] # coicop aggregation level >>> # get data >>> data = get_series(df_idbank.IDBANK)
- pynsee.macrodata.get_dataset(dataset, update=False, silent=False, metadata=True, filter=None, startPeriod=None, endPeriod=None, firstNObservations=None, lastNObservations=None, includeHistory=None, updatedAfter=None)¶
Get dataset’s data from INSEE BDM database
- Args:
dataset (str): an INSEE dataset included in the list provided by get_dataset_list()
update (bool, optional): Set to True, to update manually the data stored locally on the computer. Defaults to False.
metadata (bool, optional): If True, some metadata is added to the data
filter (str, optional): Use the filter to choose only some values in a dimension. It is recommended to use it for big datasets. A dimension left empty means all values are selected. To select multiple values in one dimension put a “+” between those values.
startPeriod (str, optional): start date of the data.
endPeriod (str, optional): end date of the data.
firstNObservations (int, optional): get the first N observations for each key series (idbank).
lastNObservations (int, optional): get the last N observations for each key series (idbank).
includeHistory (boolean, optional): boolean to access the previous releases (not available on all series).
updatedAfter (str, optional): starting point for querying the previous releases (format yyyy-mm-ddThh:mm:ss)
- Raises:
ValueError: dataset should be in INSEE’s datasets list
- Returns:
DataFrame: contains the data
- Examples:
>>> from pynsee.macrodata import get_dataset >>> ipc_data = get_dataset("IPC-2015", >>> filter = "M......ENSEMBLE...CVS.2015", >>> includeHistory = True, updatedAfter = "2017-07-11T08:45:00") >>> # >>> business_climate = get_dataset("CLIMAT-AFFAIRES", lastNObservations = 1)
- pynsee.macrodata.get_series_title(series)¶
Get French and English titles of a list of series (idbanks)
- Args:
series (list): a list of series (idbanks)
- Examples:
>>> from pynsee import get_series_list, get_series_title >>> series = get_series_list("CLIMAT-AFFAIRES") >>> series = series.loc[:3, "IDBANK"].to_list() >>> titles = get_series_title(series)
Get geographical data¶
- pynsee.geodata.get_geodata(id, update=False, crs='EPSG:3857')¶
Get geographical data with identifier and from IGN API
- Args:
id (str): data identifier from get_geodata_list function
update (bool, optional): data is saved locally, set update=True to trigger an update. Defaults to False.
crs (str, optional): CRS used for the geodata output. Defaults to ‘EPSG:3857’.
- Examples:
>>> from pynsee.geodata import get_geodata_list, get_geodata >>> # >>> # Get a list of geographical limits of French administrative areas from IGN API >>> geodata_list = get_geodata_list() >>> # >>> # Get geographical limits of departments >>> df = get_geodata('ADMINEXPRESS-COG-CARTO.LATEST:departement')
- class pynsee.geodata.GeoFrDataFrame.GeoFrDataFrame(*args, **kwargs)¶
Class for handling dataframes built from IGN’s geographical data
- get_geom()¶
Extract a shape (Polygon, Point …) from a GeoFrDataFrame
- Examples:
>>> from pynsee.geodata import get_geodata_list, get_geodata >>> # >>> # Get a list of geographical limits of French administrative areas from IGN API >>> geodata_list = get_geodata_list() >>> # >>> # Get geographical limits of departments >>> df = get_geodata('ADMINEXPRESS-COG-CARTO.LATEST:departement') >>> # >>> # Extract a polygon from the GeoDataframe >>> geo = df.get_geom()
- translate(departement=['971', '972', '974', '973', '976'], factor=[None, None, None, 0.35, None], center=(- 133583.39, 5971815.98), radius=650000, angle=0.3490658503988659, startAngle=2.6179938779914944)¶
Move overseas departements closer to metropolitan France
- Args:
departement (list, optional): list of departements to be moved, overseas departement list is used by default
factor (list, optional): make departements bigger or smaller, it should correspond to the departement list. This parameter is used by shapely.affinity.scale function, please refer to its documentation to choose the value. By default, only Guyane’s size is reduced. If the value is None, no rescaling is performed.
center (tuple, optional): center point from which offshore points are computed to move overseas departement It should be defined as a (longitude, latitude) point in crs EPSG:3857
radius (float, optional): radius used with center point to make offshore points, distance in meter
angle (float, optional): angle used between offshore points, by default it is pi/9
startAngle (float, optional): start angle defining offshore points, by default it is pi * (1 - 1.5 * 1/9))
- Notes:
by default translate method focuses on overseas departement, but it can be used to move any departement anywhere on the map
- Examples:
>>> from pynsee.geodata import get_geodata_list, get_geodata >>> # >>> # Get a list of geographical limits of French administrative areas from IGN API >>> geodata_list = get_geodata_list() >>> # >>> # Get geographical limits of departments >>> df = get_geodata('ADMINEXPRESS-COG-CARTO.LATEST:departement') >>> # >>> # Move overseas departements closer to metropolitan France >>> dfTranslate = df.translate()
- zoom(departement=['75', '92', '93', '94'], center=(- 133583.39, 5971815.98), radius=650000, startAngle=2.2689280275926285, factor=2)¶
Zoom on parisian departements
- Args:
departement (list, optional): list of departements to be moved, departements closest to Paris are selected by default
center (tuple, optional): center point from which an offshore point is computed to move Parisian departements It should be defined as a (longitude, latitude) point in crs EPSG:3857
radius (float, optional): radius used with center point to make offshore point, distance in meter
startAngle (float, optional): start angle defining offshore point, by default it is pi * (1 - 2.5 * 1/9))
factor (float, optional): make departements bigger or smaller. This parameter is used by shapely.affinity.scale function, please refer to its documentation to choose the value.
- Notes:
by default zoom method focuses on the closest departements to Paris, but the function can be used to make a zoom on any departement anywhere on the map
- Examples:
>>> from pynsee.geodata import get_geodata_list, get_geodata >>> # >>> # Get a list of geographical limits of French administrative areas from IGN API >>> geodata_list = get_geodata_list() >>> # >>> # Get geographical limits of departments >>> df = get_geodata('ADMINEXPRESS-COG-CARTO.LATEST:departement') >>> # >>> # Zoom on parisian departements >>> dfZoom = df.zoom()
Get local data¶
- pynsee.localdata.get_local_data(variables, dataset_version, nivgeo='FE', geocodes=['1'], update=False, silent=False)¶
Get INSEE local numeric data
- Args:
variables (str): one or several variables separated by an hyphen (see get_local_metadata)
dataset_version (str): code of a dataset version (see get_local_metadata), if dates are replaced by ‘latest’ the function triggers a loop to find the latest data available (examples: ‘GEOlatestRPlatest’, ‘GEOlatestFLORESlatest’)
nivgeo (str): code of kind of French administrative area (see get_nivgeo_list), by default it is ‘FE’ ie all France
geocodes (list): code one specific area (see get_geo_list), by default it is [‘1’] ie all France
update (bool): data is saved locally, set update=True to trigger an update
silent (bool, optional): Set to True, to disable messages printed in log info
- Raises:
ValueError: Error if geocodes is not a list
- Examples:
>>> from pynsee.localdata import get_local_metadata, get_nivgeo_list, get_geo_list, get_local_data >>> metadata = get_local_metadata() >>> nivgeo = get_nivgeo_list() >>> departement = get_geo_list('departements') >>> # >>> data_all_france = get_local_data(dataset_version='GEO2020RP2017', >>> variables = 'SEXE-DIPL_19') >>> # >>> data_91_92 = get_local_data(dataset_version='GEO2020RP2017', >>> variables = 'SEXE-DIPL_19', >>> nivgeo = 'DEP', >>> geocodes = ['91','92']) >>> # >>> # get latest data for from RP (Recensement / Census) on socio-professional categories by sexe in Paris >>> data_paris = get_local_data(dataset_version='GEOlatestRPlatest', >>> variables = 'CS1_8-SEXE', >>> nivgeo = 'COM', >>> geocodes = '75056')
- pynsee.localdata.get_population()¶
Get population data on all French communes (cities)
- Examples:
>>> from pynsee.localdata import get_population >>> pop = get_population()
- pynsee.localdata.get_included_area(area_type, codeareas)¶
Get all areas included in the list of areas provided
- Args:
area_type (str): type of area
codeareas (str): list of areas
- Raises:
ValueError: Error if codeareas is not a list
- Examples:
>>> from pynsee.localdata import get_area_list, get_included_area >>> area_list = get_area_list() >>> paris_empl_area = get_included_area(area_type = 'zonesDEmploi2020', codeareas = '1109')
- pynsee.localdata.get_old_city(code, date=None)¶
Get data about the old cities made from the new one
- Notes:
Local data is always provided with a cities classification which depends on a year. This classification evolves over time due to the merger of some cities. It is often useful to keep track of these mergers to reconcile some data.
- Args:
code (str): city code
date (str, optional): date used to analyse the data, format : ‘AAAA-MM-JJ’. If date is None, by default it supposed to be the current year.
- Examples:
>>> from pynsee.localdata import get_old_city >>> df = get_old_city(code = '24259')
- pynsee.localdata.get_new_city(code, date=None)¶
Get data about the new city made from the old ones
- Notes:
Local data is always provided with a cities classification which depends on a year. This classification evolves over time due to the merger of some cities. It is often useful to keep track of these mergers to reconcile some data.
To get a city at a given date, use get_area_projection instead.
- Args:
code (str): city code
date (str, optional): date used to analyse the data, format : ‘AAAA-MM-JJ’. If date is None, by default it supposed to be ten years before current year.
- Examples:
>>> from pynsee.localdata import get_next_city >>> df = get_next_city(code = '24431', date = '2018-01-01')
- pynsee.localdata.get_ascending_area(area: str, code: str, date: str = None, type: str = None, update: bool = False, silent: bool = False)¶
Get information about areas containing a given area
- Args:
area (str): case sensitive, area type, any of (‘arrondissement’, ‘arrondissementMunicipal’, ‘circonscriptionTerritoriale’, ‘commune’, ‘communeAssociee’, ‘communeDeleguee’, ‘departement’, ‘district’)
code (str): area code
type (str) : case insensitive, any of ‘Arrondissement’, ‘Departement’, ‘Region’, ‘UniteUrbaine2020’, ‘ZoneDEmploi2020’, …
date (str, optional): date used to analyse the data, format : ‘AAAA-MM-JJ’. If date is None, by default the current date is used.
update (bool): locally saved data is used by default. Trigger an update with update=True.
silent (bool, optional): Set to True, to disable messages printed in log info
- Examples:
>>> from pynsee.localdata import get_ascending_area >>> df = get_ascending_area("commune", code='59350', date='2018-01-01') >>> df = get_ascending_area("departement", code='59')
- pynsee.localdata.get_descending_area(area: str, code: str, date: str = None, type: str = None, update: bool = False)¶
Get information about areas contained in a given area
- Args:
area (str): case sensitive, area type, any of (‘aireDAttractionDesVilles2020’, ‘arrondissement’, ‘collectiviteDOutreMer’, ‘commune’, ‘departement’, ‘region’, ‘uniteUrbaine2020’, ‘zoneDEmploi2020’)
code (str): area code
type (str) : case insensitive, any of ‘Arrondissement’, ‘Departement’, ‘Region’, ‘UniteUrbaine2020’, ‘ZoneDEmploi2020’, …
date (str, optional): date used to analyse the data, format : ‘AAAA-MM-JJ’. If date is None, by default the current date is used/
update (bool): locally saved data is used by default. Trigger an update with update=True.
- Examples:
>>> from pynsee.localdata import get_area_descending >>> df = get_descending_area("commune", code='59350', date='2018-01-01') >>> df = get_descending_area("departement", code='59', date='2018-01-01') >>> df = get_descending_area("zoneDEmploi2020", code='1109')
Get metadata¶
- pynsee.metadata.get_definition(ids)¶
Get the definition of a concept from its identifier
- Args:
ids (list): a list of concept identifiers
- Raises:
ValueError: an error is raised if ids is not a list
- Examples:
>>> from pynsee.metadata import get_definition_list, get_definition >>> def_list = get_definition_list() >>> # geographic areas definition >>> geo_definitions = get_definition(['c1468', 'c1282', 'c1762', >>> 'c1501', 'c1346', 'c1502', 'c1912', >>> 'c1361', 'c2173', 'c2070'])
- pynsee.metadata.get_legal_entity(codes, print_err_msg=True, update=False, silent=False)¶
Get legal entities labels
- Args:
codes (list): list of legal entities code of 2 or 4 characters
update (bool, optional): Trigger an update, otherwise locally saved data is used. Defaults to False
silent (bool, optional): Set to True, to disable messages printed in log info
- Examples:
>>> from pynsee.metadata import get_legal_entity >>> legal_entity = get_legal_entity(codes = ['5599', '83'])
Get sirene data¶
- pynsee.sirene.get_sirene_data(*id)¶
Get data about one or several companies from siren or siret identifiers
- Notes:
This function may return personal data, please check and comply with the legal framework relating to personal data protection
- Examples:
>>> from pynsee.sirene import get_sirene_data >>> df = get_sirene_data("552081317", "32227167700021") >>> df = get_sirene_data(['32227167700021', '26930124800077'])
- pynsee.sirene.get_sirene_relatives(*siret)¶
Find parent or child entities for one siret entity (etablissement)
- Args:
siret (str or list): siret or list of siret codes
- Raises:
ValueError: siret should be str or list
- Returns:
pandas.DataFrame: dataframe containing the query content
- Examples:
>>> # find parent or child entities for one siret entity (etablissement) >>> data = get_sirene_relatives('00555008200027') >>> data = get_sirene_relatives(['39860733300059', '00555008200027'])
- class pynsee.sirene.SireneDataFrame.SireneDataFrame(*args, **kwargs)¶
Class for handling dataframes built from INSEE SIRENE API’s data
- get_location(update=False)¶
Get latitude and longitude from OpenStreetMap, add geometry column and turn
SireneDataframe
intoGeoFrDataFrame
.- Args:
- update (bool, optional): data is saved locally, set update=True to
trigger an update. Defaults to False.
- Notes:
If it fails to find the exact location, by default it returns the location of the city. Whether the exact location has been found or not is encoded in the exact_location column of the new
GeoFrDataFrame
.- Examples:
>>> from pynsee.metadata import get_activity_list >>> from pynsee.sirene import search_sirene >>> # >>> # Get activity list >>> naf5 = get_activity_list('NAF5') >>> # >>> # Get alive legal entities belonging to the automotive industry >>> df = search_sirene(variable = ["activitePrincipaleEtablissement"], >>> pattern = ['29.10Z'], kind = 'siret') >>> # >>> # Keep businesses with more than 100 employees >>> df = df.loc[df['effectifsMinEtablissement'] > 100] >>> df = df.reset_index(drop=True) >>> # >>> # Get location >>> df = df.get_location()
Get data from insee.fr files¶
- pynsee.download.download_file(id, variables=None, update=False, silent=False)¶
User level function to download files from insee.fr
- Args:
id (str): file id, check get_file_list to have a full list of available files
variables (list): a list of variables to load from the data file, use get_column_metadata function to have the full list
update (bool, optional): Trigger an update, otherwise locally saved data is used. Defaults to False.
silent (bool, optional): Set to True, to disable messages printed in log info
- Returns:
Returns the request dataframe as a pandas object
- Examples:
>>> from pynsee.download import download_file >>> df = download_file("AIRE_URBAINE")
- pynsee.download.get_column_metadata(id)¶
Get metadata about an insee.fr file
- Returns:
Returns the request dataframe as a pandas object
- Examples:
>>> from pynsee.download import get_column_metadata >>> rp_logement_metadata = get_column_metadata("RP_LOGEMENT_2016")