Modules and functions

Download

pynsee.download.download_file(id, variables=None, update=False, silent=False) DataFrame

User level function to download files from insee.fr

Note: in the case of vintaged ids (ie. TAG_COM_2025), it should be safe to use a _LATEST tag instead if you just want the latest available dataset (ie TAG_COM_LATEST). In case of side-effects, pay attention to the log entries and revert to the vintaged id.

Parameters:
  • id (str) – file id, check get_file_list to have a full list of available files

  • variables (list) – a list of variables to load from the data file, use get_column_metadata function to have the full list

  • update (bool, optional) – Trigger an update, otherwise locally saved data is used. Defaults to False.

  • silent (bool, optional) – Set to True, to disable messages printed in log info

Returns:

Returns the request dataframe as a pandas object

Examples

>>> from pynsee.download import download_file
>>> df = download_file("TAG_COM_LATEST")
pynsee.download.get_column_metadata(id)

Get metadata about an insee.fr file

Returns:

Returns the request dataframe as a pandas object

Examples

>>> from pynsee.download import get_column_metadata
>>> rp_logement_metadata = get_column_metadata("RP_LOGEMENT_2016")
pynsee.download.get_file_list()

Download a list of files available on insee.fr

Returns:

Returns the requested dataframe as a pandas object

Notes

pynsee.download’s metadata rely on volunteering contributors and their manual updates. get_file_list does not provide data from official Insee’s metadata API. Consequently, please report any issue

Examples

>>> from pynsee.download import get_file_list
>>> insee_file_list = get_file_list()

Geodata

class pynsee.geodata.GeoFrDataFrame(*args, **kwargs)

Class for handling GeoDataFrames built from IGN’s geographical data. It inherits from GeoDataFrame.

copy(deep: bool = True) GeoFrDataFrame

Copy the GeoFrDataFrame object.

get_geom()

Return the combination of all geometries in the GeoFrDataFrame.

Deprecated since version 0.2.0: Use geometry instead and call union_all() on it. See also the documentation of geopandas.

Example

>>> geo = gdf.geometry.union_all()
transform_overseas(*args, **kwargs) GeoFrDataFrame

Apply translation and zoom to oversea territories.

See: pynsee.geodata.transform_overseas().

translate(*args, **kwargs) GeoSeries | GeoFrDataFrame

This function is a deprecated alias of transform_overseas(). It will try to guess whether you want to run transform_overseas or the real GeoDataFrame.translate() depending on the arguments that were passed. If no arguments are passed, it will run transform_overseas.

Warning

Starting with pynsee >= 0.3.0, this function will only run GeoSeries.translate(). Do switch to transform_overseas() if this is what you wanted to run.

zoom(*args, **kwargs) GeoFrDataFrame

Zoom for parisian departments.

See: pynsee.geodata.zoom().

pynsee.geodata.get_geodata(dataset_id: str, update: bool = False, crs: Any = 'EPSG:3857', constrain_area: GeoDataFrame | None = None) GeoFrDataFrame

Get geographical data with identifier and from IGN API

Parameters:
  • id (str) – data identifier from get_geodata_list function

  • update (bool, optional) – data is saved locally, set update=True to trigger an update. Defaults to False.

  • crs (any valid CRS input, optional) – CRS used for the geodata output. Defaults to ‘EPSG:3857’.

  • constrain_area (GeoDataFrame, optional) – GeoDataFrame used to constrain the area of interest. Defaults to None.

Examples

>>> from pynsee.geodata import get_geodata_list, get_geodata
>>> #
>>> # Get a list of geographical limits of French administrative areas from IGN API
>>> geodata_list = get_geodata_list()
>>> #
>>> # Get geographical limits of departments
>>> df = get_geodata('ADMINEXPRESS-COG-CARTO.LATEST:departement')
pynsee.geodata.get_geodata_list(update=False, silent=False) DataFrame

Get a list of geographical limits of French administrative areas from IGN API

Parameters:
  • update (bool, optional) – Trigger an update, otherwise locally saved data is used. Defaults to False.

  • silent (bool, optional) – Set to True, to disable messages printed in log info

Examples

>>> from pynsee.geodata import get_geodata_list
>>> # Get a list of geographical limits of French administrative areas from IGN API
>>> geodata_list = get_geodata_list()
pynsee.geodata.transform_overseas(gdf: GeoDataFrame, departement: tuple[str, ...] = ('971', '972', '974', '973', '976'), factor: tuple[float | None, ...] = (None, None, None, 0.35, None), center: tuple[float, float] = (-133583.39, 5971815.98), radius: float = 650000, angle: float = 0.3490658503988659, startAngle: float = 2.6179938779914944) GeoDataFrame

Move overseas departements closer to metropolitan France

Parameters:
  • departement (tuple, optional) – list of departements to be moved, overseas departement list is used by default

  • factor (tuple, optional) – make departements bigger or smaller, it should correspond to the departement list.

  • function (This parameter is used by shapely.affinity.scale)

  • value. (please refer to its documentation to choose the)

  • default (By)

  • None (only Guyane's size is reduced. If the value is)

  • performed. (no rescaling is)

  • center (tuple, optional) – center point from which offshore points are computed to move overseas departement

  • a (It should be defined as) – 3857

  • radius (float, optional) – radius used with center point to make offshore points, distance in meter

  • angle (float, optional) – angle used between offshore points, by default it is pi/9

  • startAngle (float, optional) – start angle defining offshore points, by default it is pi * (1 - 1.5 * 1/9))

Notes

by default translate method focuses on overseas departement, but it can be used to move any departement anywhere on the map

Examples

>>> from pynsee.geodata import get_geodata_list, get_geodata
>>> #
>>> # Get a list of geographical limits of French administrative areas from IGN API
>>> geodata_list = get_geodata_list()
>>> #
>>> # Get geographical limits of departments
>>> gdf = get_geodata('ADMINEXPRESS-COG-CARTO.LATEST:departement')
>>> #
>>> # Move overseas departements closer to metropolitan France
>>> dfTranslate = gdf.translate()
pynsee.geodata.zoom(gdf: GeoDataFrame, departement: tuple[str, ...] = ('75', '92', '93', '94'), center: tuple[float, float] = (-133583.39, 5971815.98), radius: float = 650000, startAngle: float = 2.2689280275926285, factor: float = 2) GeoDataFrame

Zoom on parisian departements

Parameters:
  • departement (list, optional) – list of departements to be moved, departements closest to Paris are selected by default

  • center (tuple, optional) – center point from which an offshore point is computed to move Parisian departements

  • a (It should be defined as) – 3857

  • radius (float, optional) – radius used with center point to make offshore point, distance in meter

  • startAngle (float, optional) – start angle defining offshore point, by default it is pi * (1 - 2.5 * 1/9))

  • factor (float, optional) – make departements bigger or smaller.

  • function (This parameter is used by shapely.affinity.scale)

  • value. (please refer to its documentation to choose the)

Notes

by default zoom method focuses on the closest departements to Paris, but the function can be used to make a zoom on any departement anywhere on the map

Examples

>>> from pynsee.geodata import get_geodata_list, get_geodata
>>> #
>>> # Get a list of geographical limits of French administrative areas from IGN API
>>> geodata_list = get_geodata_list()
>>> #
>>> # Get geographical limits of departments
>>> df = get_geodata('ADMINEXPRESS-COG-CARTO.LATEST:departement')
>>> #
>>> # Zoom on parisian departements
>>> dfZoom = df.zoom()

Local data

pynsee.localdata.get_area_list(area=None, date=None, update=False, silent=False) DataFrame

Get an exhaustive list of administrative areas : communes, departments, and urban, employment or functional areas

Parameters:
  • area (str, optional) – Defaults to None, then get all values

  • date (str) – date of validity (AAAA-MM-DD)

  • update (bool) – locally saved data is used by default. Trigger an update with update=True.

  • silent (bool, optional) – Set to True, to disable messages printed in log info

Raises:

ValueError – Error if area is not available

Examples

>>> from pynsee.localdata import get_area_list
>>> area_list = get_area_list()
>>> #
>>> # get list of all communes in France
>>> reg = get_area_list(area='regions')
pynsee.localdata.get_area_projection(area: str, code: str, date: str, dateProjection: str = None, silent: bool = False)

Get data about the area (valid at given date datetime) projected at dateProjection datetime.

Parameters:
  • area (str) – case insensitive, area type, any of ( ‘arrondissement’, ‘arrondissementMunicipal’, ‘commune’, ‘departement’, ‘region’ )

  • code (str) – city code

  • date (str) – date used to analyse the data, format : ‘AAAA-MM-JJ’.

  • dateProjection (str, optional) – date used to project the area into, format : ‘AAAA-MM-JJ’. If dateProjection is None, by default it is supposed to be the current date (ie projection into today’s value)

  • silent (bool, optional) – Set to True, to disable messages printed in log info

Examples

>>> from pynsee.localdata import get_area_projection
>>> df = get_area_projection(
        code='01039',
        date='2020-01-01',
        dateProjection='2023-04-01'
        )
pynsee.localdata.get_ascending_area(area: str, code: str, date: str = None, type: str = None, update: bool = False, silent: bool = False)

Get information about areas containing a given area

Parameters:
  • area (str) – case sensitive, area type, any of (‘arrondissement’, ‘arrondissementMunicipal’, ‘circonscriptionTerritoriale’, ‘commune’, ‘communeAssociee’, ‘communeDeleguee’, ‘departement’, ‘district’)

  • code (str) – area code

  • type (str) – case insensitive, any of ‘Arrondissement’, ‘Departement’, ‘Region’, ‘UniteUrbaine2020’, ‘ZoneDEmploi2020’, …

  • date (str, optional) – date used to analyse the data, format : ‘AAAA-MM-JJ’. If date is None, by default the current date is used.

  • update (bool) – locally saved data is used by default. Trigger an update with update=True.

  • silent (bool, optional) – Set to True to disable messages printed in log info

Examples

>>> from pynsee.localdata import get_ascending_area
>>> df = get_ascending_area("commune", code='59350', date='2018-01-01')
>>> df = get_ascending_area("departement", code='59')
pynsee.localdata.get_descending_area(area: str, code: str, date: str = None, type: str = None, update: bool = False, silent: bool = False)

Get information about areas contained in a given area

Parameters:
  • area (str) – case sensitive, area type, any of (‘aireDAttractionDesVilles2020’, ‘arrondissement’, ‘collectiviteDOutreMer’, ‘commune’, ‘departement’, ‘region’, ‘uniteUrbaine2020’, ‘zoneDEmploi2020’)

  • code (str) – area code

  • type (str) – case insensitive, any of ‘Arrondissement’, ‘Departement’, ‘Region’, ‘UniteUrbaine2020’, ‘ZoneDEmploi2020’, …

  • date (str, optional) – date used to analyse the data, format : ‘AAAA-MM-JJ’. If date is None, by default the current date is used/

  • update (bool) – locally saved data is used by default. Trigger an update with update=True.

  • silent (bool, optional) – Set to True to disable messages printed in log info

Examples

>>> from pynsee.localdata import get_area_descending
>>> df = get_descending_area("commune", code='59350', date='2018-01-01')
>>> df = get_descending_area("departement", code='59', date='2018-01-01')
>>> df = get_descending_area("zoneDEmploi2020", code='1109')
pynsee.localdata.get_geo_list(geo=None, date=None, update=False, silent=False)

Get a list of French geographic areas (communes, departements, regions …)

Parameters:
  • geo (str) – choose among : communes, communesDeleguees, communesAssociees, regions, departements, arrondissements, arrondissementsMunicipaux

  • date (str) – date of validity (AAAA-MM-DD)

  • update (bool) – locally saved data is used by default. Trigger an update with update=True.

  • silent (bool, optional) – Set to True, to disable messages printed in log info

Raises:

ValueError – geo should be among the geographic area list

Examples

>>> from pynsee.localdata.get_geo_list import get_geo_list
>>> city_list = get_geo_list('communes')
>>> region_list = get_geo_list('regions')
>>> departement_list = get_geo_list('departements')
>>> arrondiss_list = get_geo_list('arrondissements')
pynsee.localdata.get_local_data(variables, dataset_version, nivgeo='FE', geocodes=['1'], update=False, silent=False, backwardperiod=6)

Get INSEE local numeric data

Parameters:
  • variables (str) – one or several variables separated by an hyphen (see get_local_metadata)

  • dataset_version (str) – code of a dataset version (see get_local_metadata), if dates are replaced by ‘latest’ the function triggers a loop to find the latest data available (examples: ‘GEOlatestRPlatest’, ‘GEOlatestFLORESlatest’).

  • nivgeo (str) – code of kind of French administrative area (see get_nivgeo_list), by default it is ‘FE’ ie all France

  • geocodes (list) – code one specific area (see get_geo_list), by default it is [‘1’] ie all France

  • update (bool) – data is saved locally, set update=True to trigger an update

  • silent (bool, optional) – Set to True to disable messages printed in log info

  • backwardperiod (int, optional) – this arg is used only whenever the latest data is searched, it specifies the number of past years the loop should run through.

Raises:

ValueError – Error if geocodes is not a list

Examples

>>> from pynsee.localdata import get_local_metadata, get_nivgeo_list, get_geo_list, get_local_data
>>> metadata = get_local_metadata()
>>> nivgeo = get_nivgeo_list()
>>> departement = get_geo_list('departements')
>>> #
>>> data_all_france = get_local_data(dataset_version='GEO2020RP2017',
>>>                        variables =  'SEXE-DIPL_19')
>>> #
>>> data_91_92 = get_local_data(dataset_version='GEO2020RP2017',
>>>                        variables =  'SEXE-DIPL_19',
>>>                        nivgeo = 'DEP',
>>>                        geocodes = ['91','92'])
>>> #
>>> # get latest data for from RP (Recensement / Census) on socio-professional categories by sexe in Paris
>>> data_paris = get_local_data(dataset_version='GEOlatestRPlatest',
>>>                        variables =  'CS1_8-SEXE',
>>>                        nivgeo = 'COM',
>>>                        geocodes = '75056')
pynsee.localdata.get_local_metadata()

Get a list of all combinations of datasets, variables and unit measures available from INSEE Local API

Notes

This function renders only package’s internal data, it might not be the most up-to-date

Examples

>>> from pynsee.localdata import get_local_metadata
>>> metadata = get_local_metadata()
pynsee.localdata.get_new_city(code, date=None)

Get data about the new city made from the old ones

Notes

Local data is always provided with a cities classification which depends on a year. This classification evolves over time due to the merger of some cities. It is often useful to keep track of these mergers to reconcile some data.

To get a city at a given date, use get_area_projection instead.

Parameters:
  • code (str) – city code

  • date (str, optional) – date used to analyse the data, format : ‘AAAA-MM-JJ’. If date is None, by default it supposed to be ten years before current year.

Examples

>>> from pynsee.localdata import get_next_city
>>> df = get_next_city(code = '24431', date = '2018-01-01')
pynsee.localdata.get_nivgeo_list()

Get a list of geographic levels

Examples
>>> from pynsee.localdata import get_nivgeo_list
>>> nivgeo_list = get_nivgeo_list()
pynsee.localdata.get_old_city(code, date=None)

Get data about the old cities made from the new one

Notes

Local data is always provided with a cities classification which depends on a year. This classification evolves over time due to the merger of some cities. It is often useful to keep track of these mergers to reconcile some data.

Parameters:
  • code (str) – city code

  • date (str, optional) – date used to analyse the data, format : ‘AAAA-MM-JJ’. If date is None, by default it supposed to be the current year.

Examples

>>> from pynsee.localdata import get_old_city
>>> df = get_old_city(code = '24259')
pynsee.localdata.get_population()

Get population data on all French communes (cities)

Examples

>>> from pynsee.localdata import get_population
>>> pop = get_population()

Macro data

pynsee.macrodata.get_column_title(dataset=None, update=True)

Get the title of a dataset’s columns

Parameters:

dataset (str, optional) – An INSEE dataset name. Defaults to None, this returns all columns.

Raises:
  • ValueError – Only one string (length one)

  • ValueError – Dataset must belong to INSEE datasets list

Examples

>>> from pynsee.macrodata import get_column_title
>>> insee_all_columns = get_column_title()
>>> balance_paiements_columns = get_column_title("BALANCE-PAIEMENTS")
pynsee.macrodata.get_dataset(dataset, update=False, silent=False, metadata=True, filter=None, startPeriod=None, endPeriod=None, firstNObservations=None, lastNObservations=None, updatedAfter=None)

Get dataset’s data from INSEE BDM database

Parameters:
  • dataset (str) – an INSEE dataset included in the list provided by get_dataset_list()

  • update (bool, optional) – Set to True, to update manually the data stored locally on the computer. Defaults to False.

  • silent (bool, optional) – Set to True, to disable messages printed in log info

  • metadata (bool, optional) – If True, some metadata is added to the data

  • filter (str, optional) – Use the filter to choose only some values in a dimension.

  • means (It is recommended to use it for big datasets. A dimension left empty)

  • values. (all values are selected. To select multiple values in one dimension put a "+" between those)

  • startPeriod (str, optional) – start date of the data.

  • endPeriod (str, optional) – end date of the data.

  • firstNObservations (int, optional) – get the first N observations for each key series (idbank).

  • lastNObservations (int, optional) – get the last N observations for each key series (idbank).

  • updatedAfter (str, optional) – starting point for querying the previous releases (format yyyy-mm-ddThh:mm:ss)

Raises:

ValueError – dataset should be in INSEE’s datasets list

Returns:

DataFrame – contains the data

Examples

>>> from pynsee.macrodata import get_dataset
>>> ipc_data = get_dataset("IPC-2015",
>>>        filter = "M......ENSEMBLE...CVS.2015",
>>>        updatedAfter = "2017-07-11T08:45:00")
>>> #
>>> business_climate = get_dataset("CLIMAT-AFFAIRES", lastNObservations = 1)
pynsee.macrodata.get_dataset_list(update=False, silent=False)

Download a full INSEE’s datasets list from BDM macroeconomic database

Parameters:
  • update (bool, optional) – Set to True, to update manually the metadata

  • False. (stored locally on the computer. Defaults to)

  • silent (bool, optional) – Set to True, to disable messages printed in log info

Returns:

DataFrame – a dataframe containing the list of datasets available

Examples

>>> from pynsee.macrodata import get_dataset_list
>>> insee_dataset = get_dataset_list()
pynsee.macrodata.get_last_release()

Get the datasets from BDM macroeconomic database released in the last 30 days

Examples
>>> from pynsee.macrodata import get_last_release
>>> dataset_released = get_last_release()
pynsee.macrodata.get_series(*idbanks, update=False, silent=False, metadata=True, startPeriod=None, endPeriod=None, firstNObservations=None, lastNObservations=None, updatedAfter=None)

Get data from INSEE series idbank

Parameters:
  • idbanks (str or list or pd.series) – some idbanks provided by get_idbank_list()

  • update (bool, optional) – Set to True, to update manually the data

  • False. (stored locally on the computer. Defaults to)

  • silent (bool, optional) – Set to True, to disable messages printed in log info

  • metadata (bool, optional) – If True, some metadata is added to the data

  • startPeriod (str, optional) – start date of the data.

  • endPeriod (str, optional) – end date of the data.

  • firstNObservations (int, optional) – get the first N observations for each key series (idbank).

  • lastNObservations (int, optional) – get the last N observations for each key series (idbank).

  • updatedAfter (str, optional) – starting point for querying the previous releases (format yyyy-mm-ddThh:mm:ss)

Returns:

DataFrame – contains the data, indexed by DATE and sorted by IDBANK

Examples

>>> from pynsee.macrodata import get_series_list, get_series
>>> # inflation figures in France
>>> df_idbank = get_series_list("IPC-2015")
>>> df_idbank = df_idbank.loc[
>>>                    (df_idbank.FREQ == "M") & # monthly
>>>                    (df_idbank.NATURE == "INDICE") & # index
>>>                    (df_idbank.MENAGES_IPC == "ENSEMBLE") & # all kinds of household
>>>                    (df_idbank.REF_AREA == "FE") & # all France including overseas departements
>>>                    (df_idbank.COICOP2016.str.match("^[0-9]{2}$"))] # coicop aggregation level
>>> # get data
>>> data = get_series(df_idbank.IDBANK)
pynsee.macrodata.get_series_list(*datasets, update=False, silent=False)

Download an INSEE’s series key list for one or several datasets from BDM macroeconomic database

Parameters:
  • datasets (str) – datasets should be among the datasets list provided by get_dataset_list()

  • update (bool, optional) – Set to True, to update manually the metadata

  • False. (stored locally on the computer. Defaults to)

  • silent (bool, optional) – Set to True, to disable messages printed in log info

Raises:

ValueError – datasets should be among the datasets list provided by get_dataset_list()

Returns:

DataFrame – contains dimension columns, series keys, dataset name

Notes

Some metadata is stored for 3 months locally on the computer. It is updated automatically

Examples

>>> from pynsee.macrodata import get_dataset_list, get_series_list
>>> dataset_list = get_dataset_list()
>>> idbank_ipc = get_series_list('IPC-2015', 'CLIMAT-AFFAIRES')
pynsee.macrodata.get_series_title(series)

Get French and English titles of a list of series (idbanks)

Parameters:

series (list) – a list of series (idbanks)

Examples

>>> from pynsee import get_series_list, get_series_title
>>> series = get_series_list("CLIMAT-AFFAIRES")
>>> series = series.loc[:3, "IDBANK"].to_list()
>>> titles = get_series_title(series)
pynsee.macrodata.search_macrodata(pattern='.*', metadata=True)

Search a pattern among insee series (idbanks) from BDM macroeconomic database

Notes

This function uses package’s internal data which might not be the most up-to-date.

Parameters:

pattern (str, optional) – String used to filter the idbank list. Defaults to “.*”, returns all series.

Examples

>>> from pynsee.macrodata import search_macrodata
>>> search_all = search_macrodata()
>>> search_paper = search_macrodata("pâte à papier")
>>> search_paris = search_macrodata("PARIS")
>>> search_survey_gdp = search_macrodata("Survey|GDP")

Metadata

pynsee.metadata.get_activity_list(level)

Get a list of economic activities from NAF/NACE rev 2 2008 classification

Notes

This function uses NAF/NACE rev. 2 classification made in 2008. This function renders only package’s internal data.

Parameters:

level (str) – Levels available are : A5, A10, A17, A21, A38, A64, A88, A129, A138, NAF1, NAF2, NAF3, NAF4, NAF5

Raises:

ValueError – an error is raised if level is not in the default list

Examples

>>> from pynsee.metadata import get_activity_list
>>> activity_A138 = get_activity_list('A138')
>>> activity_NAF3 = get_activity_list('NAF3')
>>> activity_NAF5 = get_activity_list('NAF5')
pynsee.metadata.get_definition(ids)

Get the definition of a concept from its identifier

Parameters:

ids (list) – a list of concept identifiers

Raises:

ValueError – an error is raised if ids is not a list

Examples

>>> from pynsee.metadata import get_definition_list, get_definition
>>> def_list = get_definition_list()
>>> # geographic areas definition
>>> geo_definitions = get_definition(['c1468', 'c1282', 'c1762',
>>>                       'c1501', 'c1346', 'c1502', 'c1912',
>>>                       'c1361', 'c2173', 'c2070'])
pynsee.metadata.get_definition_list()

Get a list of concept definitions

Examples

>>> from pynsee.metadata import get_definition_list
>>> definition = get_definition_list()

Get legal entities labels

Parameters:
  • codes (list) – list of legal entities code of 2 or 4 characters

  • update (bool, optional) – Trigger an update, otherwise locally saved data is used. Defaults to False

  • silent (bool, optional) – Set to True, to disable messages printed in log info

Examples

>>> from pynsee.metadata import get_legal_entity
>>> legal_entity = get_legal_entity(codes = ['5599', '83'])

SIRENE

class pynsee.sirene.SireneDataFrame(*args, **kwargs)

Class for handling dataframes built from INSEE SIRENE API’s data

get_location(update=False) GeoFrDataFrame | SireneDataFrame

Get latitude and longitude from OpenStreetMap, add geometry column and turn SireneDataframe into GeoFrDataFrame.

Parameters:

update (bool, optional) – data is saved locally, set update=True to trigger an update. Defaults to False.

Notes

If it fails to find the exact location, by default it returns the location of the city. Whether the exact location has been found or not is encoded in the exact_location column of the new GeoFrDataFrame.

Examples

>>> from pynsee.metadata import get_activity_list
>>> from pynsee.sirene import search_sirene
>>> #
>>> #  Get activity list
>>> naf5 = get_activity_list('NAF5')
>>> #
>>> # Get alive legal entities belonging to the automotive industry
>>> df = search_sirene(variable = ["activitePrincipaleEtablissement"],
>>>                    pattern = ['29.10Z'], kind = 'siret')
>>> #
>>> # Keep businesses with more than 100 employees
>>> df = df.loc[df['effectifsMinEtablissement'] > 100]
>>> df = df.reset_index(drop=True)
>>> #
>>> # Get location
>>> df = df.get_location()
pynsee.sirene.get_dimension_list(kind='siret')

Get a list of all columns useful to make queries with search_sirene

Parameters:

kind (str, optional) – Choose between siret and siren. Defaults to ‘siret’.

Examples

>>> from pynsee.sirene import get_dimension_list
>>> sirene_dimension = get_dimension_list()
pynsee.sirene.get_sirene_data(*id)

Get data about one or several companies from siren or siret identifiers

Notes

This function may return personal data, please check and comply with the legal framework relating to personal data protection

Examples

>>> from pynsee.sirene import get_sirene_data
>>> df = get_sirene_data("552081317", "32227167700021")
>>> df = get_sirene_data(['32227167700021', '26930124800077'])
pynsee.sirene.get_sirene_relatives(*siret)

Find parent or child entities for one siret entity (etablissement)

Parameters:

siret (str or list) – siret or list of siret codes

Raises:

ValueError – siret should be str or list

Returns:

pandas.DataFrame – dataframe containing the query content

Examples

>>> # find parent or child entities for one siret entity (etablissement)
>>> data = get_sirene_relatives('00555008200027')
>>> data = get_sirene_relatives(['39860733300059', '00555008200027'])
pynsee.sirene.search_sirene(variable, pattern, kind='siret', phonetic_search=False, and_condition=True, upper_case=False, decode=False, number=1000, activity=True, legal=False, closed=False, update=False, silent=False)

Get data about companies from criteria on variables

Parameters:
  • variable (str or list) – name of the variable on which the search is applied.

  • pattern (str or list) – the pattern or criterium searched

  • kind (str, optional) – kind of companies : siren or siret. Defaults to “siret”

  • phonetic_search (bool, or list of bool, optional) – If True phonetic search is triggered on the all variables of the list, if it is a list of True/False, phonetic search is used accordingly on the list of variables

  • and_condition (bool, optional) – If True, only records meeting all conditions are kept (AND is inserted between the conditions). If False, all records meeting at least one condition are kept (OR is inserted between the conditions).

  • number (int, optional) – Number of companies searched. Defaults to 1000. If it is above 1000, multiple queries are triggered.

  • upper_case (bool, optional) – If True, values of argument ‘pattern’ are converted to upper case and added to the list of searched patterns.

  • decode (bool, optional) – If True, values of argument ‘pattern’ are decoded, especially accents are removed and added to the list of searched patterns.

  • activity (bool, optional) – If True, activty title is added based on NAF/NACE. Defaults to True.

  • legal (bool, optional) – If True, legal entities title are added

  • closed (bool, optional) – If False, closed entities are removed from the data and for each legal entity only the last period for which the data is stable is displayed

  • silent (bool, optional) – Set to True, to disable messages printed in log info

Notes

This function may return personal data, please check and comply with the legal framework relating to personal data protection

Examples

>>> from pynsee.metadata import get_activity_list
>>> from pynsee.sirene import search_sirene
>>> from pynsee.sirene import get_dimension_list
>>> #
>>> # Get available column names, it is useful to design your query with search_sirene
>>> sirene_dimension = get_dimension_list()
>>> #
>>> # Get activity list (NAF rev 2)
>>> naf5 = get_activity_list('NAF5')
>>> #
>>> # Get a list of hospitals in Paris
>>> df = search_sirene(variable = ["activitePrincipaleUniteLegale",
>>>                                "codePostalEtablissement"],
>>>                    pattern = ["86.10Z", "75*"], kind = "siret")
>>> #
>>> # Get a list of companies located in Igny city whose name matches with 'pizza' using a phonetic search
>>> df = search_sirene(variable = ["libelleCommuneEtablissement",
>>>                            'denominationUniteLegale'],
>>>                    pattern = ["igny", 'pizza'],
>>>                    phonetic_search=True, kind = "siret")
>>> #
>>> # Get a list of companies whose name matches with 'SNCF' (French national railway company)
>>> # and whose legal status is SAS (societe par actions simplifiee)
>>> df = search_sirene(variable=["denominationUniteLegale",
>>>                              'categorieJuridiqueUniteLegale'],
>>>                    pattern=["sncf", '5710'], kind="siren")
>>> #
>>> # Get data on Hadrien Leclerc
>>> df = search_sirene(variable = ['prenom1UniteLegale', 'nomUniteLegale'],
>>>                           pattern = ['hadrien', 'leclerc'],
>>>                           phonetic_search = [True, False],
>>>                           closed=True)
>>> #
>>> # Find 2500 tobacco shops
>>> df = search_sirene(variable = ['denominationUniteLegale'],
>>>            pattern = ['tabac'],
>>>            number = 2500,
>>>            kind = "siret")
>>> #
>>> # Find 1000 companies whose name sounds like Dassault Système or is a big company (GE),
>>> # search is made as well on patterns whose accents have been removed
>>> import os
>>> # environment variable 'pynsee_print_url' force the package to print the request
>>> os.environ["pynsee_print_url"] = 'True'
>>> df = search_sirene(variable = ["denominationUniteLegale", 'categorieEntreprise'],
>>>                 pattern = ['Dassot Système', 'GE'],
>>>                 and_condition = False,
>>>                 upper_case = True,
>>>                 decode = True,
>>>                 update = True,
>>>                 phonetic_search  = [True, False],
>>>                 number = 1000)

Utils

class pynsee.utils.PynseeAPISession(sirene_key: str | None = None, http_proxy: str | None = None, https_proxy: str | None = None)

Session class used throughout pynsee for http(s) queries. This session object uses a specific set of config values, use help(PynseeAPISession.__init__) for more informations.

request(method: str, url: str, timeout: tuple | int = (10, 15), raise_if_not_ok: bool = True, **kwargs) Response

Overwrite requests.Session’s request. Allows to set specific timeouts and to raise exceptions if response is not ok. Also silences urllib’s warnings on insecure requests being performed.

Parameters:
  • method (str) – Usually, “GET” or “POST”.

  • url (str) – URL to query.

  • timeout (Union[tuple, int], optional) – timeout used for the query. See requests’ documentation for more info. The default is (10, 15).

  • raise_if_not_ok (bool, optional) – If set to True, a RequestException will automatically be raised if the response is not ok (= status_code < 400). The default is True.

  • **kwargs – Any other kwargs are passed directly to requests.Session.request

Raises:

RequestException – If the requests fails (only if raise_if_not_ok is set to True).

Returns:

response (requests.Response) – HTTP response from the requests package.

request_insee(api_url: str | None = None, sdmx_url: str | None = None, file_format: str = 'application/xml', print_msg: bool = True, raise_if_not_ok: bool = False) Response

Performs a query to INSEE, either through API or sdmx_url

Parameters:
  • api_url (Optional[str]) – URL to be queried on the API portal, optional. The default is None.

  • sdmx_url (Optional[str]) – URL to be queried on the SDMX (Statistical Data and Metadata eXchange) webservice of INSEE, optional. The default is None.

  • file_format (str, optional) – Which king of file to expect. This currently alters the output of INSEE’s APIs. The default is “application/xml”.

  • print_msg (bool, optional) – If True, will log critical entries to warn of failures to query the APIs. The default is True.

  • raise_if_not_ok (bool, optional) – See PynseeAPISession.request. The default is False.

Raises:
  • RequestException – In case sirene_key is missing for a call to SIRENE API.

  • ValueError – If neither api_url or sdmx_url have been set.

Returns:

result (requests.Response) – HTTP response from the requests package.

pynsee.utils.clear_all_cache()

Clear the cache of all functions

Notes

If the credentials provided fail to get a token from api.insee.fr even after a double check, try to clear the cache as the output of the function retrieving the token is cached even it is an error.

Examples

>>> from pynsee.utils import clear_all_cache
>>> clear_all_cache()
pynsee.utils.init_conn(sirene_key: str | None = None, http_proxy: str | None = None, https_proxy: str | None = None) None

Save your credentials to connect to INSEE APIs, subscribe to api.insee.fr

Parameters:
  • sirene_key (str, optional) – user’s key for sirene API

  • http_proxy (str, optional) – Proxy server address, e.g. ‘http://my_proxy_server:port’. Defaults to “”.

  • https_proxy (str, optional) – Proxy server address, e.g. ‘http://my_proxy_server:port’. Defaults to “”.

Notes

Environment variables can be used instead of init_conn function

Examples

>>> from pynsee.utils import init_conn
>>> init_conn(sirene_key="my_sirene_key")
>>> #
>>> # if the user has to use a proxy server use http_proxy and https_proxy arguments as follows:
>>> from pynsee.utils import init_conn
>>> init_conn(sirene_key="my_sirene_key",
>>>           http_proxy="http://my_proxy_server:port",
>>>           https_proxy="http://my_proxy_server:port")
>>> #
>>> # Alternativety you can use directly environment variables as follows:
>>> # Beware not to commit your credentials!
>>> import os
>>> os.environ['sirene_key'] = 'my_sirene_key'
>>> os.environ['http_proxy'] = "http://my_proxy_server:port"
>>> os.environ['https_proxy'] = "http://my_proxy_server:port"
>>> init_conn()