Detailed description of the methodology behind extraction and processing of data from the EPFO

Source: We process the Employee Provident Fund data from the EPFO: Establishment Search page


Using the Establishment Search function, we get a list of all the Establishment names and their Establishment ID (code). Establishment ID is a unique identifier for all the establishments. In total, there are about 16 lakhs establishments with a unique Establishment ID (Jan 2019 - present). These establishments are associated with one of the 123 EPFO offices throughout the country.


Establishment and payment details

There are two sections of details that are available under "View details". These are Establishment details and Payment details. 


We merge these using using the unique identifier of Establishment Code. The establishments are aggregated to a district level using the following methods:


  1. State and district names already present in Establishment details
  2. If district names are not present, we map the establishment's pin codes to their respective districts. (source for mapping pin codes)

Dimensions for establishments

We use the details available to identify the following:


  1. Sector and various subcategories: Using the Primary Business Activity, we classify the establishment into one of the following:

    1. Manufacturing

    2. Services

    3. Agriculture

    4. Trading

    5. Others

  2. Size of establishment: Using the No. of employee for whom PF contribution is paid, we classify the establishment into either of the following:

    1. Small: less than 50 employees

    2. Medium: Between 50-500 employees

    3. Large: More than 500 employees

Metrics for Establishments

We use the Establishment ID as a unique identifier to extract the payment details and merge with the Establishment details.

Data is aggregated at a monthly level based on the Wage month for all the metrics.

List of metrics available:

  1. Total PF paying firms: Firms which have paid PF contributions for the respective month. 

  2. Total number of employees: Employees for whom PF is paid for the respective month

  3. Total PF amount paid: Total amount paid as PF contributions

  4. Firms missing PF dues: Firms which have failed to pay their PF contributions for the month. Firms will be identified as having missed PF dues if it fails to pay EPF amount at least one employee. It will not be recorded here if it makes a partial payment of some of its employees.

We consider the Wage month and not the date of credit. Hence, it an establishment pays for PF dues of the month of March 2021 in June 2021, it is attributed to March 2021 and not June 2021.

Hence, in the following example, an establishment has paid its March 2021 dues in separate tranches in April and July 2021. It will all be recorded under March 2021.

Firms typically pay their EPF contributions a couple of months after the wage month. hence, it is recommended to ignore the last two months of data as most establishments would not have filed their contributions.

How is this different from the EPFO dashboard?

The EPFO releases data on firms paying and members (employees) on their own dashboard. You can find the dashboard here:

However, there are several key differences: 

  1. The EPFO dashboard aggregates the data for the past year (It does not mention whether it is last 12 months or calendar year)

  2. As part of Contributing Establishments - it takes into consideration if a firm has paid PF contribution in any one of months within the reference period. 

    1. Example: a firm may have paid PF dues from April - June 2020, but failed to register payments since. It will be considered on the EPFO dashboard. However, on Sales Pulse, it will be recorded as a PF paying firm till June 2020, and will be recorded as Missing PF dues post June 2020. If the firm resumes paying PF dues from November 2020, it will again be captured under a PF paying firm.

Hence, as explained above, since the Sales Pulse looks at monthly data, it is not comparable to what the EPFO dashboard displays, but is more robust to track monthly changes.