Python Project’s Structure P2C

Ismail Mebsout
October 23, 2024
5 min

As a programmer, I find that coding is an art. When working on a complex project, there are many steps to follow in order to develop a coherent, solid, and sustainable code that can be read and resumed by other contributors:

Project's E2E steps
  • First, it is essential to understand the problem in order to answer the right need
  • The project can be chopped up into sub-projects that facilitate the task and especially the collaboration
  • Your pipeline is a straightforward result of your sub-projects
  • The code’s structure should follow the same fragmentation so that you can have the same Pipeline and technical logic
  • Baby steps can be followed in each fragment in order to efficiently develop your code

In this article, we will see how to go from a business need to a fully functional python code that is straightforward to apprehend.

The summary is as follows:

  1. Project Example
  2. Project organization
  3. Imports in Python

Project example

For the sake of illustration, we will consider the following business need:

As a highway manager, I would like to carry out a daily count of the vehicles using a given itinerary. In order to answer this need, a data science team was put on the project and decided to use a fixed camera and count the unique number of license plates.
(one suggestion among others)

This idea can be seen as the sequence of several steps (a simplified decomposition among others):

  • Vehicle’s detection
  • License plate’s detection
  • License plate’s OCRization

Hence, the following pipeline:

Technical pipeline

Project’s organization

Given the pipeline above, one can organize the code as follows:

|--data/ #useful to store temporary files for instance
|--Tests/ #hosts the functional unit testings of your code/api
|--notebooks/ #helpful for testing and developping and debugging
|--|--develop.ipynb
|--weights/ #weights are kept a part for easier manipulation
|--counthighwaypy/ #folder hosting your entire code
|--|--detectvehicle/ #1st brick of your pipeline
|--|--|--detect_vehicle_main.py
|--|--|--detect_vehicle_utils.py
|--|--|--detect_vehicle_conf.py
|--|--|--Tests/ #independant unit testings relative to 1st brick
|--|--detectlicenseplate/ #2nd brick of your pipeline
|--|--|--licence_plate_main.py
|--|--|--licence_plate_utils.py
|--|--|--licence_plate_conf.py
|--|--|--Tests/ #independant unit testings relative to 1st brick
|--|--ocrlicenseplate/ #3rd brick of your pipeline
|--|--|--ocr_license_main.py
|--|--|--ocr_license_utils.py
|--|--|--ocr_license_conf.py
|--|--|--Tests/ #independant unit testings relative to 1st brick

|--|--utils.py
|--|--conf.py #! very  important file (see below)
|--|--main.py # orchestrator of the different bricks
|--|--Tests/ #E2E technical unit testings
+--README.md
+--app.py #hosts your API and calls the main.py
+--packages.txt #python environment
+--launch_tests.sh #functional unit testings
+--pytest.ini
+--Dockerfile

As mentioned in the previous section, the structure of the repository follows the same logic within the pipeline.

  • Each brick hosts :
    + utils file: contains all the auxiliary functions of your brick
    + conf file: contains all the constant parameters (names of variables, directories, the value of hyper-parameters, …)
    + main file: usually hosts one function that assembles all the baby step functions in utils file
    + Tests folder: contains unit testings that allow evaluating the regressions and the improvements specific to the brick independently from the other ones. It is an essential principle that enables faster and more efficient debugging.
  • When working on machine learning algorithms, it is better to store the potential weights at the root of the project since they might be replaced very often durant the development.
  • Trying new features can be easily carried out using notebooks. Given the structure, each one should host the following python code in order to be able to “see” and import the module counthighwaypy:

import os
import sys
sys.path.insert(0, os.path.abspath("../")) #visible parents folders
from counthighwaypy import ...
### your code
  • Within the module counthighwaypy, it is important to have a utils file, conf and a main file that orchestrates the different bricks without forgetting the E2E Tests
  • The conf file is very essential because it sets the root of the project and its different sub-modules and directories. It can be written as follows:
import os
PROJECT_ROOT = os.path.realpath(os.path.join(os.path.realpath(__file__), "../.."))
##### directories
DATA = os.path.join(PROJECT_ROOT, "data/")
NOTEBOOK= os.path.join(PROJECT_ROOT, "notebooks")
SRC = os.path.join(PROJECT_ROOT, "counthighwaypy/")
WEIGHTS = os.path.join(PROJECT_ROOT, "weights/")
MODULE_DETECT_VEHICLE = os.path.join(SRC, "detectvehicle/")
MODULE_DETECT_LICENCE_PLATE = os.path.join(SRC, "detectlicenseplate/")
MODULE_OCR_LICENSE_PLATE = os.path.join(SRC, "ocrlicenseplate/")
  • The app file encapsulates your project into an API that can be consumable by other users and services
  • Additional files such as packages.txt, pytest.ini, and Dockerfile are placed at the root of the project

Once the code’s structure is set, it is better to develop each brick in a standalone format independently from the others. With that being said, here are some guidelines you can follow:

  • Set the format of the input & output of each brick, where the output of the brick i is the input of the brick i+1
  • Write the code’s canvas (empty functions) in a simple way when you read it, you instantly understand what the script does
  • Don’t forget the signatures and the comments
  • Use a code versioning tool, git for example, for more efficient collaboration
  • Keep your code clean using code formatters and code linters
  • For cross-team collaboration, expose your code/package as a consumable API

Imports in Python

Since Python 3.3, a folder folderame is considered as a module (without the need of an __init__.py file) and can be simply imported in a python file, as long as it is visible i.e on the same tree level, by using:

import foldername

Say we have the following structure:

|--FOLDER1/ |--|--file1.py|--FOLDER2/ |--|--file2.py|--main.py
  • In main.py we can
import FOLDER1.file1
import FOLDER2.file2
  • To import file2 in file1:
import os
import sys
#make FOLDER2 visible to file1 (one step up in the tree)
sys.path.insert(0, os.path.abspath("../"))
from FOLDER2 import file2

In a complex python project, in order to keep your imports consistent, it is recommended to start all them all from the source of your code. In our case, start all your imports in any .py file with:

from counthighwaypy.xxx.xxx import xxx

Conclusion

There are other ways to structure your python project, but I find the one described in this article straightforward to understand and easy to follow. It can also be applied to languages other than Python.

I hope you have enjoyed reading this article and that it will help you organize your work better in the future.
All comments and suggestions are welcome!

As a programmer, I find that coding is an art. When working on a complex project, there are many steps to follow in order to develop a coherent, solid, and sustainable code that can be read and resumed by other contributors:

Get In Touch

Have any questions? We'd love to hear from you.

Thank you! We will get back in touch with you within 48 hours.
Oops! Something went wrong while submitting the form.

Python Project’s Structure P2C

From Pipeline to Code
Ismail Mebsout
Author
October 23, 2024
-
5 min

As a programmer, I find that coding is an art. When working on a complex project, there are many steps to follow in order to develop a coherent, solid, and sustainable code that can be read and resumed by other contributors:

Project's E2E steps
  • First, it is essential to understand the problem in order to answer the right need
  • The project can be chopped up into sub-projects that facilitate the task and especially the collaboration
  • Your pipeline is a straightforward result of your sub-projects
  • The code’s structure should follow the same fragmentation so that you can have the same Pipeline and technical logic
  • Baby steps can be followed in each fragment in order to efficiently develop your code

In this article, we will see how to go from a business need to a fully functional python code that is straightforward to apprehend.

The summary is as follows:

  1. Project Example
  2. Project organization
  3. Imports in Python

Project example

For the sake of illustration, we will consider the following business need:

As a highway manager, I would like to carry out a daily count of the vehicles using a given itinerary. In order to answer this need, a data science team was put on the project and decided to use a fixed camera and count the unique number of license plates.
(one suggestion among others)

This idea can be seen as the sequence of several steps (a simplified decomposition among others):

  • Vehicle’s detection
  • License plate’s detection
  • License plate’s OCRization

Hence, the following pipeline:

Technical pipeline

Project’s organization

Given the pipeline above, one can organize the code as follows:

|--data/ #useful to store temporary files for instance
|--Tests/ #hosts the functional unit testings of your code/api
|--notebooks/ #helpful for testing and developping and debugging
|--|--develop.ipynb
|--weights/ #weights are kept a part for easier manipulation
|--counthighwaypy/ #folder hosting your entire code
|--|--detectvehicle/ #1st brick of your pipeline
|--|--|--detect_vehicle_main.py
|--|--|--detect_vehicle_utils.py
|--|--|--detect_vehicle_conf.py
|--|--|--Tests/ #independant unit testings relative to 1st brick
|--|--detectlicenseplate/ #2nd brick of your pipeline
|--|--|--licence_plate_main.py
|--|--|--licence_plate_utils.py
|--|--|--licence_plate_conf.py
|--|--|--Tests/ #independant unit testings relative to 1st brick
|--|--ocrlicenseplate/ #3rd brick of your pipeline
|--|--|--ocr_license_main.py
|--|--|--ocr_license_utils.py
|--|--|--ocr_license_conf.py
|--|--|--Tests/ #independant unit testings relative to 1st brick

|--|--utils.py
|--|--conf.py #! very  important file (see below)
|--|--main.py # orchestrator of the different bricks
|--|--Tests/ #E2E technical unit testings
+--README.md
+--app.py #hosts your API and calls the main.py
+--packages.txt #python environment
+--launch_tests.sh #functional unit testings
+--pytest.ini
+--Dockerfile

As mentioned in the previous section, the structure of the repository follows the same logic within the pipeline.

  • Each brick hosts :
    + utils file: contains all the auxiliary functions of your brick
    + conf file: contains all the constant parameters (names of variables, directories, the value of hyper-parameters, …)
    + main file: usually hosts one function that assembles all the baby step functions in utils file
    + Tests folder: contains unit testings that allow evaluating the regressions and the improvements specific to the brick independently from the other ones. It is an essential principle that enables faster and more efficient debugging.
  • When working on machine learning algorithms, it is better to store the potential weights at the root of the project since they might be replaced very often durant the development.
  • Trying new features can be easily carried out using notebooks. Given the structure, each one should host the following python code in order to be able to “see” and import the module counthighwaypy:

import os
import sys
sys.path.insert(0, os.path.abspath("../")) #visible parents folders
from counthighwaypy import ...
### your code
  • Within the module counthighwaypy, it is important to have a utils file, conf and a main file that orchestrates the different bricks without forgetting the E2E Tests
  • The conf file is very essential because it sets the root of the project and its different sub-modules and directories. It can be written as follows:
import os
PROJECT_ROOT = os.path.realpath(os.path.join(os.path.realpath(__file__), "../.."))
##### directories
DATA = os.path.join(PROJECT_ROOT, "data/")
NOTEBOOK= os.path.join(PROJECT_ROOT, "notebooks")
SRC = os.path.join(PROJECT_ROOT, "counthighwaypy/")
WEIGHTS = os.path.join(PROJECT_ROOT, "weights/")
MODULE_DETECT_VEHICLE = os.path.join(SRC, "detectvehicle/")
MODULE_DETECT_LICENCE_PLATE = os.path.join(SRC, "detectlicenseplate/")
MODULE_OCR_LICENSE_PLATE = os.path.join(SRC, "ocrlicenseplate/")
  • The app file encapsulates your project into an API that can be consumable by other users and services
  • Additional files such as packages.txt, pytest.ini, and Dockerfile are placed at the root of the project

Once the code’s structure is set, it is better to develop each brick in a standalone format independently from the others. With that being said, here are some guidelines you can follow:

  • Set the format of the input & output of each brick, where the output of the brick i is the input of the brick i+1
  • Write the code’s canvas (empty functions) in a simple way when you read it, you instantly understand what the script does
  • Don’t forget the signatures and the comments
  • Use a code versioning tool, git for example, for more efficient collaboration
  • Keep your code clean using code formatters and code linters
  • For cross-team collaboration, expose your code/package as a consumable API

Imports in Python

Since Python 3.3, a folder folderame is considered as a module (without the need of an __init__.py file) and can be simply imported in a python file, as long as it is visible i.e on the same tree level, by using:

import foldername

Say we have the following structure:

|--FOLDER1/ |--|--file1.py|--FOLDER2/ |--|--file2.py|--main.py
  • In main.py we can
import FOLDER1.file1
import FOLDER2.file2
  • To import file2 in file1:
import os
import sys
#make FOLDER2 visible to file1 (one step up in the tree)
sys.path.insert(0, os.path.abspath("../"))
from FOLDER2 import file2

In a complex python project, in order to keep your imports consistent, it is recommended to start all them all from the source of your code. In our case, start all your imports in any .py file with:

from counthighwaypy.xxx.xxx import xxx

Conclusion

There are other ways to structure your python project, but I find the one described in this article straightforward to understand and easy to follow. It can also be applied to languages other than Python.

I hope you have enjoyed reading this article and that it will help you organize your work better in the future.
All comments and suggestions are welcome!

As a programmer, I find that coding is an art. When working on a complex project, there are many steps to follow in order to develop a coherent, solid, and sustainable code that can be read and resumed by other contributors: