https://www.youtube.com/watch?v=ImWfIDTxn7E
Use questions from comments:
sudo apt update
sudo apt install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev libffi-dev curl libbz2-dev
curl -O https://www.python.org/ftp/python/3.8.3/Python-3.8.3.tgz
./configure --enable-optimizations
make -j $(nproc)
python -m spacy download en_core_web_sm
The queue wrapper is going to be created by a parent process. Then it’s going to be pickled and then passed to our worker processes. Pickling is a Python specific object serialization format, which makes it easy to save the state of an object. So it’s pickled and then passed to our worker processes, which means the property specified in the class are going to be copies of the version that was created by the parent. If we create an instance of this class and pass it to two processes, we have three instances of QueueWrapper. Since the Queue is multi-process aware, those copies are going to use a proxy that Python creates for us automatically to interact with the Queue. For properties that are not multi-process aware, those values don’t sync up automatically. Here’s a more complete example. Imagine we add a Boolean property to QueueWrapper and we set the value to True. When we create an instance of QueueWrapper and we pass it off to our worker processes, each process is handed a pickled copy of QueueWrapper. We now have our three copies of QueueWrapper, the original resides in the parent process, and we have one for each of the two worker processes. Again, we can use the Queue from each because the multiprocessing Queue is designed to be used by different processes. The Boolean properties being primitive typed are all independent. Changing one is not going to impact the others. So we need a mechanism that is multi-process aware to determine if the queue is writeable. The way I chose to solve this problem is to use a multiprocessing Event. This is a multiprocessing version of the threading.Event. The idea is that an event is basically a Boolean value and that it’s either set or unset. When an event is set, any process that is listening for that event is notified. This is similar to our multiprocessing Queue in that because it was designed to be multi-process aware, we can create it in the parent process and to pass it to our worker processes.
from setuptools import setup, find_packages
setup(
name="ingestion",
version="0.0.1",
author="Ben Lambert",
author_email="support@cloudacademy.com",
description="A demo application created to accompany a Python course.",
keywords="learn python cloud academy",
url="http://cloudacademy.com",
packages=find_packages(),
entry_points={"console_scripts": [
"ingestiond=ingest.backend:main",
"getdataset=simulator.download:download_and_extract",
"uploaddataset=simulator.upload:main",
]},
install_requires=[
"fastapi==0.58.0",
"google-cloud-firestore==2.7.0",
"pydantic==1.5.1",
"uvicorn==0.11.7",
"gunicorn==20.0.4",
"passlib==1.7.2",
"bcrypt==3.1.7",
"PyJWT==1.7.1",
"spacy==2.3.2",
"spacy-lookups-data==0.3.2",
"typer==0.3.0",
"httpx==0.13.3",
"supervisor==4.2.0",
],
extras_require={
"dev": [
"pytest==5.4.3",
],
"web": [
"wordcloud==1.7.0",
"falcon",
"falcon==2.0.0",
"google-cloud-storage==1.29.0",
]
}
)