Домой United States USA — software Processing 3D Data Using Python Multiprocessing Library

Processing 3D Data Using Python Multiprocessing Library

По

September 26, 2021

247

Large amounts of data reveal problems that require creative approaches. Fortunately, Python language and its extensive set of libraries can help.
Join the DZone community and get the full member experience. Today we’ll cover the tools that are very handy with large amount of data. I’m not going to tell you only general information that might be found in manuals but share some little tricks that I’ve discovered, such as using tqdm with multiprocessing imap, working with archives in parallel, plotting and processing 3D data, and how to search for a similar object within object meshes if you have a point cloud. So why should we resort to parallel computing? Nowadays, if you work with any kind of data you might face problems related to «big data». Each time we have the data that doesn’t fit the RAM we need to process it piece by piece. Fortunately, modern programming languages allow us to spawn multiple processes (or even threads) that work perfectly on multi-core processors. ( NB: That doesn’t mean that single-core processors cannot handle multiprocessing. Here’s the Stack Overflow thread on that topic.) Today we’ll try our hand at the frequently occurring 3D computer vision task of computing distances between mesh and point cloud. You might face this problem, for example, when you need to find a mesh within all available meshes that defines the same 3D object as the given point cloud. Our data consist of.obj files stored in.7z archive, which is great in terms of storage efficiency. But when we need to access the exact portion of it, we should make an effort. Here I define the class that wraps up the 7-zip archive and provides an interface to the underlying data. This class hardly relies on py7zlib package that allows us to decompress data each time we call get method and give us the number of files inside an archive. We also define __iter__ that will help us to start multiprocessing map on that object as on the iterable. As you might know, it is possible to create a Python class from which one can instantiate iterable objects. Such class should meet the following conditions: override __getitem__ to return self and __next__ to return following element. And we are definitely following this rule. The above definition provides us a possibility to iterate over the archive but does it allow us to do a random access to contents in parallel? It’s an interesting question, to which I haven’t found an answer online, but we can research the source code of py7zlib and try to answer by ourselves.