- Joined
- 11/5/18
- Messages
- 303
- Points
- 53
Essentially, for rows whose
work_height
, work_width
, work_depth
dimensions are missing but there's a description of those dimensions in the work_dimensions
column, I want to parse the said description into the work_height
, work_width
, work_depth
columns. There are a few types of structures available based on my exploration:- __ unit x __ unit x __ unit e.g.
200 x 300 mm
. This one should be easy. - __ unit x __ unit \newline __ unit x __ unit, e.g.
200 x 300 mm\n400 x 760 mm
I believe these are two different image dimension settings possible for the same image. I want to create a new image item (row) with the second setting (or third or whatever). - The written out mixed fractions, e.g.
16 7/8 in (42.8 cm)
or16 7/8in (42.8cm)
. How is this supposed to be parsed? This is one of the hard ones. Since the unit columnwork_measurement_unit
is generally mm, that's the unit to parse I presume (and even then I have to convert from cm to mm). - Measurement Description, followed by the mixed fraction and other unit in parentheses above, i.e.
Diameter: 19 3/7 in (72.5 cm)
.
[CODE lang="python" title="code to get missing data"]mask = (df['work_dimensions'] != '-1') & (df['work_dimensions'].notnull()) & ((df[['work_height','work_width','work_depth']] == -1.0).sum(axis=1) == 3)
df[['work_dimensions','work_height','work_width','work_depth','work_measurement_unit']][mask][/CODE]
I'm not too familiar with regexp stuff in Python or in general so any help would be appreciated!