Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eppy suffers from performance issues for larger IDF models #303

Open
samuelduchesne opened this issue Aug 31, 2020 · 10 comments
Open

Eppy suffers from performance issues for larger IDF models #303

samuelduchesne opened this issue Aug 31, 2020 · 10 comments

Comments

@samuelduchesne
Copy link
Contributor

I have found out that loading some of the heftier models can take a while to load.

I am investigating possible fixes

@santoshphilip
Copy link
Owner

ha ! I was not aware of that.
I know that the IDD takes time to load. Did not know that about the IDF.
Are you running a profiler on it, to see where it is taking time?

Let us know what you discover

@samuelduchesne
Copy link
Contributor Author

Yes, I'm running yappi to figure out which function is taking a lot of time. Then I use line_profile to profile line by line.

I found out that "only_legalcharrs" is very heavy, especially since it can be called millions of times.

def onlylegalchar(name):
"""return only legal chars"""
legalchar = ascii_letters + digits + " "
return "".join([s for s in name[:] if s in legalchar])

Possible solution

A simple fix is to encode and decode the string:

def faster_onlylegalchar(name):
    return name.encode('ascii', 'ignore').decode()

It is much faster (~7x faster). This translated in a 30% speed increase when parsing a large IDF file.

>>> %timeit onlylegalchar("This is a non-ascii: Æ")
2 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit faster_onlylegalchar("This is a non-ascii: Æ")
272 ns ± 1.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

The result does keep characters the onlylegalchars would keep such as dashes - and :

>>> onlylegalchar("This is a non-ascii: Æ")
'This is a nonascii '

>>> faster_onlylegalchar("This is a non-ascii: Æ")
'This is a non-ascii: '

Let's see if the CI is happy...

samuelduchesne added a commit to samuelduchesne/eppy that referenced this issue Sep 2, 2020
@santoshphilip
Copy link
Owner

Apparently CI is unhappy
https://ci.appveyor.com/project/santoshphilip/eppy/builds/34998766

I am making some changes in response to #287 that will fix some of that unhappiness. I'll merge those changes in a couple of days.

@samuelduchesne
Copy link
Contributor Author

Another lag is created by these two decorators: make_idd_index and extractidddata which essentially translates into 3 calls to extractidddata. Rewriting this function would speed up significantly the idfreader1 function.

@make_idd_index
@embedgroupdata
def extractidddata(fname, debug=False):
"""
extracts all the needed information out of the idd file
if debug is True, it generates a series of text files.
Each text file is incrementally different. You can do a diff
see what the change is
-
this code is from 2004.
it works.
I am trying not to change it (until I rewrite the whole thing)
to add functionality to it, I am using decorators
So if
Does not integrate group data into the results (@embedgroupdata does it)
Does not integrate iddindex into the results (@make_idd_index does it)
"""
try:
if isinstance(fname, (file, StringIO)):
astr = fname.read()
try:
astr = astr.decode("ISO-8859-2")
except AttributeError:
pass
else:
astr = mylib2.readfile(fname)
# astr = astr.decode('ISO-8859-2') -> mylib1 does a decode
except NameError:
if isinstance(fname, (FileIO, StringIO)):
astr = fname.read()
try:
astr = astr.decode("ISO-8859-2")
except AttributeError:
pass
else:
astr = mylib2.readfile(fname)
# astr = astr.decode('ISO-8859-2') -> mylib2.readfile has decoded
(nocom, nocom1, blocklst) = get_nocom_vars(astr)
astr = nocom
st1 = removeblanklines(astr)
if debug:
mylib1.write_str2file("nocom2.txt", st1.encode("latin-1"))
# find the groups and the start object of the group
# find all the group strings
groupls = []
alist = st1.splitlines()
for element in alist:
lss = element.split()
if lss[0].upper() == "\\group".upper():
groupls.append(element)
# find the var just after each item in groupls
groupstart = []
for i in range(len(groupls)):
iindex = alist.index(groupls[i])
groupstart.append([alist[iindex], alist[iindex + 1]])
# remove the group commentline
for element in groupls:
alist.remove(element)
if debug:
st1 = "\n".join(alist)
mylib1.write_str2file("nocom3.txt", st1.encode("latin-1"))
# strip each line
for i in range(len(alist)):
alist[i] = alist[i].strip()
if debug:
st1 = "\n".join(alist)
mylib1.write_str2file("nocom4.txt", st1.encode("latin-1"))
# ensure that each line is a comment or variable
# find lines that don't start with a comment
# if this line has a comment in it
# then move the comment to a new line below
lss = []
for i in range(len(alist)):
# find lines that don't start with a comment
if alist[i][0] != "\\":
# if this line has a comment in it
pnt = alist[i].find("\\")
if pnt != -1:
# then move the comment to a new line below
lss.append(alist[i][:pnt].strip())
lss.append(alist[i][pnt:].strip())
else:
lss.append(alist[i])
else:
lss.append(alist[i])
alist = lss[:]
if debug:
st1 = "\n".join(alist)
mylib1.write_str2file("nocom5.txt", st1.encode("latin-1"))
# need to make sure that each line has only one variable - as in WindowGlassSpectralData,
lss = []
for element in alist:
# if the line is not a comment
if element[0] != "\\":
# test for more than one var
llist = element.split(",")
if llist[-1] == "":
tmp = llist.pop()
for elm in llist:
if elm[-1] == ";":
lss.append(elm.strip())
else:
lss.append((elm + ",").strip())
else:
lss.append(element)
ls_debug = alist[:] # needed for the next debug - 'nocom7.txt'
alist = lss[:]
if debug:
st1 = "\n".join(alist)
mylib1.write_str2file("nocom6.txt", st1.encode("latin-1"))
if debug:
# need to make sure that each line has only one variable - as in WindowGlassSpectralData,
# this is same as above.
# but the variables are put in without the ';' and ','
# so we can do a diff between 'nocom7.txt' and 'nocom8.txt'. Should be identical
lss_debug = []
for element in ls_debug:
# if the line is not a comment
if element[0] != "\\":
# test for more than one var
llist = element.split(",")
if llist[-1] == "":
tmp = llist.pop()
for elm in llist:
if elm[-1] == ";":
lss_debug.append(elm[:-1].strip())
else:
lss_debug.append((elm).strip())
else:
lss_debug.append(element)
ls_debug = lss_debug[:]
st1 = "\n".join(ls_debug)
mylib1.write_str2file("nocom7.txt", st1.encode("latin-1"))
# replace each var with '=====var======'
# join into a string,
# split using '=====var====='
for i in range(len(lss)):
# if the line is not a comment
if lss[i][0] != "\\":
lss[i] = "=====var====="
st2 = "\n".join(lss)
lss = st2.split("=====var=====\n")
lss.pop(0) # the above split generates an extra item at start
if debug:
fname = "nocom8.txt"
fhandle = open(fname, "wb")
k = 0
for i in range(len(blocklst)):
for j in range(len(blocklst[i])):
atxt = blocklst[i][j] + "\n"
fhandle.write(atxt)
atxt = lss[k]
fhandle.write(atxt.encode("latin-1"))
k = k + 1
fhandle.close()
# map the structure of the comments -(this is 'lss' now) to
# the structure of blocklst - blocklst is a nested list
# make lss a similar nested list
k = 0
lst = []
for i in range(len(blocklst)):
lst.append([])
for j in range(len(blocklst[i])):
lst[i].append(lss[k])
k = k + 1
if debug:
fname = "nocom9.txt"
fhandle = open(fname, "wb")
k = 0
for i in range(len(blocklst)):
for j in range(len(blocklst[i])):
atxt = blocklst[i][j] + "\n"
fhandle.write(atxt)
fhandle.write(lst[i][j].encode("latin-1"))
k = k + 1
fhandle.close()
# break up multiple line comment so that it is a list
for i in range(len(lst)):
for j in range(len(lst[i])):
lst[i][j] = lst[i][j].splitlines()
# remove the '\'
for k in range(len(lst[i][j])):
lst[i][j][k] = lst[i][j][k][1:]
commlst = lst
# copied with minor modifications from readidd2_2.py -- which has been erased ha !
clist = lst
lss = []
for i in range(0, len(clist)):
alist = []
for j in range(0, len(clist[i])):
itt = clist[i][j]
ddtt = {}
for element in itt:
if len(element.split()) == 0:
break
ddtt[element.split()[0].lower()] = []
for element in itt:
if len(element.split()) == 0:
break
# ddtt[element.split()[0].lower()].append(string.join(element.split()[1:]))
ddtt[element.split()[0].lower()].append(" ".join(element.split()[1:]))
alist.append(ddtt)
lss.append(alist)
commdct = lss
# add group information to commlst and commdct
# glist = iddgroups.idd2grouplist(fname)
# commlst = group2commlst(commlst, glist)
# commdct = group2commdct(commdct, glist)
return blocklst, commlst, commdct
# give blocklst a better name :-(

@santoshphilip
Copy link
Owner

I suspected make_idd_index was a culprit.

I used make_idd_index as a way of speeding up searches thru the file.

  • like finding walls in a zone
  • the idea behind it was to have something like indexing in databases (mongodb was the inspiration)
  • I don't think it speeded things up much.

removing it is an option

regarding embedgroupdata, I have look into it to see what embedgroupdata does and why it is useful. I don't recall off the top of my head

@santoshphilip
Copy link
Owner

Do you have a sample large file I can play with.

I also want to test it with eppy3000. So far eppy3000 has turned out to be 5-10 faster than eppy

@santoshphilip
Copy link
Owner

I updated branch develop with some changes.
It may fix some of your CI issues.

@samuelduchesne
Copy link
Contributor Author

Do you have a sample large file I can play with.

I also want to test it with eppy3000. So far eppy3000 has turned out to be 5-10 faster than eppy

Yes I need to check that out! Happy to see you are moving to epjson

@samuelduchesne
Copy link
Contributor Author

I updated branch develop with some changes.
It may fix some of your CI issues.

latest changes on #304 seem to pass all tests.

I implemented a regex instead of encode/decode. It is not as fast, but at least follows the same behavior as the original onlylegalchars function.

@samuelduchesne
Copy link
Contributor Author

#304 also merges the 2 decorators inside the extractidddata, which prevents calling it 3 times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants