Eppy suffers from performance issues for larger IDF models #303

samuelduchesne · 2020-08-31T20:28:29Z

I have found out that loading some of the heftier models can take a while to load.

I am investigating possible fixes

santoshphilip · 2020-09-01T00:54:14Z

ha ! I was not aware of that.
I know that the IDD takes time to load. Did not know that about the IDF.
Are you running a profiler on it, to see where it is taking time?

Let us know what you discover

samuelduchesne · 2020-09-02T23:14:29Z

Yes, I'm running yappi to figure out which function is taking a lot of time. Then I use line_profile to profile line by line.

I found out that "only_legalcharrs" is very heavy, especially since it can be called millions of times.

eppy/eppy/bunchhelpers.py

Lines 17 to 20 in 98e5858

    
           def onlylegalchar(name): 
        
               """return only legal chars""" 
        
               legalchar = ascii_letters + digits + " " 
        
               return "".join([s for s in name[:] if s in legalchar])

Possible solution

A simple fix is to encode and decode the string:

def faster_onlylegalchar(name):
    return name.encode('ascii', 'ignore').decode()

It is much faster (~7x faster). This translated in a 30% speed increase when parsing a large IDF file.

>>> %timeit onlylegalchar("This is a non-ascii: Æ")
2 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit faster_onlylegalchar("This is a non-ascii: Æ")
272 ns ± 1.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

The result does keep characters the onlylegalchars would keep such as dashes - and :

>>> onlylegalchar("This is a non-ascii: Æ")
'This is a nonascii '

>>> faster_onlylegalchar("This is a non-ascii: Æ")
'This is a non-ascii: '

Let's see if the CI is happy...

santoshphilip · 2020-09-03T10:30:17Z

Apparently CI is unhappy
https://ci.appveyor.com/project/santoshphilip/eppy/builds/34998766

I am making some changes in response to #287 that will fix some of that unhappiness. I'll merge those changes in a couple of days.

samuelduchesne · 2020-09-03T17:09:51Z

Another lag is created by these two decorators: make_idd_index and extractidddata which essentially translates into 3 calls to extractidddata. Rewriting this function would speed up significantly the idfreader1 function.

eppy/eppy/EPlusInterfaceFunctions/parse_idd.py

Lines 143 to 379 in 98e5858

    
           @make_idd_index 
        
           @embedgroupdata 
        
           def extractidddata(fname, debug=False): 
        
               """ 
        
               extracts all the needed information out of the idd file 
        
               if debug is True,  it generates a series of text files. 
        
               Each text file is incrementally different. You can do a diff 
        
               see what the change is 
        
               - 
        
               this code is from 2004. 
        
               it works. 
        
               I am trying not to change it (until I rewrite the whole thing)  
        
               to add functionality to it, I am using decorators 
        
               So if 
        
               Does not integrate group data into the results (@embedgroupdata does it) 
        
               Does not integrate iddindex into the results (@make_idd_index does it) 
        
               """ 
        
               try: 
        
                   if isinstance(fname, (file, StringIO)): 
        
                       astr = fname.read() 
        
                       try: 
        
                           astr = astr.decode("ISO-8859-2") 
        
                       except AttributeError: 
        
                           pass 
        
                   else: 
        
                       astr = mylib2.readfile(fname) 
        
                       # astr = astr.decode('ISO-8859-2') -> mylib1 does a decode 
        
               except NameError: 
        
                   if isinstance(fname, (FileIO, StringIO)): 
        
                       astr = fname.read() 
        
                       try: 
        
                           astr = astr.decode("ISO-8859-2") 
        
                       except AttributeError: 
        
                           pass 
        
                   else: 
        
                       astr = mylib2.readfile(fname) 
        
                       # astr = astr.decode('ISO-8859-2') -> mylib2.readfile has decoded 
        
               (nocom, nocom1, blocklst) = get_nocom_vars(astr) 
        
               astr = nocom 
        
               st1 = removeblanklines(astr) 
        
               if debug: 
        
                   mylib1.write_str2file("nocom2.txt", st1.encode("latin-1")) 
        
               # find the groups and the start object of the group 
        
               # find all the group strings 
        
               groupls = [] 
        
               alist = st1.splitlines() 
        
               for element in alist: 
        
                   lss = element.split() 
        
                   if lss[0].upper() == "\\group".upper(): 
        
                       groupls.append(element) 
        
               # find the var just after each item in groupls 
        
               groupstart = [] 
        
               for i in range(len(groupls)): 
        
                   iindex = alist.index(groupls[i]) 
        
                   groupstart.append([alist[iindex], alist[iindex + 1]]) 
        
               # remove the group commentline 
        
               for element in groupls: 
        
                   alist.remove(element) 
        
               if debug: 
        
                   st1 = "\n".join(alist) 
        
                   mylib1.write_str2file("nocom3.txt", st1.encode("latin-1")) 
        
               # strip each line 
        
               for i in range(len(alist)): 
        
                   alist[i] = alist[i].strip() 
        
               if debug: 
        
                   st1 = "\n".join(alist) 
        
                   mylib1.write_str2file("nocom4.txt", st1.encode("latin-1")) 
        
               # ensure that each line is a comment or variable 
        
               # find lines that don't start with a comment 
        
               # if this line has a comment in it 
        
               #   then move the comment to a new line below 
        
               lss = [] 
        
               for i in range(len(alist)): 
        
                   # find lines that don't start with a comment 
        
                   if alist[i][0] != "\\": 
        
                       # if this line has a comment in it 
        
                       pnt = alist[i].find("\\") 
        
                       if pnt != -1: 
        
                           # then move the comment to a new line below 
        
                           lss.append(alist[i][:pnt].strip()) 
        
                           lss.append(alist[i][pnt:].strip()) 
        
                       else: 
        
                           lss.append(alist[i]) 
        
                   else: 
        
                       lss.append(alist[i]) 
        
               alist = lss[:] 
        
               if debug: 
        
                   st1 = "\n".join(alist) 
        
                   mylib1.write_str2file("nocom5.txt", st1.encode("latin-1")) 
        
               # need to make sure that each line has only one variable - as in WindowGlassSpectralData, 
        
               lss = [] 
        
               for element in alist: 
        
                   # if the line is not a comment 
        
                   if element[0] != "\\": 
        
                       # test for more than one var 
        
                       llist = element.split(",") 
        
                       if llist[-1] == "": 
        
                           tmp = llist.pop() 
        
                       for elm in llist: 
        
                           if elm[-1] == ";": 
        
                               lss.append(elm.strip()) 
        
                           else: 
        
                               lss.append((elm + ",").strip()) 
        
                   else: 
        
                       lss.append(element) 
        
               ls_debug = alist[:]  # needed for the next debug - 'nocom7.txt' 
        
               alist = lss[:] 
        
               if debug: 
        
                   st1 = "\n".join(alist) 
        
                   mylib1.write_str2file("nocom6.txt", st1.encode("latin-1")) 
        
               if debug: 
        
                   # need to make sure that each line has only one variable - as in WindowGlassSpectralData, 
        
                   # this is same as above. 
        
                   # but the variables are put in without the ';' and ',' 
        
                   # so we can do a diff between 'nocom7.txt' and 'nocom8.txt'. Should be identical 
        
                   lss_debug = [] 
        
                   for element in ls_debug: 
        
                       # if the line is not a comment 
        
                       if element[0] != "\\": 
        
                           # test for more than one var 
        
                           llist = element.split(",") 
        
                           if llist[-1] == "": 
        
                               tmp = llist.pop() 
        
                           for elm in llist: 
        
                               if elm[-1] == ";": 
        
                                   lss_debug.append(elm[:-1].strip()) 
        
                               else: 
        
                                   lss_debug.append((elm).strip()) 
        
                       else: 
        
                           lss_debug.append(element) 
        
                   ls_debug = lss_debug[:] 
        
                   st1 = "\n".join(ls_debug) 
        
                   mylib1.write_str2file("nocom7.txt", st1.encode("latin-1")) 
        
               # replace each var with '=====var======' 
        
               # join into a string, 
        
               # split using '=====var=====' 
        
               for i in range(len(lss)): 
        
                   # if the line is not a comment 
        
                   if lss[i][0] != "\\": 
        
                       lss[i] = "=====var=====" 
        
               st2 = "\n".join(lss) 
        
               lss = st2.split("=====var=====\n") 
        
               lss.pop(0)  # the above split generates an extra item at start 
        
               if debug: 
        
                   fname = "nocom8.txt" 
        
                   fhandle = open(fname, "wb") 
        
                   k = 0 
        
                   for i in range(len(blocklst)): 
        
                       for j in range(len(blocklst[i])): 
        
                           atxt = blocklst[i][j] + "\n" 
        
                           fhandle.write(atxt) 
        
                           atxt = lss[k] 
        
                           fhandle.write(atxt.encode("latin-1")) 
        
                           k = k + 1 
        
                   fhandle.close() 
        
               # map the structure of the comments -(this is 'lss' now) to 
        
               # the structure of blocklst - blocklst is a nested list 
        
               # make lss a similar nested list 
        
               k = 0 
        
               lst = [] 
        
               for i in range(len(blocklst)): 
        
                   lst.append([]) 
        
                   for j in range(len(blocklst[i])): 
        
                       lst[i].append(lss[k]) 
        
                       k = k + 1 
        
               if debug: 
        
                   fname = "nocom9.txt" 
        
                   fhandle = open(fname, "wb") 
        
                   k = 0 
        
                   for i in range(len(blocklst)): 
        
                       for j in range(len(blocklst[i])): 
        
                           atxt = blocklst[i][j] + "\n" 
        
                           fhandle.write(atxt) 
        
                           fhandle.write(lst[i][j].encode("latin-1")) 
        
                           k = k + 1 
        
                   fhandle.close() 
        
               # break up multiple line comment so that it is a list 
        
               for i in range(len(lst)): 
        
                   for j in range(len(lst[i])): 
        
                       lst[i][j] = lst[i][j].splitlines() 
        
                       # remove the '\' 
        
                       for k in range(len(lst[i][j])): 
        
                           lst[i][j][k] = lst[i][j][k][1:] 
        
               commlst = lst 
        
               # copied with minor modifications from readidd2_2.py -- which has been erased ha ! 
        
               clist = lst 
        
               lss = [] 
        
               for i in range(0, len(clist)): 
        
                   alist = [] 
        
                   for j in range(0, len(clist[i])): 
        
                       itt = clist[i][j] 
        
                       ddtt = {} 
        
                       for element in itt: 
        
                           if len(element.split()) == 0: 
        
                               break 
        
                           ddtt[element.split()[0].lower()] = [] 
        
                       for element in itt: 
        
                           if len(element.split()) == 0: 
        
                               break 
        
                           # ddtt[element.split()[0].lower()].append(string.join(element.split()[1:])) 
        
                           ddtt[element.split()[0].lower()].append(" ".join(element.split()[1:])) 
        
                       alist.append(ddtt) 
        
                   lss.append(alist) 
        
               commdct = lss 
        
               # add group information to commlst and commdct 
        
               # glist = iddgroups.idd2grouplist(fname) 
        
               # commlst = group2commlst(commlst, glist) 
        
               # commdct = group2commdct(commdct, glist) 
        
               return blocklst, commlst, commdct 
        
               # give blocklst a better name :-(

santoshphilip · 2020-09-03T17:26:29Z

I suspected make_idd_index was a culprit.

I used make_idd_index as a way of speeding up searches thru the file.

like finding walls in a zone
the idea behind it was to have something like indexing in databases (mongodb was the inspiration)
I don't think it speeded things up much.

removing it is an option

regarding embedgroupdata, I have look into it to see what embedgroupdata does and why it is useful. I don't recall off the top of my head

santoshphilip · 2020-09-03T17:29:08Z

Do you have a sample large file I can play with.

I also want to test it with eppy3000. So far eppy3000 has turned out to be 5-10 faster than eppy

santoshphilip · 2020-09-03T17:49:14Z

I updated branch develop with some changes.
It may fix some of your CI issues.

samuelduchesne · 2020-09-03T19:46:44Z

Do you have a sample large file I can play with.

I also want to test it with eppy3000. So far eppy3000 has turned out to be 5-10 faster than eppy

Yes I need to check that out! Happy to see you are moving to epjson

samuelduchesne · 2020-09-03T19:48:42Z

I updated branch develop with some changes.
It may fix some of your CI issues.

latest changes on #304 seem to pass all tests.

I implemented a regex instead of encode/decode. It is not as fast, but at least follows the same behavior as the original onlylegalchars function.

samuelduchesne · 2020-09-03T19:50:50Z

#304 also merges the 2 decorators inside the extractidddata, which prevents calling it 3 times.

samuelduchesne added a commit to samuelduchesne/eppy that referenced this issue Sep 2, 2020

addresses issue santoshphilip#303

1422b91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eppy suffers from performance issues for larger IDF models #303

Eppy suffers from performance issues for larger IDF models #303

samuelduchesne commented Aug 31, 2020

santoshphilip commented Sep 1, 2020

samuelduchesne commented Sep 2, 2020

santoshphilip commented Sep 3, 2020

samuelduchesne commented Sep 3, 2020

santoshphilip commented Sep 3, 2020

santoshphilip commented Sep 3, 2020

santoshphilip commented Sep 3, 2020

samuelduchesne commented Sep 3, 2020

samuelduchesne commented Sep 3, 2020

samuelduchesne commented Sep 3, 2020

Eppy suffers from performance issues for larger IDF models #303

Eppy suffers from performance issues for larger IDF models #303

Comments

samuelduchesne commented Aug 31, 2020

santoshphilip commented Sep 1, 2020

samuelduchesne commented Sep 2, 2020

Possible solution

santoshphilip commented Sep 3, 2020

samuelduchesne commented Sep 3, 2020

santoshphilip commented Sep 3, 2020

santoshphilip commented Sep 3, 2020

santoshphilip commented Sep 3, 2020

samuelduchesne commented Sep 3, 2020

samuelduchesne commented Sep 3, 2020

samuelduchesne commented Sep 3, 2020