#120 Improved statistics calculation performance

병합
jens.klein stefan.schmidt/stat_perf 에서 SPIN/master 로 12 commits 를 머지했습니다 6 년 전
Stefan Schmidt 코멘트됨, 6 년 전

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time.

Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM.

I verified that this change doesn't alter the content of the generated statistics files.

This should solve #119, although there probably are more performance improvements possible.

/edit:

I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect here (timestamps are weird, but the memory amounts are correct).

/edit 2:

I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time. Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM. I verified that this change doesn't alter the content of the generated statistics files. This should solve #119, although there probably are more performance improvements possible. /edit: I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect [here](https://imgur.com/a/gVjksyG) (timestamps are weird, but the memory amounts are correct). /edit 2: I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.
Carlos Garcia 코멘트됨, 6 년 전
소유자

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.
Carlos Garcia 코멘트됨, 6 년 전
소유자

@stefan.schmidt One small issue to address: The calculation of node degrees in statistics.cpp (as in issue #119) should only be calculated if the -t flag (for advanced statistics) is given.

As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because -t is not specified).

@stefan.schmidt One small issue to address: The calculation of node degrees in `statistics.cpp` (as in issue #119) should only be calculated if the `-t` flag (for advanced statistics) is given. As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because `-t` is not specified).
Stefan Schmidt 코멘트됨, 6 년 전
협업자

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the -t flag if that's ok.

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the ```-t``` flag if that's ok.
Carlos Garcia 코멘트됨, 6 년 전
소유자

I'll move them both behind the -t flag

@stefan.schmidt, yes, that is the way to go.

> I'll move them both behind the -t flag @stefan.schmidt, yes, that is the way to go.
Stefan Schmidt 코멘트됨, 6 년 전
협업자

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.
이 풀리퀘스트가 성공적으로 머지되었습니다!
로그인하여 이 대화에 참여
레이블 없음
Bug
마일스톤 없음
담당자 없음
참여자 2명
로딩중...
취소
저장
아직 콘텐츠가 없습니다.