Creating spdx-tv output sometimes fails with an exception #448
|
@sschuberth yes, by design directories are also returned in a scan. Rather than letting the spdx library compute the sha1 with |
|
I agree it makes sense to reuse an existing SHA1 for a file if it's known, but my understanding is that it's not known unless you pass |
|
Actually the file information are always collected, in particular because they are used for cache handling. Other type and related information are also always computed as they are by various scans too. So a sha1 is always computed whether or not you ask for it in the scan with |
Ah! That sort of explains why the scan is so slow ;-P |
|
I am pretty sure the impact of a SHA1 on scanning times is pretty small. In particular this allows to cache and stream results to support multiprocessing and a side effect of caching is that a file is scanned only once in a codebase that would contain multiple times the same file. Though it could be worth measuring it of course. |
|
So, I tried to read the SHA1 from the cache, but it fails for know, see PR #449. The file name is not found in the hash table, it seems:
Would you mind having a look to point me into the right direction? |
Not sure yet what's going on, but I'd like to document it here. For me running
might end up with
@pombredanne In
Algorithm('SHA1', file_entry.calc_chksum())thefile_entrycomes from here, so it looks likefile_data['path']might contain the path to a directory instead of to a file. Is that correct / by design?