ghidra-fid-generator
How do I use it?
Pre-requisites:
- Set
GHIDRA_HOMEenv var to your Ghidra installation, e.g. viaexport GHIDRA_HOME=/home/user/ghidra/ghidra_9.0.4 - Set
GHIDRA_PROJenv var to your Ghidra project directory, e.g. viaexport GHIDRA_PROJ=/home/user/ghidra_projects - Must use
ghidra-9.1-DEVor later due to a bug in X86_64 relocation handling (https://github.com/NationalSecurityAgency/ghidra/pull/910)
Only tested with CentOS 7. Requires:
- wget
- grep
- sed
- sort
- gzip
- 7z
- find
- rpm2cpio
- cpio
- unzip
- tar
- ar
- tee
- (maybe others; please open an issue if you have problems)
Everything should be already installed (even on a minimal install) except:
yum install epel-release
yum install p7zip p7zip-plugins
To generate fidb/el7-x86.LE.32.default.fidb and fidb/el7-x86:LE:64:default run:
./00-el-get-rpms.sh
./01-el-unpack-all-rpms.sh
./02-unpack-libs.sh lib/el7
./03-ghidra-import.sh lib/el7
./04-checklog.sh lib/el7
./05-ghidra-fidb.sh lib/el7
To generate additionally fidb/el6-x86.LE.32.default.fidb and fidb/el6-x86:LE:64:default you only need to run:
./02-unpack-libs.sh lib/el6
./03-ghidra-import.sh lib/el6
./04-checklog.sh lib/el6
./05-ghidra-fidb.sh lib/el6
You can manually add analysis to the lib-fidb Ghidra project. Then to regenerate
the new fidb/el7-x86.LE.32.default.fidb and fidb/el7-x86:LE:64:default you run:
rm fidb/el7-*.fidb
./05-ghidra-fidb.sh lib/el7
How does this work?
00-el-get-rpms.sh: Downloads RPMs fromhttp://mirror.centos.org/centos/into folderrpms01-el-unpack-all-rpms.sh: Unpacks all the RPMs fromrpmstolib/el{6,7}.{i686,x86_64}/libname/version/release/*.o(calls01-unpack-rpm.sh)02-unpack-libs.sh <library>: Unpack.libfiles to.o` files.03-ghidra-import.sh <library>: Import (and analyze) from folder<library>into Ghidra projectlib-fidb.04-checklog.sh <library>: Check the analysis log and genratelib/library-langids.txt. Generatinglibrary-langids.txtis important!05-ghidra-fidb.sh <library>: Generates.fidbfiles (one for each Language ID inlib/library-langids.txt) intofidb/with signatures for the libraries in<library>folder
How can I manually add libraries?
Add your .lib files into the lib folder as follows:
+-- lib
| |-- provider-name
| | |-- library-name
| | | `-- version
| | | `-- variant
| | | |-- lib1.a
| | | `-- lib2.lib
provider-name: The name of the provider of the libraries. This will also be the filename of the generated.fidbfiles.library-name: The name of the library.version: Version.variant: Variant or release string.
To extract the .a and/or .lib files run ./02-unpack-libs.sh lib/provider-name.
After this the folders should be:
+-- lib
| |-- provider-name
| | |-- library-name
| | | `-- version
| | | `-- variant
| | | |-- lib1
| | | | |-- foo.o
| | | | `-- bar.o
| | | `-- lib2
| | | | |-- this.obj
| | | | `-- that.obj
(You can also a .o files directly.
Then run ./03-ghidra-import.sh lib/provider-name to import this folder structure into the Ghidra project lib-fidb.
After the import run ./04-checklog.sh lib/provider-name this will read the lib/provider-name-headless.log file written during 03-ghidra-import.sh
and generate lib/provider-name-langids.txt from it. lib/provider-name-langids.txt is used by 05-ghidra-fidb.sh to know for which processor architectures
Function ID datasets should be generated.
Add the file lib/provider-name-common.txt. This is a file with common function names, which will be excluded from the Function ID signatures. Currently, the file is simply empty, so you can simply do a touch lib/provider-name-common.txt.
Last run ./05-ghidra-fidb.sh lib/provider-name to generate fidb/provider-name-PROC.ENDIAN.SIZE.VARIANT.fidb.
Can I just download the .fidb files?
Yes: https://github.com/threatrack/ghidra-fidb-repo
How much disk space and time will this take?
As an example, look at el7.x86_64.fidb. It includes:
boost-static/1.53.0/27.el7.x86_64glibc-static/2.17/260.el7_6.3.x86_64glibc-static/2.17/260.el7_6.6.x86_64glibc-static/2.17/260.el7.x86_64glibc-static/2.17/292.el7.x86_64libgo-static/4.8.5/36.el7_6.1.x86_64libgo-static/4.8.5/36.el7.x86_64libstdc++-static/4.8.5/36.el7.x86_64lua-static/5.1.4/15.el7.x86_64openssl-static/1.0.2k/16.el7_6.1.x86_64openssl-static/1.0.2k/16.el7.x86_64openssl-static/1.0.2k/19.el7.x86_64protobuf-lite-static/2.5.0/8.el7.x86_64protobuf-static/2.5.0/8.el7.x86_64zlib-static/1.2.7/18.el7.x86_64
The object files in el/el7.x86_64 were 192MB.
The resulting Ghidra project after running 02-ghidra-import.sh (which took 4h on a i5-2520M) was 16GB.
Running 03-ghidra-fidb.sh (which took 15min) resulted in a 6.6MB fidb/el7.x86_64.fidb file.
Using RepackFid.java the final size is 5.9M.
Stats
Here are the stats for (some) of the Function ID datasets in https://github.com/threatrack/ghidra-fidb-repo:
.fidb |
# .o |
du .o |
02-ghidra-import.sh |
du .gpr |
03-ghidra-fidb.sh |
du .fidb |
# Entries |
|---|---|---|---|---|---|---|---|
| el7.x86_64.fidb | 13036 | 195M | ~ 4h | ~ 16GB | ~ 15min | 6.6M | 57966 |
| el7.i686.fidb | 12600 | 132M | ~ 8h | ~ 16GB | ~ 26min | 6.6M | 53823 |
| el6.x86_64.fidb | 5695 | 53M | ~ 3h | ~ 8GB | ~ 3min | 2.2M | 16912 |
| el6.i686.fidb | 5709 | 45M | ~ 2h | ~ 8GB | ~ 4min | 2.5M | 21612 |
(These are only ballpark figures, as the measurements may have been impacted by thermal throttling or concurrent tasks running on the system.)
Known issues
Program has different compiler spec than already established
In case you received an error like (when running 05-ghidra-fidb.sh):
ERROR REPORT SCRIPT ERROR: /home/user/github/threatrack/ghidra-fid-generator/ghidra_scripts/AutoCreateMultipleLibraries.java : Program x86_64cpuid.o has different compiler spec (windows) than already established (gcc) (HeadlessAnalyzer) java.lang.IllegalArgumentException: Program x86_64cpuid.o has different compiler spec (windows) than already established (gcc)
You can fix it by going into Ghidra and in the project view right clicking (in this case x86_64cpuid.o) and change its Language to gcc (or what ever the error complains it should be).
The cause of this problem seems to be that Ghidra on import identified the compiler wrongly and then on generating the .fidb complains about it.
You can use ghidra_scripts/SearchFalseCspecsInPrograms.py to search for programs in a project that do not match a desired compiler spec.
You can use ghidra_scripts/SetCspecForPrograms.py to automatically force a compiler spec for all programs under a root folder.
The AutoImporter could not successfully load...
On libsodium there was a problem with the auto importer:
2019-10-06 16:13:15 ERROR (HeadlessAnalyzer) The AutoImporter could not successfully load /home/ghidra/ghidra-fid-generator/lib/libsodium/libsodium/1.0.17/stable-msvc/libsodium/Win32/Debug/v100/ltcg/libsodium/D/a/1/s/obj/libsodium/Win32/Debug/v100/ltcg/stream_salsa2012.obj with the provided import parameters. Please ensure that any specified processor/cspec arguments are compatible with the loader that is used during import and try again.
2019-10-06 16:13:15 ERROR (HeadlessAnalyzer) REPORT: Import failed for file: /home/ghidra/ghidra-fid-generator/lib/libsodium/libsodium/1.0.17/stable-msvc/libsodium/Win32/Debug/v100/ltcg/libsodium/D/a/1/s/obj/libsodium/Win32/Debug/v100/ltcg/stream_salsa2012.obj
However, only some files werre affected. So the files that could not be imported were ignored ... for now.
TODO
- De-duplicate .o files. Going from one minor version to the next some .o files in a package don't change at all. Analyzing the same file multiple times wastes time.
- Re-do
el{6,7}with new system. - FIXME: libsodium exhibited
AutoImporter could not successfully load...error. Needs to be figured out and fixed. - Add
ghidra_scripts/MergeFidb.pyto merge multiple Function ID datasets.
