CLI Tools usage - Importing data into Cerebrate using the importer CLI tool
Populating Cerebrate with data is straightforward using the built-in importer tool available by default on every instance.
Brief description of the accepted arguments and the configuration file
The command below provides information about the list of accepted options:
bin/cake Importer --help
config
: Path to the file describing the mapping between the incoming data and Cerebrate and its format.source
: Source of the data to be imported. Can either accept a valid URL or a filepath.model_class
: Allows to specify the Cerebrate's model in which you want to save the data. Basically, supplying--model_class Organisations
allows you to create new organisations in the system.primary_key
: Allows to perform data reconciliation to some degree, and thus, avoiding duplicates. If specified, it will join the incoming data with the existing data based on the value taken from thisprimary_key
, allowing updates instead of the creation of new entities. A typical use-case would be to use theuuid
as primary key, or to use thename
if the former is not available on the incoming dataset.
Populating brand new data into an empty Cerebrate instance
The simplest use case of the importer is to import organisations from a file on the disk or an URL on an empty Cerebrate instance.
Importing new data from a MISP instance
bin/cake Importer src/Command/config/config-misp-format-organisation.json https://misp.csirt-tooling.org
This is what the config-misp-format-organisation.json could look like:
{
"format": "json",
"mapping": {
"name": "{n}.Organisation.name",
"uuid": "{n}.Organisation.uuid",
"nationality": "{n}.Organisation.nationality"
},
"sourceHeaders": {
"Authorization": "~~YOUR_API_KEY_HERE~~"
}
}
sourceHeaders
which can be used to provide authentication information
Importing new data from the ENISA CSIRT inventory
bin/cake Importer src/Command/config/config-enisa-csirts-inventory.json https://www.enisa.europa.eu/topics/csirts-in-europe/csirt-inventory/certs-by-country-interactive-map/tool_data.json
This is what the config-enisa-csirts-inventory.json could look like:
{
"format": "json",
"mapping": {
"name": "data.{n}.short-team-name",
"url": "data.{n}.website",
"contacts": "data.{n}.email",
"ISO 3166-1 Code": "data.{n}.country-code",
"website": "data.{n}.website",
"enisa-geo-group": "data.{n}.enisa-geo-group",
"is-approved": "data.{n}.is_approved",
"first-member-type": "data.{n}.first-member-type",
"team-name": "data.{n}.team-name",
"oes-coverage": "data.{n}.oes-coverage",
"enisa-tistatus": "data.{n}.enisa-tistatus",
"csirt-network-status": "data.{n}.csirt-network-status",
"constituency": "data.{n}.constituency",
"establishment": "data.{n}.establishment",
"email": "data.{n}.email",
"country-name": "data.{n}.country-name",
"short-team-name": "data.{n}.short-team-name",
"key": "data.{n}.key"
},
"metaTemplateUUID": "089c68c7-d97e-4f21-a798-159cd10f7864"
}
metaTemplateUUID
field used to indicates that some fields (such as enisa-geo-group
) are related to the provided metaTemplateUUID
.
Populating data into a Cerebrate instance already containing data
In most of the cases, whenever an import will be done, it will be on an instance already containing data.
In order to avoid duplication of entries and to support update of existing records, the tool must be made aware on how to find records to update.
This can be achieved using the primary_key
parameter.
Importing and merging data from the ENISA CSIRT inventory
bin/cake Importer --primary_key name src/Command/config/config-enisa-csirts-inventory.json https://www.enisa.europa.eu/topics/csirts-in-europe/csirt-inventory/certs-by-country-interactive-map/tool_data.json
{
"format": "json",
"mapping": {
"name": "data.{n}.short-team-name",
"url": "data.{n}.website",
"contacts": "data.{n}.email",
// [...] same as config showed above
"email": "data.{n}.email",
"country-name": {
"path": "data.{n}.country-name",
"override": false
},
"short-team-name": "data.{n}.short-team-name",
"key": "data.{n}.key"
},
"metaTemplateUUID": "089c68c7-d97e-4f21-a798-159cd10f7864"
}
The command above will do the following:
1. Fetch the remote data from the provided URL
2. Fetch existing records from the cerebrate instance
3. Find the matching existing record based on the organisation name
(coming from --primary_key name
) for each entries to be imported
- Note: If no existing record exists, there is no need for reconciliation and the entry will be created right away. This behavior can be avoided by passing the --update-only
parameter.
4. Create missing fields and update existing one
- Note: Notice the slight different mapping for the country-name
. The override
flag tells us that this field should not be modified by the imported data
Use-cases
Importing data from MISP and keeping it in sync with ENISA's CSIRT inventory
One very simple use-case of this importer tool would be to keep a cerebrate instance in sync with both a remote MISP instance and the ENISA's CSIRT inventory. With the correct (already provided) configuration passed to the importer tool, this process becomes trivial.
# Import data from a MISP instance.
# (--primary_key uuid) No duplication will occur
bin/cake Importer --primary_key uuid --yes /var/www/cerebrate/src/Command/config/config-misp-format-organisation.json https://misp.circl.lu/organisations/index.json
# Sync the cerebrate database with ENISA's CSIRT inventory
# ( --primary_key name) Records reconciliation will be done based on organisation name
# (--update-only) No entries from ENISA's inventory will be ingested if they don't exist in MISP in the first place
bin/cake Importer --primary_key name --update-only --yes src/Command/config/config-enisa-csirts-inventory.json https://www.enisa.europa.eu/topics/csirts-in-europe/csirt-inventory/certs-by-country-interactive-map/tool_data.json
Importing data from ENISA's CSIRT inventory and overriding Cerebrate's uuid by uuid coming from MISP
This example is the opposite of the one described above. Cerebrate will first ingest data coming from the ENISA's CSIRT inventory and then use MISP's organisation UUID instead of the automatically generated one.
# Import data from a MISP ENISA's CSIRT inventory
bin/cake Importer --yes src/Command/config/config-enisa-csirts-inventory.json https://www.enisa.europa.eu/topics/csirts-in-europe/csirt-inventory/certs-by-country-interactive-map/tool_data.json
# Sync the UUID
bin/cake Importer --primary_key name --update-only --yes /var/www/cerebrate/src/Command/config/config-misp-format-organisation.json https://misp.circl.lu/organisations/index.json
Limitation
One limitation of the importer tool lies in how the existing data and incoming data is joined. Currently the join point is done through a strict string match on the value extracted from the primary_key
field. That leads the tool to skip slightly different entries even though they are the same. For example, if cerebrate has an organisation record CERT EU
and the incoming data refers to that exact same organisation as CERT-EU
, there will be no match and the entry will be skipped.
Overriding data in Cerebrate using the FieldSquasher CLI tool
Even though the importer CLI tool support a lightweight method to override data thanks to the primary_key
argument, the primary goal of the tool is to new data into cerebrate. The purpose of the FieldSquasher tool is to override specific fields but it features a more flexible way to join existing data with the incoming data.
Explained example of a configuration file
{
"source": "data.json", // Source from which the data should be taken. Can be either a path or URL
"finder": { // Config describing how to reach the record and how to join data
"path": "{n}.Organisation", // Path to join each record
"joinFields": {
"squashed": "name", // Path for the left part of the join (data existing in cerebrate)
"squashing": "name" // Path for the right part of the join (data coming from the source)
},
"type": "closest", // Method to perform the join. Accept `exact` for exact string match or `closest` for a levenshtein distance
"levenshteinScore": 1 // The levenshtein threshold under which data will be joined
},
"target": {
"model": "Organisations", // Model under which the tool operating on
"squashedField": "uuid" // The field about to be overridden
},
"squashingData": {
"squashingField": "uuid", // Path to access the overriding value from the record (finder.path)
"massage": "validateUUID" // Optional function to be called to modify the data before the override
}
}
By passing such configuration, difference such as cert eu
and cert-eu
will be detected and joined nonetheless.
It should be noted that for very small strings, error could happen. For example, cert.lu
and cert.eu
will both have a levenshtein score of 1 and thus be joined. To avoid accidentally saving such errors, the tool propose to dump on the disk the different steps taken so that admin can later check that everything went as intended and recover mistakes manually if needed.
Generating summaries
The CLI command Summary
can be used to generate summaries about changes done for the provided amount of days.
It will create a txt
file in the /tmp
directory for all organisation nationalities (or only for the given one).
Each txt
files will contain changes for the following data:
- Organisations
- Individuals
- Users
Usage
$ bin/cake Summary --help
Usage:
cake summary [-d 7] [-h] [-q] [-v] [<nationality>]
Options:
--days, -d The amount of days to look back in the logs
(default: 7)
--help, -h Display this help.
--output, -o The destination folder where to write the files
(default: /tmp)
--quiet, -q Enable quiet output.
--verbose, -v Enable verbose output.
Arguments:
nationality The organisation nationality. (optional)
Example
$ bin/cake Summary Luxembourg -o /tmp/countries
$ cat /tmp/countries/Luxembourg.txt
Modified users:
Model,Action,"Editor user","Log ID",Datetime,Change
Users,add,"admin (1)",1187,"2022-11-14 09:18:27","{""username"":""johndoe"",""organisation_id"":2,""role_id"":3,""individual_id"":4,""created"":""2022-11-14T09:18:27+00:00"",""uuid"":""27b5390c-8a44-4c67-954e-c74fdd21fa88""}"
Users,add,"admin (1)",1192,"2022-11-14 09:24:40","{""username"":""johndoe2"",""organisation_id"":2,""role_id"":2,""individual_id"":5,""created"":""2022-11-14T09:24:40+00:00"",""uuid"":""3cdc960e-1bdc-463f-9f2f-dd780ca22f81""}"
Users,add,"admin (1)",1195,"2022-11-14 09:37:58","{""username"":""johndoe3"",""organisation_id"":2,""role_id"":2,""individual_id"":6,""created"":""2022-11-14T09:37:58+00:00"",""uuid"":""c54bf116-549c-4828-958f-05e12cfaa76b""}"
...
Modified organisations:
Model,Action,"Editor user","Log ID",Datetime,Change
Organisations,edit,"admin (1)",1188,"2022-11-14 09:23:30","{""nationality"":["""",""Luxembourg""]}"
Organisations,edit,"admin (1)",1189,"2022-11-14 09:24:06","{""nationality"":[""Luxembourg"",""""]}"
Organisations,edit,"admin (1)",1303,"2022-11-15 10:38:19","{""nationality"":["""",""Luxembourg""]}"
...
Modified individuals:
Model,Action,"Editor user","Log ID",Datetime,Change
Individuals,add,"admin (1)",1185,"2022-11-14 09:18:27","{""email"":""john.doe@my-fake-org.com"",""first_name"":""John"",""last_name"":""Doe"",""uuid"":""39369a66-6d0f-461a-ae99-f9450c8839c8"",""created"":""2022-11-14T09:18:27+00:00""}"
Individuals,add,"admin (1)",1190,"2022-11-14 09:24:40","{""email"":""john.doe2@my-fake-org.com"",""first_name"":""John2"",""last_name"":""Doe2"",""uuid"":""b56c5435-4f45-4848-ba24-568ff9004cba"",""created"":""2022-11-14T09:24:40+00:00""}"
Individuals,add,"admin (1)",1193,"2022-11-14 09:37:58","{""email"":""john.doe3@my-fake-org.com"",""first_name"":""John3"",""last_name"":""Doe3"",
...
Afterward, these files can, for example, be sent by email using the following script
#!/usr/bin/bash
path="/home/john/books"
mail_subject="Periodic summary"
email_address="john.doe@example.test"
for filename in $path/*.txt; do
mail -a "$filename" -s "$mail_subject" "$email_address" < /dev/null
done