Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider changing to/giving option of couchDB? #212

Closed
PidgeyL opened this issue Aug 19, 2015 · 20 comments
Closed

Consider changing to/giving option of couchDB? #212

PidgeyL opened this issue Aug 19, 2015 · 20 comments
Labels
enhancement New feature or request In progress In progress

Comments

@PidgeyL
Copy link
Member

PidgeyL commented Aug 19, 2015

Should we consider merging to couchDB, or making CVE-Search compatible with both?
I have heard a lot of negative comments about Mongo, and it would be neat to give multiple database options.
Also, the database layer should be abstracted a lot more as well. (I can do this)
My idea would be to make a database abstraction layer, which implements functions for both mongo, couch, postgres,... (we could further this if we see fit), and then we might be able to change databases with the configuration files.
Your thoughts? @adulau @wimremes

@adulau
Copy link
Member

adulau commented Aug 19, 2015

Sure, it's a good idea. I think the best would be to abstract more the database access.

Some document-based database like Hyperdex has compatible MongoDB compatible database http://hackingdistributed.com/2015/01/12/hyperdex-1.6.0/ .

Maybe we should start together to abstract more the database access. Then we see if can split the Document-based access compared to the key-value store access.

pombredanne referenced this issue in pombredanne/cve-search Sep 18, 2015
replace posts with ajax requests
@pombredanne
Copy link

Have you ever considered using a more structured DB, e.g. a traditional relational DB?

@PidgeyL
Copy link
Member Author

PidgeyL commented Jul 17, 2019

We will start migration to Postgres soon

@PidgeyL PidgeyL self-assigned this Jul 17, 2019
@PidgeyL PidgeyL added the In progress In progress label Jul 17, 2019
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2020

Stale issue message

@iTosun
Copy link

iTosun commented Jun 11, 2021

When is the postgres migration planned?

@P-T-I
Copy link
Member

P-T-I commented Jun 16, 2021

Not as far as I'm aware off.

@baonq-me
Copy link

Not as far as I'm aware off.

@P-T-I If you have any migration plan, I would like to spend some weeks executing it. I have experience in MongoDB, PostgreSQL, and ElasticSearch. As I see that the cve-search code base is a little bit messy, so a complete rework is needed, maybe a 4.3 version.

@P-T-I
Copy link
Member

P-T-I commented Dec 13, 2023

@baonq-me Well there where some thoughts (also briefly discussed with @oh2fih) stripping cve-search from all backend code and keep it solely as a front end; then let CveXplore handle all the backend code. So in that case moving towards as little overlap between the two as possible (probably taking care of the messy code base in the process) and let the users choose based on their requirements; if they prefer to work on the cli they would only need to use CveXplore and if they would like a GUI they could simply add cve-search to the mix. So any new database logic should be added to cvexplore. For me this split up in functionality makes sense, so you agree? So coming back on the topic; I believe a sql backend is a nice addition too. But I wouldn't narrow it towards postgres, I would opt for a SqlAlchemy ORM model approach so you could use a variety of sql databases (MySQL, mariadb, postgres etc).

@P-T-I P-T-I reopened this Dec 13, 2023
@baonq-me
Copy link

baonq-me commented Dec 13, 2023

@baonq-me Well there where some thoughts (also briefly discussed with @oh2fih) stripping cve-search from all backend code and keep it solely as a front end; then let CveXplore handle all the backend code. So in that case moving towards as little overlap between the two as possible (probably taking care of the messy code base in the process) and let the users choose based on their requirements; if they prefer to work on the cli they would only need to use CveXplore and if they would like a GUI they could simply add cve-search to the mix. So any new database logic should be added to cvexplore. For me this split up in functionality makes sense, so you agree? So coming back on the topic; I believe a sql backend is a nice addition too. But I wouldn't narrow it towards postgres, I would opt for a SqlAlchemy ORM model approach so you could use a variety of sql databases (MySQL, mariadb, postgres etc).

I agree that letting CveXplore handle all the backend code is a good choice in terms of maintainability. Let's do a quick analysis:

As I know, there are three kinds of users:

  • The first is a person who only needs to use CLI functionality and doesn't wish to run anything on their system continuously. He just needs something clean and quick.
  • The second is people who are in some kind of air-gap system (like SOC) and do not have an internet connection so they need to use tools like cve-search to search for CVE for collaboration in tasks like Incident response.
  • The third is people who want to integrate cve-search into the existing system. For example, I have a list of 3rd apps running in my system, and I want to know which one has CVE as soon as possible (ideally several hours after CVE announcement). So, cve-search can be an alert tool or an HTTP endpoint to provide data for other systems.

This analysis led me to the idea that we can use the SqlAlchemy ORM model as you said.

  • For the first group of people, a lightweight embedded database like SQLite is acceptable, portable, and almost requires no installation.
  • The second and third should be okay with MySQL, PostgreSQL, MariaDB, etc.
  • Another advantage of this approach is that we can utilize the Change-data-capture (CDC) capability to do more things like reindexing data to another database like ElasticSearch (full-text search), Redis (caching) or message queue (alert new CVEs) while requiring no coding at all.

Here are my additional ideas for this big refactor:

  • When we initialize the cve-search instance, we only need to use CveXplore to initialize its SQLite database first (can include it like the way CVE-Search-Docker do), then import that SQLite database to a higher-level database like MySQL, PostgreSQL, MariaDB, etc. This approach will decouple two projects. This process can be slow but just need to do one single time.
  • As I see that with the current workflow where cve-search is highly dependent on the CveXplore interface, it is very tricky for new developers or when I need to do some debugging. Pass mongodb connection string when initialize CveXplore cve-search#1030 can be an example as MongoDB connection strings is not passed from cve-search to CveXplore.

@P-T-I
Copy link
Member

P-T-I commented Dec 13, 2023

As I know, there are three kinds of users:

  • The first is a person who only needs to use CLI functionality and doesn't wish to run anything on their system continuously. He just needs something clean and quick.
  • The second is people who are in some kind of air-gap system (like SOC) and do not have an internet connection so they need to use tools like cve-search to search for CVE for collaboration in tasks like Incident response.
  • The third is people who want to integrate cve-search into the existing system. For example, I have a list of 3rd apps running in my system, and I want to know which one has CVE as soon as possible (ideally several hours after CVE announcement). So, cve-search can be an alert tool or an HTTP endpoint to provide data for other systems.

Agreed with those three groups; I believe for the first CveXplore alone should suffice; for the second both CveXplore and CveSearch should be needed and for the third either CveSearch or CveXplore could suffice depending on 'how' you would facilitate 3rd party integration (CveSearch HTTP API or via the the CveXplore package)

  • Another advantage of this approach is that we can utilize the Change-data-capture (CDC) capability to do more things like reindexing data to another database like ElasticSearch (full-text search), Redis (caching) or message queue (alert new CVEs) while requiring no coding at all.

I like this idea; especially the push towards a message queue (Kafka would be the defacto goto I guess). These functionalities should be added into the CveXplore functionality, right?

  • When we initialize the cve-search instance, we only need to use CveXplore to initialize its SQLite database first (can include it like the way CVE-Search-Docker do), then import that SQLite database to a higher-level database like MySQL, PostgreSQL, MariaDB, etc. This approach will decouple two projects. This process can be slow but just need to do one single time.

I'm reluctant to actually add a database dump into the code base (I know I've done this in the CVE-Search-Docker repo); I would opt in hosting database dumps externally, which might already could be provided by the vulnerability-lookup project.

Although I agree; I do not see a way around this; once this path of decoupling is taken, there is no way back. For the long term I would say the configuration effort, maintainability and de-duplication of code benefits outweighs the 'high dependency' downfall.

I would suggest we move this discussion into a new project / issue list in the cve-search/CveXplore repo, agreed?

If so, I'll transfer this issue into the cve-search/CveXplore repo

@baonq-me
Copy link

I like this idea; especially the push towards a message queue (Kafka would be the defacto goto I guess). These functionalities should be added into the CveXplore functionality, right?

I can update CVE-Search-Docker for demonstration as well as documents to clarify this.

I would suggest we move this discussion into a new project / issue list in the cve-search/CveXplore repo, agreed?

I agree

@P-T-I P-T-I transferred this issue from cve-search/cve-search Dec 13, 2023
@P-T-I P-T-I added the enhancement New feature or request label Dec 13, 2023
@P-T-I
Copy link
Member

P-T-I commented Dec 13, 2023

I can update CVE-Search-Docker for demonstration as well as documents to clarify this.

Which specific demonstration are you talking about?

@pombredanne
Copy link

FWIW, my concern with MongoDB was/is that this is no longer using an open source license.

@baonq-me
Copy link

baonq-me commented Dec 14, 2023

I can update CVE-Search-Docker for demonstration as well as documents to clarify this.

Which specific demonstration are you talking about?

Here is my reference architecture.

CleanShot 2023-12-14 at 10 25 42

In my use case, cve-search/CveXplore is not just a tool to search for CVEs but also a offline data source to provide the ability to detect vulnerable software timely as well as highly reliable and fully automated from gathering software versions to detection, even in very special cases like CVE-2023-22522 where vulnerable configurations are updated by NVD 5 days after vendor announcement (in this case human people must be involved to manually review).

Of course, I can add more code to cve-search/CveXplore if needed. The above is just a reference architecture to express my idea.

@P-T-I
Copy link
Member

P-T-I commented Dec 14, 2023

Looks very nice; I'll give a more detailed response in an hour or so, let me get to the office first ;-)

@P-T-I
Copy link
Member

P-T-I commented Dec 14, 2023

Right, the way I see it is that the bulk of the logic needs to be incorporated into the cvexplore repo (the green parts). The GUI (frontend as discussed earlier) should be the cvesearch repo (purple):
cvexplore
The dotted parts should be, in my opinion, made optional / configurable and cvesearch should be able to fully function with, but also without them present.
The blue boxes (alerting etc) are 3rd party options / integrations, but not part of both code bases and are out of scope for development, right?

@baonq-me
Copy link

The dotted parts should be, in my opinion, made optional / configurable and cvesearch should be able to fully function with, but also without them present.

The blue boxes (alerting etc) are 3rd party options / integrations, but not part of both code bases and are out of scope for development, right?

I agree both. I think we can narrow down those ideas to a more detailed task list that needs to be done.

@P-T-I
Copy link
Member

P-T-I commented Dec 14, 2023

Ageed; I'll start on that task list right away (in random order), please append as you see fit, I'll create a new issue: #213 as a master issue to track the work

@P-T-I
Copy link
Member

P-T-I commented Dec 14, 2023

FWIW, my concern with MongoDB was/is that this is no longer using an open source license.

@pombredanne any specific database whishes?

@P-T-I
Copy link
Member

P-T-I commented Dec 19, 2023

closing in favor of #213

@P-T-I P-T-I closed this as completed Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request In progress In progress
Projects
None yet
Development

No branches or pull requests

6 participants