dimanche 30 août 2015

SQL Server vs. No-SQL Database

I have inherited a legacy content delivery system and I need to re-design & re-build it. The content is delivered by content suppliers (e.g. Sony Music) and is ingested by a legacy .NET app into a SQL Server database.

Each content has some common properties (e.g. Title & Artist Name) as well as some content-type specific properties (e.g. Bit Rate for MP3 files and Frame Rate for video files).

This information is stored in a relational database in multiple tables. These tables might have null values in some of their fields because those fields might not belong to a property of the content. The database is constantly under write operations because the content ingestion system is constantly receiving content files from the suppliers and then adds their metadata to the DB.

Also, there is a public facing web application which lets end users buy the ingested contents (e.g. musics, videos etc). This web application totally relies on an Elasticsearch index. In fact this application does not see the database at all and uses the Elasticsearch index as the source of data. The reason is that SQL Server does not perform as fast and as efficient as Elasticsearch when it comes to text-search.

To keep the database and Elasticsearch in sync there is a Windows service which reads the updates from SQL Sever and writes them to the Elasticsearch index!

As you can see there are a few problems here:

1- The data is saved in a relational database which the data hard to manage. e.g. there is a table of 3 billion records to store metadata of each contents as a key value pairs! To me using a no-sql database or index would make a lot more sense as they allow to store documents with different formats in them.

2- The Elasticsearch index needs to be kept in Sync with the database. If the Windows services does not work for any reason then the index will not get updated. Also when there are too many inserts/updates in the DB it takes a while for the index to get updated.

3- We need to maintain two sources of data which has cost overhead.

Now my question is that : Is there a no-sql database which has these characteristics?

A- Allows me to store documents with different structures in it? B- Provides good text-search functions and performance? e.g. Fuzzy search etc. C- Allows multiple updates to be made to its data concurrently? Based on my experience Elasticsearch has problems with concurrent updates. D- It can be installed and used at Amazon AWS infrastructure because our new products will be hosted on AWS.Auto scaling and clustering is important. e.g. DynamoDB.

E- It would have a kind of GUI so that support staff or developers could modify the data to some extent.




Aucun commentaire:

Enregistrer un commentaire