Data Engineering

Good Reading:

Tool:

https://github.com/datasciencemasters

Stream Processing

  1. Simple Event Processing
    Simple filter (e.g. is this gold or platinum customer)
  2. Event Stream Processing
    Looking across multiple event streams and joining multiple event, etc
  3. Complex Event Processing
    Processing multiple event stream to identify meaningful pattern, using complex condition & temporal windows
    e.g . There has been a more than 10% increase in overall trading activity and the average price of commodities has fallen 2% in last 4 hour

Data Integrity

Ref:

Data Integrity is the assurance that data a consistent, accurate, reliable and accessible.

“Guilty until proven innocent” approach

Types of integrity constraint:

  1. Entity integrity: within table (primary key)
  2. Referential integrity: inter table relationship (foreign key)
  3. Domain integrity
  4. User-defined integrity