Data Models
Data Models¶
Importance of Data Modeling¶
- Helps clarify how all the pieces of a business fit together, but can drill down into details. (High Level and Detailed views).
- A good Data Model (Blueprint) means different end-user views of the data won’t matter.
Data Modeling Foundation¶
- Data Modeling: The process of creating a specific data model for a problem domain.
- Starts with simple problem and understanding
- Increases detail as understanding increases
- When complete Data Model is a blueprint for:
- Instructions to build a database
- Meets all end-user requirements
- Will clearly be defined in text and diagrams
- Should meet at least these 3 criteria
- Description of the data structures
- Enforcable data integrity rules
- Method for manipulating and transforming data.
- Data Model: Represents data structures and their characteristics relations, constraints, transformations, and other constructs with the purpose of supporting a specific problem domain.
Building Blocks:¶
- Entities: Nouns
- Attributes: Characteristic of Entity
- Relationships: How Entities are associated (usually linking verbs)
- 1:M, 1..*: One to many
- M:N or ..: Many to many
- 1:1 or 1..1: One to one
- Constraints: Data restrictions
- Ensure data integrity
- Expressed as rules:
Business Rules & Database Design¶
- Business Rules are policies, procedures, and principles of organizations.
- They are also the first step in identifying and creating the aforementioned building blocks
- Rules for Business Rules:
- Should be easy to understand
- Should be widely disseminated
Business Rule Discovery¶
- Direct Interviews with End-Users (managers, policy makers)
- Verify end-user perceptions, however
- Verify end-user perceptions, however
- Documentation
Note
Document identified business rules to: * Standardize the company’s view of data * Facilitate communication between users and designers * Abets understanding scope of data * Clarifies business process and perceptions * Allows developing and re-visting correct relationships and constraints.
Translating Business Rules into a Data Model¶
- Nouns often translate into an Entities.
- Verbs often translate into relationships between entities.
- From business rule, determine cardinality by assuming bi-directional relationships and asking:
- How many instances of entity B are related to entity A
- How many instances of entity A are related to entity B
- Attributes can also be nouns or adverbs, but generally are characteristics of the entities.
- Naming conventions should be followed:
- CUS_CREDIT_LIMIT: e.g. field/attribute name: All upper case, proceeded by Entity Abbrev, underscores for spaces.
Evolution of Major Models¶
graph TB
subgraph "1960-1969 - Least Semantics"
comments1["- Difficult to represent M:N relationships (hierarchical only)\n- Structural level dependency\n- No ad hoc queries (record-at-a-time access)\n- Access path predefined (navigational access)"]
hierarchical[Hierarchical]
network[Network]
end
subgraph 1970
comments2["- Conceptual simplicity (structural independence)\n- Provides ad hoc queries (SQL)\n- Set-oriented access"]
relational[Relational]
end
subgraph 1976
comments3["- Easy to understand (more semantics)\n - Limited to conceptual modeling\n(no implememntation component)"]
entity_relationship[Entity Relationship]
end
subgraph "1978-1990 - Most Semantics"
comments4["- More semantics in data model\n- Support for complex objects\n- Inheritance (class hierarchy)\n- Behavior\n- Unstructured data (XML Begins)\n- LATE 1990s XML data exchanges"]
semantic[Semantic]
object_oriented[Object Oriented]
extended_relational["Extended Relational\n(O/R DBMS)"]
end
subgraph 2005-Present
comments5["- Addresses Big Data problem\n- Less semantics in data model\n- Based on schema-less, key-value data model\nBest suited for large, sparse data stores"]
xml_json["Unstructured\n(Document Data,\nXML/JSON)"]
bigdata[Big Data / NoSQL]
end
hierarchical --> network
network --> relational
relational --> entity_relationship
entity_relationship --> semantic
semantic --> object_oriented
semantic --> extended_relational
extended_relational -->|DBs Still Play a Part| bigdata
object_oriented --> xml_json
xml_json -->|Some aspects shared| bigdata
style hierarchical fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
style network fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
style relational fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
style entity_relationship fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
style semantic fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
style object_oriented fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
style extended_relational fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
style xml_json fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
style bigdata fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
- Hierarchical model:
- Upside-down tree
- Parent can have many children, but children only have a single parent.
- Composed of segments, which are like file-system records
- Network model:
- DB is a collection of records with 1:M relationships, but children can have many parents.
- Defined:
- Schema (Org of DB as seen by DBA, tables, indexes, views, and queries),
- SubSchema (Org of DB as seen by Applications),
- DML (Defines how to manipulate DB i.e. SQL),
- DDL (Allows DBA to define Schema Structure)
- Limited Ad-Hoc queryability
- Heavy demand on application developers
- Structural changes lead to application massive changes (dats dependence)
- Relational model (Solves general Data Problems):
- Based on mathematical relation concept (But think of it as a TABLE of ROWS and COLUMNS)
- COLUMNS are attributes, and ROWS are called tuples
- TABLES: complete data and structural independence because it is a logical structure.
- How a table is stored is of no concern to a user or designer.
- RDBMS sophisticated mathematical data manipulation constructs, but they are hidden from the end-user!!!
- Queries are intuitive and logical (SQL, specifies what must be done without expressing HOW).
- Tables are releated via a common attribute
- Easy to diagrame with Relational Diagrams
- 3 Simple Parts
- User Interface: DBBeaver, Sql Studio, etc. Helps generate SQL
- Collection of Tables: Pesents data in an easy to unserstand fashion.
- SQL Engine: Translate SQL user requests into complex mathematical data manipulations.
- Due to robust mathematical foundation RDBMs can easily be extended with new capabilities like XML and Objects.
- Entity relationship model:
- Graphical way of representing the relational model with ERDs
- Entities: Rectangles
- Name: Noun in center of rectangle.
- Attributes: Circles branched from entities in ERD
- Characteristics of Entity
- Relationships: Diamond, Crows Foot, or UML/Class notation.
- OO Model (Solves specific needs):
- Popular in CAD, GIS markets, which require complex objects.
- Data and relationships contained in single structure known as an object.
- Contains all operations that can be performed on an object.
- Components:
- Object: Single instance of a real world entity
- Attributes: Properties of an object
- Class: Collection of similar objects with shared structure.
- Methods: Behaviors you can run on objects.
- Class Hierarchy: upside-down tree with single parent relationship
- Inheritance: Can inherit attributes and methods of parent.
- Depicted with UML
- Object/Relational and XML
- ERDM adds OO features into simpler RDM structure.
- Encapsulated data and methods
- Extensible Data Types based on Classes
- XML Databases emerged to manage unstructured data like e-mail, web-pages, diagrams, etc.
- ERDM adds OO features into simpler RDM structure.
Data Model Basic Terminology Comparison¶
Real World | Example | File Processing | Hierarchical Model | Network Model | Relational Model | ER Model | OO Model |
---|---|---|---|---|---|---|---|
A group of vendors | Vendor file cabinet | File | Segment type | Record type | Table | Entity set | Class |
A single vendor | Global supplies | Record | Segment occurrence | Current record | Row (tuple) | Entity occurrence | Object |
The contact name | Johnny Ventura | Field | Segment field | Record field | Table attribute | Entity attribute | Object attribute |
The vendor identifier | G12987 | Index | Sequence field | Record key | Key | Entity identifier | Object identifier |
Emerging Data Models¶
- Exponential data growth
- Getting actionable BI data from Petabytes of collected data has become paramount.
- Browsing patterns, purchasing histories, customer preference, customer behaviors, social connections.
- Mobile data, GPS, RFID, sensors, IoT, sound, pictures, etc.
- Big Data: Find better ways to manage large data, get valuable business insights, but reduce costs.
- 3Vs:
- Volume: Petabytes of data
- Velocity: Exponential growth, and need to mine it quickly (Response time is expected to be nearly instant).
- Variety: Many different data formats not suitable to traditional RDBMS.
- 3Vs:
BigData Technologies¶
- hadoop: java based, distributed, with thousands of nodes.
- Componenets are HDFS and MapReduce
- HDFS: Write once, read-many model
- Name Node: Stores all FS metadata
- Data Node: Stores fixed data-blocks that gets replicated
- Client Node: User interface to HDFS
- MapReduce: API that works with structured and unstructured data
- Uses many nodes to perform 2 functions Map and Reduce
- Map divides a job, Reduce reduces it into a filtered result set.
- NoSQL: Large distributed DB System that stores structured and unstructured data.
- They are not based on RDBMS Model
- They are highly distributed
- They are highly scalable, fault tolerant, and available
- Support data with large number of attributes but sparse entity types
- Built for performance and not data consistency
- Examples:
- Key/Value stores: Memcached, Redis
- Document DBs: Mongo, ElasticSearch
- Graph DBs: Neo4J
Classification of Data Models¶
graph TB
subgraph " "
direction TB
subgraph " "
direction TB
subgraph " "
enduser[End User View]
designer[Designer's View]
dbms[DBMS View]
end
subgraph " "
direction TB
external1[External Model]
external2[External Model]
conceptual[Conceptual Model]
internal[Internal Model]
physical[Physical Model]
conceptual -->|Logical independence| internal
internal -->|Physical independence| physical
external1 --> conceptual
external2 --> conceptual
end
enduser --> external1
enduser --> external2
designer --> conceptual
dbms --> internal
end
subgraph "Degreee of Abstration"
subgraph " "
high[High]
medhi["High/Medium"]
medium[Medium]
low[Low]
end
subgraph " "
er[ER]
oo[Object Oriented]
rel[Relational]
hinet[Network\nHierarchical]
end
subgraph "Characteristics"
char1["Hardware-independent\nSoftware-independent"]
char2["Hardware-independent\nSoftware-dependent"]
char3["Hardware-dependent\nSoftware-dependent"]
end
er --> high
oo --> medhi
rel --> medium
hinet --> low
high --> char1
medium --> char2
low --> char3
end
end
style conceptual fill:#0055ff,stroke:#4caf50,stroke-width:2px;
style external1 fill:#cc004a,stroke:#4caf50,stroke-width:2px;
style external2 fill:#cc004a,stroke:#4caf50,stroke-width:2px;
style internal fill:#ccffcc,stroke:#4caf50,stroke-width:2px;
style physical fill:#aaffaa,stroke:#4caf50,stroke-width:2px;
- Classified by ANSI/SPARC defined Levels of Abstraction (External, Conceptual, Internal, Physical)
- External: Subset of the Data Model necessary for the application developer or report writers.
- Conceptual: The global (Bird’s Eye) view of the data model. A Macro picture of the organization’s data.
- Internal: Map of conceptual model to a specific DBMS
- Entities of Conceptual model are mapped to Tables
- Attributes to columns, data-types, constraints, etc.
- Physical: The actual hardware (SSDs, SAN, etc.) the data is stored on.