Database Systems
Database Systems¶
Data Vs Information¶
- Data is raw facts, the building blocks of info
- Information is processing raw data for meaning
- Accurate, contextual, and timely information is key to good decision making.
- Other Relevant terms:
- knowledge: The body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge.
- data management: A process that focuses on data collection, storage, and retrieval. Common data management functions include addition, deletion, modification, and listing.
What is a Database¶
- database: A shared, integrated computer structure that houses a collection of related data. A database contains two types of data: end-user data (raw facts) and metadata.
- metadata: Data about data.
- Relationships
- Data characteristics (names, types, null, etc.)
-
Database management system (DBMS): The collection of programs that manages the database structure and controls access to the data stored in the database.
-
A database provides:
- Improved data sharing
- Improved data security
- Better data integration
- Minimized data inconsistency
- Improved data access
- Improved decision making
- Increased end-user productivity
-
DB Types and Classifications
- User Classification
- Single User/Desktop: 1 user on a dekstop
- Multiuser/Workgroup: 50 or less users supported
- Enterprise: Supports hundreds of thousands of users
- Location Classification
- Centralized: single site, multi-user
- Distributed: 2 or more independent sites with related data, multi-user
- Cloud Database: Maintained by cloud providers, unknown hardware, but known performance
- Data Classification (i.e. type of data stored)
- General-Purpose: wide variety of data used by multiple disciplines
- Discipline-Specific: Focused on specific subjects.
- Data Structure Classification
- Unstructured data: original raw data
- structured data: formatted, and stored to aid storage and retrieval
- Semi-Structured data: Partially structured.
- XML Database: an example of a semi-structured DB.
- Usage Classification (Most common today)
- Operational: Designed to support day-to-day operations
- OLTP (Online Transaction Processing): Same as Operational
- Transactional: Same as Operational
- Production: Same as Operational
- Analytical: Tactical / Strategic decision usage
- Data warehouse: historical data optimized for decision support
- Warehouse data from Operational DB + Other Sources
- OLAP: Frontend tools used to model data on warehouse
- BI: Comprehensive set of tools used to make business decisions on warehouse
- NoSQL (Not Just SQL)
- Not a traditional RDBMS
- built to handle: volume, variety, and velocity of data of new business demands from social medai
- User Classification
Why Design a Database¶
- To make data atomic
- Avoid repetetive data
- Optimize storage, avoid unnecessary nulls
- Optimize retrieval via indexing and views
Database Evolution¶
- DP Problems: data inconsistency, data anomaly, data dependence, and structural dependence
- Lengthy development times
- Difficulty getting quick answers
- Complex to manage and administrate - Data Redundancy, islands of info, no data integrity
- Lacked security and data sharing
- Difficult to change - Structural dependence
- Data type change leads to program change - Data dependence
Database Components¶
- Hardware: servers, workstations, switches, physical storage devices
- Software:
- Operating System: Linux, MacOS Unix, Windows
- DBMS Software: MySQL, SQL Server, Oracle, DB2, PostgreSQL
- Applications: DBBeaver, SQL Studio
- People: Sys Admins, DBAs, DB Architects, Developers, End Users
- Procedures: Audits, monitoring, standards of business.
- Data: set of facts enetered into DB.
Main Functions of RDBMS¶
- Data Dictionary Mgmt: schema, metadata, removes structural and data dependence from DBMS
- Data Storage Mgmt: Manages complex structures required for storage and retrieval. Important for DB Perf. Tuning.
- Data Transformation and Presentation: Coerces data into required formats. Relieves you from chore of distinguishing between logical data and physical data format (think date formatting VS how its stored).
- Security Mgmt: User level security, CRUD ACLs, query and report access
- Multiuser access control: Multi-CRUD operations without violating DB integrity
- Backup and recovery Mgmt: Just what it sounds like
- Data Integrity Mgmt: Integrity rules to reduce redundancy and maximize data consistency. Relationships are used to enforce data integrity
- Database Acess Languages and APIs: query lang (SQL), APIs to Java, C, C#, Python, etc.
- Database Comm Interface: Direct connect to websites, email apps, etc.