sijitang

2009年1月21日星期三

database design

Use a stack of intermediate languages（e.g. ER-model) to come from the description of an application to a database schema.e.g. ER-model,From intuitive and semi-formal to technical and formal

miniworld

mapping to semantic datamodel
e.g.ER-model=>ER-diagram

mapping to datamodel
relational database model
=>relational database schema

Main Phases of Database Design

Requirements collection and analysis:
understand and document their data requirements
Result: set of users’ requirements

Conceptual design:Includes detailed descriptions of the entity types, relationships, and constraints
do not include implementation details,
concentrates on specifying the properties of the data, without being concerned with storage details

Logical design or data model mapping：
Implementation data model: such as the relational or the object oriented database model
Result: a conceptual database schema in the implementation data model of the DBMS (additionally: external schemas)

physical design:Internal storage structures, access paths, and file organizations for the database files are specified: internal schema
A database is An integrated, self-describing collection of related data
Related: Data has some relationship to other data. In a university we have students who take courses taught by professors.

Database Management System (DBMS)

An Example Database (Relation)

A Database contains
• User Data
• Metadata
• Indexes
• Application metadata

User Data：
data will be generally stored in tables with some relationships between tables.
Each table has one or more columns. A set of columns forms a database record.
Meta Data: Data about data
Data that describe how user data are stored in terms of table name, column name, data type, length, primary keys, etc.
Indexs：indexes provide an alternate means of accessing user data. Sorting and Searching.
allow the database to access a record without having to search through the entire table.

slot,block,dateikonzept und katalogeintraege moeglichkeit

Block: Einheit des Transports
zwischen Platte und Hauptspeicher

Slot: Platz auf einer Spur,
der zur Aufnahme eines Blocks
vorgesehen ist

Datei konzept

dateikatalog:
zur Abbildung des Dateinamens auf eine Folge von Blöcken
• selbst mit auf der Platte abgelegt
• meist auf einer festen Position (Zylinder 0, Spur 0, Block 0 o.ä.)
Einstieg über den Dateinamen
• muss dann zu einer gegebenen Blocknummer die physische Slot-Adresse liefern können

katalogeintraege moeglichkeit:
A: Eintrag = (physische Slot-Adresse des ersten Blocks; Anzahl der Slots, die ab dieser Adresse belegt sind)
benötigte Zahl von Blöcken auf physisch sequenzielle und lückenlose Folge von Slots abbilden

B: Eintrag = (physische Slot-Adresse des ersten Blocks)
Verkettung der Blöcke als lineare Liste
Erweiterung um neue Blöcke:immer möglich, wenn es auf der Platte irgendwo noch genug freie Slots gibt

C: Eintrag = (Array mit den Slot-Adressen aller Blöcke)
Zugriff auf Block i: problemlos
Erweiterung um neue Blöcke: immer möglich, wenn es auf der Platte irgendwo noch genug freie Slots gibt。
sequenzielles Lesen aller Blöcke: langsam (viele Armbewegungen)

2009年1月19日星期一

Datawarehouse2-multidimensional modelling

Micro data:
• correspond to single observations
• they are the result of the loading phase
->Base data
Macro data
• correspond to preparation/aggregation of base data for analysis reasons
• they are the result of the evaluation phase
->Prepared data (data warehouse, data mart)
Meta data
• describe the properties of micro and macro data, how they are produced, stored, aggregated, analysed throughout the complete data warehousing process

Multidimensional Perspective:

qualifying data – attributes of categories

quantifying data – attributes for summing up sales

Modelling Approaches:
data models are desired that focus on the representation of the macro data as statistical table

Multidimensional Data Modelling:
qualifying information – dimensions of the cube

Hierarchies, dimensional attributes
form the starting point for selection and aggregation

quantifying information – cells of the cube

Facts, measures – plain
Facts, measures – computed

Graphical Design:

"->" denotes functional dependency

Orthogonality: There is no D.Di -> D’.Dj with Di =/D’

Functional dependencies impose tree structure on instances, are 1:n relationships
Each path from a dimensional attribute to Top defines a class hierarchy

Sales and turnover per article, shop and day:

CSales[(P.Article, S.Shop, T.Day), (Sales, Turnover)]

Sales per product group, region and quarter:

CSales[(P.Group, S.Region, T.Quarter), (Sales, Turnover)]

Datawarehouse1

Ein DataWarehouse ist eine zentrale Datensammlung (meist eine Datenbank), deren Inhalt sich aus Daten unterschiedlicher Quellen zusammensetzt. Die Daten werden von den Datenquellen in das DataWarehouse geladen und dort vor allem für die Datenanalyse und zur betriebswirtschaftlichen Entscheidungshilfe in Unternehmen langfristig gespeichert. Ein DataWarehouse dient der Informationsintegration. Daten werden im Rahmen des ETLProzesses(Extraktion,Transformation,Laden) aus verschiedenen Quellen extrahiert,durch Transformation bereinigt und vereinheitlicht, um danach in das DataWarehouse geladen zu werden.
Datawarehouse system:

Datenquelle(externe system+operative system)
ETLProzesses(Extraktion,Transformation,Laden)
DWM
queries,analysis
datamining(BI tools:reports,auswertung,tabelle kalkulation)
OLAP

DWH im vergleich mit Transactional DB(conventional)
transactional:

read,write,delete
relative few datarecords
no/few null values
most independent data objects
high dynamics of modification
tuple query
flexible with respect to query formulation

DWH:

read,periodical append
many data records (very high data volume)
very many null values
comprehensive dependencies between data
mostly statistical data(stable data,modification only in production phase)
region queries
customized for analyzing

数据仓库是一种信息系统的数据储存理论，此理论强调利用某些特殊数据储存方式，让所包含的数据，特别有利于分析处理，以产生有价值的信息并依此作决策。
利用数据仓库方式所存放的数据，具有一但存入，便不随时间而更动的特性，同时存入的数据必定包含时间属性，通常一个数据仓库皆会含有大量的历史性数据，并利用特定分析方式，自其中发掘出特定信息。
主要功能乃是将组织通过信息系统之在线交易处理(OLTP)经年累月所累积的大量数据，通过数据仓库理论所特有的数据储存架构，作一有系统的分析整理，以利各种分析方法如在线分析处理
(OLAP)、数据采矿(Data Mining)之进行，并进而支持如决策支持系统(DSS)、主管信息系统(EIS)之建立，帮助决策者能快速有效的自大量数据中，分析出有价值的信息，以利决策拟定及快速回应外在环境变动，帮助建构商业智能(BI)
数据采矿和OLAP同为分析工具，其差别在于OLAP提供使用者一便利的多维度观点和方法，以有效率的对数据进行复杂的查询动作，其默认查询条件由使用者预先设置，而数据采矿，则能由信息系统主动发掘数据来源中，未曾被查觉的隐藏信息，和通过使用者的认知以产生知识。
数据采矿(Data Mining)技术是经由自动或半自动的方法探勘及分析大量的数据，以建立有效的模型及规则，而企业通过数据挖掘更了解他们的客户，进而改进他们的行销、业务及客服的运作。 Data Mining 是 Data Warehouse 应用方式中最重要的一种。基本上，Data Mining 是用来将你的数据中隐藏的信息挖掘出来，所以 Data Mining 其实是所谓的 Knowledge Discovery 的一部份，Data Mining 使用了许多统计分析与 Modeling 的方法，到数据中查找有用的特征（Patterns,muster）以及关连性（Relationships）。 Knowledge Discovery 的过程对 Data Mining 的应用成功与否有重要的影响，只有它才能确保Data Mining 能获得有意义的结果

fan_2009_8_23.MP3

my Facebook