- Introduction to Data Modeling
Identifying Data Objects and Relationships
In order to begin constructing the basic model, the modeler must analyze the information gathered during the requirements analysis for the purpose of:
- classifying data objects as either entities or attributes
- identifying and defining relationships between entities
- naming and defining identified entities, attributes, and relationships
- documenting this information in the data document
- entities contain descriptive information
- attributes either identify or describe entities
- relationships are associations between entities
- Entities
- Attributes
Validating Attributes
Derived Attributes and Code Values - Relationships
- Naming Data Objects
- Object Definition
- Recording Information in Design Document
Entities:
"Any distinguishable person, place, thing, event, or concept, about which information is kept"
"A thing which can be distinctly identified"
"Any distinguishable object that is to be represented in a database"
"...anything about which we store information (e.g. supplier, machine tool, employee, utility pole, airline seat, etc.). For each entity type, certain attributes are stored".
Attributes
Attributes are data objects that either identify or describe entities. Attributes that identify entities are called key attributes. Attributes that describe an entity are called non-key attribute
The process for identifying attributes is similar except now you want to look for and extract those names that appear to be descriptive noun phrases
Validating Attributes
Attribute values should be atomic, that is, present a single fact. Having disaggregated data allows simpler programming, greater reusability of data, and easier implementation of changes. Normalization also depends upon the "single fact" rule being followed. Common types of violations include:
simple aggregation - a common example is Person Name which concatenates first name, middle initial, and last name. Another is Address which concatenates, street address, city, and zip code. When dealing with such attributes, you need to find out if there are good reasons for decomposing them. For example, do the end-users want to use the person's first name in a form letter? Do they want to sort by zip code?
complex codes - these are attributes whose values are codes composed of concatenated pieces of information. An example is the code attached to automobiles and trucks. The code represents over 10 different pieces of information about the vehicle. Unless part of an industry standard, these codes have no meaning to the end user. They are very difficult to process and update.
text blocks - these are free-form text fields. While they have a legitimate use, an over reliance on them may indicate that some data requirements are not met by the model.
mixed domains - this is where a value of an attribute can have different meaning under different conditions
Derived Attributes and Code Values
Two areas where data modeling experts disagree is whether derived attributes and attributes whose values are codes should be permitted in the data model.
Derived attributes are those created by a formula or by a summary operation on other attributes. Arguments against including derived data are based on the premise that derived data should not be stored in a database and therefore should not be included in the data model. The arguments in favor are:
- derived data is often important to both managers and users and therefore should be included in the data model
- it is just as important, perhaps more so, to document derived attributes just as you would other attributes
- including derived attributes in the data model does not imply how they will be implemented
A coded value uses one or more letters or numbers to represent a fact. For example, the value Gender might use the letters "M" and "F" as values rather than "Male" and "Female". Those who are against this practice cite that codes have no intuitive meaning to the end-users and add complexity to processing data. Those in favor argue that many organizations have a long history of using coded attributes, that codes save space, and improve flexibility in that values can be easily added or modified by means of look-up tables.
Relationships
Relationships are associations between entities. Typically, a relationship is indicated by a verb connecting two or more entities. For example:
"employees are assigned to projects "
As relationships are identified they should be classified in terms of cardinality, optionality, direction, and dependence.
Cardinality quantifies the relationships between entities by measuring how many instances of one entity are related to a single instance of another. To determine the cardinality, assume the existence of an instance of one of the entities. Then determine how many specific instances of the second entity could be related to the first. Repeat this analysis reversing the entities. For example:
employees may be assigned to no more than three projects at a time; every project has at least two employees assigned to it.
Here the cardinality of the relationship from employees to projects is three; from projects to employees, the cardinality is two. Therefore, this relationship can be classified as a many-to-many relationship.
If a relationship can have a cardinality of zero, it is an optional relationship. If it must have a cardinality of at least one, the relationship is mandatory. Optional relationships are typically indicated by the conditional tense. For example:
an employee may be assigned to a project
Mandatory relationships, on the other hand, are indicated by words such as must have. For example:
a student must register for at least three course each semester
In the case of the specific relationship form (1:1 and 1:M), there is always a parent entity and a child entity. In one-to-many relationships, the parent is always the entity with the cardinality of one. In one-to-one relationships, the choice of the parent entity must be made in the context of the business being modeled. If a decision cannot be made, the choice is arbitrary.
Naming Data Objects
The names should have the following properties:
- unique
- have meaning to the end-user
- contain the minimum number of words needed to uniquely and accurately describe the object.
For entities and attributes, names are singular nouns while relationship names are typically verbs.
Object Definition
Complete and accurate definitions are important to make sure that all parties involved in the modeling of the data know exactly what concepts the objects are representing.
Definitions should use terms familiar to the user and should precisely explain what the object represents and the role it plays in the enterprise. Some authors recommend having the end-users provide the definitions. If acronyms, or terms not universally understood are used in the definition, then these should be defined .
While defining objects, the modeler should be careful to resolve any instances where a single entity is actually representing two different concepts (homonyms) or where two different entities are actually representing the same "thing" (synonyms).
This situation typically arises because individuals or organizations may think about an event or process in terms of their own function.
An example of a homonym would be a case where the Marketing Department defines the entity MARKET in terms of geographical regions while the Sales Departments thinks of this entity in terms of demographics. Unless resolved, the result would be an entity with two different meanings and properties.
Conversely, an example of a synonym would be the Service Department may have identified an entity called CUSTOMER while the Help Desk has identified the entity CONTACT.
In reality, they may mean the same thing, a person who contacts or calls the organization for assistance with a problem. The resolution of synonyms is important in order to avoid redundancy and to avoid possible consistency or integrity problems.
Refrence :http://www.utexas.edu/its/windows/database/datamodeling/dm/objects.html
.jpg)