Report/Chapter3.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246

\section{XML}

There are two kinds of XML files used for the website. The main information will be stored in the Database XML, this file will be supported by three other XML files. The size of the XML database will be increased by two methods. You can enter information manually or the information will be imported from other sites via
RSS-feeds. An RSS is the second kind XML file. This is a standardized XML file. The website will also offer information via this RSS feed.

In this section you can read how the XML files are build up.

\subsection{Database XML}

All the information for the website is stored in four Database XML files. One file contains the main information about all the liquor. Two files contains grouping information of the liquor. Finally there is one file which contains technical information for the incoming RSS feeds. In this section we will discussed the four files. This will be done by a description, (a part of) the file and an XML tree. We started with grouping the two information files. \\
 
\textbf{Note}: The information in the given XML file snippets is the actual data (or a part of it) which the files contain at the moment this report is written. Some files can be extended with information because of the dynamic nature of the website. \\

\subsubsection{LiquorTypes.xml}

In this file we will store the different kinds of liquor for which we have data in the database. Every type has its own ID on which the items relate to. The name of the LiquorTypes are also being used by the search engine of the site. Below you see the XML file about the liquor types: \\

\begin{verbatim}
<LiquorTypes>
  <LiquorType ID="1"><Name>General</Name></LiquorType>
  <LiquorType ID="2"><Name>Beer</Name></LiquorType>
  <LiquorType ID="3"><Name>Wine</Name></LiquorType>
  <LiquorType ID="4"><Name>Cocktail</Name></LiquorType>
</LiquorTypes>
\end{verbatim}

On the figure below the data tree of the LiquorTypes XML-file is shown. For all elements in the XML file there will be only one shown completely. If there can be more elements, which are the same as their siblings, they are shown dotted (without their descendants).

\begin {center}
  \includegraphics[width=53.7mm]{LiquorType.png} \\
  Fig 2. Data tree of LiquorType.xml
\end {center}

\subsubsection{NewsCategories.xml}

In this file we will store the different categories the items belong to. It is very similar to the LiquorTypes XML. Every category had its own ID on which the items relate to. The name of the LiquorTypes are also being used by the search engine of the site. Below a snippet of the XML-file about the NewsCategories can be seen: \\

\begin{verbatim}
<NewsCategories>
  <NewsCategory ID="1"><Name>News</Name></NewsCategory>
  <NewsCategory ID="2"><Name>Fun</Name></NewsCategory>
  <NewsCategory ID="3"><Name>Recipe</Name></NewsCategory>
</NewsCategories>
\end{verbatim}

On figure below you can see the data tree of the NewsCategories XML-file. All elements in the XML will be shown completely. If there can be more elements, which are the same as their siblings, they are shown dotted (without their descendants).

\begin {center}
  \includegraphics[width=138.1mm]{NewsCategory.png} \\
  Fig 3. Data tree of NewsCategories.xml
\end {center}

\subsubsection{DB.xml}

In this file we will store the main information of the website. Below you see a snippet of the XML-file.

\begin{verbatim}
<db>
  <items>
    <item ID="1">
      <LiquorTypeID>3</LiquorTypeID>
      <NewsCategoryID>1</NewsCategoryID>
      <pubDate>2008/05/15</pubDate>
      <guid>http://www.wijnspecialist.be/site/wijnnieuws/op
            -restaurant-met-je-eigen-wijn.htm
      </guid>
      <title>Op restaurant met je eigen wijn?</title>
      <intern>index.php?link=news&amp;title=Op_restaurant_met_je_eigen_wijn?
      </intern>
      <description>BYO (Bring your own) ontstond in de Engelstalige landen...
      </description>
    </item>

    ...

  </items>
</db>
\end{verbatim}

The site contains many news messages which are stored in the "item" elements, a child of the "items" element. The following elements are part of the item element:

\begin{itemize}
  \item \underline{LiquorTypeID}: This is the ID of the LiquerType of the news message. A news message can contain one or more LiquerTypeID's.
  \item \underline{NewsCategoryID}: This is the ID of the NewsCategory of the news message. A news message can contain one or more NewsCategoryID's.
  \item \underline{pubDate}: This is the date when the news messages are written. (If the incoming RSS feeds does not contain a date, then the pubDate will be the date on which the news messages is added to the site)
  \item \underline{guid}: This is the link of the original news messages on the original site.
  \item \underline{title}: This is the title of the news messages.
  \item \underline{intern}: This is the link of the news messages on the site. If this link is followed, the news messages is shown on the site.
  \item \underline{description}: This is the news messages.
\end{itemize}

On figure below the data tree of the database XML-file can be seen. For all elements in the XML-file there will be only one shown completely. If there can be more elements, which are the same as their siblings, they are showed dotted (without their descendants).

\begin {center}
  \includegraphics[width=160.0mm]{DB.png} \\
  Fig 4. Data tree of DB.xml
\end {center}

\subsubsection{rssfeeds.xml}

In this file we will store the RSS binding information for the incoming RSS feed, so the website knows from which RSS the information must be taken from and how it must be stored. Below you can see a part of the XML-file.

\begin{verbatim}
<IncomingRSSFeedBindings>
  <RSSFeed ID="1">
    <Name>Goedkoopbier - Drankspellen</Name>
    <Url>http://www.goedkoopbier.nl/rss/drankspellen.xml</Url>
    <LiquorTypeID>2</LiquorTypeID>
    <NewsCategoryID>2</NewsCategoryID>
  </RSSFeed>

  ...

</IncomingRSSFeedBindings>

\end{verbatim}

The following elements are part of the RSSFeed element:

\begin{itemize}
  \item \underline{Name}: This is the name of the RSS feed. This is used to show the RSS feeds bindings on the site.
  \item \underline{Url}: This is the URL of the RSS feed. From this, the website gathers the information.
  \item \underline{LiquorTypeID}: This is the ID for the liquor type, which will be bound to all the incoming messages from the RSS feed.
  \item \underline{NewsCategoryID}: This is the ID for the News Category, which will be bound to all the incoming messages from the RSS feed.
\end{itemize}

On the figure below you will see the Data tree of the DB XML-file. For all elements in the XML there will be only one shown completed. If there can be more elements, which are the same as their siblings, they are shown dotted (without their descendants).

\begin {center}
  \includegraphics[width=117.2mm]{rssfeeds.png} \\
  Fig 5. Data tree of rss.xml
\end {center}

\subsection{RSS-Feeds}

To gather information or deliver information the site uses RSS-Feeds. These are standard kinds of XML-files. First we describe the types of RSS feeds the website uses and then how a RSS-feed looks like. \\

The website uses two types of RSS feeds:

\subsubsection{Incoming RSS-feed}

This is the RSS-Feed from which the website takes its information. There are several incoming RSS feeds and for each one the website does the same procedures to add the information into the database. This is done six times a day by running a PHP script. The following procedures will be done for every RSS feed known to Pottepei:

\begin{enumerate}
  \item Take an RSS feed and read the information from it.
  \item Check for every item if the item is already in the database. This is done by a title check. More information about this can be read in chapter 4. If the item is already in the database continue with point 4.
  \item Add the item to the DB.xml and go on with the next item of RSS Feed
  \item The news item is already there, so check if the LiquorTypeID and NewsCategoryID is the same. If it is the same, go on with the next item of RSS Feed.
  \item The news item is already there, but it has different LiquorTypeID's or NewsCategoryID's, so add the new LiquorTypeID or NewsCategoryID to the item and go on with the next item of RSS Feed.
\end{enumerate}

\subsubsection{Outgoing RSS-feed}

The website also has an outgoing RSS-Feed. In this feed the latest twenty news messages from the database will be put into an RSS feed. The latest twenty news messages are gained from the DB xml via an XPath Query. This information is put via XML php statements into the RSS Feed. A part of the XML php statements can be read below.

\begin{verbatim}
  // create doctype
  $dom = new DOMDocument("1.0");

  // create root element
  $root = $dom->createElement("rss");
  $dom->appendChild($root);
  $root->setAttribute("version", "2.0");
  $dom->formatOutput = true;

  // create channel element
  $channel = $dom->createElement("channel");
  $root->appendChild($channel);

  // create title element
  $title = $dom->createElement("title");
  $channel->appendChild($title);
  $text = $dom->createTextNode("Pottepei RSS feed");
  $title->appendChild($text);
\end{verbatim}

This part of the code first generates an XML document with version \textit{"1.0"} and then creates a root element, in this case called \textit{"rss"}. The root gets also an attribute named \textit{"version"} with the value \textit{"2.0"}. The statement \textit{"\$dom-$>$formatOutput"} makes the output file readable. \\

After that the code creates a channel element and add it as a child of the root element. Then a title element will be created and added to the channel element as a child. This title element will contain the text \textit{"Pottepei RSS feed"}. This is the way that the whole RSS-feed will be build. Below is a part of the XML file which this part of the code generates.

\begin{verbatim}
<?xml version="1.0"?>
  <rss version="2.0">
    <channel>
      <title>Pottepei RSS feed</title>
\end{verbatim}

This code will be automatically executed every day.

\subsubsection{The construction of an RSS feed}

An RSS feed has a standard constructions. We use the RSS version 2.0. There is much information which you can store in an RSS feed. We use some of those. Below you can see a part of our RSS feed file:

\begin{verbatim}
<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Pottepei RSS feed</title>
    <link>http://pottepei.schinagl.nl</link>
    <description>Het laatste nieuws van Pottepei.</description>
    <language>nl</language>
    <pubDate>2008/05/15</pubDate>

    <item>
      <pubDate>2008/05/15</pubDate>
      <guid>http://www.goedkoopbier.nl/nieuws/Heineken_Trom_Pet</guid>
      <title>Heineken Trom-Pet</title>
      <description>Heineken komt voor aankomend EK weer met een
                   hoofddeksel op de proppen: de Trom-Pet.
      </description>
    </item>

    ...

  </channel>
</rss>

\end{verbatim}

This file got the following elements:

\begin{itemize}
  \item \underline{rss}: This is the standard element which says that the XML file is an RSS feed. The attribute \underline{version} says which RSS version the document is.
  \item \underline{channel}: This a standard RSS element
  \item \underline{title}: This is the title of the RSS feed
  \item \underline{link}: This is the link of the website
  \item \underline{description}: This will give a description of the RSS feed.
  \item \underline{language}: This will give the language of the RSS feed.
  \item \underline{pubDate}: When the RSS feed is published.
  \item \underline{item}: This contains the information of one news massage
    \subitem \underline{pubDate}: The date of the news item
    \subitem \underline{guid}: The link where the news massage can be found
    \subitem \underline{title}: The title of the news massage
    \subitem \underline{description}: The news massage itself.
\end{itemize}

These are also the elements we use from the incoming RSS feeds.\\
 
\textbf{Note}: From the initial RSS feeds, the feed from www.cocktailz.nl, does not have a pubDate. In this special case we take the day on which the website imports the news massage as pubDate.

On the figure below you will see the data tree of the RSS-feed XML. For all elements in the XML there will be only one shown completely. If there can be more elements, which are the same as their siblings, they are shown dotted (without their descendants).

\begin {center}
  \includegraphics[width=\textwidth]{RSS.png} \\
  Fig 6. Data tree of rss.xml
\end {center}