As specified by the W3C, a well-formed XML document should begin with a prolog.
In its simplest form, the prolog should contain at least an XML declaration that specifies the version of XML being used.
<?xml version="1.0" ?>
XML Declarations uses " <?xml " and " ?> " as the opening and closing delimiters. Inside the delimiters can appear the
yes or no . Yes specifies that this document exists entirely on its own, without depending on other files. No indicates that the document depends on other documents, for example an external Document Type Definition (DTD).A full XML declaration at the start of an XML file might thus appear as
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
In addition to the XML declaration, the prolog section may also contain XML comments, processing instructions, and a document type declaration (DOCTYPE). Here is an example of a more complete prolog.
<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <!-- ********************************** faq.xml Ed Gellenbeck October 4, 2002 ********************************** --> <?xml-stylesheet href="faq.css" type="text/css" ?> <!DOCTYPE FAQs SYSTEM "faq.dtd" >
XML documents are trees and are required to have exactly one root element. The root element encapsulates the remaining XML elements and data in the document. Choose a root name that describes your XML data as this name is used for the document type declaration.
XML names should begin with a letter followed by letters, digits, or the '_'. XML names cannot contain a whitespace character, nor can the begin with the letters xml. XML Names are case sensitive
To be considered well-formed, an XML document needs to follow some basic rules.
Start tags and end tags must match, are case-sensitive, and every start tag must have a corresponding end tag. XML has a shortened version for closing empty elements (i.e. elements that have no data content). The following two lines are equivalent (and the second one is preferred).
<Picture file="photo.jpg"></Picture> <Picture file="photo.jpg" />
In the above example, photo.jpg is called an attribute value. XML required the attribute values be enclosed in quotation marks. XML allows both single and double quotation marks, as long as you are consistent.
Elements must be strictly nested; elements cannot overlap. If you start element A, then start element B, then element B must be closed before you close element A. The following example shows properly nested elements in a well-formed XML document.
<?xml version="1.0" ?> <!-- File: faq.xml --> <FAQs> <GeneralQuestions> <Page number="1"> <Question>What is XML?</Question> <Answer>XML is the Extensible Markup Language. It is designed to improve the functionality of the Web by providing more flexible and adaptable information identification.</Answer> </Page> <Page number="2"> <Question>What is XML for?</Question> <Answer>XML is intended to make it easy and straightforward to use SGML on the Web: easy to define document types, easy to author and manage SGML-defined documents, and easy to transmit and share them across the Web.</Answer> </Page> </GeneralQuestions> </FAQs>
A standard cascading style sheet can be used to apply formatting to your XML tags. Tags will need to have their display property set to either block, inline, list-item, or none. Block places a line break before and after the element, inline assumes no line break before and after the element, list-item is same as block except a list-item marker is added, and none means do not display. For example, the following cascading style sheet could be used to format the FAQ XML document.
/* File: faq.css */ Page { display : block; border-bottom-color : blue; border-bottom-style : solid; margin-top: 1em; margin-bottom : 1em; } Question { display : block; font-style : italic; font-weight : bolder; } Answer { display : block; margin-left : 5%; margin-right : 30%; margin-bottom: 1em; }
An XML document is well-formed when it is structured according to the rules defined in Section 2.1 of the XML 1.0 Recommendation. Basically, the rules state only one root element, elements are delimited by their start and end tags, must be nested properly within one another, and attributes are enclosed by quotation marks.
Validation takes this process one step further. An XML document is valid if it has an associated Document Type Declaration (DOCTYPE) and if the document complies with the constraints expressed by the Document Type Definition (DTD) (or XML Schema). A valid document is, by definition, also a well-formed XML document.
The original W3C XML specification included a description of the Document Type Definition (DTD) based on the parent language SGML. DTDs are was used to describe the elements and attributes allowed in the XML document and their relationship. At that time, the term "valid" was defined to mean conformance to a certain DTD.
It soon became apparent that XML required a more flexible way to describe the content and corresponding data types. In 1999 the W3C started work on the new XML Schema definition language that is now available as a final Recommendation dated May 2nd, 2001.
A Document Type Definition to describe the faq.xml document could look like the following
<!-- file: faq.dtd --> <!ELEMENT FAQs ( GeneralQuestions, SpecificQuestions? ) > <!ELEMENT GeneralQuestions ( Page+ ) > <!ELEMENT Page ( Question, Answer ) > <!ATTLIST Page number CDATA #REQUIRED > <!ELEMENT Question ( #PCDATA ) > <!ELEMENT Answer ( #PCDATA ) >
Instructions for Lab 1's Homework Assignment