Pages

Thursday, September 17, 2015

HTML Tag Checker (KICTM UiTM Jasin 2015)

Markup languages such as HTML use tags to highlight sections with special significance. In this way, a sentence in boldface can be indicated thus:

    <B>This is a sentence in boldface</B>

Typically every tag has an opening tag of the form <TAG> and a closing tag of the form </TAG>, so that portions of text can be bracketed as above. Tags can then be combined to achieve more than one effect on a particular piece of text simply by nesting them properly, for instance:

    <CENTER><B>This text is centred and in boldface</B></CENTER>

Two of the most common mistakes when tagging text are:

getting the nesting wrong:
    <B><CENTER>This should be centred boldface, but the tags are wrongly nested</B></CENTER>
forgetting a tag:
    <B><CENTER>This should be centred boldface, but there is a missing tag</CENTER>
Write a program to check that all the tags in a given piece of text (a paragraph) are correctly nested,and that there are no missing or extra tags. An opening tag for this problem is enclosed by angle brackets, and contains exactly one upper case letter, for example <T>, <X>, <S>. The corresponding closing tag will be the same letter preceded by the symbol "/". For the examples above these would be </T>, </X>, </S>.

Input 

The input will consist of any number of paragraphs. Each paragraph will consist of a sequence of tagged sentences, over as many lines as necessary, and terminating with a # which will not occur elsewhere in the text. The input will never break a tag between two lines and no line will be longer than 80 characters. The input will be terminated by an empty paragraph, i.e. a line containing only a single #.

Output 

If the paragraph is correctly tagged then output the line "Correctly tagged paragraph", otherwise output a line of the form "Expected <expected> found <unexpected>" where <expected> is the closing tag matching the most recent unmatched tag and is the closing tag encountered. If either of these is the end of paragraph, i.e. there is either an unmatched opening tag or no matching closing tag at the end of the paragraph, then replace the tag or closing tag with #. These points are illustrated in the examples below which should be followed exactly as far as spacing is concerned.

Sample Input 

The following text<C><B>is centred and in boldface</B></C>#
<B>This <\g>is <B>boldface</B> in <<*> a</B> <\6> <<d>sentence#
<B><C> This should be centred and in boldface, but the tags are wrongly nested </B></C>#
<B>This should be in boldface, but there is an extra closingtag</B></C>#
<B><C>This should be centred and in boldface, but there is a missing closing tag</C>#
#

Sample Output 

Correctly tagged paragraph
Correctly tagged paragraph
Expected </C> found </B>
Expected # found </C>
Expected </B> found #


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import java.util.*;

@SuppressWarnings("unchecked")

public class Q3 {
    public static void main(String[] args) throws Exception {
        Scanner scan = new Scanner(System.in);
        String line = System.getProperty("line.separator");
        scan.useDelimiter(line);

        String nextTag, lastTag;
        Vector stackTag;
        Vector stackList;

        String in = scan.next();
        while ( in .charAt(0) != '#') {
            while (! in .endsWith("#")) 
               in = in +scan.next();
            
            stackList = getTags(in.toUpperCase());

            stackTag = new Vector();
            while (!stackList.isEmpty()) {
                nextTag = (String) stackList.remove(0);

                // Final tag
                if (nextTag.equals("#")) {
                    break;
                }

                // Open tag
                if (nextTag.length() == 1) {
                    stackTag.add(nextTag);
                    continue;
                }

                // Close tag
                if (stackTag.isEmpty()) {
                    stackList.add(0, nextTag);
                    break;
                }

                // Validating open close is match or not?
                lastTag = (String) stackTag.remove(stackTag.size() - 1);
                if (lastTag.charAt(0) != nextTag.charAt(1)) {
                    stackTag.add(lastTag);
                    stackList.add(0, nextTag);
                    break;
                }
            }

            if (stackTag.isEmpty() && stackList.isEmpty()) {
                System.out.println("Correctly tagged paragraph");
            } else if (stackTag.isEmpty()) {
                System.out.println("Expected # found <" + stackList.get(0) + ">");
            } else if (stackList.isEmpty()) {
                System.out.println("Expected </" + stackTag.get(stackTag.size() - 1) + "> found #");
            } else {
                System.out.println("Expected </" + stackTag.get(stackTag.size() - 1) + "> found <" + stackList.get(0) + ">");
            }

            in = scan.next();
         }
    }
    public static Vector getTags(String str) {
        int x;
        Vector tags = new Vector();
        for (x = 0; x < str.length(); x++) {
            // End-of-paragraph
            if (str.charAt(x) == '#') {
                tags.add("#");
            }

            // Open-tag <A>
            if ((str.charAt(x) == '<') && (x + 2 < str.length()) && (str.charAt(x + 1) >= 'A') && (str.charAt(x + 1) <= 'Z') && (str.charAt(x + 2) == '>')) {
                tags.add("" + str.charAt(x + 1));
            }

            // Close-tag </A>
            if ((str.charAt(x) == '<') && (x + 3 < str.length()) && (str.charAt(x + 1) == '/') && (str.charAt(x + 2) >= 'A') && (str.charAt(x + 2) <= 'Z') && (str.charAt(x + 3) == '>')) {
                tags.add("/" + str.charAt(x + 2));
            }
        }
        return tags;
    }
}

No comments:

Post a Comment