Class HTMLParser
- java.lang.Object
-
- org.apache.jmeter.protocol.http.parser.HTMLParser
-
- Direct Known Subclasses:
JsoupBasedHtmlParser
public abstract class HTMLParser extends Object
HtmlParsers can parse HTML content to obtain URLs.
-
-
Field Summary
Fields Modifier and Type Field Description protected static StringATT_BACKGROUNDprotected static StringATT_CODEprotected static StringATT_CODEBASEprotected static StringATT_DATAprotected static StringATT_HREFprotected static StringATT_IS_IMAGEprotected static StringATT_RELprotected static StringATT_SRCprotected static StringATT_STYLEprotected static StringATT_TYPEstatic StringDEFAULT_PARSERprotected static StringIE_UAprotected static PatternIE_UA_PATTERNstatic StringPARSER_CLASSNAMEprotected static StringSTYLESHEETprotected static StringTAG_APPLETprotected static StringTAG_BASEprotected static StringTAG_BGSOUNDprotected static StringTAG_BODYprotected static StringTAG_EMBEDprotected static StringTAG_FRAMEprotected static StringTAG_IFRAMEprotected static StringTAG_IMAGEprotected static StringTAG_INPUTprotected static StringTAG_LINKprotected static StringTAG_OBJECTprotected static StringTAG_SCRIPT
-
Constructor Summary
Constructors Modifier Constructor Description protectedHTMLParser()Protected constructor to prevent instantiation except from within subclasses.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected FloatextractIEVersion(String userAgent)Iterator<URL>getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, String encoding)Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...Iterator<URL>getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, Collection<URLString> coll, String encoding)Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...abstract Iterator<URL>getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, URLCollection coll, String encoding)Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...static HTMLParsergetParser()static HTMLParsergetParser(String htmlParserClassName)protected booleanisEnableConditionalComments(Float ieVersion)protected booleanisReusable()Parsers should over-ride this method if the parser class is re-usable, in which case the class will be cached for the next getParser() call.
-
-
-
Field Detail
-
ATT_BACKGROUND
protected static final String ATT_BACKGROUND
- See Also:
- Constant Field Values
-
ATT_CODE
protected static final String ATT_CODE
- See Also:
- Constant Field Values
-
ATT_CODEBASE
protected static final String ATT_CODEBASE
- See Also:
- Constant Field Values
-
ATT_DATA
protected static final String ATT_DATA
- See Also:
- Constant Field Values
-
ATT_HREF
protected static final String ATT_HREF
- See Also:
- Constant Field Values
-
ATT_REL
protected static final String ATT_REL
- See Also:
- Constant Field Values
-
ATT_SRC
protected static final String ATT_SRC
- See Also:
- Constant Field Values
-
ATT_STYLE
protected static final String ATT_STYLE
- See Also:
- Constant Field Values
-
ATT_TYPE
protected static final String ATT_TYPE
- See Also:
- Constant Field Values
-
ATT_IS_IMAGE
protected static final String ATT_IS_IMAGE
- See Also:
- Constant Field Values
-
TAG_APPLET
protected static final String TAG_APPLET
- See Also:
- Constant Field Values
-
TAG_BASE
protected static final String TAG_BASE
- See Also:
- Constant Field Values
-
TAG_BGSOUND
protected static final String TAG_BGSOUND
- See Also:
- Constant Field Values
-
TAG_BODY
protected static final String TAG_BODY
- See Also:
- Constant Field Values
-
TAG_EMBED
protected static final String TAG_EMBED
- See Also:
- Constant Field Values
-
TAG_FRAME
protected static final String TAG_FRAME
- See Also:
- Constant Field Values
-
TAG_IFRAME
protected static final String TAG_IFRAME
- See Also:
- Constant Field Values
-
TAG_IMAGE
protected static final String TAG_IMAGE
- See Also:
- Constant Field Values
-
TAG_INPUT
protected static final String TAG_INPUT
- See Also:
- Constant Field Values
-
TAG_LINK
protected static final String TAG_LINK
- See Also:
- Constant Field Values
-
TAG_OBJECT
protected static final String TAG_OBJECT
- See Also:
- Constant Field Values
-
TAG_SCRIPT
protected static final String TAG_SCRIPT
- See Also:
- Constant Field Values
-
STYLESHEET
protected static final String STYLESHEET
- See Also:
- Constant Field Values
-
IE_UA
protected static final String IE_UA
- See Also:
- Constant Field Values
-
IE_UA_PATTERN
protected static final Pattern IE_UA_PATTERN
-
PARSER_CLASSNAME
public static final String PARSER_CLASSNAME
- See Also:
- Constant Field Values
-
DEFAULT_PARSER
public static final String DEFAULT_PARSER
- See Also:
- Constant Field Values
-
-
Method Detail
-
getParser
public static final HTMLParser getParser()
-
getParser
public static final HTMLParser getParser(String htmlParserClassName)
-
getEmbeddedResourceURLs
public Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, String encoding) throws HTMLParseException
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...URLs should not appear twice in the returned iterator.
Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.
- Parameters:
userAgent- User Agenthtml- HTML codebaseUrl- Base URL from which the HTML code was obtainedencoding- Charset- Returns:
- an Iterator for the resource URLs
- Throws:
HTMLParseException- when parsing thehtmlfails
-
getEmbeddedResourceURLs
public abstract Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, URLCollection coll, String encoding) throws HTMLParseException
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...All URLs should be added to the Collection.
Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.
N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.
- Parameters:
userAgent- User Agenthtml- HTML codebaseUrl- Base URL from which the HTML code was obtainedcoll- URLCollectionencoding- Charset- Returns:
- an Iterator for the resource URLs
- Throws:
HTMLParseException- when parsing thehtmlfails
-
getEmbeddedResourceURLs
public Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, Collection<URLString> coll, String encoding) throws HTMLParseException
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.
- Parameters:
userAgent- User Agenthtml- HTML codebaseUrl- Base URL from which the HTML code was obtainedcoll- Collection - will contain URLString objects, not URLsencoding- Charset- Returns:
- an Iterator for the resource URLs
- Throws:
HTMLParseException- when parsing thehtmlfails
-
isReusable
protected boolean isReusable()
Parsers should over-ride this method if the parser class is re-usable, in which case the class will be cached for the next getParser() call.- Returns:
- true if the Parser is reusable
-
isEnableConditionalComments
protected final boolean isEnableConditionalComments(Float ieVersion)
- Parameters:
ieVersion- Float IE version- Returns:
- true if IE version < IE v10
-
-