HEX

File: //proc/2/cwd/lib64/python2.7/site-packages/lxml/html/html5parser.pyc
�
�'�Yc@s�dZddlZddlZddlmZddlmZddlm	Z	ddl
mZmZm
Z
y
eZWnek
r�eefZnXyddlmZWn!ek
r�ddlmZnXyddlmZWn!ek
rddlmZnXd	efd
��YZyddlmZWnek
rAn Xdefd
��YZe�Zd�Zddd�Zeddd�Z eddd�Z!ddd�Z"ddd�Z#d�Z$e�Z%dS(s?
An interface to html5lib that mimics the lxml.html interface.
i����N(t
HTMLParser(tTreeBuilder(tetree(tElementtXHTML_NAMESPACEt_contains_block_level_tag(turlopen(turlparseRcBseZdZed�ZRS(s*An html5lib HTML parser with lxml as tree.cKs tj|d|dt|�dS(Ntstrictttree(t_HTMLParsert__init__R(tselfRtkwargs((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyRs(t__name__t
__module__t__doc__tFalseR(((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyRs(tXHTMLParserRcBseZdZed�ZRS(s+An html5lib XHTML Parser with lxml as tree.cKs tj|d|dt|�dS(NRR	(t_XHTMLParserRR(RRR
((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyR*s(RRRRR(((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyR'scCs6|j|�}|dk	r|S|jdt|f�S(Ns{%s}%s(tfindtNoneR(R	ttagtelem((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyt	_find_tag0scCs�t|t�std��n|dkr3t}ni}|dkr]t|t�r]t}n|dk	rv||d<n|j||�j�S(s�
    Parse a whole document into a string.

    If `guess_charset` is true, or if the input is not Unicode but a
    byte string, the `chardet` library will perform charset guessing
    on the string.
    sstring requiredt
useChardetN(	t
isinstancet_stringst	TypeErrorRthtml_parsertbytestTruetparsetgetroot(thtmlt
guess_charsettparsertoptions((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pytdocument_fromstring7s		
cCs�t|t�std��n|dkr3t}ni}|dkr]t|t�r]t}n|dk	rv||d<n|j|d|�}|r�t|dt�r�|r�|dj�r�t	j
d|d��n|d=q�n|S(s`Parses several HTML elements, returning a list of elements.

    The first item in the list may be a string.  If no_leading_text is true,
    then it will be an error if there is leading text, and it will always be
    a list of only elements.

    If `guess_charset` is true, the `chardet` library will perform charset
    guessing on the string.
    sstring requiredRtdivisThere is leading text: %rN(RRRRRRRt
parseFragmenttstripRtParserError(R"tno_leading_textR#R$R%tchildren((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pytfragments_fromstringOs"		
	
cCs;t|t�std��nt|�}t|d|d|d|�}|r�t|t�sgd}nt|�}|r�t|dt�r�|d|_|d=n|j|�n|S|s�tj	d��nt
|�dkr�tj	d	��n|d}|jr.|jj�r.tj	d
|j��nd|_|S(s�Parses a single HTML element; it is an error if there is more than
    one element, or if anything but whitespace precedes or follows the
    element.

    If 'create_parent' is true (or is a tag name) then a parent node
    will be created to encapsulate the HTML in a single element.  In
    this case, leading or trailing text is allowed.

    If `guess_charset` is true, the `chardet` library will perform charset
    guessing on the string.
    sstring requiredR#R$R+R'isNo elements foundisMultiple elements foundsElement followed by text: %rN(RRRtboolR-RttexttextendRR*tlenttailR)R(R"t
create_parentR#R$taccept_leading_texttelementstnew_roottresult((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pytfragment_fromstringqs2

	

	cCsAt|t�std��nt|d|d|�}|d }t|t�rd|jdd�}n|j�j�}|jd�s�|jd�r�|St	|d	�}t
|�r�|St	|d
�}t
|�dkr|js�|jj�r|dj
s|dj
j�r|d
St|�r4d|_n	d|_|S(s�Parse the html, returning a single element/document.

    This tries to minimally parse the chunk of text, without knowing if it
    is a fragment or a document.

    'base_url' will set the document's base_url attribute (and the tree's
    docinfo.URL)

    If `guess_charset` is true, or if the input is not Unicode but a
    byte string, the `chardet` library will perform charset guessing
    on the string.
    sstring requiredR$R#i2tasciitreplaces<htmls	<!doctypetheadtbodyii����iR'tspan(RRRR&Rtdecodetlstriptlowert
startswithRR1R/R)R2RR(R"R#R$tdoctstartR;R<((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyt
fromstring�s*
	
,"	cCs�|dkrt}nt|t�sB|}|dkr�t}q�nTt|�rrt|�}|dkr�t}q�n$t|d�}|dkr�t}ni}|r�||d<n|j	||�S(s*Parse a filename, URL, or file-like object into an HTML document
    tree.  Note: this returns a tree, not an element.  Use
    ``parse(...).getroot()`` to get the document root.

    If ``guess_charset`` is true, the ``useChardet`` option is passed into
    html5lib to enable character detection.  This option is on by default
    when parsing from URLs, off by default when parsing from file(-like)
    objects (which tend to return Unicode more often than not), and on by
    default when parsing from a file path (which is read in binary mode).
    trbRN(
RRRRRt_looks_like_urlRRtopenR (tfilename_url_or_fileR#R$tfpR%((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyR �s"		
cCsVt|�d}|stStjdkrN|tjkrNt|�dkrNtStSdS(Nitwin32i(RRtsystplatformtstringt
ascii_lettersR1R(tstrtscheme((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyRF�s(&RRKRMthtml5libRR
t html5lib.treebuilders.etree_lxmlRtlxmlRt	lxml.htmlRRRt
basestringRt	NameErrorRROturllib2RtImportErrorturllib.requestRturllib.parseRRtxhtml_parserRRR&RR-R8RDR RFR(((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyt<module>sF




		!+6$