Gmane
From: Ted Mielczarek <ted <at> mielczarek.org>
Subject: wget css parsing
Newsgroups: gmane.comp.web.wget.patches
Date: 2006-12-05 17:00:15 GMT (1 year, 51 weeks, 6 days, 8 hours and 44 minutes ago)
Expires: This article expires on 2006-12-19

Hello,

I have updated my CSS parser patch to trunk.  Attached is the patch file and the textual changelog.  This is a pretty sizable patch, so don't hesitate to ask me for more information.  I have received interest in this patch from various people, but no response from the wget maintainers.  Is this something you would be interested in taking into wget?

You can also browse and download a tarball of my source tree at:
http://ted.mielczarek.org/code/wget-modified/trunk/

Regards,
-Ted

Attachment (wget_simple_css_parser.patch): application/octet-stream, 51 KiB
wget.h:
  added TEXTCSS to dt flags enum

convert.h:
  added link_css_p to struct urlpos
  added link_expect_css to struct urlpos
  added downloaded_css_set hash table
  added register_css function prototype

convert.c:
  added #include "html-url.h"
  added downloaded_css_set hash table
  added register_css function
  moved most of convert_all_links to function convert_links_in_hashtable
  call convert_links_in_hashtable for each of downloaded_{html,css}_set
  made convert_links_in_hashtable handle css files
  added replace_plain function

retr.c:
  added #include "html-url.h"
  added TEXTCSS -> register_css in retrieve_url

http.c:
  added #define TEXTCSS_S
  added check for CSS content type -> TEXTCSS flag
  added function ensure_extension
  changed code to use ensure_extension for HTML and CSS files

recur.h:
  removed prototypes for functions from html-url.c

recur.c:
  added #include "html-url.h"
  added #include "css-url.h"
  added 'css_allowed' to enqueue/dequeue_url functions, struct queue_element
  modified retrieve_tree to handle css_allowed, set descend properly,
    call get_urls_css_file

html-url.h:
  added prototypes from recur.h
  added prototype for append_url
  added definition of struct map_context

html-url.c:
  added #include "html-url.h"
  removed definition of struct_map_context
  added ATTR_POS and ATTR_SIZE to shorten calls to append_url
  changed append_url to be non-static and take position and size
    as parameters instead of tag/attrind
  modified tag_handle_link to set link_expect_css for link rel="stylesheet"
  added "style" to additional_attributes array
  added check_style_attr function
  modified collect_tags_mapper to call check_style_attr, and handle
    uninteresting tags
  added check in collect_tags_mapper 
  modified get_urls_html to call map_html_tags with NULL as the
    interesting_tags parameter, so we receive all tags

html-parse.c:
  added struct tagstack_item (for tag stack, a doubly linked list)
  added functions tagstack_push, tagstack_pop, tagstack_find
  added head and tail tag stack pointers in map_html_tags
  in map_html_tags, if not a start tag, push tag details to the stack
  when end of tag is found, set contents_begin on stack
  before calling map function, if an end tag, find matching start tag
    on the stack, pop it, and save contents info in taginfo struct
  cleanup tag stack when finished