Procedures and Control (Introduction to Web Browsers)
In Unit 1, you wrote a program to extract the first link from a web page. The next step towards building your search engine is to extract all of the links from a web page. In order to write a program to extract all of the links, you need to know these two key concepts:
1. Procedures – a way to package code so it can be reused with different inputs.
2. Control – a way to have the computer execute different instructions depending on the data (instead of just executing instructions one after the other).
In this unit, you will learn three important programming constructs: procedures, if statements, and while loops.
Procedures, also known in Python as “functions,” enable you to abstract code from its inputs;
if statements allow you to write code that executes differently depending on the data; and
while loops provide a convenient way to repeat the same operations many times. You will combine these to solve the problem of finding all of the links on a web page.
Procedures A procedure takes in inputs, does some processing, and produces outputs. which allows you to use few lines of code to do many different thing.
Let’s consider how to turn the code for finding the first link into a get_next_target procedure that finds the next link target in the page contents. for this procedure the input should be –> string giving the rest of the content of the web page for the procedure output should be –> URL and end_quote
def get_next_target(page):
start_link = page.find(‘<a href=’)
start_quote = page.find(‘”‘, start_link)
end_quote = page.find(‘”‘, start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
Procedures are a very important concept and the core of programming is breaking problems into
procedures, and implementing those procedures.
Making Decisions
Now, let’s figure out a way to make code behave differently based on decisions. To do so we want to find a way to make comparisons, which will give you a way to test and ultimately allow your program to decide what to
do.
*If Statements it executes the code inside only if the IF condition is true Else Statement is used beside if statement to provide another alternative if the condition was not executed or add another condition.
Loops they are used to do code again and again if a certain condition is always true
*While loop as long as the condition is true the code inside the loop body is executed if we write break inside the loop the execution of the loop body will stop.
So here we know enough concepts to do the goal of the unit and write the code that extract all the links in the web page
before starting using our procedure get next target there is a small problem we didn’t think about if there are no links in the page ! …. what is the expected output from the code ?
The program returns, “the whole page content except the last letter in the text ” because when the find operation does not find what it is looking for it returns -1. When -1 is used as an index, it eliminates the last character of the string
So, the get_next_target function will be as follow
def get_next_target(page):
start_link = page.find(‘<a href=’)
if start_link == -1:
return None, 0
start_quote = page.find(‘”‘, start_link)
end_quote = page.find(‘”‘, start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
and the main code to find all the webpage links (use the previous function) will be as follow
NOTE use the function get_page(page url) in python to get the source code of the webpage
def print_all_links(page):
while True:
url, endpos = get_next_target(page)
if url:
print url
page = page[endpos
else:
break