- User stories can be split, actually they should be split to accommodate them in a three to five days time frame
- User stories in the Sprint Backlog must include all necessary work; development, infodev and QE (Quality Engineering) tasks would need to be considered
- User stories should be delivered on a weekly basis, we should try to avoid at all cost to deliver all user stories by the end of the Sprint. On the contrary, our target should be to deliver small increments of implemented and tested functionality each week, see the bar chart below

A collateral benefit of having weekly deliverables is that by doing this, we'll contribute to distributing work evenly during the Sprint-sustainable phase concept-avoiding peaks at the end of the Sprint that usually cause to have implemented but not tested user stories.

I think that the key for this to happen is good planning, good in the sense of team's involvement but not description of tasks to the last detail. It's very common that planning is more an individual effort focused on estimating hours the more accurately than experience and guessing can permit.
Scrum approach for this is quiet different, firstly, planning is a team's activity were everybody has an stake. Secondly, it's divided in two parts: one for understanding what has to be done for each user story, and the other for providing time estimations. Both sessions combined should not take more than eight hours-it's recommended that the team invest four hours tops in each part.
The first part is also good for developers and QEs to communicate their understanding of what needs to be done; feedback from the whole team is what in general would increase the degree of understanding of what would be done and how it would be tested.
For the part where time estimations should be provided, I've been working in an approach that might work, no magic recipe that can be extrapolated to all teams but certainly the essence can be adapted to team's particular needs.
Let's start recognizing that estimations will always be inaccurate, no matter how much effort and time you invest. Further, estimating only hours is by far the most inaccurate way to estimate tasks duration.
An alternative is to estimate size and complexity. My preferred technique for this is a vote by a show of hands. If size and complexity are defined next comes priority, again the team would need to decide what they'd need to tackle first. This goes hand in hand with Sprint Backlog prioritization.
Practical experience shows that having a target date for implementation helps QE to get prepared for receiving implemented user stories. It's also a good idea to provide an estimation for this.
Hours come next and again the team should vote for estimating them to tasks. This voting would take into consideration the other already estimated parameters and hopefully would conduct to a more educated guessing. Again, accuracy is not what we're looking for.
Lastly, team members can start to volunteer for the work they want to do for the Sprint. Work balancing would be oriented to balance complexity and size more than just hours. I present and spreadsheet that summarizes this approach:
 
 
 




















 

